mm: memcontrol: fix transparent huge page allocations under pressure

In a memcg with even just moderate cache pressure, success rates for transparent huge page allocations drop to zero, wasting a lot of effort that the allocator puts into assembling these pages. The reason for this is that the memcg reclaim code was never designed for higher-order charges. It reclaims in small batches until there is room for at least one page. Huge page charges only succeed when these batches add up over a series of huge faults, which is unlikely under any significant load involving order-0 allocations in the group. Remove that loop on the memcg side in favor of passing the actual reclaim goal to direct reclaim, which is already set up and optimized to meet higher-order goals efficiently. This brings memcg's THP policy in line with the system policy: if the allocator painstakingly assembles a hugepage, memcg will at least make an honest effort to charge it. As a result, transparent hugepage allocation rates amid cache activity are drastically improved: vanilla patched pgalloc 4717530.80 ( +0.00%) 4451376.40 ( -5.64%) pgfault 491370.60 ( +0.00%) 225477.40 ( -54.11%) pgmajfault 2.00 ( +0.00%) 1.80 ( -6.67%) thp_fault_alloc 0.00 ( +0.00%) 531.60 (+100.00%) thp_fault_fallback 749.00 ( +0.00%) 217.40 ( -70.88%) [ Note: this may in turn increase memory consumption from internal fragmentation, which is an inherent risk of transparent hugepages. Some setups may have to adjust the memcg limits accordingly to accomodate this - or, if the machine is already packed to capacity, disable the transparent huge page feature. ] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Dave Hansen <dave@sr71.net> Cc: Greg Thelen <gthelen@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
author: Johannes Weiner <hannes@cmpxchg.org> 2014-10-09 15:28:56 -0700
committer: Linus Torvalds <torvalds@linux-foundation.org> 2014-10-09 22:25:59 -0400
commit: b70a2a21dc9d4ad455931b53131a0cb4fc01fafe (patch)
tree: 95ad6c804009a5867ac991cc1edf414e163b40b4 /include/linux/swap.h
parent: 3fbe724424fb104aaca9973389b4a9df428c3e2a (diff)
download: linux-b70a2a21dc9d4ad455931b53131a0cb4fc01fafe.tar.gz
linux-b70a2a21dc9d4ad455931b53131a0cb4fc01fafe.tar.bz2
linux-b70a2a21dc9d4ad455931b53131a0cb4fc01fafe.zip
1 files changed, 4 insertions, 2 deletions
diff --git a/include/linux/swap.h b/include/linux/swap.h
index ea4f926e6b9b..37a585beef5c 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -327,8 +327,10 @@ extern void lru_cache_add_active_or_unevictable(struct page *page,
 extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
 					gfp_t gfp_mask, nodemask_t *mask);
 extern int __isolate_lru_page(struct page *page, isolate_mode_t mode);
-extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
-						  gfp_t gfp_mask, bool noswap);
+extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
+						  unsigned long nr_pages,
+						  gfp_t gfp_mask,
+						  bool may_swap);
 extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
 						gfp_t gfp_mask, bool noswap,
 						struct zone *zone,
author	Johannes Weiner <hannes@cmpxchg.org>	2014-10-09 15:28:56 -0700
committer	Linus Torvalds <torvalds@linux-foundation.org>	2014-10-09 22:25:59 -0400
commit	b70a2a21dc9d4ad455931b53131a0cb4fc01fafe (patch)
tree	95ad6c804009a5867ac991cc1edf414e163b40b4 /include/linux/swap.h
parent	3fbe724424fb104aaca9973389b4a9df428c3e2a (diff)
download	linux-b70a2a21dc9d4ad455931b53131a0cb4fc01fafe.tar.gz linux-b70a2a21dc9d4ad455931b53131a0cb4fc01fafe.tar.bz2 linux-b70a2a21dc9d4ad455931b53131a0cb4fc01fafe.zip