sched/core: Rewrite and improve select_idle_siblings()

select_idle_siblings() is a known pain point for a number of workloads; it either does too much or not enough and sometimes just does plain wrong. This rewrite attempts to address a number of issues (but sadly not all). The current code does an unconditional sched_domain iteration; with the intent of finding an idle core (on SMT hardware). The problems which this patch tries to address are: - its pointless to look for idle cores if the machine is real busy; at which point you're just wasting cycles. - it's behaviour is inconsistent between SMT and !SMT hardware in that !SMT hardware ends up doing a scan for any idle CPU in the LLC domain, while SMT hardware does a scan for idle cores and if that fails, falls back to a scan for idle threads on the 'target' core. The new code replaces the sched_domain scan with 3 explicit scans: 1) search for an idle core in the LLC 2) search for an idle CPU in the LLC 3) search for an idle thread in the 'target' core where 1 and 3 are conditional on SMT support and 1 and 2 have runtime heuristics to skip the step. Step 1) is conditional on sd_llc_shared->has_idle_cores; when a cpu goes idle and sd_llc_shared->has_idle_cores is false, we scan all SMT siblings of the CPU going idle. Similarly, we clear sd_llc_shared->has_idle_cores when we fail to find an idle core. Step 2) tracks the average cost of the scan and compares this to the average idle time guestimate for the CPU doing the wakeup. There is a significant fudge factor involved to deal with the variability of the averages. Esp. hackbench was sensitive to this. Step 3) is unconditional; we assume (also per step 1) that scanning all SMT siblings in a core is 'cheap'. With this; SMT systems gain step 2, which cures a few benchmarks -- notably one from Facebook. One 'feature' of the sched_domain iteration, which we preserve in the new code, is that it would start scanning from the 'target' CPU, instead of scanning the cpumask in cpu id order. This avoids multiple CPUs in the LLC scanning for idle to gang up and find the same CPU quite as much. The down side is that tasks can end up hopping across the LLC for no apparent reason. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
author: Peter Zijlstra <peterz@infradead.org> 2016-05-09 10:38:05 +0200
committer: Ingo Molnar <mingo@kernel.org> 2016-09-30 11:03:09 +0200
commit: 10e2f1acd0106c05229f94c70a344ce3a2c8008b (patch)
tree: ea286530e7e44d7c5fa5b7c694e298eaf3d88acf /kernel/sched/idle_task.c
parent: 0e369d757578b23ac50b893f920aa50fdbc45fb6 (diff)
download: linux-10e2f1acd0106c05229f94c70a344ce3a2c8008b.tar.gz
linux-10e2f1acd0106c05229f94c70a344ce3a2c8008b.tar.bz2
linux-10e2f1acd0106c05229f94c70a344ce3a2c8008b.zip
1 files changed, 1 insertions, 1 deletions
diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c
index dedc81ecbb2e..5405d3feb112 100644
--- a/kernel/sched/idle_task.c
+++ b/kernel/sched/idle_task.c
@@ -27,7 +27,7 @@ static struct task_struct *
 pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct pin_cookie cookie)
 {
 	put_prev_task(rq, prev);
-
+	update_idle_core(rq);
 	schedstat_inc(rq->sched_goidle);
 	return rq->idle;
 }
author	Peter Zijlstra <peterz@infradead.org>	2016-05-09 10:38:05 +0200
committer	Ingo Molnar <mingo@kernel.org>	2016-09-30 11:03:09 +0200
commit	10e2f1acd0106c05229f94c70a344ce3a2c8008b (patch)
tree	ea286530e7e44d7c5fa5b7c694e298eaf3d88acf /kernel/sched/idle_task.c
parent	0e369d757578b23ac50b893f920aa50fdbc45fb6 (diff)
download	linux-10e2f1acd0106c05229f94c70a344ce3a2c8008b.tar.gz linux-10e2f1acd0106c05229f94c70a344ce3a2c8008b.tar.bz2 linux-10e2f1acd0106c05229f94c70a344ce3a2c8008b.zip