diff options
author | Chengming Zhou <zhouchengming@bytedance.com> | 2022-10-14 19:05:51 +0800 |
---|---|---|
committer | Greg Kroah-Hartman <gregkh@linuxfoundation.org> | 2023-07-27 08:50:37 +0200 |
commit | 7d8bba4da1a8f11f304a15b6f0de69ac99ebd67f (patch) | |
tree | 7c2ba9c197067c6bcfae884331455085f5bd4fa7 /include | |
parent | 45f739e8fb345cc4352da091602660f94320431f (diff) | |
download | linux-stable-7d8bba4da1a8f11f304a15b6f0de69ac99ebd67f.tar.gz linux-stable-7d8bba4da1a8f11f304a15b6f0de69ac99ebd67f.tar.bz2 linux-stable-7d8bba4da1a8f11f304a15b6f0de69ac99ebd67f.zip |
sched/psi: Fix avgs_work re-arm in psi_avgs_work()
[ Upstream commit 2fcd7bbae90a6d844da8660a9d27079281dfbba2 ]
Pavan reported a problem that PSI avgs_work idle shutoff is not
working at all. Because PSI_NONIDLE condition would be observed in
psi_avgs_work()->collect_percpu_times()->get_recent_times() even if
only the kworker running avgs_work on the CPU.
Although commit 1b69ac6b40eb ("psi: fix aggregation idle shut-off")
avoided the ping-pong wake problem when the worker sleep, psi_avgs_work()
still will always re-arm the avgs_work, so shutoff is not working.
This patch changes to use PSI_STATE_RESCHEDULE to flag whether to
re-arm avgs_work in get_recent_times(). For the current CPU, we re-arm
avgs_work only when (NR_RUNNING > 1 || NR_IOWAIT > 0 || NR_MEMSTALL > 0),
for other CPUs we can just check PSI_NONIDLE delta. The new flag
is only used in psi_avgs_work(), so we check in get_recent_times()
that current_work() is avgs_work.
One potential problem is that the brief period of non-idle time
incurred between the aggregation run and the kworker's dequeue will
be stranded in the per-cpu buckets until avgs_work run next time.
The buckets can hold 4s worth of time, and future activity will wake
the avgs_work with a 2s delay, giving us 2s worth of data we can leave
behind when shut off the avgs_work. If the kworker run other works after
avgs_work shut off and doesn't have any scheduler activities for 2s,
this maybe a problem.
Reported-by: Pavan Kondeti <quic_pkondeti@quicinc.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Tested-by: Chengming Zhou <zhouchengming@bytedance.com>
Link: https://lore.kernel.org/r/20221014110551.22695-1-zhouchengming@bytedance.com
Stable-dep-of: aff037078eca ("sched/psi: use kernfs polling functions for PSI trigger polling")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Diffstat (limited to 'include')
-rw-r--r-- | include/linux/psi_types.h | 3 |
1 files changed, 3 insertions, 0 deletions
diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h index 14a1ebb74e11..1e0a0d7ace3a 100644 --- a/include/linux/psi_types.h +++ b/include/linux/psi_types.h @@ -72,6 +72,9 @@ enum psi_states { /* Use one bit in the state mask to track TSK_ONCPU */ #define PSI_ONCPU (1 << NR_PSI_STATES) +/* Flag whether to re-arm avgs_work, see details in get_recent_times() */ +#define PSI_STATE_RESCHEDULE (1 << (NR_PSI_STATES + 1)) + enum psi_aggregators { PSI_AVGS = 0, PSI_POLL, |