psi: Optimize switching tasks inside shared cgroups
authorJohannes Weiner <hannes@cmpxchg.org>
Mon, 16 Mar 2020 19:13:32 +0000 (15:13 -0400)
committerPeter Zijlstra <peterz@infradead.org>
Fri, 20 Mar 2020 12:06:19 +0000 (13:06 +0100)
commit36b238d5717279163859fb6ba0f4360abcafab83
tree1b3282b27b593262f09686f704b1e00767ff76f6
parentb05e75d611380881e73edc58a20fd8c6bb71720b
psi: Optimize switching tasks inside shared cgroups

When switching tasks running on a CPU, the psi state of a cgroup
containing both of these tasks does not change. Right now, we don't
exploit that, and can perform many unnecessary state changes in nested
hierarchies, especially when most activity comes from one leaf cgroup.

This patch implements an optimization where we only update cgroups
whose state actually changes during a task switch. These are all
cgroups that contain one task but not the other, up to the first
shared ancestor. When both tasks are in the same group, we don't need
to update anything at all.

We can identify the first shared ancestor by walking the groups of the
incoming task until we see TSK_ONCPU set on the local CPU; that's the
first group that also contains the outgoing task.

The new psi_task_switch() is similar to psi_task_change(). To allow
code reuse, move the task flag maintenance code into a new function
and the poll/avg worker wakeups into the shared psi_group_change().

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200316191333.115523-3-hannes@cmpxchg.org
include/linux/psi.h
kernel/sched/psi.c
kernel/sched/stats.h