rcu-tasks: Use schedule_hrtimeout_range() to wait for grace periods
authorSebastian Andrzej Siewior <bigeasy@linutronix.de>
Tue, 8 Mar 2022 17:54:13 +0000 (09:54 -0800)
committerPaul E. McKenney <paulmck@kernel.org>
Tue, 12 Apr 2022 00:06:42 +0000 (17:06 -0700)
The synchronous RCU-tasks grace-period-wait primitives invoke
schedule_timeout_idle() to give readers a chance to exit their
read-side critical sections.  Unfortunately, this fails during early
boot on PREEMPT_RT because PREEMPT_RT relies solely on ksoftirqd to run
timer handlers.  Because ksoftirqd cannot operate until its kthreads
are spawned, there is a brief period of time following scheduler
initialization where PREEMPT_RT cannot run the timer handlers that
schedule_timeout_idle() relies on, resulting in a hang.

To avoid this boot-time hang, this commit replaces schedule_timeout_idle()
with schedule_hrtimeout(), so that the timer expires in hardirq context.
This is ensures that the timer fires even on PREEMPT_RT throughout the
irqs-enabled portions of boot as well as during runtime.

The timer is set to expire between fract and fract + HZ / 2 jiffies in
order to align with any other timers that might expire during that time,
thus reducing the number of wakeups.

Note that RCU-tasks grace periods are infrequent, so the use of hrtimer
should be fine.  In contrast, in common-case code, user of hrtimer
could result in performance issues.

Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
kernel/rcu/tasks.h

index 4b91cb2..71fe340 100644 (file)
@@ -647,13 +647,16 @@ static void rcu_tasks_wait_gp(struct rcu_tasks *rtp)
        fract = rtp->init_fract;
 
        while (!list_empty(&holdouts)) {
+               ktime_t exp;
                bool firstreport;
                bool needreport;
                int rtst;
 
                // Slowly back off waiting for holdouts
                set_tasks_gp_state(rtp, RTGS_WAIT_SCAN_HOLDOUTS);
-               schedule_timeout_idle(fract);
+               exp = jiffies_to_nsecs(fract);
+               __set_current_state(TASK_IDLE);
+               schedule_hrtimeout_range(&exp, jiffies_to_nsecs(HZ / 2), HRTIMER_MODE_REL_HARD);
 
                if (fract < HZ)
                        fract++;