Merge branches 'cpuinfo.2020.11.06a', 'doc.2020.11.06a', 'fixes.2020.11.19b', 'lockde...
authorPaul E. McKenney <paulmck@kernel.org>
Fri, 20 Nov 2020 03:37:47 +0000 (19:37 -0800)
committerPaul E. McKenney <paulmck@kernel.org>
Fri, 20 Nov 2020 03:37:47 +0000 (19:37 -0800)
cpuinfo.2020.11.06a: Speedups for /proc/cpuinfo.
doc.2020.11.06a: Documentation updates.
fixes.2020.11.19b: Miscellaneous fixes.
lockdep.2020.11.02a: Lockdep-RCU updates to avoid "unused variable".
tasks.2020.11.06a: Tasks-RCU updates.
torture.2020.11.06a': Torture-test updates.

44 files changed:
Documentation/RCU/Design/Requirements/Requirements.rst
Documentation/RCU/checklist.rst
Documentation/RCU/rcu_dereference.rst
Documentation/RCU/whatisRCU.rst
arch/x86/kernel/cpu/aperfmperf.c
arch/x86/kernel/cpu/mtrr/mtrr.c
arch/x86/kernel/smpboot.c
include/linux/kernel.h
include/linux/list.h
include/linux/lockdep.h
include/linux/rcupdate.h
include/linux/rcupdate_trace.h
include/linux/rcutiny.h
include/linux/rcutree.h
include/linux/sched/task.h
include/net/sch_generic.h
include/net/sock.h
kernel/locking/locktorture.c
kernel/rcu/Kconfig
kernel/rcu/rcu_segcblist.h
kernel/rcu/rcuscale.c
kernel/rcu/rcutorture.c
kernel/rcu/refscale.c
kernel/rcu/srcutree.c
kernel/rcu/tree.c
kernel/rcu/tree.h
kernel/rcu/tree_plugin.h
kernel/rcu/tree_stall.h
kernel/scftorture.c
kernel/sysctl.c
kernel/torture.c
tools/include/nolibc/nolibc.h
tools/testing/selftests/rcutorture/bin/console-badness.sh
tools/testing/selftests/rcutorture/bin/functions.sh
tools/testing/selftests/rcutorture/bin/kvm-check-branches.sh
tools/testing/selftests/rcutorture/bin/kvm-recheck-rcuscale.sh
tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
tools/testing/selftests/rcutorture/bin/kvm.sh
tools/testing/selftests/rcutorture/bin/parse-console.sh
tools/testing/selftests/rcutorture/configs/rcu/SRCU-t
tools/testing/selftests/rcutorture/configs/rcu/SRCU-u
tools/testing/selftests/rcutorture/configs/rcuscale/CFcommon
tools/testing/selftests/rcutorture/configs/rcuscale/TRACE01 [new file with mode: 0644]
tools/testing/selftests/rcutorture/configs/rcuscale/TRACE01.boot [new file with mode: 0644]

index 1ae79a1..e8c84fc 100644 (file)
@@ -1929,16 +1929,46 @@ The Linux-kernel CPU-hotplug implementation has notifiers that are used
 to allow the various kernel subsystems (including RCU) to respond
 appropriately to a given CPU-hotplug operation. Most RCU operations may
 be invoked from CPU-hotplug notifiers, including even synchronous
-grace-period operations such as ``synchronize_rcu()`` and
-``synchronize_rcu_expedited()``.
-
-However, all-callback-wait operations such as ``rcu_barrier()`` are also
-not supported, due to the fact that there are phases of CPU-hotplug
-operations where the outgoing CPU's callbacks will not be invoked until
-after the CPU-hotplug operation ends, which could also result in
-deadlock. Furthermore, ``rcu_barrier()`` blocks CPU-hotplug operations
-during its execution, which results in another type of deadlock when
-invoked from a CPU-hotplug notifier.
+grace-period operations such as (``synchronize_rcu()`` and
+``synchronize_rcu_expedited()``).  However, these synchronous operations
+do block and therefore cannot be invoked from notifiers that execute via
+``stop_machine()``, specifically those between the ``CPUHP_AP_OFFLINE``
+and ``CPUHP_AP_ONLINE`` states.
+
+In addition, all-callback-wait operations such as ``rcu_barrier()`` may
+not be invoked from any CPU-hotplug notifier.  This restriction is due
+to the fact that there are phases of CPU-hotplug operations where the
+outgoing CPU's callbacks will not be invoked until after the CPU-hotplug
+operation ends, which could also result in deadlock. Furthermore,
+``rcu_barrier()`` blocks CPU-hotplug operations during its execution,
+which results in another type of deadlock when invoked from a CPU-hotplug
+notifier.
+
+Finally, RCU must avoid deadlocks due to interaction between hotplug,
+timers and grace period processing. It does so by maintaining its own set
+of books that duplicate the centrally maintained ``cpu_online_mask``,
+and also by reporting quiescent states explicitly when a CPU goes
+offline.  This explicit reporting of quiescent states avoids any need
+for the force-quiescent-state loop (FQS) to report quiescent states for
+offline CPUs.  However, as a debugging measure, the FQS loop does splat
+if offline CPUs block an RCU grace period for too long.
+
+An offline CPU's quiescent state will be reported either:
+
+1.  As the CPU goes offline using RCU's hotplug notifier (``rcu_report_dead()``).
+2.  When grace period initialization (``rcu_gp_init()``) detects a
+    race either with CPU offlining or with a task unblocking on a leaf
+    ``rcu_node`` structure whose CPUs are all offline.
+
+The CPU-online path (``rcu_cpu_starting()``) should never need to report
+a quiescent state for an offline CPU.  However, as a debugging measure,
+it does emit a warning if a quiescent state was not already reported
+for that CPU.
+
+During the checking/modification of RCU's hotplug bookkeeping, the
+corresponding CPU's leaf node lock is held. This avoids race conditions
+between RCU's hotplug notifier hooks, the grace period initialization
+code, and the FQS loop, all of which refer to or modify this bookkeeping.
 
 Scheduler and RCU
 ~~~~~~~~~~~~~~~~~
index 2efed99..bb7128e 100644 (file)
@@ -314,6 +314,13 @@ over a rather long period of time, but improvements are always welcome!
        shared between readers and updaters.  Additional primitives
        are provided for this case, as discussed in lockdep.txt.
 
+       One exception to this rule is when data is only ever added to
+       the linked data structure, and is never removed during any
+       time that readers might be accessing that structure.  In such
+       cases, READ_ONCE() may be used in place of rcu_dereference()
+       and the read-side markers (rcu_read_lock() and rcu_read_unlock(),
+       for example) may be omitted.
+
 10.    Conversely, if you are in an RCU read-side critical section,
        and you don't hold the appropriate update-side lock, you -must-
        use the "_rcu()" variants of the list macros.  Failing to do so
index c9667eb..f3e587a 100644 (file)
@@ -28,6 +28,12 @@ Follow these rules to keep your RCU code working properly:
        for an example where the compiler can in fact deduce the exact
        value of the pointer, and thus cause misordering.
 
+-      In the special case where data is added but is never removed
+       while readers are accessing the structure, READ_ONCE() may be used
+       instead of rcu_dereference().  In this case, use of READ_ONCE()
+       takes on the role of the lockless_dereference() primitive that
+       was removed in v4.15.
+
 -      You are only permitted to use rcu_dereference on pointer values.
        The compiler simply knows too much about integral values to
        trust it to carry dependencies through integer operations.
index fb3ff76..1a4723f 100644 (file)
@@ -497,8 +497,7 @@ long -- there might be other high-priority work to be done.
 In such cases, one uses call_rcu() rather than synchronize_rcu().
 The call_rcu() API is as follows::
 
-       void call_rcu(struct rcu_head * head,
-                     void (*func)(struct rcu_head *head));
+       void call_rcu(struct rcu_head *head, rcu_callback_t func);
 
 This function invokes func(head) after a grace period has elapsed.
 This invocation might happen from either softirq or process context,
index e2f319d..22911de 100644 (file)
 #include <linux/cpufreq.h>
 #include <linux/smp.h>
 #include <linux/sched/isolation.h>
+#include <linux/rcupdate.h>
 
 #include "cpu.h"
 
 struct aperfmperf_sample {
        unsigned int    khz;
+       atomic_t        scfpending;
        ktime_t time;
        u64     aperf;
        u64     mperf;
@@ -62,17 +64,20 @@ static void aperfmperf_snapshot_khz(void *dummy)
        s->aperf = aperf;
        s->mperf = mperf;
        s->khz = div64_u64((cpu_khz * aperf_delta), mperf_delta);
+       atomic_set_release(&s->scfpending, 0);
 }
 
 static bool aperfmperf_snapshot_cpu(int cpu, ktime_t now, bool wait)
 {
        s64 time_delta = ktime_ms_delta(now, per_cpu(samples.time, cpu));
+       struct aperfmperf_sample *s = per_cpu_ptr(&samples, cpu);
 
        /* Don't bother re-computing within the cache threshold time. */
        if (time_delta < APERFMPERF_CACHE_THRESHOLD_MS)
                return true;
 
-       smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, wait);
+       if (!atomic_xchg(&s->scfpending, 1) || wait)
+               smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, wait);
 
        /* Return false if the previous iteration was too long ago. */
        return time_delta <= APERFMPERF_STALE_THRESHOLD_MS;
@@ -89,6 +94,9 @@ unsigned int aperfmperf_get_khz(int cpu)
        if (!housekeeping_cpu(cpu, HK_FLAG_MISC))
                return 0;
 
+       if (rcu_is_idle_cpu(cpu))
+               return 0; /* Idle CPUs are completely uninteresting. */
+
        aperfmperf_snapshot_cpu(cpu, ktime_get(), true);
        return per_cpu(samples.khz, cpu);
 }
@@ -108,6 +116,8 @@ void arch_freq_prepare_all(void)
        for_each_online_cpu(cpu) {
                if (!housekeeping_cpu(cpu, HK_FLAG_MISC))
                        continue;
+               if (rcu_is_idle_cpu(cpu))
+                       continue; /* Idle CPUs are completely uninteresting. */
                if (!aperfmperf_snapshot_cpu(cpu, now, false))
                        wait = true;
        }
@@ -118,6 +128,8 @@ void arch_freq_prepare_all(void)
 
 unsigned int arch_freq_get_on_cpu(int cpu)
 {
+       struct aperfmperf_sample *s = per_cpu_ptr(&samples, cpu);
+
        if (!cpu_khz)
                return 0;
 
@@ -131,6 +143,8 @@ unsigned int arch_freq_get_on_cpu(int cpu)
                return per_cpu(samples.khz, cpu);
 
        msleep(APERFMPERF_REFRESH_DELAY_MS);
+       atomic_set(&s->scfpending, 1);
+       smp_mb(); /* ->scfpending before smp_call_function_single(). */
        smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, 1);
 
        return per_cpu(samples.khz, cpu);
index 6a80f36..5f436cb 100644 (file)
@@ -794,8 +794,6 @@ void mtrr_ap_init(void)
        if (!use_intel() || mtrr_aps_delayed_init)
                return;
 
-       rcu_cpu_starting(smp_processor_id());
-
        /*
         * Ideally we should hold mtrr_mutex here to avoid mtrr entries
         * changed, but this routine will be called in cpu boot time,
index de776b2..99bdceb 100644 (file)
@@ -229,6 +229,7 @@ static void notrace start_secondary(void *unused)
 #endif
        cpu_init_exception_handling();
        cpu_init();
+       rcu_cpu_starting(raw_smp_processor_id());
        x86_cpuinit.early_percpu_clock_init();
        preempt_disable();
        smp_callin();
index 2f05e91..4b5fd3d 100644 (file)
@@ -536,6 +536,7 @@ extern int panic_on_warn;
 extern unsigned long panic_on_taint;
 extern bool panic_on_taint_nousertaint;
 extern int sysctl_panic_on_rcu_stall;
+extern int sysctl_max_rcu_stall_to_panic;
 extern int sysctl_panic_on_stackoverflow;
 
 extern bool crash_kexec_post_notifiers;
index a18c87b..89bdc92 100644 (file)
@@ -9,7 +9,7 @@
 #include <linux/kernel.h>
 
 /*
- * Simple doubly linked list implementation.
+ * Circular doubly linked list implementation.
  *
  * Some of the internal functions ("__xxx") are useful when
  * manipulating whole lists rather than single entries, as
index f559487..ccc3ce6 100644 (file)
@@ -375,6 +375,12 @@ static inline void lockdep_unregister_key(struct lock_class_key *key)
 
 #define lockdep_depth(tsk)     (0)
 
+/*
+ * Dummy forward declarations, allow users to write less ifdef-y code
+ * and depend on dead code elimination.
+ */
+extern int lock_is_held(const void *);
+extern int lockdep_is_held(const void *);
 #define lockdep_is_held_type(l, r)             (1)
 
 #define lockdep_assert_held(l)                 do { (void)(l); } while (0)
index 6cdd015..de08264 100644 (file)
@@ -241,6 +241,11 @@ bool rcu_lockdep_current_cpu_online(void);
 static inline bool rcu_lockdep_current_cpu_online(void) { return true; }
 #endif /* #else #if defined(CONFIG_HOTPLUG_CPU) && defined(CONFIG_PROVE_RCU) */
 
+extern struct lockdep_map rcu_lock_map;
+extern struct lockdep_map rcu_bh_lock_map;
+extern struct lockdep_map rcu_sched_lock_map;
+extern struct lockdep_map rcu_callback_map;
+
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 
 static inline void rcu_lock_acquire(struct lockdep_map *map)
@@ -253,10 +258,6 @@ static inline void rcu_lock_release(struct lockdep_map *map)
        lock_release(map, _THIS_IP_);
 }
 
-extern struct lockdep_map rcu_lock_map;
-extern struct lockdep_map rcu_bh_lock_map;
-extern struct lockdep_map rcu_sched_lock_map;
-extern struct lockdep_map rcu_callback_map;
 int debug_lockdep_rcu_enabled(void);
 int rcu_read_lock_held(void);
 int rcu_read_lock_bh_held(void);
@@ -327,7 +328,7 @@ static inline void rcu_preempt_sleep_check(void) { }
 
 #else /* #ifdef CONFIG_PROVE_RCU */
 
-#define RCU_LOCKDEP_WARN(c, s) do { } while (0)
+#define RCU_LOCKDEP_WARN(c, s) do { } while (0 && (c))
 #define rcu_sleep_check() do { } while (0)
 
 #endif /* #else #ifdef CONFIG_PROVE_RCU */
index 3e7919f..86c8f6c 100644 (file)
 #include <linux/sched.h>
 #include <linux/rcupdate.h>
 
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
-
 extern struct lockdep_map rcu_trace_lock_map;
 
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+
 static inline int rcu_read_lock_trace_held(void)
 {
        return lock_is_held(&rcu_trace_lock_map);
index 7c1ecdb..2a97334 100644 (file)
@@ -89,6 +89,8 @@ static inline void rcu_irq_enter_irqson(void) { }
 static inline void rcu_irq_exit(void) { }
 static inline void rcu_irq_exit_preempt(void) { }
 static inline void rcu_irq_exit_check_preempt(void) { }
+#define rcu_is_idle_cpu(cpu) \
+       (is_idle_task(current) && !in_nmi() && !in_irq() && !in_serving_softirq())
 static inline void exit_rcu(void) { }
 static inline bool rcu_preempt_need_deferred_qs(struct task_struct *t)
 {
index 59eb5cd..df578b7 100644 (file)
@@ -50,6 +50,7 @@ void rcu_irq_exit(void);
 void rcu_irq_exit_preempt(void);
 void rcu_irq_enter_irqson(void);
 void rcu_irq_exit_irqson(void);
+bool rcu_is_idle_cpu(int cpu);
 
 #ifdef CONFIG_PROVE_RCU
 void rcu_irq_exit_check_preempt(void);
index 85fb2f3..c0f71f2 100644 (file)
@@ -47,9 +47,7 @@ extern spinlock_t mmlist_lock;
 extern union thread_union init_thread_union;
 extern struct task_struct init_task;
 
-#ifdef CONFIG_PROVE_RCU
 extern int lockdep_tasklist_lock_is_held(void);
-#endif /* #ifdef CONFIG_PROVE_RCU */
 
 extern asmlinkage void schedule_tail(struct task_struct *prev);
 extern void init_idle(struct task_struct *idle, int cpu);
index d8fd867..749db62 100644 (file)
@@ -435,7 +435,6 @@ struct tcf_block {
        struct mutex proto_destroy_lock; /* Lock for proto_destroy hashtable. */
 };
 
-#ifdef CONFIG_PROVE_LOCKING
 static inline bool lockdep_tcf_chain_is_locked(struct tcf_chain *chain)
 {
        return lockdep_is_held(&chain->filter_chain_lock);
@@ -445,17 +444,6 @@ static inline bool lockdep_tcf_proto_is_locked(struct tcf_proto *tp)
 {
        return lockdep_is_held(&tp->lock);
 }
-#else
-static inline bool lockdep_tcf_chain_is_locked(struct tcf_block *chain)
-{
-       return true;
-}
-
-static inline bool lockdep_tcf_proto_is_locked(struct tcf_proto *tp)
-{
-       return true;
-}
-#endif /* #ifdef CONFIG_PROVE_LOCKING */
 
 #define tcf_chain_dereference(p, chain)                                        \
        rcu_dereference_protected(p, lockdep_tcf_chain_is_locked(chain))
index a5c6ae7..198d548 100644 (file)
@@ -1566,13 +1566,11 @@ do {                                                                    \
        lockdep_init_map(&(sk)->sk_lock.dep_map, (name), (key), 0);     \
 } while (0)
 
-#ifdef CONFIG_LOCKDEP
 static inline bool lockdep_sock_is_held(const struct sock *sk)
 {
        return lockdep_is_held(&sk->sk_lock) ||
               lockdep_is_held(&sk->sk_lock.slock);
 }
-#endif
 
 void lock_sock_nested(struct sock *sk, int subclass);
 
index 62d215b..fd838ce 100644 (file)
@@ -29,6 +29,7 @@
 #include <linux/slab.h>
 #include <linux/percpu-rwsem.h>
 #include <linux/torture.h>
+#include <linux/reboot.h>
 
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Paul E. McKenney <paulmck@linux.ibm.com>");
@@ -60,6 +61,7 @@ static struct task_struct **reader_tasks;
 
 static bool lock_is_write_held;
 static bool lock_is_read_held;
+static unsigned long last_lock_release;
 
 struct lock_stress_stats {
        long n_lock_fail;
@@ -74,6 +76,7 @@ static void lock_torture_cleanup(void);
  */
 struct lock_torture_ops {
        void (*init)(void);
+       void (*exit)(void);
        int (*writelock)(void);
        void (*write_delay)(struct torture_random_state *trsp);
        void (*task_boost)(struct torture_random_state *trsp);
@@ -90,12 +93,13 @@ struct lock_torture_cxt {
        int nrealwriters_stress;
        int nrealreaders_stress;
        bool debug_lock;
+       bool init_called;
        atomic_t n_lock_torture_errors;
        struct lock_torture_ops *cur_ops;
        struct lock_stress_stats *lwsa; /* writer statistics */
        struct lock_stress_stats *lrsa; /* reader statistics */
 };
-static struct lock_torture_cxt cxt = { 0, 0, false,
+static struct lock_torture_cxt cxt = { 0, 0, false, false,
                                       ATOMIC_INIT(0),
                                       NULL, NULL};
 /*
@@ -571,6 +575,11 @@ static void torture_percpu_rwsem_init(void)
        BUG_ON(percpu_init_rwsem(&pcpu_rwsem));
 }
 
+static void torture_percpu_rwsem_exit(void)
+{
+       percpu_free_rwsem(&pcpu_rwsem);
+}
+
 static int torture_percpu_rwsem_down_write(void) __acquires(pcpu_rwsem)
 {
        percpu_down_write(&pcpu_rwsem);
@@ -595,6 +604,7 @@ static void torture_percpu_rwsem_up_read(void) __releases(pcpu_rwsem)
 
 static struct lock_torture_ops percpu_rwsem_lock_ops = {
        .init           = torture_percpu_rwsem_init,
+       .exit           = torture_percpu_rwsem_exit,
        .writelock      = torture_percpu_rwsem_down_write,
        .write_delay    = torture_rwsem_write_delay,
        .task_boost     = torture_boost_dummy,
@@ -632,6 +642,7 @@ static int lock_torture_writer(void *arg)
                lwsp->n_lock_acquired++;
                cxt.cur_ops->write_delay(&rand);
                lock_is_write_held = false;
+               WRITE_ONCE(last_lock_release, jiffies);
                cxt.cur_ops->writeunlock();
 
                stutter_wait("lock_torture_writer");
@@ -786,9 +797,10 @@ static void lock_torture_cleanup(void)
 
        /*
         * Indicates early cleanup, meaning that the test has not run,
-        * such as when passing bogus args when loading the module. As
-        * such, only perform the underlying torture-specific cleanups,
-        * and avoid anything related to locktorture.
+        * such as when passing bogus args when loading the module.
+        * However cxt->cur_ops.init() may have been invoked, so beside
+        * perform the underlying torture-specific cleanups, cur_ops.exit()
+        * will be invoked if needed.
         */
        if (!cxt.lwsa && !cxt.lrsa)
                goto end;
@@ -828,6 +840,11 @@ static void lock_torture_cleanup(void)
        cxt.lrsa = NULL;
 
 end:
+       if (cxt.init_called) {
+               if (cxt.cur_ops->exit)
+                       cxt.cur_ops->exit();
+               cxt.init_called = false;
+       }
        torture_cleanup_end();
 }
 
@@ -868,14 +885,17 @@ static int __init lock_torture_init(void)
                goto unwind;
        }
 
-       if (nwriters_stress == 0 && nreaders_stress == 0) {
+       if (nwriters_stress == 0 &&
+           (!cxt.cur_ops->readlock || nreaders_stress == 0)) {
                pr_alert("lock-torture: must run at least one locking thread\n");
                firsterr = -EINVAL;
                goto unwind;
        }
 
-       if (cxt.cur_ops->init)
+       if (cxt.cur_ops->init) {
                cxt.cur_ops->init();
+               cxt.init_called = true;
+       }
 
        if (nwriters_stress >= 0)
                cxt.nrealwriters_stress = nwriters_stress;
@@ -1038,6 +1058,10 @@ static int __init lock_torture_init(void)
 unwind:
        torture_init_end();
        lock_torture_cleanup();
+       if (shutdown_secs) {
+               WARN_ON(!IS_MODULE(CONFIG_LOCK_TORTURE_TEST));
+               kernel_power_off();
+       }
        return firsterr;
 }
 
index b71e21f..cdc57b4 100644 (file)
@@ -221,19 +221,23 @@ config RCU_NOCB_CPU
          Use this option to reduce OS jitter for aggressive HPC or
          real-time workloads.  It can also be used to offload RCU
          callback invocation to energy-efficient CPUs in battery-powered
-         asymmetric multiprocessors.
+         asymmetric multiprocessors.  The price of this reduced jitter
+         is that the overhead of call_rcu() increases and that some
+         workloads will incur significant increases in context-switch
+         rates.
 
          This option offloads callback invocation from the set of CPUs
          specified at boot time by the rcu_nocbs parameter.  For each
          such CPU, a kthread ("rcuox/N") will be created to invoke
          callbacks, where the "N" is the CPU being offloaded, and where
-         the "p" for RCU-preempt (PREEMPTION kernels) and "s" for RCU-sched
-         (!PREEMPTION kernels).  Nothing prevents this kthread from running
-         on the specified CPUs, but (1) the kthreads may be preempted
-         between each callback, and (2) affinity or cgroups can be used
-         to force the kthreads to run on whatever set of CPUs is desired.
-
-         Say Y here if you want to help to debug reduced OS jitter.
+         the "x" is "p" for RCU-preempt (PREEMPTION kernels) and "s" for
+         RCU-sched (!PREEMPTION kernels).  Nothing prevents this kthread
+         from running on the specified CPUs, but (1) the kthreads may be
+         preempted between each callback, and (2) affinity or cgroups can
+         be used to force the kthreads to run on whatever set of CPUs is
+         desired.
+
+         Say Y here if you need reduced OS jitter, despite added overhead.
          Say N here if you are unsure.
 
 config TASKS_TRACE_RCU_READ_MB
index 5c293af..492262b 100644 (file)
@@ -62,7 +62,7 @@ static inline bool rcu_segcblist_is_enabled(struct rcu_segcblist *rsclp)
 /* Is the specified rcu_segcblist offloaded?  */
 static inline bool rcu_segcblist_is_offloaded(struct rcu_segcblist *rsclp)
 {
-       return rsclp->offloaded;
+       return IS_ENABLED(CONFIG_RCU_NOCB_CPU) && rsclp->offloaded;
 }
 
 /*
index 2819b95..06491d5 100644 (file)
@@ -38,6 +38,7 @@
 #include <asm/byteorder.h>
 #include <linux/torture.h>
 #include <linux/vmalloc.h>
+#include <linux/rcupdate_trace.h>
 
 #include "rcu.h"
 
@@ -294,6 +295,35 @@ static struct rcu_scale_ops tasks_ops = {
        .name           = "tasks"
 };
 
+/*
+ * Definitions for RCU-tasks-trace scalability testing.
+ */
+
+static int tasks_trace_scale_read_lock(void)
+{
+       rcu_read_lock_trace();
+       return 0;
+}
+
+static void tasks_trace_scale_read_unlock(int idx)
+{
+       rcu_read_unlock_trace();
+}
+
+static struct rcu_scale_ops tasks_tracing_ops = {
+       .ptype          = RCU_TASKS_FLAVOR,
+       .init           = rcu_sync_scale_init,
+       .readlock       = tasks_trace_scale_read_lock,
+       .readunlock     = tasks_trace_scale_read_unlock,
+       .get_gp_seq     = rcu_no_completed,
+       .gp_diff        = rcu_seq_diff,
+       .async          = call_rcu_tasks_trace,
+       .gp_barrier     = rcu_barrier_tasks_trace,
+       .sync           = synchronize_rcu_tasks_trace,
+       .exp_sync       = synchronize_rcu_tasks_trace,
+       .name           = "tasks-tracing"
+};
+
 static unsigned long rcuscale_seq_diff(unsigned long new, unsigned long old)
 {
        if (!cur_ops->gp_diff)
@@ -754,7 +784,7 @@ rcu_scale_init(void)
        long i;
        int firsterr = 0;
        static struct rcu_scale_ops *scale_ops[] = {
-               &rcu_ops, &srcu_ops, &srcud_ops, &tasks_ops,
+               &rcu_ops, &srcu_ops, &srcud_ops, &tasks_ops, &tasks_tracing_ops
        };
 
        if (!torture_init_begin(scale_type, verbose))
@@ -772,7 +802,6 @@ rcu_scale_init(void)
                for (i = 0; i < ARRAY_SIZE(scale_ops); i++)
                        pr_cont(" %s", scale_ops[i]->name);
                pr_cont("\n");
-               WARN_ON(!IS_MODULE(CONFIG_RCU_SCALE_TEST));
                firsterr = -EINVAL;
                cur_ops = NULL;
                goto unwind;
@@ -846,6 +875,10 @@ rcu_scale_init(void)
 unwind:
        torture_init_end();
        rcu_scale_cleanup();
+       if (shutdown) {
+               WARN_ON(!IS_MODULE(CONFIG_RCU_SCALE_TEST));
+               kernel_power_off();
+       }
        return firsterr;
 }
 
index c811f23..528ed10 100644 (file)
@@ -917,7 +917,8 @@ static int rcu_torture_boost(void *arg)
                oldstarttime = boost_starttime;
                while (time_before(jiffies, oldstarttime)) {
                        schedule_timeout_interruptible(oldstarttime - jiffies);
-                       stutter_wait("rcu_torture_boost");
+                       if (stutter_wait("rcu_torture_boost"))
+                               sched_set_fifo_low(current);
                        if (torture_must_stop())
                                goto checkwait;
                }
@@ -937,7 +938,8 @@ static int rcu_torture_boost(void *arg)
                                                                 jiffies);
                                call_rcu_time = jiffies;
                        }
-                       stutter_wait("rcu_torture_boost");
+                       if (stutter_wait("rcu_torture_boost"))
+                               sched_set_fifo_low(current);
                        if (torture_must_stop())
                                goto checkwait;
                }
@@ -969,7 +971,8 @@ static int rcu_torture_boost(void *arg)
                }
 
                /* Go do the stutter. */
-checkwait:     stutter_wait("rcu_torture_boost");
+checkwait:     if (stutter_wait("rcu_torture_boost"))
+                       sched_set_fifo_low(current);
        } while (!torture_must_stop());
 
        /* Clean up and exit. */
@@ -992,6 +995,7 @@ rcu_torture_fqs(void *arg)
 {
        unsigned long fqs_resume_time;
        int fqs_burst_remaining;
+       int oldnice = task_nice(current);
 
        VERBOSE_TOROUT_STRING("rcu_torture_fqs task started");
        do {
@@ -1007,7 +1011,8 @@ rcu_torture_fqs(void *arg)
                        udelay(fqs_holdoff);
                        fqs_burst_remaining -= fqs_holdoff;
                }
-               stutter_wait("rcu_torture_fqs");
+               if (stutter_wait("rcu_torture_fqs"))
+                       sched_set_normal(current, oldnice);
        } while (!torture_must_stop());
        torture_kthread_stopping("rcu_torture_fqs");
        return 0;
@@ -1027,9 +1032,11 @@ rcu_torture_writer(void *arg)
        bool gp_cond1 = gp_cond, gp_exp1 = gp_exp, gp_normal1 = gp_normal;
        bool gp_sync1 = gp_sync;
        int i;
+       int oldnice = task_nice(current);
        struct rcu_torture *rp;
        struct rcu_torture *old_rp;
        static DEFINE_TORTURE_RANDOM(rand);
+       bool stutter_waited;
        int synctype[] = { RTWS_DEF_FREE, RTWS_EXP_SYNC,
                           RTWS_COND_GET, RTWS_SYNC };
        int nsynctypes = 0;
@@ -1148,7 +1155,8 @@ rcu_torture_writer(void *arg)
                                       !rcu_gp_is_normal();
                }
                rcu_torture_writer_state = RTWS_STUTTER;
-               if (stutter_wait("rcu_torture_writer") &&
+               stutter_waited = stutter_wait("rcu_torture_writer");
+               if (stutter_waited &&
                    !READ_ONCE(rcu_fwd_cb_nodelay) &&
                    !cur_ops->slow_gps &&
                    !torture_must_stop() &&
@@ -1160,6 +1168,8 @@ rcu_torture_writer(void *arg)
                                        rcu_ftrace_dump(DUMP_ALL);
                                        WARN(1, "%s: rtort_pipe_count: %d\n", __func__, rcu_tortures[i].rtort_pipe_count);
                                }
+               if (stutter_waited)
+                       sched_set_normal(current, oldnice);
        } while (!torture_must_stop());
        rcu_torture_current = NULL;  // Let stats task know that we are done.
        /* Reset expediting back to unexpedited. */
@@ -1919,7 +1929,9 @@ static void rcu_torture_fwd_prog_nr(struct rcu_fwd *rfp,
        unsigned long stopat;
        static DEFINE_TORTURE_RANDOM(trs);
 
-       if  (cur_ops->call && cur_ops->sync && cur_ops->cb_barrier) {
+       if (!cur_ops->sync)
+               return; // Cannot do need_resched() forward progress testing without ->sync.
+       if (cur_ops->call && cur_ops->cb_barrier) {
                init_rcu_head_on_stack(&fcs.rh);
                selfpropcb = true;
        }
@@ -2109,6 +2121,7 @@ static struct notifier_block rcutorture_oom_nb = {
 /* Carry out grace-period forward-progress testing. */
 static int rcu_torture_fwd_prog(void *args)
 {
+       int oldnice = task_nice(current);
        struct rcu_fwd *rfp = args;
        int tested = 0;
        int tested_tries = 0;
@@ -2127,7 +2140,8 @@ static int rcu_torture_fwd_prog(void *args)
                        rcu_torture_fwd_prog_cr(rfp);
 
                /* Avoid slow periods, better to test when busy. */
-               stutter_wait("rcu_torture_fwd_prog");
+               if (stutter_wait("rcu_torture_fwd_prog"))
+                       sched_set_normal(current, oldnice);
        } while (!torture_must_stop());
        /* Short runs might not contain a valid forward-progress attempt. */
        WARN_ON(!tested && tested_tries >= 5);
@@ -2143,8 +2157,8 @@ static int __init rcu_torture_fwd_prog_init(void)
 
        if (!fwd_progress)
                return 0; /* Not requested, so don't do it. */
-       if (!cur_ops->stall_dur || cur_ops->stall_dur() <= 0 ||
-           cur_ops == &rcu_busted_ops) {
+       if ((!cur_ops->sync && !cur_ops->call) ||
+           !cur_ops->stall_dur || cur_ops->stall_dur() <= 0 || cur_ops == &rcu_busted_ops) {
                VERBOSE_TOROUT_STRING("rcu_torture_fwd_prog_init: Disabled, unsupported by RCU flavor under test");
                return 0;
        }
@@ -2491,13 +2505,13 @@ rcu_torture_cleanup(void)
                        torture_stop_kthread(rcu_torture_reader,
                                             reader_tasks[i]);
                kfree(reader_tasks);
+               reader_tasks = NULL;
        }
 
        if (fakewriter_tasks) {
-               for (i = 0; i < nfakewriters; i++) {
+               for (i = 0; i < nfakewriters; i++)
                        torture_stop_kthread(rcu_torture_fakewriter,
                                             fakewriter_tasks[i]);
-               }
                kfree(fakewriter_tasks);
                fakewriter_tasks = NULL;
        }
@@ -2654,7 +2668,6 @@ rcu_torture_init(void)
                for (i = 0; i < ARRAY_SIZE(torture_ops); i++)
                        pr_cont(" %s", torture_ops[i]->name);
                pr_cont("\n");
-               WARN_ON(!IS_MODULE(CONFIG_RCU_TORTURE_TEST));
                firsterr = -EINVAL;
                cur_ops = NULL;
                goto unwind;
@@ -2822,6 +2835,10 @@ rcu_torture_init(void)
 unwind:
        torture_init_end();
        rcu_torture_cleanup();
+       if (shutdown_secs) {
+               WARN_ON(!IS_MODULE(CONFIG_RCU_TORTURE_TEST));
+               kernel_power_off();
+       }
        return firsterr;
 }
 
index 952595c..23ff36a 100644 (file)
@@ -658,7 +658,6 @@ ref_scale_init(void)
                for (i = 0; i < ARRAY_SIZE(scale_ops); i++)
                        pr_cont(" %s", scale_ops[i]->name);
                pr_cont("\n");
-               WARN_ON(!IS_MODULE(CONFIG_RCU_REF_SCALE_TEST));
                firsterr = -EINVAL;
                cur_ops = NULL;
                goto unwind;
@@ -681,6 +680,12 @@ ref_scale_init(void)
        // Reader tasks (default to ~75% of online CPUs).
        if (nreaders < 0)
                nreaders = (num_online_cpus() >> 1) + (num_online_cpus() >> 2);
+       if (WARN_ONCE(loops <= 0, "%s: loops = %ld, adjusted to 1\n", __func__, loops))
+               loops = 1;
+       if (WARN_ONCE(nreaders <= 0, "%s: nreaders = %d, adjusted to 1\n", __func__, nreaders))
+               nreaders = 1;
+       if (WARN_ONCE(nruns <= 0, "%s: nruns = %d, adjusted to 1\n", __func__, nruns))
+               nruns = 1;
        reader_tasks = kcalloc(nreaders, sizeof(reader_tasks[0]),
                               GFP_KERNEL);
        if (!reader_tasks) {
@@ -712,6 +717,10 @@ ref_scale_init(void)
 unwind:
        torture_init_end();
        ref_scale_cleanup();
+       if (shutdown) {
+               WARN_ON(!IS_MODULE(CONFIG_RCU_REF_SCALE_TEST));
+               kernel_power_off();
+       }
        return firsterr;
 }
 
index c13348e..0f23d20 100644 (file)
@@ -177,11 +177,13 @@ static int init_srcu_struct_fields(struct srcu_struct *ssp, bool is_static)
        INIT_DELAYED_WORK(&ssp->work, process_srcu);
        if (!is_static)
                ssp->sda = alloc_percpu(struct srcu_data);
+       if (!ssp->sda)
+               return -ENOMEM;
        init_srcu_struct_nodes(ssp, is_static);
        ssp->srcu_gp_seq_needed_exp = 0;
        ssp->srcu_last_gp_end = ktime_get_mono_fast_ns();
        smp_store_release(&ssp->srcu_gp_seq_needed, 0); /* Init done. */
-       return ssp->sda ? 0 : -ENOMEM;
+       return 0;
 }
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
@@ -906,7 +908,7 @@ static void __synchronize_srcu(struct srcu_struct *ssp, bool do_norm)
 {
        struct rcu_synchronize rcu;
 
-       RCU_LOCKDEP_WARN(lock_is_held(&ssp->dep_map) ||
+       RCU_LOCKDEP_WARN(lockdep_is_held(ssp) ||
                         lock_is_held(&rcu_bh_lock_map) ||
                         lock_is_held(&rcu_lock_map) ||
                         lock_is_held(&rcu_sched_lock_map),
index 06895ef..516c689 100644 (file)
@@ -177,7 +177,7 @@ module_param(rcu_unlock_delay, int, 0444);
  * per-CPU. Object size is equal to one page. This value
  * can be changed at boot time.
  */
-static int rcu_min_cached_objs = 2;
+static int rcu_min_cached_objs = 5;
 module_param(rcu_min_cached_objs, int, 0444);
 
 /* Retrieve RCU kthreads priority for rcutorture */
@@ -341,6 +341,14 @@ static bool rcu_dynticks_in_eqs(int snap)
        return !(snap & RCU_DYNTICK_CTRL_CTR);
 }
 
+/* Return true if the specified CPU is currently idle from an RCU viewpoint.  */
+bool rcu_is_idle_cpu(int cpu)
+{
+       struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
+
+       return rcu_dynticks_in_eqs(rcu_dynticks_snap(rdp));
+}
+
 /*
  * Return true if the CPU corresponding to the specified rcu_data
  * structure has spent some time in an extended quiescent state since
@@ -546,12 +554,12 @@ static int param_set_next_fqs_jiffies(const char *val, const struct kernel_param
        return ret;
 }
 
-static struct kernel_param_ops first_fqs_jiffies_ops = {
+static const struct kernel_param_ops first_fqs_jiffies_ops = {
        .set = param_set_first_fqs_jiffies,
        .get = param_get_ulong,
 };
 
-static struct kernel_param_ops next_fqs_jiffies_ops = {
+static const struct kernel_param_ops next_fqs_jiffies_ops = {
        .set = param_set_next_fqs_jiffies,
        .get = param_get_ulong,
 };
@@ -928,8 +936,8 @@ void __rcu_irq_enter_check_tick(void)
 {
        struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
 
-        // Enabling the tick is unsafe in NMI handlers.
-       if (WARN_ON_ONCE(in_nmi()))
+       // If we're here from NMI there's nothing to do.
+       if (in_nmi())
                return;
 
        RCU_LOCKDEP_WARN(rcu_dynticks_curr_cpu_in_eqs(),
@@ -1093,8 +1101,11 @@ static void rcu_disable_urgency_upon_qs(struct rcu_data *rdp)
  * CPU can safely enter RCU read-side critical sections.  In other words,
  * if the current CPU is not in its idle loop or is in an interrupt or
  * NMI handler, return true.
+ *
+ * Make notrace because it can be called by the internal functions of
+ * ftrace, and making this notrace removes unnecessary recursion calls.
  */
-bool rcu_is_watching(void)
+notrace bool rcu_is_watching(void)
 {
        bool ret;
 
@@ -1149,7 +1160,7 @@ bool rcu_lockdep_current_cpu_online(void)
        preempt_disable_notrace();
        rdp = this_cpu_ptr(&rcu_data);
        rnp = rdp->mynode;
-       if (rdp->grpmask & rcu_rnp_online_cpus(rnp))
+       if (rdp->grpmask & rcu_rnp_online_cpus(rnp) || READ_ONCE(rnp->ofl_seq) & 0x1)
                ret = true;
        preempt_enable_notrace();
        return ret;
@@ -1603,8 +1614,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
 {
        bool ret = false;
        bool need_qs;
-       const bool offloaded = IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
-                              rcu_segcblist_is_offloaded(&rdp->cblist);
+       const bool offloaded = rcu_segcblist_is_offloaded(&rdp->cblist);
 
        raw_lockdep_assert_held_rcu_node(rnp);
 
@@ -1715,6 +1725,7 @@ static void rcu_strict_gp_boundary(void *unused)
  */
 static bool rcu_gp_init(void)
 {
+       unsigned long firstseq;
        unsigned long flags;
        unsigned long oldmask;
        unsigned long mask;
@@ -1758,6 +1769,12 @@ static bool rcu_gp_init(void)
         */
        rcu_state.gp_state = RCU_GP_ONOFF;
        rcu_for_each_leaf_node(rnp) {
+               smp_mb(); // Pair with barriers used when updating ->ofl_seq to odd values.
+               firstseq = READ_ONCE(rnp->ofl_seq);
+               if (firstseq & 0x1)
+                       while (firstseq == READ_ONCE(rnp->ofl_seq))
+                               schedule_timeout_idle(1);  // Can't wake unless RCU is watching.
+               smp_mb(); // Pair with barriers used when updating ->ofl_seq to even values.
                raw_spin_lock(&rcu_state.ofl_lock);
                raw_spin_lock_irq_rcu_node(rnp);
                if (rnp->qsmaskinit == rnp->qsmaskinitnext &&
@@ -2048,8 +2065,7 @@ static void rcu_gp_cleanup(void)
                needgp = true;
        }
        /* Advance CBs to reduce false positives below. */
-       offloaded = IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
-                   rcu_segcblist_is_offloaded(&rdp->cblist);
+       offloaded = rcu_segcblist_is_offloaded(&rdp->cblist);
        if ((offloaded || !rcu_accelerate_cbs(rnp, rdp)) && needgp) {
                WRITE_ONCE(rcu_state.gp_flags, RCU_GP_FLAG_INIT);
                WRITE_ONCE(rcu_state.gp_req_activity, jiffies);
@@ -2248,8 +2264,7 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
        unsigned long flags;
        unsigned long mask;
        bool needwake = false;
-       const bool offloaded = IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
-                              rcu_segcblist_is_offloaded(&rdp->cblist);
+       const bool offloaded = rcu_segcblist_is_offloaded(&rdp->cblist);
        struct rcu_node *rnp;
 
        WARN_ON_ONCE(rdp->cpu != smp_processor_id());
@@ -2399,6 +2414,7 @@ int rcutree_dead_cpu(unsigned int cpu)
        if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
                return 0;
 
+       WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
        /* Adjust any no-longer-needed kthreads. */
        rcu_boost_kthread_setaffinity(rnp, -1);
        /* Do any needed no-CB deferred wakeups from this CPU. */
@@ -2417,8 +2433,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
 {
        int div;
        unsigned long flags;
-       const bool offloaded = IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
-                              rcu_segcblist_is_offloaded(&rdp->cblist);
+       const bool offloaded = rcu_segcblist_is_offloaded(&rdp->cblist);
        struct rcu_head *rhp;
        struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl);
        long bl, count;
@@ -2675,8 +2690,7 @@ static __latent_entropy void rcu_core(void)
        unsigned long flags;
        struct rcu_data *rdp = raw_cpu_ptr(&rcu_data);
        struct rcu_node *rnp = rdp->mynode;
-       const bool offloaded = IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
-                              rcu_segcblist_is_offloaded(&rdp->cblist);
+       const bool offloaded = rcu_segcblist_is_offloaded(&rdp->cblist);
 
        if (cpu_is_offline(smp_processor_id()))
                return;
@@ -2978,8 +2992,7 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func)
                                   rcu_segcblist_n_cbs(&rdp->cblist));
 
        /* Go handle any RCU core processing required. */
-       if (IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
-           unlikely(rcu_segcblist_is_offloaded(&rdp->cblist))) {
+       if (unlikely(rcu_segcblist_is_offloaded(&rdp->cblist))) {
                __call_rcu_nocb_wake(rdp, was_alldone, flags); /* unlocks */
        } else {
                __call_rcu_core(rdp, head, flags);
@@ -3084,6 +3097,9 @@ struct kfree_rcu_cpu_work {
  *     In order to save some per-cpu space the list is singular.
  *     Even though it is lockless an access has to be protected by the
  *     per-cpu lock.
+ * @page_cache_work: A work to refill the cache when it is empty
+ * @work_in_progress: Indicates that page_cache_work is running
+ * @hrtimer: A hrtimer for scheduling a page_cache_work
  * @nr_bkv_objs: number of allocated objects at @bkvcache.
  *
  * This is a per-CPU structure.  The reason that it is not included in
@@ -3100,6 +3116,11 @@ struct kfree_rcu_cpu {
        bool monitor_todo;
        bool initialized;
        int count;
+
+       struct work_struct page_cache_work;
+       atomic_t work_in_progress;
+       struct hrtimer hrtimer;
+
        struct llist_head bkvcache;
        int nr_bkv_objs;
 };
@@ -3217,10 +3238,10 @@ static void kfree_rcu_work(struct work_struct *work)
                        }
                        rcu_lock_release(&rcu_callback_map);
 
-                       krcp = krc_this_cpu_lock(&flags);
+                       raw_spin_lock_irqsave(&krcp->lock, flags);
                        if (put_cached_bnode(krcp, bkvhead[i]))
                                bkvhead[i] = NULL;
-                       krc_this_cpu_unlock(krcp, flags);
+                       raw_spin_unlock_irqrestore(&krcp->lock, flags);
 
                        if (bkvhead[i])
                                free_page((unsigned long) bkvhead[i]);
@@ -3347,6 +3368,57 @@ static void kfree_rcu_monitor(struct work_struct *work)
                raw_spin_unlock_irqrestore(&krcp->lock, flags);
 }
 
+static enum hrtimer_restart
+schedule_page_work_fn(struct hrtimer *t)
+{
+       struct kfree_rcu_cpu *krcp =
+               container_of(t, struct kfree_rcu_cpu, hrtimer);
+
+       queue_work(system_highpri_wq, &krcp->page_cache_work);
+       return HRTIMER_NORESTART;
+}
+
+static void fill_page_cache_func(struct work_struct *work)
+{
+       struct kvfree_rcu_bulk_data *bnode;
+       struct kfree_rcu_cpu *krcp =
+               container_of(work, struct kfree_rcu_cpu,
+                       page_cache_work);
+       unsigned long flags;
+       bool pushed;
+       int i;
+
+       for (i = 0; i < rcu_min_cached_objs; i++) {
+               bnode = (struct kvfree_rcu_bulk_data *)
+                       __get_free_page(GFP_KERNEL | __GFP_NOWARN);
+
+               if (bnode) {
+                       raw_spin_lock_irqsave(&krcp->lock, flags);
+                       pushed = put_cached_bnode(krcp, bnode);
+                       raw_spin_unlock_irqrestore(&krcp->lock, flags);
+
+                       if (!pushed) {
+                               free_page((unsigned long) bnode);
+                               break;
+                       }
+               }
+       }
+
+       atomic_set(&krcp->work_in_progress, 0);
+}
+
+static void
+run_page_cache_worker(struct kfree_rcu_cpu *krcp)
+{
+       if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
+                       !atomic_xchg(&krcp->work_in_progress, 1)) {
+               hrtimer_init(&krcp->hrtimer, CLOCK_MONOTONIC,
+                       HRTIMER_MODE_REL);
+               krcp->hrtimer.function = schedule_page_work_fn;
+               hrtimer_start(&krcp->hrtimer, 0, HRTIMER_MODE_REL);
+       }
+}
+
 static inline bool
 kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
 {
@@ -3363,32 +3435,8 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
        if (!krcp->bkvhead[idx] ||
                        krcp->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) {
                bnode = get_cached_bnode(krcp);
-               if (!bnode) {
-                       /*
-                        * To keep this path working on raw non-preemptible
-                        * sections, prevent the optional entry into the
-                        * allocator as it uses sleeping locks. In fact, even
-                        * if the caller of kfree_rcu() is preemptible, this
-                        * path still is not, as krcp->lock is a raw spinlock.
-                        * With additional page pre-allocation in the works,
-                        * hitting this return is going to be much less likely.
-                        */
-                       if (IS_ENABLED(CONFIG_PREEMPT_RT))
-                               return false;
-
-                       /*
-                        * NOTE: For one argument of kvfree_rcu() we can
-                        * drop the lock and get the page in sleepable
-                        * context. That would allow to maintain an array
-                        * for the CONFIG_PREEMPT_RT as well if no cached
-                        * pages are available.
-                        */
-                       bnode = (struct kvfree_rcu_bulk_data *)
-                               __get_free_page(GFP_NOWAIT | __GFP_NOWARN);
-               }
-
                /* Switch to emergency path. */
-               if (unlikely(!bnode))
+               if (!bnode)
                        return false;
 
                /* Initialize the new block. */
@@ -3452,12 +3500,10 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
                goto unlock_return;
        }
 
-       /*
-        * Under high memory pressure GFP_NOWAIT can fail,
-        * in that case the emergency path is maintained.
-        */
        success = kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr);
        if (!success) {
+               run_page_cache_worker(krcp);
+
                if (head == NULL)
                        // Inline if kvfree_rcu(one_arg) call.
                        goto unlock_return;
@@ -3567,7 +3613,7 @@ void __init kfree_rcu_scheduler_running(void)
  * During early boot, any blocking grace-period wait automatically
  * implies a grace period.  Later on, this is never the case for PREEMPTION.
  *
- * Howevr, because a context switch is a grace period for !PREEMPTION, any
+ * However, because a context switch is a grace period for !PREEMPTION, any
  * blocking grace-period wait automatically implies a grace period if
  * there is only one CPU online at any point time during execution of
  * either synchronize_rcu() or synchronize_rcu_expedited().  It is OK to
@@ -3583,7 +3629,20 @@ static int rcu_blocking_is_gp(void)
                return rcu_scheduler_active == RCU_SCHEDULER_INACTIVE;
        might_sleep();  /* Check for RCU read-side critical section. */
        preempt_disable();
-       ret = num_online_cpus() <= 1;
+       /*
+        * If the rcu_state.n_online_cpus counter is equal to one,
+        * there is only one CPU, and that CPU sees all prior accesses
+        * made by any CPU that was online at the time of its access.
+        * Furthermore, if this counter is equal to one, its value cannot
+        * change until after the preempt_enable() below.
+        *
+        * Furthermore, if rcu_state.n_online_cpus is equal to one here,
+        * all later CPUs (both this one and any that come online later
+        * on) are guaranteed to see all accesses prior to this point
+        * in the code, without the need for additional memory barriers.
+        * Those memory barriers are provided by CPU-hotplug code.
+        */
+       ret = READ_ONCE(rcu_state.n_online_cpus) <= 1;
        preempt_enable();
        return ret;
 }
@@ -3628,7 +3687,7 @@ void synchronize_rcu(void)
                         lock_is_held(&rcu_sched_lock_map),
                         "Illegal synchronize_rcu() in RCU read-side critical section");
        if (rcu_blocking_is_gp())
-               return;
+               return;  // Context allows vacuous grace periods.
        if (rcu_gp_is_expedited())
                synchronize_rcu_expedited();
        else
@@ -3707,13 +3766,13 @@ static int rcu_pending(int user)
                return 1;
 
        /* Does this CPU have callbacks ready to invoke? */
-       if (rcu_segcblist_ready_cbs(&rdp->cblist))
+       if (!rcu_segcblist_is_offloaded(&rdp->cblist) &&
+           rcu_segcblist_ready_cbs(&rdp->cblist))
                return 1;
 
        /* Has RCU gone idle with this CPU needing another grace period? */
        if (!gp_in_progress && rcu_segcblist_is_enabled(&rdp->cblist) &&
-           (!IS_ENABLED(CONFIG_RCU_NOCB_CPU) ||
-            !rcu_segcblist_is_offloaded(&rdp->cblist)) &&
+           !rcu_segcblist_is_offloaded(&rdp->cblist) &&
            !rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL))
                return 1;
 
@@ -3969,6 +4028,7 @@ int rcutree_prepare_cpu(unsigned int cpu)
        raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
        rcu_prepare_kthreads(cpu);
        rcu_spawn_cpu_nocb_kthread(cpu);
+       WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus + 1);
 
        return 0;
 }
@@ -4057,6 +4117,9 @@ void rcu_cpu_starting(unsigned int cpu)
 
        rnp = rdp->mynode;
        mask = rdp->grpmask;
+       WRITE_ONCE(rnp->ofl_seq, rnp->ofl_seq + 1);
+       WARN_ON_ONCE(!(rnp->ofl_seq & 0x1));
+       smp_mb(); // Pair with rcu_gp_cleanup()'s ->ofl_seq barrier().
        raw_spin_lock_irqsave_rcu_node(rnp, flags);
        WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext | mask);
        newcpu = !(rnp->expmaskinitnext & mask);
@@ -4067,13 +4130,18 @@ void rcu_cpu_starting(unsigned int cpu)
        rcu_gpnum_ovf(rnp, rdp); /* Offline-induced counter wrap? */
        rdp->rcu_onl_gp_seq = READ_ONCE(rcu_state.gp_seq);
        rdp->rcu_onl_gp_flags = READ_ONCE(rcu_state.gp_flags);
-       if (rnp->qsmask & mask) { /* RCU waiting on incoming CPU? */
+
+       /* An incoming CPU should never be blocking a grace period. */
+       if (WARN_ON_ONCE(rnp->qsmask & mask)) { /* RCU waiting on incoming CPU? */
                rcu_disable_urgency_upon_qs(rdp);
                /* Report QS -after- changing ->qsmaskinitnext! */
                rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
        } else {
                raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
        }
+       smp_mb(); // Pair with rcu_gp_cleanup()'s ->ofl_seq barrier().
+       WRITE_ONCE(rnp->ofl_seq, rnp->ofl_seq + 1);
+       WARN_ON_ONCE(rnp->ofl_seq & 0x1);
        smp_mb(); /* Ensure RCU read-side usage follows above initialization. */
 }
 
@@ -4101,6 +4169,9 @@ void rcu_report_dead(unsigned int cpu)
 
        /* Remove outgoing CPU from mask in the leaf rcu_node structure. */
        mask = rdp->grpmask;
+       WRITE_ONCE(rnp->ofl_seq, rnp->ofl_seq + 1);
+       WARN_ON_ONCE(!(rnp->ofl_seq & 0x1));
+       smp_mb(); // Pair with rcu_gp_cleanup()'s ->ofl_seq barrier().
        raw_spin_lock(&rcu_state.ofl_lock);
        raw_spin_lock_irqsave_rcu_node(rnp, flags); /* Enforce GP memory-order guarantee. */
        rdp->rcu_ofl_gp_seq = READ_ONCE(rcu_state.gp_seq);
@@ -4113,6 +4184,9 @@ void rcu_report_dead(unsigned int cpu)
        WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext & ~mask);
        raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
        raw_spin_unlock(&rcu_state.ofl_lock);
+       smp_mb(); // Pair with rcu_gp_cleanup()'s ->ofl_seq barrier().
+       WRITE_ONCE(rnp->ofl_seq, rnp->ofl_seq + 1);
+       WARN_ON_ONCE(rnp->ofl_seq & 0x1);
 
        rdp->cpu_started = false;
 }
@@ -4449,24 +4523,14 @@ static void __init kfree_rcu_batch_init(void)
 
        for_each_possible_cpu(cpu) {
                struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
-               struct kvfree_rcu_bulk_data *bnode;
 
                for (i = 0; i < KFREE_N_BATCHES; i++) {
                        INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work);
                        krcp->krw_arr[i].krcp = krcp;
                }
 
-               for (i = 0; i < rcu_min_cached_objs; i++) {
-                       bnode = (struct kvfree_rcu_bulk_data *)
-                               __get_free_page(GFP_NOWAIT | __GFP_NOWARN);
-
-                       if (bnode)
-                               put_cached_bnode(krcp, bnode);
-                       else
-                               pr_err("Failed to preallocate for %d CPU!\n", cpu);
-               }
-
                INIT_DELAYED_WORK(&krcp->monitor_work, kfree_rcu_monitor);
+               INIT_WORK(&krcp->page_cache_work, fill_page_cache_func);
                krcp->initialized = true;
        }
        if (register_shrinker(&kfree_rcu_shrinker))
index e4f66b8..7708ed1 100644 (file)
@@ -56,6 +56,7 @@ struct rcu_node {
                                /*  Initialized from ->qsmaskinitnext at the */
                                /*  beginning of each grace period. */
        unsigned long qsmaskinitnext;
+       unsigned long ofl_seq;  /* CPU-hotplug operation sequence count. */
                                /* Online CPUs for next grace period. */
        unsigned long expmask;  /* CPUs or groups that need to check in */
                                /*  to allow the current expedited GP */
@@ -298,6 +299,7 @@ struct rcu_state {
                                                /* Hierarchy levels (+1 to */
                                                /*  shut bogus gcc warning) */
        int ncpus;                              /* # CPUs seen so far. */
+       int n_online_cpus;                      /* # CPUs online for RCU. */
 
        /* The following fields are guarded by the root rcu_node's lock. */
 
index fd8a52e..7e291ce 100644 (file)
@@ -628,7 +628,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
                        set_tsk_need_resched(current);
                        set_preempt_need_resched();
                        if (IS_ENABLED(CONFIG_IRQ_WORK) && irqs_were_disabled &&
-                           !rdp->defer_qs_iw_pending && exp) {
+                           !rdp->defer_qs_iw_pending && exp && cpu_online(rdp->cpu)) {
                                // Get scheduler to re-evaluate and call hooks.
                                // If !IRQ_WORK, FQS scan will eventually IPI.
                                init_irq_work(&rdp->defer_qs_iw,
index 0fde39b..70d48c5 100644 (file)
@@ -13,6 +13,7 @@
 
 /* panic() on RCU Stall sysctl. */
 int sysctl_panic_on_rcu_stall __read_mostly;
+int sysctl_max_rcu_stall_to_panic __read_mostly;
 
 #ifdef CONFIG_PROVE_RCU
 #define RCU_STALL_DELAY_DELTA          (5 * HZ)
@@ -106,6 +107,11 @@ early_initcall(check_cpu_stall_init);
 /* If so specified via sysctl, panic, yielding cleaner stall-warning output. */
 static void panic_on_rcu_stall(void)
 {
+       static int cpu_stall;
+
+       if (++cpu_stall < sysctl_max_rcu_stall_to_panic)
+               return;
+
        if (sysctl_panic_on_rcu_stall)
                panic("RCU Stall\n");
 }
@@ -249,13 +255,16 @@ static bool check_slow_task(struct task_struct *t, void *arg)
 
 /*
  * Scan the current list of tasks blocked within RCU read-side critical
- * sections, printing out the tid of each.
+ * sections, printing out the tid of each of the first few of them.
  */
-static int rcu_print_task_stall(struct rcu_node *rnp)
+static int rcu_print_task_stall(struct rcu_node *rnp, unsigned long flags)
+       __releases(rnp->lock)
 {
+       int i = 0;
        int ndetected = 0;
        struct rcu_stall_chk_rdr rscr;
        struct task_struct *t;
+       struct task_struct *ts[8];
 
        if (!rcu_preempt_blocked_readers_cgp(rnp))
                return 0;
@@ -264,6 +273,14 @@ static int rcu_print_task_stall(struct rcu_node *rnp)
        t = list_entry(rnp->gp_tasks->prev,
                       struct task_struct, rcu_node_entry);
        list_for_each_entry_continue(t, &rnp->blkd_tasks, rcu_node_entry) {
+               get_task_struct(t);
+               ts[i++] = t;
+               if (i >= ARRAY_SIZE(ts))
+                       break;
+       }
+       raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+       for (i--; i; i--) {
+               t = ts[i];
                if (!try_invoke_on_locked_down_task(t, check_slow_task, &rscr))
                        pr_cont(" P%d", t->pid);
                else
@@ -273,6 +290,7 @@ static int rcu_print_task_stall(struct rcu_node *rnp)
                                ".q"[rscr.rs.b.need_qs],
                                ".e"[rscr.rs.b.exp_hint],
                                ".l"[rscr.on_blkd_list]);
+               put_task_struct(t);
                ndetected++;
        }
        pr_cont("\n");
@@ -293,8 +311,9 @@ static void rcu_print_detail_task_stall_rnp(struct rcu_node *rnp)
  * Because preemptible RCU does not exist, we never have to check for
  * tasks blocked within RCU read-side critical sections.
  */
-static int rcu_print_task_stall(struct rcu_node *rnp)
+static int rcu_print_task_stall(struct rcu_node *rnp, unsigned long flags)
 {
+       raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
        return 0;
 }
 #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
@@ -472,7 +491,6 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
        pr_err("INFO: %s detected stalls on CPUs/tasks:\n", rcu_state.name);
        rcu_for_each_leaf_node(rnp) {
                raw_spin_lock_irqsave_rcu_node(rnp, flags);
-               ndetected += rcu_print_task_stall(rnp);
                if (rnp->qsmask != 0) {
                        for_each_leaf_node_possible_cpu(rnp, cpu)
                                if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu)) {
@@ -480,7 +498,7 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
                                        ndetected++;
                                }
                }
-               raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+               ndetected += rcu_print_task_stall(rnp, flags); // Releases rnp->lock.
        }
 
        for_each_possible_cpu(cpu)
index 554a521..d55a9f8 100644 (file)
@@ -59,9 +59,10 @@ torture_param(int, onoff_holdoff, 0, "Time after boot before CPU hotplugs (s)");
 torture_param(int, onoff_interval, 0, "Time between CPU hotplugs (s), 0=disable");
 torture_param(int, shutdown_secs, 0, "Shutdown time (ms), <= zero to disable.");
 torture_param(int, stat_interval, 60, "Number of seconds between stats printk()s.");
-torture_param(int, stutter_cpus, 5, "Number of jiffies to change CPUs under test, 0=disable");
+torture_param(int, stutter, 5, "Number of jiffies to run/halt test, 0=disable");
 torture_param(bool, use_cpus_read_lock, 0, "Use cpus_read_lock() to exclude CPU hotplug.");
 torture_param(int, verbose, 0, "Enable verbose debugging printk()s");
+torture_param(int, weight_resched, -1, "Testing weight for resched_cpu() operations.");
 torture_param(int, weight_single, -1, "Testing weight for single-CPU no-wait operations.");
 torture_param(int, weight_single_wait, -1, "Testing weight for single-CPU operations.");
 torture_param(int, weight_many, -1, "Testing weight for multi-CPU no-wait operations.");
@@ -82,6 +83,7 @@ torture_param(bool, shutdown, SCFTORT_SHUTDOWN, "Shutdown at end of torture test
 struct scf_statistics {
        struct task_struct *task;
        int cpu;
+       long long n_resched;
        long long n_single;
        long long n_single_ofl;
        long long n_single_wait;
@@ -97,12 +99,15 @@ static struct task_struct *scf_torture_stats_task;
 static DEFINE_PER_CPU(long long, scf_invoked_count);
 
 // Data for random primitive selection
-#define SCF_PRIM_SINGLE                0
-#define SCF_PRIM_MANY          1
-#define SCF_PRIM_ALL           2
-#define SCF_NPRIMS             (2 * 3) // Need wait and no-wait versions of each.
+#define SCF_PRIM_RESCHED       0
+#define SCF_PRIM_SINGLE                1
+#define SCF_PRIM_MANY          2
+#define SCF_PRIM_ALL           3
+#define SCF_NPRIMS             7 // Need wait and no-wait versions of each,
+                                 //  except for SCF_PRIM_RESCHED.
 
 static char *scf_prim_name[] = {
+       "resched_cpu",
        "smp_call_function_single",
        "smp_call_function_many",
        "smp_call_function",
@@ -136,6 +141,8 @@ static char *bangstr = "";
 
 static DEFINE_TORTURE_RANDOM_PERCPU(scf_torture_rand);
 
+extern void resched_cpu(int cpu); // An alternative IPI vector.
+
 // Print torture statistics.  Caller must ensure serialization.
 static void scf_torture_stats_print(void)
 {
@@ -148,6 +155,7 @@ static void scf_torture_stats_print(void)
        for_each_possible_cpu(cpu)
                invoked_count += data_race(per_cpu(scf_invoked_count, cpu));
        for (i = 0; i < nthreads; i++) {
+               scfs.n_resched += scf_stats_p[i].n_resched;
                scfs.n_single += scf_stats_p[i].n_single;
                scfs.n_single_ofl += scf_stats_p[i].n_single_ofl;
                scfs.n_single_wait += scf_stats_p[i].n_single_wait;
@@ -160,8 +168,8 @@ static void scf_torture_stats_print(void)
        if (atomic_read(&n_errs) || atomic_read(&n_mb_in_errs) ||
            atomic_read(&n_mb_out_errs) || atomic_read(&n_alloc_errs))
                bangstr = "!!! ";
-       pr_alert("%s %sscf_invoked_count %s: %lld single: %lld/%lld single_ofl: %lld/%lld many: %lld/%lld all: %lld/%lld ",
-                SCFTORT_FLAG, bangstr, isdone ? "VER" : "ver", invoked_count,
+       pr_alert("%s %sscf_invoked_count %s: %lld resched: %lld single: %lld/%lld single_ofl: %lld/%lld many: %lld/%lld all: %lld/%lld ",
+                SCFTORT_FLAG, bangstr, isdone ? "VER" : "ver", invoked_count, scfs.n_resched,
                 scfs.n_single, scfs.n_single_wait, scfs.n_single_ofl, scfs.n_single_wait_ofl,
                 scfs.n_many, scfs.n_many_wait, scfs.n_all, scfs.n_all_wait);
        torture_onoff_stats();
@@ -314,6 +322,13 @@ static void scftorture_invoke_one(struct scf_statistics *scfp, struct torture_ra
                }
        }
        switch (scfsp->scfs_prim) {
+       case SCF_PRIM_RESCHED:
+               if (IS_BUILTIN(CONFIG_SCF_TORTURE_TEST)) {
+                       cpu = torture_random(trsp) % nr_cpu_ids;
+                       scfp->n_resched++;
+                       resched_cpu(cpu);
+               }
+               break;
        case SCF_PRIM_SINGLE:
                cpu = torture_random(trsp) % nr_cpu_ids;
                if (scfsp->scfs_wait)
@@ -421,6 +436,7 @@ static int scftorture_invoker(void *arg)
                        was_offline = false;
                }
                cond_resched();
+               stutter_wait("scftorture_invoker");
        } while (!torture_must_stop());
 
        VERBOSE_SCFTORTOUT("scftorture_invoker %d ended", scfp->cpu);
@@ -433,8 +449,8 @@ static void
 scftorture_print_module_parms(const char *tag)
 {
        pr_alert(SCFTORT_FLAG
-                "--- %s:  verbose=%d holdoff=%d longwait=%d nthreads=%d onoff_holdoff=%d onoff_interval=%d shutdown_secs=%d stat_interval=%d stutter_cpus=%d use_cpus_read_lock=%d, weight_single=%d, weight_single_wait=%d, weight_many=%d, weight_many_wait=%d, weight_all=%d, weight_all_wait=%d\n", tag,
-                verbose, holdoff, longwait, nthreads, onoff_holdoff, onoff_interval, shutdown, stat_interval, stutter_cpus, use_cpus_read_lock, weight_single, weight_single_wait, weight_many, weight_many_wait, weight_all, weight_all_wait);
+                "--- %s:  verbose=%d holdoff=%d longwait=%d nthreads=%d onoff_holdoff=%d onoff_interval=%d shutdown_secs=%d stat_interval=%d stutter=%d use_cpus_read_lock=%d, weight_resched=%d, weight_single=%d, weight_single_wait=%d, weight_many=%d, weight_many_wait=%d, weight_all=%d, weight_all_wait=%d\n", tag,
+                verbose, holdoff, longwait, nthreads, onoff_holdoff, onoff_interval, shutdown, stat_interval, stutter, use_cpus_read_lock, weight_resched, weight_single, weight_single_wait, weight_many, weight_many_wait, weight_all, weight_all_wait);
 }
 
 static void scf_cleanup_handler(void *unused)
@@ -475,6 +491,7 @@ static int __init scf_torture_init(void)
 {
        long i;
        int firsterr = 0;
+       unsigned long weight_resched1 = weight_resched;
        unsigned long weight_single1 = weight_single;
        unsigned long weight_single_wait1 = weight_single_wait;
        unsigned long weight_many1 = weight_many;
@@ -487,9 +504,10 @@ static int __init scf_torture_init(void)
 
        scftorture_print_module_parms("Start of test");
 
-       if (weight_single == -1 && weight_single_wait == -1 &&
+       if (weight_resched == -1 && weight_single == -1 && weight_single_wait == -1 &&
            weight_many == -1 && weight_many_wait == -1 &&
            weight_all == -1 && weight_all_wait == -1) {
+               weight_resched1 = 2 * nr_cpu_ids;
                weight_single1 = 2 * nr_cpu_ids;
                weight_single_wait1 = 2 * nr_cpu_ids;
                weight_many1 = 2;
@@ -497,6 +515,8 @@ static int __init scf_torture_init(void)
                weight_all1 = 1;
                weight_all_wait1 = 1;
        } else {
+               if (weight_resched == -1)
+                       weight_resched1 = 0;
                if (weight_single == -1)
                        weight_single1 = 0;
                if (weight_single_wait == -1)
@@ -517,6 +537,10 @@ static int __init scf_torture_init(void)
                firsterr = -EINVAL;
                goto unwind;
        }
+       if (IS_BUILTIN(CONFIG_SCF_TORTURE_TEST))
+               scf_sel_add(weight_resched1, SCF_PRIM_RESCHED, false);
+       else if (weight_resched1)
+               VERBOSE_SCFTORTOUT_ERRSTRING("built as module, weight_resched ignored");
        scf_sel_add(weight_single1, SCF_PRIM_SINGLE, false);
        scf_sel_add(weight_single_wait1, SCF_PRIM_SINGLE, true);
        scf_sel_add(weight_many1, SCF_PRIM_MANY, false);
@@ -535,6 +559,11 @@ static int __init scf_torture_init(void)
                if (firsterr)
                        goto unwind;
        }
+       if (stutter > 0) {
+               firsterr = torture_stutter_init(stutter, stutter);
+               if (firsterr)
+                       goto unwind;
+       }
 
        // Worker tasks invoking smp_call_function().
        if (nthreads < 0)
index afad085..c9fbdd8 100644 (file)
@@ -2650,6 +2650,17 @@ static struct ctl_table kern_table[] = {
                .extra2         = SYSCTL_ONE,
        },
 #endif
+#if defined(CONFIG_TREE_RCU)
+       {
+               .procname       = "max_rcu_stall_to_panic",
+               .data           = &sysctl_max_rcu_stall_to_panic,
+               .maxlen         = sizeof(sysctl_max_rcu_stall_to_panic),
+               .mode           = 0644,
+               .proc_handler   = proc_dointvec_minmax,
+               .extra1         = SYSCTL_ONE,
+               .extra2         = SYSCTL_INT_MAX,
+       },
+#endif
 #ifdef CONFIG_STACKLEAK_RUNTIME_DISABLE
        {
                .procname       = "stack_erasing",
index 1061492..8562ac1 100644 (file)
@@ -602,18 +602,29 @@ static int stutter_gap;
  */
 bool stutter_wait(const char *title)
 {
-       int spt;
+       ktime_t delay;
+       unsigned int i = 0;
        bool ret = false;
+       int spt;
 
        cond_resched_tasks_rcu_qs();
        spt = READ_ONCE(stutter_pause_test);
        for (; spt; spt = READ_ONCE(stutter_pause_test)) {
-               ret = true;
+               if (!ret) {
+                       sched_set_normal(current, MAX_NICE);
+                       ret = true;
+               }
                if (spt == 1) {
                        schedule_timeout_interruptible(1);
                } else if (spt == 2) {
-                       while (READ_ONCE(stutter_pause_test))
+                       while (READ_ONCE(stutter_pause_test)) {
+                               if (!(i++ & 0xffff)) {
+                                       set_current_state(TASK_INTERRUPTIBLE);
+                                       delay = 10 * NSEC_PER_USEC;
+                                       schedule_hrtimeout(&delay, HRTIMER_MODE_REL);
+                               }
                                cond_resched();
+                       }
                } else {
                        schedule_timeout_interruptible(round_jiffies_relative(HZ));
                }
@@ -629,20 +640,27 @@ EXPORT_SYMBOL_GPL(stutter_wait);
  */
 static int torture_stutter(void *arg)
 {
+       ktime_t delay;
+       DEFINE_TORTURE_RANDOM(rand);
        int wtime;
 
        VERBOSE_TOROUT_STRING("torture_stutter task started");
        do {
                if (!torture_must_stop() && stutter > 1) {
                        wtime = stutter;
-                       if (stutter > HZ + 1) {
+                       if (stutter > 2) {
                                WRITE_ONCE(stutter_pause_test, 1);
-                               wtime = stutter - HZ - 1;
-                               schedule_timeout_interruptible(wtime);
-                               wtime = HZ + 1;
+                               wtime = stutter - 3;
+                               delay = ktime_divns(NSEC_PER_SEC * wtime, HZ);
+                               delay += (torture_random(&rand) >> 3) % NSEC_PER_MSEC;
+                               set_current_state(TASK_INTERRUPTIBLE);
+                               schedule_hrtimeout(&delay, HRTIMER_MODE_REL);
+                               wtime = 2;
                        }
                        WRITE_ONCE(stutter_pause_test, 2);
-                       schedule_timeout_interruptible(wtime);
+                       delay = ktime_divns(NSEC_PER_SEC * wtime, HZ);
+                       set_current_state(TASK_INTERRUPTIBLE);
+                       schedule_hrtimeout(&delay, HRTIMER_MODE_REL);
                }
                WRITE_ONCE(stutter_pause_test, 0);
                if (!torture_must_stop())
index 2551e9b..e61d36c 100644 (file)
@@ -107,7 +107,7 @@ static int errno;
 #endif
 
 /* errno codes all ensure that they will not conflict with a valid pointer
- * because they all correspond to the highest addressable memry page.
+ * because they all correspond to the highest addressable memory page.
  */
 #define MAX_ERRNO 4095
 
@@ -231,7 +231,7 @@ struct rusage {
 #define DT_SOCK   12
 
 /* all the *at functions */
-#ifndef AT_FDWCD
+#ifndef AT_FDCWD
 #define AT_FDCWD             -100
 #endif
 
index 0e4c0b2..80ae7f0 100755 (executable)
@@ -13,4 +13,5 @@
 egrep 'Badness|WARNING:|Warn|BUG|===========|Call Trace:|Oops:|detected stalls on CPUs/tasks:|self-detected stall on CPU|Stall ended before state dump start|\?\?\? Writer stall state|rcu_.*kthread starved for|!!!' |
 grep -v 'ODEBUG: ' |
 grep -v 'This means that this is a DEBUG kernel and it is' |
-grep -v 'Warning: unable to open an initial console'
+grep -v 'Warning: unable to open an initial console' |
+grep -v 'NOHZ tick-stop error: Non-RCU local softirq work is pending, handler'
index 51f3464..8266349 100644 (file)
@@ -169,6 +169,7 @@ identify_qemu () {
 # Output arguments for the qemu "-append" string based on CPU type
 # and the TORTURE_QEMU_INTERACTIVE environment variable.
 identify_qemu_append () {
+       echo debug_boot_weak_hash
        local console=ttyS0
        case "$1" in
        qemu-system-x86_64|qemu-system-i386)
index 6e65c13..370406b 100755 (executable)
@@ -52,8 +52,7 @@ echo Results directory: $resdir/$ds
 KVM="`pwd`/tools/testing/selftests/rcutorture"; export KVM
 PATH=${KVM}/bin:$PATH; export PATH
 . functions.sh
-cpus="`identify_qemu_vcpus`"
-echo Using up to $cpus CPUs.
+echo Using all `identify_qemu_vcpus` CPUs.
 
 # Each pass through this loop does one command-line argument.
 for gitbr in $@
@@ -74,7 +73,7 @@ do
                # Test the specified commit.
                git checkout $i > $resdir/$ds/$idir/git-checkout.out 2>&1
                echo git checkout return code: $? "(Commit $ntry: $i)"
-               kvm.sh --cpus $cpus --duration 3 --trust-make > $resdir/$ds/$idir/kvm.sh.out 2>&1
+               kvm.sh --allcpus --duration 3 --trust-make > $resdir/$ds/$idir/kvm.sh.out 2>&1
                ret=$?
                echo kvm.sh return code $ret for commit $i from branch $gitbr
 
index aa74515..b582113 100755 (executable)
@@ -32,7 +32,7 @@ sed -e 's/^\[[^]]*]//' < $i/console.log |
 awk '
 /-scale: .* gps: .* batches:/ {
        ngps = $9;
-       nbatches = $11;
+       nbatches = 1;
 }
 
 /-scale: .*writer-duration/ {
index 6dc2b49..3cd03d0 100755 (executable)
@@ -206,7 +206,10 @@ do
        kruntime=`gawk 'BEGIN { print systime() - '"$kstarttime"' }' < /dev/null`
        if test -z "$qemu_pid" || kill -0 "$qemu_pid" > /dev/null 2>&1
        then
-               if test $kruntime -ge $seconds -o -f "$TORTURE_STOPFILE"
+               if test -n "$TORTURE_KCONFIG_GDB_ARG"
+               then
+                       :
+               elif test $kruntime -ge $seconds || test -f "$TORTURE_STOPFILE"
                then
                        break;
                fi
@@ -223,6 +226,20 @@ do
                                echo "ps -fp $killpid" >> $resdir/Warnings 2>&1
                                ps -fp $killpid >> $resdir/Warnings 2>&1
                        fi
+                       # Reduce probability of PID reuse by allowing a one-minute buffer
+                       if test $((kruntime + 60)) -lt $seconds && test -s "$resdir/../jitter_pids"
+                       then
+                               awk < "$resdir/../jitter_pids" '
+                               NF > 0 {
+                                       pidlist = pidlist " " $1;
+                                       n++;
+                               }
+                               END {
+                                       if (n > 0) {
+                                               print "kill " pidlist;
+                                       }
+                               }' | sh
+                       fi
                else
                        echo ' ---' `date`: "Kernel done"
                fi
index 6eb1d3f..45d07b7 100755 (executable)
@@ -58,7 +58,7 @@ usage () {
        echo "       --datestamp string"
        echo "       --defconfig string"
        echo "       --dryrun sched|script"
-       echo "       --duration minutes"
+       echo "       --duration minutes | <seconds>s | <hours>h | <days>d"
        echo "       --gdb"
        echo "       --help"
        echo "       --interactive"
@@ -93,7 +93,7 @@ do
                TORTURE_BOOT_IMAGE="$2"
                shift
                ;;
-       --buildonly)
+       --buildonly|--build-only)
                TORTURE_BUILDONLY=1
                ;;
        --configs|--config)
@@ -128,8 +128,20 @@ do
                shift
                ;;
        --duration)
-               checkarg --duration "(minutes)" $# "$2" '^[0-9]*$' '^error'
-               dur=$(($2*60))
+               checkarg --duration "(minutes)" $# "$2" '^[0-9][0-9]*\(s\|m\|h\|d\|\)$' '^error'
+               mult=60
+               if echo "$2" | grep -q 's$'
+               then
+                       mult=1
+               elif echo "$2" | grep -q 'h$'
+               then
+                       mult=3600
+               elif echo "$2" | grep -q 'd$'
+               then
+                       mult=86400
+               fi
+               ts=`echo $2 | sed -e 's/[smhd]$//'`
+               dur=$(($ts*mult))
                shift
                ;;
        --gdb)
@@ -148,7 +160,7 @@ do
                jitter="$2"
                shift
                ;;
-       --kconfig)
+       --kconfig|--kconfigs)
                checkarg --kconfig "(Kconfig options)" $# "$2" '^CONFIG_[A-Z0-9_]\+=\([ynm]\|[0-9]\+\)\( CONFIG_[A-Z0-9_]\+=\([ynm]\|[0-9]\+\)\)*$' '^error$'
                TORTURE_KCONFIG_ARG="$2"
                shift
@@ -159,7 +171,7 @@ do
        --kcsan)
                TORTURE_KCONFIG_KCSAN_ARG="CONFIG_DEBUG_INFO=y CONFIG_KCSAN=y CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n CONFIG_KCSAN_REPORT_ONCE_IN_MS=100000 CONFIG_KCSAN_VERBOSE=y CONFIG_KCSAN_INTERRUPT_WATCHER=y"; export TORTURE_KCONFIG_KCSAN_ARG
                ;;
-       --kmake-arg)
+       --kmake-arg|--kmake-args)
                checkarg --kmake-arg "(kernel make arguments)" $# "$2" '.*' '^error$'
                TORTURE_KMAKE_ARG="$2"
                shift
@@ -459,8 +471,11 @@ function dump(first, pastlast, batchnum)
        print "if test -n \"$needqemurun\""
        print "then"
        print "\techo ---- Starting kernels. `date` | tee -a " rd "log";
-       for (j = 0; j < njitter; j++)
+       print "\techo > " rd "jitter_pids"
+       for (j = 0; j < njitter; j++) {
                print "\tjitter.sh " j " " dur " " ja[2] " " ja[3] "&"
+               print "\techo $! >> " rd "jitter_pids"
+       }
        print "\twait"
        print "\techo ---- All kernel runs complete. `date` | tee -a " rd "log";
        print "else"
index e033380..263b1be 100755 (executable)
@@ -133,7 +133,7 @@ then
        then
                summary="$summary  Warnings: $n_warn"
        fi
-       n_bugs=`egrep -c 'BUG|Oops:' $file`
+       n_bugs=`egrep -c '\bBUG|Oops:' $file`
        if test "$n_bugs" -ne 0
        then
                summary="$summary  Bugs: $n_bugs"
index 6c78022..d6557c3 100644 (file)
@@ -4,7 +4,8 @@ CONFIG_PREEMPT_VOLUNTARY=n
 CONFIG_PREEMPT=n
 #CHECK#CONFIG_TINY_SRCU=y
 CONFIG_RCU_TRACE=n
-CONFIG_DEBUG_LOCK_ALLOC=n
+CONFIG_DEBUG_LOCK_ALLOC=y
+CONFIG_PROVE_LOCKING=y
 CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
 CONFIG_DEBUG_ATOMIC_SLEEP=y
 #CHECK#CONFIG_PREEMPT_COUNT=y
index c15ada8..6bc24e9 100644 (file)
@@ -4,7 +4,6 @@ CONFIG_PREEMPT_VOLUNTARY=n
 CONFIG_PREEMPT=n
 #CHECK#CONFIG_TINY_SRCU=y
 CONFIG_RCU_TRACE=n
-CONFIG_DEBUG_LOCK_ALLOC=y
-CONFIG_PROVE_LOCKING=y
+CONFIG_DEBUG_LOCK_ALLOC=n
 CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
 CONFIG_PREEMPT_COUNT=n
index 87caa0e..90942bb 100644 (file)
@@ -1,2 +1,5 @@
 CONFIG_RCU_SCALE_TEST=y
 CONFIG_PRINTK_TIME=y
+CONFIG_TASKS_RCU_GENERIC=y
+CONFIG_TASKS_RCU=y
+CONFIG_TASKS_TRACE_RCU=y
diff --git a/tools/testing/selftests/rcutorture/configs/rcuscale/TRACE01 b/tools/testing/selftests/rcutorture/configs/rcuscale/TRACE01
new file mode 100644 (file)
index 0000000..e6baa2f
--- /dev/null
@@ -0,0 +1,15 @@
+CONFIG_SMP=y
+CONFIG_PREEMPT_NONE=y
+CONFIG_PREEMPT_VOLUNTARY=n
+CONFIG_PREEMPT=n
+CONFIG_HZ_PERIODIC=n
+CONFIG_NO_HZ_IDLE=y
+CONFIG_NO_HZ_FULL=n
+CONFIG_RCU_FAST_NO_HZ=n
+CONFIG_RCU_NOCB_CPU=n
+CONFIG_DEBUG_LOCK_ALLOC=n
+CONFIG_PROVE_LOCKING=n
+CONFIG_RCU_BOOST=n
+CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
+CONFIG_RCU_EXPERT=y
+CONFIG_RCU_TRACE=y
diff --git a/tools/testing/selftests/rcutorture/configs/rcuscale/TRACE01.boot b/tools/testing/selftests/rcutorture/configs/rcuscale/TRACE01.boot
new file mode 100644 (file)
index 0000000..af0aff1
--- /dev/null
@@ -0,0 +1 @@
+rcuscale.scale_type=tasks-tracing