its next exit from idle.
Finally, the <tt>rcu_qs_ctr_snap</tt> field is used to detect
cases where a given operation has resulted in a quiescent state
-for all flavors of RCU, for example, <tt>cond_resched_rcu_qs()</tt>.
+for all flavors of RCU, for example, <tt>cond_resched()</tt>
+when RCU has indicated a need for quiescent states.
<h5>RCU Callback Handling</h5>
Its fields are as follows:
<pre>
- 1 int dynticks_nesting;
- 2 int dynticks_nmi_nesting;
+ 1 long dynticks_nesting;
+ 2 long dynticks_nmi_nesting;
3 atomic_t dynticks;
4 bool rcu_need_heavy_qs;
5 unsigned long rcu_qs_ctr;
</pre>
<p>The <tt>->dynticks_nesting</tt> field counts the
-nesting depth of normal interrupts.
-In addition, this counter is incremented when exiting dyntick-idle
-mode and decremented when entering it.
+nesting depth of process execution, so that in normal circumstances
+this counter has value zero or one.
+NMIs, irqs, and tracers are counted by the <tt>->dynticks_nmi_nesting</tt>
+field.
+Because NMIs cannot be masked, changes to this variable have to be
+undertaken carefully using an algorithm provided by Andy Lutomirski.
+The initial transition from idle adds one, and nested transitions
+add two, so that a nesting level of five is represented by a
+<tt>->dynticks_nmi_nesting</tt> value of nine.
This counter can therefore be thought of as counting the number
of reasons why this CPU cannot be permitted to enter dyntick-idle
-mode, aside from non-maskable interrupts (NMIs).
-NMIs are counted by the <tt>->dynticks_nmi_nesting</tt>
-field, except that NMIs that interrupt non-dyntick-idle execution
-are not counted.
+mode, aside from process-level transitions.
+
+<p>However, it turns out that when running in non-idle kernel context,
+the Linux kernel is fully capable of entering interrupt handlers that
+never exit and perhaps also vice versa.
+Therefore, whenever the <tt>->dynticks_nesting</tt> field is
+incremented up from zero, the <tt>->dynticks_nmi_nesting</tt> field
+is set to a large positive number, and whenever the
+<tt>->dynticks_nesting</tt> field is decremented down to zero,
+the the <tt>->dynticks_nmi_nesting</tt> field is set to zero.
+Assuming that the number of misnested interrupts is not sufficient
+to overflow the counter, this approach corrects the
+<tt>->dynticks_nmi_nesting</tt> field every time the corresponding
+CPU enters the idle loop from process context.
</p><p>The <tt>->dynticks</tt> field counts the corresponding
CPU's transitions to and from dyntick-idle mode, so that this counter
<tr><th> </th></tr>
<tr><th align="left">Quick Quiz:</th></tr>
<tr><td>
- Why not just count all NMIs?
- Wouldn't that be simpler and less error prone?
+ Why not simply combine the <tt>->dynticks_nesting</tt>
+ and <tt>->dynticks_nmi_nesting</tt> counters into a
+ single counter that just counts the number of reasons that
+ the corresponding CPU is non-idle?
</td></tr>
<tr><th align="left">Answer:</th></tr>
<tr><td bgcolor="#ffffff"><font color="ffffff">
- It seems simpler only until you think hard about how to go about
- updating the <tt>rcu_dynticks</tt> structure's
- <tt>->dynticks</tt> field.
+ Because this would fail in the presence of interrupts whose
+ handlers never return and of handlers that manage to return
+ from a made-up interrupt.
</font></td></tr>
<tr><td> </td></tr>
</table>
DYNIX/ptx used an explicit memory barrier for publication, but had nothing
resembling <tt>rcu_dereference()</tt> for subscription, nor did it
have anything resembling the <tt>smp_read_barrier_depends()</tt>
-that was later subsumed into <tt>rcu_dereference()</tt>.
+that was later subsumed into <tt>rcu_dereference()</tt> and later
+still into <tt>READ_ONCE()</tt>.
The need for these operations made itself known quite suddenly at a
late-1990s meeting with the DEC Alpha architects, back in the days when
DEC was still a free-standing company.
executing in usermode (which is one use case for
<tt>CONFIG_NO_HZ_FULL=y</tt>) or in the kernel.
That said, CPU-bound loops in the kernel must execute
-<tt>cond_resched_rcu_qs()</tt> at least once per few tens of milliseconds
+<tt>cond_resched()</tt> at least once per few tens of milliseconds
in order to avoid receiving an IPI from RCU.
<p>
is to have implicit
read-side critical sections that are delimited by voluntary context
switches, that is, calls to <tt>schedule()</tt>,
-<tt>cond_resched_rcu_qs()</tt>, and
+<tt>cond_resched()</tt>, and
<tt>synchronize_rcu_tasks()</tt>.
In addition, transitions to and from userspace execution also delimit
tasks-RCU read-side critical sections.
Note that if checks for being within an RCU read-side
critical section are not required and the pointer is never
dereferenced, rcu_access_pointer() should be used in place
- of rcu_dereference(). The rcu_access_pointer() primitive
- does not require an enclosing read-side critical section,
- and also omits the smp_read_barrier_depends() included in
- rcu_dereference(), which in turn should provide a small
- performance gain in some CPUs (e.g., the DEC Alpha).
+ of rcu_dereference().
o The comparison is against a pointer that references memory
that was initialized "a long time ago." The reason
o A CPU looping with bottom halves disabled. This condition can
result in RCU-sched and RCU-bh stalls.
-o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the
- kernel without invoking schedule(). Note that cond_resched()
- does not necessarily prevent RCU CPU stall warnings. Therefore,
- if the looping in the kernel is really expected and desirable
- behavior, you might need to replace some of the cond_resched()
- calls with calls to cond_resched_rcu_qs().
+o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel
+ without invoking schedule(). If the looping in the kernel is
+ really expected and desirable behavior, you might need to add
+ some calls to cond_resched().
o Booting Linux using a console connection that is too slow to
keep up with the boot-time console-message rate. For example,
#define rcu_dereference(p) \
({ \
- typeof(p) _________p1 = p; \
- smp_read_barrier_depends(); \
+ typeof(p) _________p1 = READ_ONCE(p); \
(_________p1); \
})
Note the use of READ_ONCE() and smp_load_acquire() to read the
opposition index. This prevents the compiler from discarding and
-reloading its cached value - which some compilers will do across
-smp_read_barrier_depends(). This isn't strictly needed if you can
+reloading its cached value. This isn't strictly needed if you can
be sure that the opposition index will _only_ be used the once.
The smp_load_acquire() additionally forces the CPU to order against
subsequent memory references. Similarly, smp_store_release() is used
(*) On any given CPU, dependent memory accesses will be issued in order, with
respect to itself. This means that for:
- Q = READ_ONCE(P); smp_read_barrier_depends(); D = READ_ONCE(*Q);
+ Q = READ_ONCE(P); D = READ_ONCE(*Q);
the CPU will issue the following memory operations:
Q = LOAD P, D = LOAD *Q
- and always in that order. On most systems, smp_read_barrier_depends()
- does nothing, but it is required for DEC Alpha. The READ_ONCE()
- is required to prevent compiler mischief. Please note that you
- should normally use something like rcu_dereference() instead of
- open-coding smp_read_barrier_depends().
+ and always in that order. However, on DEC Alpha, READ_ONCE() also
+ emits a memory-barrier instruction, so that a DEC Alpha CPU will
+ instead issue the following memory operations:
+
+ Q = LOAD P, MEMORY_BARRIER, D = LOAD *Q, MEMORY_BARRIER
+
+ Whether on DEC Alpha or not, the READ_ONCE() also prevents compiler
+ mischief.
(*) Overlapping loads and stores within a particular CPU will appear to be
ordered within that CPU. This means that for:
GENERAL mb() smp_mb()
WRITE wmb() smp_wmb()
READ rmb() smp_rmb()
- DATA DEPENDENCY read_barrier_depends() smp_read_barrier_depends()
+ DATA DEPENDENCY READ_ONCE()
All memory barriers except the data dependency barriers imply a compiler
Other CPUs may also have split caches, but must coordinate between the various
cachelets for normal memory accesses. The semantics of the Alpha removes the
-need for coordination in the absence of memory barriers.
+need for hardware coordination in the absence of memory barriers, which
+permitted Alpha to sport higher CPU clock rates back in the day. However,
+please note that smp_read_barrier_depends() should not be used except in
+Alpha arch-specific code and within the READ_ONCE() macro.
CACHE COHERENCY VS DMA
return;
}
- smp_read_barrier_depends();
+ /* READ_ONCE() enforces dependency, but dangerous through integer!!! */
ch = port->rx_buffer[ix++];
st = port->rx_buffer[ix++];
smp_mb();
if (CIRC_CNT(port->rx_inp, ix, MNSC_BUFFER_SIZE) == 0)
return NO_POLL_CHAR;
- smp_read_barrier_depends();
+ /*
+ * READ_ONCE() enforces dependency, but dangerous
+ * through integer!!!
+ */
ch = port->rx_buffer[ix++];
st = port->rx_buffer[ix++];
smp_mb();
for (i = 0; i < active && !seen_current; i++) {
struct dma_async_tx_descriptor *tx;
- smp_read_barrier_depends();
prefetch(ioat_get_ring_ent(ioat_chan, idx + i + 1));
desc = ioat_get_ring_ent(ioat_chan, idx + i);
dump_desc_dbg(ioat_chan, desc);
for (i = 1; i < active; i++) {
struct dma_async_tx_descriptor *tx;
- smp_read_barrier_depends();
prefetch(ioat_get_ring_ent(ioat_chan, idx + i + 1));
desc = ioat_get_ring_ent(ioat_chan, idx + i);
depends on NET
depends on INET
depends on m || IPV6 != m
+ depends on !ALPHA
select IRQ_POLL
---help---
Core support for InfiniBand (IB). Make sure to also select
if (!(ib_rvt_state_ops[qp->state] & RVT_FLUSH_SEND))
goto bail;
/* We are in the error state, flush the work request. */
- smp_read_barrier_depends(); /* see post_one_send() */
if (qp->s_last == READ_ONCE(qp->s_head))
goto bail;
/* If DMAs are in progress, we can't flush immediately. */
newreq = 0;
if (qp->s_cur == qp->s_tail) {
/* Check if send work queue is empty. */
- smp_read_barrier_depends(); /* see post_one_send() */
if (qp->s_tail == READ_ONCE(qp->s_head)) {
clear_ahg(qp);
goto bail;
}
/* Ensure s_rdma_ack_cnt changes are committed */
- smp_read_barrier_depends();
if (qp->s_rdma_ack_cnt) {
hfi1_queue_rc_ack(qp, is_fecn);
return;
trace_hfi1_ack(qp, psn);
/* Ignore invalid responses. */
- smp_read_barrier_depends(); /* see post_one_send */
if (cmp_psn(psn, READ_ONCE(qp->s_next_psn)) >= 0)
goto ack_done;
sqp->s_flags |= RVT_S_BUSY;
again:
- smp_read_barrier_depends(); /* see post_one_send() */
if (sqp->s_last == READ_ONCE(sqp->s_head))
goto clr_busy;
wqe = rvt_get_swqe_ptr(sqp, sqp->s_last);
static inline struct sdma_txreq *get_txhead(struct sdma_engine *sde)
{
- smp_read_barrier_depends(); /* see sdma_update_tail() */
return sde->tx_ring[sde->tx_head & sde->sdma_mask];
}
if (!(ib_rvt_state_ops[qp->state] & RVT_FLUSH_SEND))
goto bail;
/* We are in the error state, flush the work request. */
- smp_read_barrier_depends(); /* see post_one_send() */
if (qp->s_last == READ_ONCE(qp->s_head))
goto bail;
/* If DMAs are in progress, we can't flush immediately. */
RVT_PROCESS_NEXT_SEND_OK))
goto bail;
/* Check if send work queue is empty. */
- smp_read_barrier_depends(); /* see post_one_send() */
if (qp->s_cur == READ_ONCE(qp->s_head)) {
clear_ahg(qp);
goto bail;
if (!(ib_rvt_state_ops[qp->state] & RVT_FLUSH_SEND))
goto bail;
/* We are in the error state, flush the work request. */
- smp_read_barrier_depends(); /* see post_one_send */
if (qp->s_last == READ_ONCE(qp->s_head))
goto bail;
/* If DMAs are in progress, we can't flush immediately. */
}
/* see post_one_send() */
- smp_read_barrier_depends();
if (qp->s_cur == READ_ONCE(qp->s_head))
goto bail;
if (!(ib_rvt_state_ops[qp->state] & RVT_FLUSH_SEND))
goto bail;
/* We are in the error state, flush the work request. */
- smp_read_barrier_depends(); /* see post_one_send() */
if (qp->s_last == READ_ONCE(qp->s_head))
goto bail;
/* If DMAs are in progress, we can't flush immediately. */
newreq = 0;
if (qp->s_cur == qp->s_tail) {
/* Check if send work queue is empty. */
- smp_read_barrier_depends(); /* see post_one_send() */
if (qp->s_tail == READ_ONCE(qp->s_head))
goto bail;
/*
goto ack_done;
/* Ignore invalid responses. */
- smp_read_barrier_depends(); /* see post_one_send */
if (qib_cmp24(psn, READ_ONCE(qp->s_next_psn)) >= 0)
goto ack_done;
sqp->s_flags |= RVT_S_BUSY;
again:
- smp_read_barrier_depends(); /* see post_one_send() */
if (sqp->s_last == READ_ONCE(sqp->s_head))
goto clr_busy;
wqe = rvt_get_swqe_ptr(sqp, sqp->s_last);
if (!(ib_rvt_state_ops[qp->state] & RVT_FLUSH_SEND))
goto bail;
/* We are in the error state, flush the work request. */
- smp_read_barrier_depends(); /* see post_one_send() */
if (qp->s_last == READ_ONCE(qp->s_head))
goto bail;
/* If DMAs are in progress, we can't flush immediately. */
RVT_PROCESS_NEXT_SEND_OK))
goto bail;
/* Check if send work queue is empty. */
- smp_read_barrier_depends(); /* see post_one_send() */
if (qp->s_cur == READ_ONCE(qp->s_head))
goto bail;
/*
if (!(ib_rvt_state_ops[qp->state] & RVT_FLUSH_SEND))
goto bail;
/* We are in the error state, flush the work request. */
- smp_read_barrier_depends(); /* see post_one_send */
if (qp->s_last == READ_ONCE(qp->s_head))
goto bail;
/* If DMAs are in progress, we can't flush immediately. */
}
/* see post_one_send() */
- smp_read_barrier_depends();
if (qp->s_cur == READ_ONCE(qp->s_head))
goto bail;
/* non-reserved operations */
if (likely(qp->s_avail))
return 0;
- smp_read_barrier_depends(); /* see rc.c */
slast = READ_ONCE(qp->s_last);
if (qp->s_head >= slast)
avail = qp->s_size - (qp->s_head - slast);
while (iter_cnt--) {
/* Validate we receive completion update */
- if (READ_ONCE(comp_done->done) == 1) {
- /* Read updated FW return value */
- smp_read_barrier_depends();
+ if (smp_load_acquire(&comp_done->done) == 1) { /* ^^^ */
if (p_fw_ret)
*p_fw_ret = comp_done->fw_return_code;
return 0;
return -1U;
/* Check they're not leading us off end of descriptors. */
- next = vhost16_to_cpu(vq, desc->next);
- /* Make sure compiler knows to grab that: we don't want it changing! */
- /* We will use the result as an index in an array, so most
- * architectures only need a compiler barrier here. */
- read_barrier_depends();
-
+ next = vhost16_to_cpu(vq, READ_ONCE(desc->next));
return next;
}
dname[name->len] = 0;
/* Make sure we always see the terminating NUL character */
- smp_wmb();
- dentry->d_name.name = dname;
+ smp_store_release(&dentry->d_name.name, dname); /* ^^^ */
dentry->d_lockref.count = 1;
dentry->d_flags = 0;
* retry it again when a d_move() does happen. So any garbage in the buffer
* due to mismatched pointer and length will be discarded.
*
- * Data dependency barrier is needed to make sure that we see that terminating
- * NUL. Alpha strikes again, film at 11...
+ * Load acquire is needed to make sure that we see that terminating NUL.
*/
static int prepend_name(char **buffer, int *buflen, const struct qstr *name)
{
- const char *dname = READ_ONCE(name->name);
+ const char *dname = smp_load_acquire(&name->name); /* ^^^ */
u32 dlen = READ_ONCE(name->len);
char *p;
- smp_read_barrier_depends();
-
*buflen -= dlen + 1;
if (*buflen < 0)
return -ENAMETOOLONG;
struct file * file = xchg(&fdt->fd[i], NULL);
if (file) {
filp_close(file, files);
- cond_resched_rcu_qs();
+ cond_resched();
}
}
i++;
* @p: The pointer to read, prior to dereferencing
*
* Return the value of the specified RCU-protected pointer, but omit
- * both the smp_read_barrier_depends() and the READ_ONCE(), because
- * caller holds genl mutex.
+ * the READ_ONCE(), because caller holds genl mutex.
*/
#define genl_dereference(p) \
rcu_dereference_protected(p, lockdep_genl_is_held())
* @ss: The nfnetlink subsystem ID
*
* Return the value of the specified RCU-protected pointer, but omit
- * both the smp_read_barrier_depends() and the READ_ONCE(), because
- * caller holds the NFNL subsystem mutex.
+ * the READ_ONCE(), because caller holds the NFNL subsystem mutex.
*/
#define nfnl_dereference(p, ss) \
rcu_dereference_protected(p, lockdep_nfnl_is_held(ss))
* when using it as a pointer, __PERCPU_REF_ATOMIC may be set in
* between contaminating the pointer value, meaning that
* READ_ONCE() is required when fetching it.
+ *
+ * The smp_read_barrier_depends() implied by READ_ONCE() pairs
+ * with smp_store_release() in __percpu_ref_switch_to_percpu().
*/
percpu_ptr = READ_ONCE(ref->percpu_count_ptr);
- /* paired with smp_store_release() in __percpu_ref_switch_to_percpu() */
- smp_read_barrier_depends();
-
/*
* Theoretically, the following could test just ATOMIC; however,
* then we'd have to mask off DEAD separately as DEAD may be
#define cond_resched_rcu_qs() \
do { \
if (!cond_resched()) \
- rcu_note_voluntary_context_switch(current); \
+ rcu_note_voluntary_context_switch_lite(current); \
} while (0)
/*
* @p: The pointer to read
*
* Return the value of the specified RCU-protected pointer, but omit the
- * smp_read_barrier_depends() and keep the READ_ONCE(). This is useful
- * when the value of this pointer is accessed, but the pointer is not
- * dereferenced, for example, when testing an RCU-protected pointer against
- * NULL. Although rcu_access_pointer() may also be used in cases where
- * update-side locks prevent the value of the pointer from changing, you
- * should instead use rcu_dereference_protected() for this use case.
+ * lockdep checks for being in an RCU read-side critical section. This is
+ * useful when the value of this pointer is accessed, but the pointer is
+ * not dereferenced, for example, when testing an RCU-protected pointer
+ * against NULL. Although rcu_access_pointer() may also be used in cases
+ * where update-side locks prevent the value of the pointer from changing,
+ * you should instead use rcu_dereference_protected() for this use case.
*
* It is also permissible to use rcu_access_pointer() when read-side
* access to the pointer was removed at least one grace period ago, as
* @c: The conditions under which the dereference will take place
*
* Return the value of the specified RCU-protected pointer, but omit
- * both the smp_read_barrier_depends() and the READ_ONCE(). This
- * is useful in cases where update-side locks prevent the value of the
- * pointer from changing. Please note that this primitive does *not*
- * prevent the compiler from repeating this reference or combining it
- * with other references, so it should not be used without protection
- * of appropriate locks.
+ * the READ_ONCE(). This is useful in cases where update-side locks
+ * prevent the value of the pointer from changing. Please note that this
+ * primitive does *not* prevent the compiler from repeating this reference
+ * or combining it with other references, so it should not be used without
+ * protection of appropriate locks.
*
* This function is only for update-side use. Using this function
* when protected only by rcu_read_lock() will result in infrequent
static inline void rcu_idle_enter(void) { }
static inline void rcu_idle_exit(void) { }
static inline void rcu_irq_enter(void) { }
-static inline bool rcu_irq_enter_disabled(void) { return false; }
static inline void rcu_irq_exit_irqson(void) { }
static inline void rcu_irq_enter_irqson(void) { }
static inline void rcu_irq_exit(void) { }
void rcu_irq_exit(void);
void rcu_irq_enter_irqson(void);
void rcu_irq_exit_irqson(void);
-bool rcu_irq_enter_disabled(void);
void exit_rcu(void);
* @p: The pointer to read, prior to dereferencing
*
* Return the value of the specified RCU-protected pointer, but omit
- * both the smp_read_barrier_depends() and the READ_ONCE(), because
- * caller holds RTNL.
+ * the READ_ONCE(), because caller holds RTNL.
*/
#define rtnl_dereference(p) \
rcu_dereference_protected(p, lockdep_rtnl_is_held())
static inline int raw_read_seqcount_latch(seqcount_t *s)
{
- int seq = READ_ONCE(s->sequence);
/* Pairs with the first smp_wmb() in raw_write_seqcount_latch() */
- smp_read_barrier_depends();
+ int seq = READ_ONCE(s->sequence); /* ^^^ */
return seq;
}
unsigned long srcu_unlock_count[2]; /* Unlocks per CPU. */
/* Update-side state. */
- raw_spinlock_t __private lock ____cacheline_internodealigned_in_smp;
+ spinlock_t __private lock ____cacheline_internodealigned_in_smp;
struct rcu_segcblist srcu_cblist; /* List of callbacks.*/
unsigned long srcu_gp_seq_needed; /* Furthest future GP needed. */
unsigned long srcu_gp_seq_needed_exp; /* Furthest future exp GP. */
* Node in SRCU combining tree, similar in function to rcu_data.
*/
struct srcu_node {
- raw_spinlock_t __private lock;
+ spinlock_t __private lock;
unsigned long srcu_have_cbs[4]; /* GP seq for children */
/* having CBs, but only */
/* is > ->srcu_gq_seq. */
struct srcu_node *level[RCU_NUM_LVLS + 1];
/* First node at each level. */
struct mutex srcu_cb_mutex; /* Serialize CB preparation. */
- raw_spinlock_t __private lock; /* Protect counters */
+ spinlock_t __private lock; /* Protect counters */
struct mutex srcu_gp_mutex; /* Serialize GP work. */
unsigned int srcu_idx; /* Current rdr array element. */
unsigned long srcu_gp_seq; /* Grace-period seq #. */
#define __SRCU_STRUCT_INIT(name) \
{ \
.sda = &name##_srcu_data, \
- .lock = __RAW_SPIN_LOCK_UNLOCKED(name.lock), \
+ .lock = __SPIN_LOCK_UNLOCKED(name.lock), \
.srcu_gp_seq_needed = 0 - 1, \
__SRCU_DEP_MAP_INIT(name) \
}
\
if (!(cond)) \
return; \
- if (rcucheck) { \
- if (WARN_ON_ONCE(rcu_irq_enter_disabled())) \
- return; \
+ if (rcucheck) \
rcu_irq_enter_irqson(); \
- } \
rcu_read_lock_sched_notrace(); \
it_func_ptr = rcu_dereference_sched((tp)->funcs); \
if (it_func_ptr) { \
__entry->grphi, __entry->gpevent)
);
+#ifdef CONFIG_RCU_NOCB_CPU
/*
* Tracepoint for RCU no-CBs CPU callback handoffs. This event is intended
* to assist debugging of these handoffs.
TP_printk("%s %d %s", __entry->rcuname, __entry->cpu, __entry->reason)
);
+#endif
/*
* Tracepoint for tasks blocking within preemptible-RCU read-side
/*
* Tracepoint for dyntick-idle entry/exit events. These take a string
- * as argument: "Start" for entering dyntick-idle mode, "End" for
- * leaving it, "--=" for events moving towards idle, and "++=" for events
- * moving away from idle. "Error on entry: not idle task" and "Error on
- * exit: not idle task" indicate that a non-idle task is erroneously
- * toying with the idle loop.
+ * as argument: "Start" for entering dyntick-idle mode, "Startirq" for
+ * entering it from irq/NMI, "End" for leaving it, "Endirq" for leaving it
+ * to irq/NMI, "--=" for events moving towards idle, and "++=" for events
+ * moving away from idle.
*
* These events also take a pair of numbers, which indicate the nesting
- * depth before and after the event of interest. Note that task-related
- * events use the upper bits of each number, while interrupt-related
- * events use the lower bits.
+ * depth before and after the event of interest, and a third number that is
+ * the ->dynticks counter. Note that task-related and interrupt-related
+ * events use two separate counters, and that the "++=" and "--=" events
+ * for irq/NMI will change the counter by two, otherwise by one.
*/
TRACE_EVENT(rcu_dyntick,
- TP_PROTO(const char *polarity, long long oldnesting, long long newnesting),
+ TP_PROTO(const char *polarity, long oldnesting, long newnesting, atomic_t dynticks),
- TP_ARGS(polarity, oldnesting, newnesting),
+ TP_ARGS(polarity, oldnesting, newnesting, dynticks),
TP_STRUCT__entry(
__field(const char *, polarity)
- __field(long long, oldnesting)
- __field(long long, newnesting)
+ __field(long, oldnesting)
+ __field(long, newnesting)
+ __field(int, dynticks)
),
TP_fast_assign(
__entry->polarity = polarity;
__entry->oldnesting = oldnesting;
__entry->newnesting = newnesting;
+ __entry->dynticks = atomic_read(&dynticks);
),
- TP_printk("%s %llx %llx", __entry->polarity,
- __entry->oldnesting, __entry->newnesting)
-);
-
-/*
- * Tracepoint for RCU preparation for idle, the goal being to get RCU
- * processing done so that the current CPU can shut off its scheduling
- * clock and enter dyntick-idle mode. One way to accomplish this is
- * to drain all RCU callbacks from this CPU, and the other is to have
- * done everything RCU requires for the current grace period. In this
- * latter case, the CPU will be awakened at the end of the current grace
- * period in order to process the remainder of its callbacks.
- *
- * These tracepoints take a string as argument:
- *
- * "No callbacks": Nothing to do, no callbacks on this CPU.
- * "In holdoff": Nothing to do, holding off after unsuccessful attempt.
- * "Begin holdoff": Attempt failed, don't retry until next jiffy.
- * "Dyntick with callbacks": Entering dyntick-idle despite callbacks.
- * "Dyntick with lazy callbacks": Entering dyntick-idle w/lazy callbacks.
- * "More callbacks": Still more callbacks, try again to clear them out.
- * "Callbacks drained": All callbacks processed, off to dyntick idle!
- * "Timer": Timer fired to cause CPU to continue processing callbacks.
- * "Demigrate": Timer fired on wrong CPU, woke up correct CPU.
- * "Cleanup after idle": Idle exited, timer canceled.
- */
-TRACE_EVENT(rcu_prep_idle,
-
- TP_PROTO(const char *reason),
-
- TP_ARGS(reason),
-
- TP_STRUCT__entry(
- __field(const char *, reason)
- ),
-
- TP_fast_assign(
- __entry->reason = reason;
- ),
-
- TP_printk("%s", __entry->reason)
+ TP_printk("%s %lx %lx %#3x", __entry->polarity,
+ __entry->oldnesting, __entry->newnesting,
+ __entry->dynticks & 0xfff)
);
/*
grplo, grphi, gp_tasks) do { } \
while (0)
#define trace_rcu_fqs(rcuname, gpnum, cpu, qsevent) do { } while (0)
-#define trace_rcu_dyntick(polarity, oldnesting, newnesting) do { } while (0)
-#define trace_rcu_prep_idle(reason) do { } while (0)
+#define trace_rcu_dyntick(polarity, oldnesting, newnesting, dyntick) do { } while (0)
#define trace_rcu_callback(rcuname, rhp, qlen_lazy, qlen) do { } while (0)
#define trace_rcu_kfree_callback(rcuname, rhp, offset, qlen_lazy, qlen) \
do { } while (0)
}
ret = 0;
- smp_wmb(); /* pairs with get_xol_area() */
- mm->uprobes_state.xol_area = area;
+ /* pairs with get_xol_area() */
+ smp_store_release(&mm->uprobes_state.xol_area, area); /* ^^^ */
fail:
up_write(&mm->mmap_sem);
if (!mm->uprobes_state.xol_area)
__create_xol_area(0);
- area = mm->uprobes_state.xol_area;
- smp_read_barrier_depends(); /* pairs with wmb in xol_add_vma() */
+ /* Pairs with xol_add_vma() smp_store_release() */
+ area = READ_ONCE(mm->uprobes_state.xol_area); /* ^^^ */
return area;
}
struct xol_area *area;
unsigned long trampoline_vaddr = -1;
- area = current->mm->uprobes_state.xol_area;
- smp_read_barrier_depends();
+ /* Pairs with xol_add_vma() smp_store_release() */
+ area = READ_ONCE(current->mm->uprobes_state.xol_area); /* ^^^ */
if (area)
trampoline_vaddr = area->vaddr;
* @tail : The new queue tail code word
* Return: The previous queue tail code word
*
- * xchg(lock, tail)
+ * xchg(lock, tail), which heads an address dependency
*
* p,*,* -> n,*,* ; prev = xchg(lock, node)
*/
if (old & _Q_TAIL_MASK) {
prev = decode_tail(old);
/*
- * The above xchg_tail() is also a load of @lock which generates,
- * through decode_tail(), a pointer.
- *
- * The address dependency matches the RELEASE of xchg_tail()
- * such that the access to @prev must happen after.
+ * The above xchg_tail() is also a load of @lock which
+ * generates, through decode_tail(), a pointer. The address
+ * dependency matches the RELEASE of xchg_tail() such that
+ * the subsequent access to @prev happens after.
*/
- smp_read_barrier_depends();
WRITE_ONCE(prev->next, node);
#define RCU_TRACE(stmt)
#endif /* #else #ifdef CONFIG_RCU_TRACE */
-/*
- * Process-level increment to ->dynticks_nesting field. This allows for
- * architectures that use half-interrupts and half-exceptions from
- * process context.
- *
- * DYNTICK_TASK_NEST_MASK defines a field of width DYNTICK_TASK_NEST_WIDTH
- * that counts the number of process-based reasons why RCU cannot
- * consider the corresponding CPU to be idle, and DYNTICK_TASK_NEST_VALUE
- * is the value used to increment or decrement this field.
- *
- * The rest of the bits could in principle be used to count interrupts,
- * but this would mean that a negative-one value in the interrupt
- * field could incorrectly zero out the DYNTICK_TASK_NEST_MASK field.
- * We therefore provide a two-bit guard field defined by DYNTICK_TASK_MASK
- * that is set to DYNTICK_TASK_FLAG upon initial exit from idle.
- * The DYNTICK_TASK_EXIT_IDLE value is thus the combined value used upon
- * initial exit from idle.
- */
-#define DYNTICK_TASK_NEST_WIDTH 7
-#define DYNTICK_TASK_NEST_VALUE ((LLONG_MAX >> DYNTICK_TASK_NEST_WIDTH) + 1)
-#define DYNTICK_TASK_NEST_MASK (LLONG_MAX - DYNTICK_TASK_NEST_VALUE + 1)
-#define DYNTICK_TASK_FLAG ((DYNTICK_TASK_NEST_VALUE / 8) * 2)
-#define DYNTICK_TASK_MASK ((DYNTICK_TASK_NEST_VALUE / 8) * 3)
-#define DYNTICK_TASK_EXIT_IDLE (DYNTICK_TASK_NEST_VALUE + \
- DYNTICK_TASK_FLAG)
+/* Offset to allow for unmatched rcu_irq_{enter,exit}(). */
+#define DYNTICK_IRQ_NONIDLE ((LONG_MAX / 2) + 1)
/*
static void srcu_reschedule(struct srcu_struct *sp, unsigned long delay);
static void process_srcu(struct work_struct *work);
+/* Wrappers for lock acquisition and release, see raw_spin_lock_rcu_node(). */
+#define spin_lock_rcu_node(p) \
+do { \
+ spin_lock(&ACCESS_PRIVATE(p, lock)); \
+ smp_mb__after_unlock_lock(); \
+} while (0)
+
+#define spin_unlock_rcu_node(p) spin_unlock(&ACCESS_PRIVATE(p, lock))
+
+#define spin_lock_irq_rcu_node(p) \
+do { \
+ spin_lock_irq(&ACCESS_PRIVATE(p, lock)); \
+ smp_mb__after_unlock_lock(); \
+} while (0)
+
+#define spin_unlock_irq_rcu_node(p) \
+ spin_unlock_irq(&ACCESS_PRIVATE(p, lock))
+
+#define spin_lock_irqsave_rcu_node(p, flags) \
+do { \
+ spin_lock_irqsave(&ACCESS_PRIVATE(p, lock), flags); \
+ smp_mb__after_unlock_lock(); \
+} while (0)
+
+#define spin_unlock_irqrestore_rcu_node(p, flags) \
+ spin_unlock_irqrestore(&ACCESS_PRIVATE(p, lock), flags) \
+
/*
* Initialize SRCU combining tree. Note that statically allocated
* srcu_struct structures might already have srcu_read_lock() and
/* Each pass through this loop initializes one srcu_node structure. */
rcu_for_each_node_breadth_first(sp, snp) {
- raw_spin_lock_init(&ACCESS_PRIVATE(snp, lock));
+ spin_lock_init(&ACCESS_PRIVATE(snp, lock));
WARN_ON_ONCE(ARRAY_SIZE(snp->srcu_have_cbs) !=
ARRAY_SIZE(snp->srcu_data_have_cbs));
for (i = 0; i < ARRAY_SIZE(snp->srcu_have_cbs); i++) {
snp_first = sp->level[level];
for_each_possible_cpu(cpu) {
sdp = per_cpu_ptr(sp->sda, cpu);
- raw_spin_lock_init(&ACCESS_PRIVATE(sdp, lock));
+ spin_lock_init(&ACCESS_PRIVATE(sdp, lock));
rcu_segcblist_init(&sdp->srcu_cblist);
sdp->srcu_cblist_invoking = false;
sdp->srcu_gp_seq_needed = sp->srcu_gp_seq;
/* Don't re-initialize a lock while it is held. */
debug_check_no_locks_freed((void *)sp, sizeof(*sp));
lockdep_init_map(&sp->dep_map, name, key, 0);
- raw_spin_lock_init(&ACCESS_PRIVATE(sp, lock));
+ spin_lock_init(&ACCESS_PRIVATE(sp, lock));
return init_srcu_struct_fields(sp, false);
}
EXPORT_SYMBOL_GPL(__init_srcu_struct);
*/
int init_srcu_struct(struct srcu_struct *sp)
{
- raw_spin_lock_init(&ACCESS_PRIVATE(sp, lock));
+ spin_lock_init(&ACCESS_PRIVATE(sp, lock));
return init_srcu_struct_fields(sp, false);
}
EXPORT_SYMBOL_GPL(init_srcu_struct);
/* The smp_load_acquire() pairs with the smp_store_release(). */
if (!rcu_seq_state(smp_load_acquire(&sp->srcu_gp_seq_needed))) /*^^^*/
return; /* Already initialized. */
- raw_spin_lock_irqsave_rcu_node(sp, flags);
+ spin_lock_irqsave_rcu_node(sp, flags);
if (!rcu_seq_state(sp->srcu_gp_seq_needed)) {
- raw_spin_unlock_irqrestore_rcu_node(sp, flags);
+ spin_unlock_irqrestore_rcu_node(sp, flags);
return;
}
init_srcu_struct_fields(sp, true);
- raw_spin_unlock_irqrestore_rcu_node(sp, flags);
+ spin_unlock_irqrestore_rcu_node(sp, flags);
}
/*
mutex_lock(&sp->srcu_cb_mutex);
/* End the current grace period. */
- raw_spin_lock_irq_rcu_node(sp);
+ spin_lock_irq_rcu_node(sp);
idx = rcu_seq_state(sp->srcu_gp_seq);
WARN_ON_ONCE(idx != SRCU_STATE_SCAN2);
cbdelay = srcu_get_delay(sp);
gpseq = rcu_seq_current(&sp->srcu_gp_seq);
if (ULONG_CMP_LT(sp->srcu_gp_seq_needed_exp, gpseq))
sp->srcu_gp_seq_needed_exp = gpseq;
- raw_spin_unlock_irq_rcu_node(sp);
+ spin_unlock_irq_rcu_node(sp);
mutex_unlock(&sp->srcu_gp_mutex);
/* A new grace period can start at this point. But only one. */
idx = rcu_seq_ctr(gpseq) % ARRAY_SIZE(snp->srcu_have_cbs);
idxnext = (idx + 1) % ARRAY_SIZE(snp->srcu_have_cbs);
rcu_for_each_node_breadth_first(sp, snp) {
- raw_spin_lock_irq_rcu_node(snp);
+ spin_lock_irq_rcu_node(snp);
cbs = false;
if (snp >= sp->level[rcu_num_lvls - 1])
cbs = snp->srcu_have_cbs[idx] == gpseq;
snp->srcu_gp_seq_needed_exp = gpseq;
mask = snp->srcu_data_have_cbs[idx];
snp->srcu_data_have_cbs[idx] = 0;
- raw_spin_unlock_irq_rcu_node(snp);
+ spin_unlock_irq_rcu_node(snp);
if (cbs)
srcu_schedule_cbs_snp(sp, snp, mask, cbdelay);
if (!(gpseq & counter_wrap_check))
for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
sdp = per_cpu_ptr(sp->sda, cpu);
- raw_spin_lock_irqsave_rcu_node(sdp, flags);
+ spin_lock_irqsave_rcu_node(sdp, flags);
if (ULONG_CMP_GE(gpseq,
sdp->srcu_gp_seq_needed + 100))
sdp->srcu_gp_seq_needed = gpseq;
- raw_spin_unlock_irqrestore_rcu_node(sdp, flags);
+ spin_unlock_irqrestore_rcu_node(sdp, flags);
}
}
mutex_unlock(&sp->srcu_cb_mutex);
/* Start a new grace period if needed. */
- raw_spin_lock_irq_rcu_node(sp);
+ spin_lock_irq_rcu_node(sp);
gpseq = rcu_seq_current(&sp->srcu_gp_seq);
if (!rcu_seq_state(gpseq) &&
ULONG_CMP_LT(gpseq, sp->srcu_gp_seq_needed)) {
srcu_gp_start(sp);
- raw_spin_unlock_irq_rcu_node(sp);
+ spin_unlock_irq_rcu_node(sp);
/* Throttle expedited grace periods: Should be rare! */
srcu_reschedule(sp, rcu_seq_ctr(gpseq) & 0x3ff
? 0 : SRCU_INTERVAL);
} else {
- raw_spin_unlock_irq_rcu_node(sp);
+ spin_unlock_irq_rcu_node(sp);
}
}
if (rcu_seq_done(&sp->srcu_gp_seq, s) ||
ULONG_CMP_GE(READ_ONCE(snp->srcu_gp_seq_needed_exp), s))
return;
- raw_spin_lock_irqsave_rcu_node(snp, flags);
+ spin_lock_irqsave_rcu_node(snp, flags);
if (ULONG_CMP_GE(snp->srcu_gp_seq_needed_exp, s)) {
- raw_spin_unlock_irqrestore_rcu_node(snp, flags);
+ spin_unlock_irqrestore_rcu_node(snp, flags);
return;
}
WRITE_ONCE(snp->srcu_gp_seq_needed_exp, s);
- raw_spin_unlock_irqrestore_rcu_node(snp, flags);
+ spin_unlock_irqrestore_rcu_node(snp, flags);
}
- raw_spin_lock_irqsave_rcu_node(sp, flags);
+ spin_lock_irqsave_rcu_node(sp, flags);
if (!ULONG_CMP_LT(sp->srcu_gp_seq_needed_exp, s))
sp->srcu_gp_seq_needed_exp = s;
- raw_spin_unlock_irqrestore_rcu_node(sp, flags);
+ spin_unlock_irqrestore_rcu_node(sp, flags);
}
/*
for (; snp != NULL; snp = snp->srcu_parent) {
if (rcu_seq_done(&sp->srcu_gp_seq, s) && snp != sdp->mynode)
return; /* GP already done and CBs recorded. */
- raw_spin_lock_irqsave_rcu_node(snp, flags);
+ spin_lock_irqsave_rcu_node(snp, flags);
if (ULONG_CMP_GE(snp->srcu_have_cbs[idx], s)) {
snp_seq = snp->srcu_have_cbs[idx];
if (snp == sdp->mynode && snp_seq == s)
snp->srcu_data_have_cbs[idx] |= sdp->grpmask;
- raw_spin_unlock_irqrestore_rcu_node(snp, flags);
+ spin_unlock_irqrestore_rcu_node(snp, flags);
if (snp == sdp->mynode && snp_seq != s) {
srcu_schedule_cbs_sdp(sdp, do_norm
? SRCU_INTERVAL
snp->srcu_data_have_cbs[idx] |= sdp->grpmask;
if (!do_norm && ULONG_CMP_LT(snp->srcu_gp_seq_needed_exp, s))
snp->srcu_gp_seq_needed_exp = s;
- raw_spin_unlock_irqrestore_rcu_node(snp, flags);
+ spin_unlock_irqrestore_rcu_node(snp, flags);
}
/* Top of tree, must ensure the grace period will be started. */
- raw_spin_lock_irqsave_rcu_node(sp, flags);
+ spin_lock_irqsave_rcu_node(sp, flags);
if (ULONG_CMP_LT(sp->srcu_gp_seq_needed, s)) {
/*
* Record need for grace period s. Pair with load
queue_delayed_work(system_power_efficient_wq, &sp->work,
srcu_get_delay(sp));
}
- raw_spin_unlock_irqrestore_rcu_node(sp, flags);
+ spin_unlock_irqrestore_rcu_node(sp, flags);
}
/*
rhp->func = func;
local_irq_save(flags);
sdp = this_cpu_ptr(sp->sda);
- raw_spin_lock_rcu_node(sdp);
+ spin_lock_rcu_node(sdp);
rcu_segcblist_enqueue(&sdp->srcu_cblist, rhp, false);
rcu_segcblist_advance(&sdp->srcu_cblist,
rcu_seq_current(&sp->srcu_gp_seq));
sdp->srcu_gp_seq_needed_exp = s;
needexp = true;
}
- raw_spin_unlock_irqrestore_rcu_node(sdp, flags);
+ spin_unlock_irqrestore_rcu_node(sdp, flags);
if (needgp)
srcu_funnel_gp_start(sp, sdp, s, do_norm);
else if (needexp)
/*
* Make sure that later code is ordered after the SRCU grace
- * period. This pairs with the raw_spin_lock_irq_rcu_node()
+ * period. This pairs with the spin_lock_irq_rcu_node()
* in srcu_invoke_callbacks(). Unlike Tree RCU, this is needed
* because the current CPU might have been totally uninvolved with
* (and thus unordered against) that grace period.
*/
for_each_possible_cpu(cpu) {
sdp = per_cpu_ptr(sp->sda, cpu);
- raw_spin_lock_irq_rcu_node(sdp);
+ spin_lock_irq_rcu_node(sdp);
atomic_inc(&sp->srcu_barrier_cpu_cnt);
sdp->srcu_barrier_head.func = srcu_barrier_cb;
debug_rcu_head_queue(&sdp->srcu_barrier_head);
debug_rcu_head_unqueue(&sdp->srcu_barrier_head);
atomic_dec(&sp->srcu_barrier_cpu_cnt);
}
- raw_spin_unlock_irq_rcu_node(sdp);
+ spin_unlock_irq_rcu_node(sdp);
}
/* Remove the initial count, at which point reaching zero can happen. */
*/
idx = rcu_seq_state(smp_load_acquire(&sp->srcu_gp_seq)); /* ^^^ */
if (idx == SRCU_STATE_IDLE) {
- raw_spin_lock_irq_rcu_node(sp);
+ spin_lock_irq_rcu_node(sp);
if (ULONG_CMP_GE(sp->srcu_gp_seq, sp->srcu_gp_seq_needed)) {
WARN_ON_ONCE(rcu_seq_state(sp->srcu_gp_seq));
- raw_spin_unlock_irq_rcu_node(sp);
+ spin_unlock_irq_rcu_node(sp);
mutex_unlock(&sp->srcu_gp_mutex);
return;
}
idx = rcu_seq_state(READ_ONCE(sp->srcu_gp_seq));
if (idx == SRCU_STATE_IDLE)
srcu_gp_start(sp);
- raw_spin_unlock_irq_rcu_node(sp);
+ spin_unlock_irq_rcu_node(sp);
if (idx != SRCU_STATE_IDLE) {
mutex_unlock(&sp->srcu_gp_mutex);
return; /* Someone else started the grace period. */
sdp = container_of(work, struct srcu_data, work.work);
sp = sdp->sp;
rcu_cblist_init(&ready_cbs);
- raw_spin_lock_irq_rcu_node(sdp);
+ spin_lock_irq_rcu_node(sdp);
rcu_segcblist_advance(&sdp->srcu_cblist,
rcu_seq_current(&sp->srcu_gp_seq));
if (sdp->srcu_cblist_invoking ||
!rcu_segcblist_ready_cbs(&sdp->srcu_cblist)) {
- raw_spin_unlock_irq_rcu_node(sdp);
+ spin_unlock_irq_rcu_node(sdp);
return; /* Someone else on the job or nothing to do. */
}
/* We are on the job! Extract and invoke ready callbacks. */
sdp->srcu_cblist_invoking = true;
rcu_segcblist_extract_done_cbs(&sdp->srcu_cblist, &ready_cbs);
- raw_spin_unlock_irq_rcu_node(sdp);
+ spin_unlock_irq_rcu_node(sdp);
rhp = rcu_cblist_dequeue(&ready_cbs);
for (; rhp != NULL; rhp = rcu_cblist_dequeue(&ready_cbs)) {
debug_rcu_head_unqueue(rhp);
* Update counts, accelerate new callbacks, and if needed,
* schedule another round of callback invocation.
*/
- raw_spin_lock_irq_rcu_node(sdp);
+ spin_lock_irq_rcu_node(sdp);
rcu_segcblist_insert_count(&sdp->srcu_cblist, &ready_cbs);
(void)rcu_segcblist_accelerate(&sdp->srcu_cblist,
rcu_seq_snap(&sp->srcu_gp_seq));
sdp->srcu_cblist_invoking = false;
more = rcu_segcblist_ready_cbs(&sdp->srcu_cblist);
- raw_spin_unlock_irq_rcu_node(sdp);
+ spin_unlock_irq_rcu_node(sdp);
if (more)
srcu_schedule_cbs_sdp(sdp, 0);
}
{
bool pushgp = true;
- raw_spin_lock_irq_rcu_node(sp);
+ spin_lock_irq_rcu_node(sp);
if (ULONG_CMP_GE(sp->srcu_gp_seq, sp->srcu_gp_seq_needed)) {
if (!WARN_ON_ONCE(rcu_seq_state(sp->srcu_gp_seq))) {
/* All requests fulfilled, time to go idle. */
/* Outstanding request and no GP. Start one. */
srcu_gp_start(sp);
}
- raw_spin_unlock_irq_rcu_node(sp);
+ spin_unlock_irq_rcu_node(sp);
if (pushgp)
queue_delayed_work(system_power_efficient_wq, &sp->work, delay);
#endif
static DEFINE_PER_CPU(struct rcu_dynticks, rcu_dynticks) = {
- .dynticks_nesting = DYNTICK_TASK_EXIT_IDLE,
+ .dynticks_nesting = 1,
+ .dynticks_nmi_nesting = DYNTICK_IRQ_NONIDLE,
.dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR),
};
-/*
- * There's a few places, currently just in the tracing infrastructure,
- * that uses rcu_irq_enter() to make sure RCU is watching. But there's
- * a small location where that will not even work. In those cases
- * rcu_irq_enter_disabled() needs to be checked to make sure rcu_irq_enter()
- * can be called.
- */
-static DEFINE_PER_CPU(bool, disable_rcu_irq_enter);
-
-bool rcu_irq_enter_disabled(void)
-{
- return this_cpu_read(disable_rcu_irq_enter);
-}
-
/*
* Record entry into an extended quiescent state. This is only to be
* called when not already in an extended quiescent state.
}
/*
- * rcu_eqs_enter_common - current CPU is entering an extended quiescent state
+ * Enter an RCU extended quiescent state, which can be either the
+ * idle loop or adaptive-tickless usermode execution.
*
- * Enter idle, doing appropriate accounting. The caller must have
- * disabled interrupts.
+ * We crowbar the ->dynticks_nmi_nesting field to zero to allow for
+ * the possibility of usermode upcalls having messed up our count
+ * of interrupt nesting level during the prior busy period.
*/
-static void rcu_eqs_enter_common(bool user)
+static void rcu_eqs_enter(bool user)
{
struct rcu_state *rsp;
struct rcu_data *rdp;
- struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
+ struct rcu_dynticks *rdtp;
- lockdep_assert_irqs_disabled();
- trace_rcu_dyntick(TPS("Start"), rdtp->dynticks_nesting, 0);
- if (IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
- !user && !is_idle_task(current)) {
- struct task_struct *idle __maybe_unused =
- idle_task(smp_processor_id());
-
- trace_rcu_dyntick(TPS("Error on entry: not idle task"), rdtp->dynticks_nesting, 0);
- rcu_ftrace_dump(DUMP_ORIG);
- WARN_ONCE(1, "Current pid: %d comm: %s / Idle pid: %d comm: %s",
- current->pid, current->comm,
- idle->pid, idle->comm); /* must be idle task! */
+ rdtp = this_cpu_ptr(&rcu_dynticks);
+ WRITE_ONCE(rdtp->dynticks_nmi_nesting, 0);
+ WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
+ rdtp->dynticks_nesting == 0);
+ if (rdtp->dynticks_nesting != 1) {
+ rdtp->dynticks_nesting--;
+ return;
}
+
+ lockdep_assert_irqs_disabled();
+ trace_rcu_dyntick(TPS("Start"), rdtp->dynticks_nesting, 0, rdtp->dynticks);
+ WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current));
for_each_rcu_flavor(rsp) {
rdp = this_cpu_ptr(rsp->rda);
do_nocb_deferred_wakeup(rdp);
}
rcu_prepare_for_idle();
- __this_cpu_inc(disable_rcu_irq_enter);
- rdtp->dynticks_nesting = 0; /* Breaks tracing momentarily. */
- rcu_dynticks_eqs_enter(); /* After this, tracing works again. */
- __this_cpu_dec(disable_rcu_irq_enter);
+ WRITE_ONCE(rdtp->dynticks_nesting, 0); /* Avoid irq-access tearing. */
+ rcu_dynticks_eqs_enter();
rcu_dynticks_task_enter();
-
- /*
- * It is illegal to enter an extended quiescent state while
- * in an RCU read-side critical section.
- */
- RCU_LOCKDEP_WARN(lock_is_held(&rcu_lock_map),
- "Illegal idle entry in RCU read-side critical section.");
- RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map),
- "Illegal idle entry in RCU-bh read-side critical section.");
- RCU_LOCKDEP_WARN(lock_is_held(&rcu_sched_lock_map),
- "Illegal idle entry in RCU-sched read-side critical section.");
-}
-
-/*
- * Enter an RCU extended quiescent state, which can be either the
- * idle loop or adaptive-tickless usermode execution.
- */
-static void rcu_eqs_enter(bool user)
-{
- struct rcu_dynticks *rdtp;
-
- rdtp = this_cpu_ptr(&rcu_dynticks);
- WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
- (rdtp->dynticks_nesting & DYNTICK_TASK_NEST_MASK) == 0);
- if ((rdtp->dynticks_nesting & DYNTICK_TASK_NEST_MASK) == DYNTICK_TASK_NEST_VALUE)
- rcu_eqs_enter_common(user);
- else
- rdtp->dynticks_nesting -= DYNTICK_TASK_NEST_VALUE;
}
/**
* critical sections can occur in irq handlers in idle, a possibility
* handled by irq_enter() and irq_exit().)
*
- * We crowbar the ->dynticks_nesting field to zero to allow for
- * the possibility of usermode upcalls having messed up our count
- * of interrupt nesting level during the prior busy period.
- *
* If you add or remove a call to rcu_idle_enter(), be sure to test with
* CONFIG_RCU_EQS_DEBUG=y.
*/
}
#endif /* CONFIG_NO_HZ_FULL */
+/**
+ * rcu_nmi_exit - inform RCU of exit from NMI context
+ *
+ * If we are returning from the outermost NMI handler that interrupted an
+ * RCU-idle period, update rdtp->dynticks and rdtp->dynticks_nmi_nesting
+ * to let the RCU grace-period handling know that the CPU is back to
+ * being RCU-idle.
+ *
+ * If you add or remove a call to rcu_nmi_exit(), be sure to test
+ * with CONFIG_RCU_EQS_DEBUG=y.
+ */
+void rcu_nmi_exit(void)
+{
+ struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
+
+ /*
+ * Check for ->dynticks_nmi_nesting underflow and bad ->dynticks.
+ * (We are exiting an NMI handler, so RCU better be paying attention
+ * to us!)
+ */
+ WARN_ON_ONCE(rdtp->dynticks_nmi_nesting <= 0);
+ WARN_ON_ONCE(rcu_dynticks_curr_cpu_in_eqs());
+
+ /*
+ * If the nesting level is not 1, the CPU wasn't RCU-idle, so
+ * leave it in non-RCU-idle state.
+ */
+ if (rdtp->dynticks_nmi_nesting != 1) {
+ trace_rcu_dyntick(TPS("--="), rdtp->dynticks_nmi_nesting, rdtp->dynticks_nmi_nesting - 2, rdtp->dynticks);
+ WRITE_ONCE(rdtp->dynticks_nmi_nesting, /* No store tearing. */
+ rdtp->dynticks_nmi_nesting - 2);
+ return;
+ }
+
+ /* This NMI interrupted an RCU-idle CPU, restore RCU-idleness. */
+ trace_rcu_dyntick(TPS("Startirq"), rdtp->dynticks_nmi_nesting, 0, rdtp->dynticks);
+ WRITE_ONCE(rdtp->dynticks_nmi_nesting, 0); /* Avoid store tearing. */
+ rcu_dynticks_eqs_enter();
+}
+
/**
* rcu_irq_exit - inform RCU that current CPU is exiting irq towards idle
*
*
* This code assumes that the idle loop never does anything that might
* result in unbalanced calls to irq_enter() and irq_exit(). If your
- * architecture violates this assumption, RCU will give you what you
- * deserve, good and hard. But very infrequently and irreproducibly.
+ * architecture's idle loop violates this assumption, RCU will give you what
+ * you deserve, good and hard. But very infrequently and irreproducibly.
*
* Use things like work queues to work around this limitation.
*
*/
void rcu_irq_exit(void)
{
- struct rcu_dynticks *rdtp;
+ struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
lockdep_assert_irqs_disabled();
- rdtp = this_cpu_ptr(&rcu_dynticks);
-
- /* Page faults can happen in NMI handlers, so check... */
- if (rdtp->dynticks_nmi_nesting)
- return;
-
- WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
- rdtp->dynticks_nesting < 1);
- if (rdtp->dynticks_nesting <= 1) {
- rcu_eqs_enter_common(true);
- } else {
- trace_rcu_dyntick(TPS("--="), rdtp->dynticks_nesting, rdtp->dynticks_nesting - 1);
- rdtp->dynticks_nesting--;
- }
+ if (rdtp->dynticks_nmi_nesting == 1)
+ rcu_prepare_for_idle();
+ rcu_nmi_exit();
+ if (rdtp->dynticks_nmi_nesting == 0)
+ rcu_dynticks_task_enter();
}
/*
local_irq_restore(flags);
}
-/*
- * rcu_eqs_exit_common - current CPU moving away from extended quiescent state
- *
- * If the new value of the ->dynticks_nesting counter was previously zero,
- * we really have exited idle, and must do the appropriate accounting.
- * The caller must have disabled interrupts.
- */
-static void rcu_eqs_exit_common(long long oldval, int user)
-{
- RCU_TRACE(struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);)
-
- rcu_dynticks_task_exit();
- rcu_dynticks_eqs_exit();
- rcu_cleanup_after_idle();
- trace_rcu_dyntick(TPS("End"), oldval, rdtp->dynticks_nesting);
- if (IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
- !user && !is_idle_task(current)) {
- struct task_struct *idle __maybe_unused =
- idle_task(smp_processor_id());
-
- trace_rcu_dyntick(TPS("Error on exit: not idle task"),
- oldval, rdtp->dynticks_nesting);
- rcu_ftrace_dump(DUMP_ORIG);
- WARN_ONCE(1, "Current pid: %d comm: %s / Idle pid: %d comm: %s",
- current->pid, current->comm,
- idle->pid, idle->comm); /* must be idle task! */
- }
-}
-
/*
* Exit an RCU extended quiescent state, which can be either the
* idle loop or adaptive-tickless usermode execution.
+ *
+ * We crowbar the ->dynticks_nmi_nesting field to DYNTICK_IRQ_NONIDLE to
+ * allow for the possibility of usermode upcalls messing up our count of
+ * interrupt nesting level during the busy period that is just now starting.
*/
static void rcu_eqs_exit(bool user)
{
struct rcu_dynticks *rdtp;
- long long oldval;
+ long oldval;
lockdep_assert_irqs_disabled();
rdtp = this_cpu_ptr(&rcu_dynticks);
oldval = rdtp->dynticks_nesting;
WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && oldval < 0);
- if (oldval & DYNTICK_TASK_NEST_MASK) {
- rdtp->dynticks_nesting += DYNTICK_TASK_NEST_VALUE;
- } else {
- __this_cpu_inc(disable_rcu_irq_enter);
- rdtp->dynticks_nesting = DYNTICK_TASK_EXIT_IDLE;
- rcu_eqs_exit_common(oldval, user);
- __this_cpu_dec(disable_rcu_irq_enter);
+ if (oldval) {
+ rdtp->dynticks_nesting++;
+ return;
}
+ rcu_dynticks_task_exit();
+ rcu_dynticks_eqs_exit();
+ rcu_cleanup_after_idle();
+ trace_rcu_dyntick(TPS("End"), rdtp->dynticks_nesting, 1, rdtp->dynticks);
+ WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current));
+ WRITE_ONCE(rdtp->dynticks_nesting, 1);
+ WRITE_ONCE(rdtp->dynticks_nmi_nesting, DYNTICK_IRQ_NONIDLE);
}
/**
* Exit idle mode, in other words, -enter- the mode in which RCU
* read-side critical sections can occur.
*
- * We crowbar the ->dynticks_nesting field to DYNTICK_TASK_NEST to
- * allow for the possibility of usermode upcalls messing up our count
- * of interrupt nesting level during the busy period that is just
- * now starting.
- *
* If you add or remove a call to rcu_idle_exit(), be sure to test with
* CONFIG_RCU_EQS_DEBUG=y.
*/
}
#endif /* CONFIG_NO_HZ_FULL */
-/**
- * rcu_irq_enter - inform RCU that current CPU is entering irq away from idle
- *
- * Enter an interrupt handler, which might possibly result in exiting
- * idle mode, in other words, entering the mode in which read-side critical
- * sections can occur. The caller must have disabled interrupts.
- *
- * Note that the Linux kernel is fully capable of entering an interrupt
- * handler that it never exits, for example when doing upcalls to
- * user mode! This code assumes that the idle loop never does upcalls to
- * user mode. If your architecture does do upcalls from the idle loop (or
- * does anything else that results in unbalanced calls to the irq_enter()
- * and irq_exit() functions), RCU will give you what you deserve, good
- * and hard. But very infrequently and irreproducibly.
- *
- * Use things like work queues to work around this limitation.
- *
- * You have been warned.
- *
- * If you add or remove a call to rcu_irq_enter(), be sure to test with
- * CONFIG_RCU_EQS_DEBUG=y.
- */
-void rcu_irq_enter(void)
-{
- struct rcu_dynticks *rdtp;
- long long oldval;
-
- lockdep_assert_irqs_disabled();
- rdtp = this_cpu_ptr(&rcu_dynticks);
-
- /* Page faults can happen in NMI handlers, so check... */
- if (rdtp->dynticks_nmi_nesting)
- return;
-
- oldval = rdtp->dynticks_nesting;
- rdtp->dynticks_nesting++;
- WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
- rdtp->dynticks_nesting == 0);
- if (oldval)
- trace_rcu_dyntick(TPS("++="), oldval, rdtp->dynticks_nesting);
- else
- rcu_eqs_exit_common(oldval, true);
-}
-
-/*
- * Wrapper for rcu_irq_enter() where interrupts are enabled.
- *
- * If you add or remove a call to rcu_irq_enter_irqson(), be sure to test
- * with CONFIG_RCU_EQS_DEBUG=y.
- */
-void rcu_irq_enter_irqson(void)
-{
- unsigned long flags;
-
- local_irq_save(flags);
- rcu_irq_enter();
- local_irq_restore(flags);
-}
-
/**
* rcu_nmi_enter - inform RCU of entry to NMI context
*
void rcu_nmi_enter(void)
{
struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
- int incby = 2;
+ long incby = 2;
/* Complain about underflow. */
WARN_ON_ONCE(rdtp->dynticks_nmi_nesting < 0);
rcu_dynticks_eqs_exit();
incby = 1;
}
- rdtp->dynticks_nmi_nesting += incby;
+ trace_rcu_dyntick(incby == 1 ? TPS("Endirq") : TPS("++="),
+ rdtp->dynticks_nmi_nesting,
+ rdtp->dynticks_nmi_nesting + incby, rdtp->dynticks);
+ WRITE_ONCE(rdtp->dynticks_nmi_nesting, /* Prevent store tearing. */
+ rdtp->dynticks_nmi_nesting + incby);
barrier();
}
/**
- * rcu_nmi_exit - inform RCU of exit from NMI context
+ * rcu_irq_enter - inform RCU that current CPU is entering irq away from idle
*
- * If we are returning from the outermost NMI handler that interrupted an
- * RCU-idle period, update rdtp->dynticks and rdtp->dynticks_nmi_nesting
- * to let the RCU grace-period handling know that the CPU is back to
- * being RCU-idle.
+ * Enter an interrupt handler, which might possibly result in exiting
+ * idle mode, in other words, entering the mode in which read-side critical
+ * sections can occur. The caller must have disabled interrupts.
*
- * If you add or remove a call to rcu_nmi_exit(), be sure to test
- * with CONFIG_RCU_EQS_DEBUG=y.
+ * Note that the Linux kernel is fully capable of entering an interrupt
+ * handler that it never exits, for example when doing upcalls to user mode!
+ * This code assumes that the idle loop never does upcalls to user mode.
+ * If your architecture's idle loop does do upcalls to user mode (or does
+ * anything else that results in unbalanced calls to the irq_enter() and
+ * irq_exit() functions), RCU will give you what you deserve, good and hard.
+ * But very infrequently and irreproducibly.
+ *
+ * Use things like work queues to work around this limitation.
+ *
+ * You have been warned.
+ *
+ * If you add or remove a call to rcu_irq_enter(), be sure to test with
+ * CONFIG_RCU_EQS_DEBUG=y.
*/
-void rcu_nmi_exit(void)
+void rcu_irq_enter(void)
{
struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
- /*
- * Check for ->dynticks_nmi_nesting underflow and bad ->dynticks.
- * (We are exiting an NMI handler, so RCU better be paying attention
- * to us!)
- */
- WARN_ON_ONCE(rdtp->dynticks_nmi_nesting <= 0);
- WARN_ON_ONCE(rcu_dynticks_curr_cpu_in_eqs());
+ lockdep_assert_irqs_disabled();
+ if (rdtp->dynticks_nmi_nesting == 0)
+ rcu_dynticks_task_exit();
+ rcu_nmi_enter();
+ if (rdtp->dynticks_nmi_nesting == 1)
+ rcu_cleanup_after_idle();
+}
- /*
- * If the nesting level is not 1, the CPU wasn't RCU-idle, so
- * leave it in non-RCU-idle state.
- */
- if (rdtp->dynticks_nmi_nesting != 1) {
- rdtp->dynticks_nmi_nesting -= 2;
- return;
- }
+/*
+ * Wrapper for rcu_irq_enter() where interrupts are enabled.
+ *
+ * If you add or remove a call to rcu_irq_enter_irqson(), be sure to test
+ * with CONFIG_RCU_EQS_DEBUG=y.
+ */
+void rcu_irq_enter_irqson(void)
+{
+ unsigned long flags;
- /* This NMI interrupted an RCU-idle CPU, restore RCU-idleness. */
- rdtp->dynticks_nmi_nesting = 0;
- rcu_dynticks_eqs_enter();
+ local_irq_save(flags);
+ rcu_irq_enter();
+ local_irq_restore(flags);
}
/**
*/
static int rcu_is_cpu_rrupt_from_idle(void)
{
- return __this_cpu_read(rcu_dynticks.dynticks_nesting) <= 1;
+ return __this_cpu_read(rcu_dynticks.dynticks_nesting) <= 0 &&
+ __this_cpu_read(rcu_dynticks.dynticks_nmi_nesting) <= 1;
}
/*
rdp->n_force_qs_snap = rsp->n_force_qs;
} else if (count < rdp->qlen_last_fqs_check - qhimark)
rdp->qlen_last_fqs_check = count;
+
+ /*
+ * The following usually indicates a double call_rcu(). To track
+ * this down, try building with CONFIG_DEBUG_OBJECTS_RCU_HEAD=y.
+ */
WARN_ON_ONCE(rcu_segcblist_empty(&rdp->cblist) != (count == 0));
local_irq_restore(flags);
raw_spin_lock_irqsave_rcu_node(rnp, flags);
rdp->grpmask = leaf_node_cpu_bit(rdp->mynode, cpu);
rdp->dynticks = &per_cpu(rcu_dynticks, cpu);
- WARN_ON_ONCE(rdp->dynticks->dynticks_nesting != DYNTICK_TASK_EXIT_IDLE);
+ WARN_ON_ONCE(rdp->dynticks->dynticks_nesting != 1);
WARN_ON_ONCE(rcu_dynticks_in_eqs(rcu_dynticks_snap(rdp->dynticks)));
rdp->cpu = cpu;
rdp->rsp = rsp;
if (rcu_segcblist_empty(&rdp->cblist) && /* No early-boot CBs? */
!init_nocb_callback_list(rdp))
rcu_segcblist_init(&rdp->cblist); /* Re-enable callbacks. */
- rdp->dynticks->dynticks_nesting = DYNTICK_TASK_EXIT_IDLE;
+ rdp->dynticks->dynticks_nesting = 1; /* CPU not up, no tearing. */
rcu_dynticks_eqs_online();
raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */
* Dynticks per-CPU state.
*/
struct rcu_dynticks {
- long long dynticks_nesting; /* Track irq/process nesting level. */
- /* Process level is worth LLONG_MAX/2. */
- int dynticks_nmi_nesting; /* Track NMI nesting level. */
+ long dynticks_nesting; /* Track process nesting level. */
+ long dynticks_nmi_nesting; /* Track irq/NMI nesting level. */
atomic_t dynticks; /* Even value for idle, else odd. */
bool rcu_need_heavy_qs; /* GP old, need heavy quiescent state. */
unsigned long rcu_qs_ctr; /* Light universal quiescent state ctr. */
#ifdef CONFIG_RCU_NOCB_CPU
static cpumask_var_t rcu_nocb_mask; /* CPUs to have callbacks offloaded. */
-static bool have_rcu_nocb_mask; /* Was rcu_nocb_mask allocated? */
static bool __read_mostly rcu_nocb_poll; /* Offload kthread are to poll. */
#endif /* #ifdef CONFIG_RCU_NOCB_CPU */
}
print_cpu_stall_fast_no_hz(fast_no_hz, cpu);
delta = rdp->mynode->gpnum - rdp->rcu_iw_gpnum;
- pr_err("\t%d-%c%c%c%c: (%lu %s) idle=%03x/%llx/%d softirq=%u/%u fqs=%ld %s\n",
+ pr_err("\t%d-%c%c%c%c: (%lu %s) idle=%03x/%ld/%ld softirq=%u/%u fqs=%ld %s\n",
cpu,
"O."[!!cpu_online(cpu)],
"o."[!!(rdp->grpmask & rdp->mynode->qsmaskinit)],
static int __init rcu_nocb_setup(char *str)
{
alloc_bootmem_cpumask_var(&rcu_nocb_mask);
- have_rcu_nocb_mask = true;
cpulist_parse(str, rcu_nocb_mask);
return 1;
}
/* Is the specified CPU a no-CBs CPU? */
bool rcu_is_nocb_cpu(int cpu)
{
- if (have_rcu_nocb_mask)
+ if (cpumask_available(rcu_nocb_mask))
return cpumask_test_cpu(cpu, rcu_nocb_mask);
return false;
}
need_rcu_nocb_mask = true;
#endif /* #if defined(CONFIG_NO_HZ_FULL) */
- if (!have_rcu_nocb_mask && need_rcu_nocb_mask) {
+ if (!cpumask_available(rcu_nocb_mask) && need_rcu_nocb_mask) {
if (!zalloc_cpumask_var(&rcu_nocb_mask, GFP_KERNEL)) {
pr_info("rcu_nocb_mask allocation failed, callback offloading disabled.\n");
return;
}
- have_rcu_nocb_mask = true;
}
- if (!have_rcu_nocb_mask)
+ if (!cpumask_available(rcu_nocb_mask))
return;
#if defined(CONFIG_NO_HZ_FULL)
struct rcu_data *rdp_leader = NULL; /* Suppress misguided gcc warn. */
struct rcu_data *rdp_prev = NULL;
- if (!have_rcu_nocb_mask)
+ if (!cpumask_available(rcu_nocb_mask))
return;
if (ls == -1) {
ls = int_sqrt(nr_cpu_ids);
unsigned long flags;
raw_spin_lock_irqsave(&rq->lock, flags);
- resched_curr(rq);
+ if (cpu_online(cpu) || cpu == smp_processor_id())
+ resched_curr(rq);
raw_spin_unlock_irqrestore(&rq->lock, flags);
}
if (p->nr_cpus_allowed > 1 && rq->rt.overloaded)
queue_push_tasks(rq);
#endif /* CONFIG_SMP */
- if (p->prio < rq->curr->prio)
+ if (p->prio < rq->curr->prio && cpu_online(cpu_of(rq)))
resched_curr(rq);
}
}
*/
__do_softirq();
local_irq_enable();
- cond_resched_rcu_qs();
+ cond_resched();
return;
}
local_irq_enable();
if (unlikely(in_nmi()))
return;
- /*
- * It is possible that a function is being traced in a
- * location that RCU is not watching. A call to
- * rcu_irq_enter() will make sure that it is, but there's
- * a few internal rcu functions that could be traced
- * where that wont work either. In those cases, we just
- * do nothing.
- */
- if (unlikely(rcu_irq_enter_disabled()))
- return;
-
rcu_irq_enter_irqson();
__ftrace_trace_stack(buffer, flags, skip, pc, NULL);
rcu_irq_exit_irqson();
* this thread will never voluntarily schedule which would
* block synchronize_rcu_tasks() indefinitely.
*/
- cond_resched_rcu_qs();
+ cond_resched();
}
return 0;
}
/*
- * rcu_assign_pointer has a smp_wmb() which makes sure that the new
- * probe callbacks array is consistent before setting a pointer to it.
- * This array is referenced by __DO_TRACE from
- * include/linux/tracepoints.h. A matching smp_read_barrier_depends()
- * is used.
+ * rcu_assign_pointer has as smp_store_release() which makes sure
+ * that the new probe callbacks array is consistent before setting
+ * a pointer to it. This array is referenced by __DO_TRACE from
+ * include/linux/tracepoint.h using rcu_dereference_sched().
*/
rcu_assign_pointer(tp->funcs, tp_funcs);
if (!static_key_enabled(&tp->key))
* stop_machine. At the same time, report a quiescent RCU state so
* the same condition doesn't freeze RCU.
*/
- cond_resched_rcu_qs();
+ cond_resched();
spin_lock_irq(&pool->lock);
if (assoc_array_ptr_is_shortcut(cursor)) {
/* Descend through a shortcut */
shortcut = assoc_array_ptr_to_shortcut(cursor);
- smp_read_barrier_depends();
- cursor = READ_ONCE(shortcut->next_node);
+ cursor = READ_ONCE(shortcut->next_node); /* Address dependency. */
}
node = assoc_array_ptr_to_node(cursor);
- smp_read_barrier_depends();
slot = 0;
/* We perform two passes of each node.
*/
has_meta = 0;
for (; slot < ASSOC_ARRAY_FAN_OUT; slot++) {
- ptr = READ_ONCE(node->slots[slot]);
+ ptr = READ_ONCE(node->slots[slot]); /* Address dependency. */
has_meta |= (unsigned long)ptr;
if (ptr && assoc_array_ptr_is_leaf(ptr)) {
- /* We need a barrier between the read of the pointer
- * and dereferencing the pointer - but only if we are
- * actually going to dereference it.
+ /* We need a barrier between the read of the pointer,
+ * which is supplied by the above READ_ONCE().
*/
- smp_read_barrier_depends();
-
/* Invoke the callback */
ret = iterator(assoc_array_ptr_to_leaf(ptr),
iterator_data);
continue_node:
node = assoc_array_ptr_to_node(cursor);
- smp_read_barrier_depends();
-
for (; slot < ASSOC_ARRAY_FAN_OUT; slot++) {
- ptr = READ_ONCE(node->slots[slot]);
+ ptr = READ_ONCE(node->slots[slot]); /* Address dependency. */
if (assoc_array_ptr_is_meta(ptr)) {
cursor = ptr;
goto begin_node;
finished_node:
/* Move up to the parent (may need to skip back over a shortcut) */
- parent = READ_ONCE(node->back_pointer);
+ parent = READ_ONCE(node->back_pointer); /* Address dependency. */
slot = node->parent_slot;
if (parent == stop)
return 0;
if (assoc_array_ptr_is_shortcut(parent)) {
shortcut = assoc_array_ptr_to_shortcut(parent);
- smp_read_barrier_depends();
cursor = parent;
- parent = READ_ONCE(shortcut->back_pointer);
+ parent = READ_ONCE(shortcut->back_pointer); /* Address dependency. */
slot = shortcut->parent_slot;
if (parent == stop)
return 0;
void *iterator_data),
void *iterator_data)
{
- struct assoc_array_ptr *root = READ_ONCE(array->root);
+ struct assoc_array_ptr *root = READ_ONCE(array->root); /* Address dependency. */
if (!root)
return 0;
pr_devel("-->%s()\n", __func__);
- cursor = READ_ONCE(array->root);
+ cursor = READ_ONCE(array->root); /* Address dependency. */
if (!cursor)
return assoc_array_walk_tree_empty;
consider_node:
node = assoc_array_ptr_to_node(cursor);
- smp_read_barrier_depends();
-
slot = segments >> (level & ASSOC_ARRAY_KEY_CHUNK_MASK);
slot &= ASSOC_ARRAY_FAN_MASK;
- ptr = READ_ONCE(node->slots[slot]);
+ ptr = READ_ONCE(node->slots[slot]); /* Address dependency. */
pr_devel("consider slot %x [ix=%d type=%lu]\n",
slot, level, (unsigned long)ptr & 3);
cursor = ptr;
follow_shortcut:
shortcut = assoc_array_ptr_to_shortcut(cursor);
- smp_read_barrier_depends();
pr_devel("shortcut to %d\n", shortcut->skip_to_level);
sc_level = level + ASSOC_ARRAY_LEVEL_STEP;
BUG_ON(sc_level > shortcut->skip_to_level);
} while (sc_level < shortcut->skip_to_level);
/* The shortcut matches the leaf's index to this point. */
- cursor = READ_ONCE(shortcut->next_node);
+ cursor = READ_ONCE(shortcut->next_node); /* Address dependency. */
if (((level ^ sc_level) & ~ASSOC_ARRAY_KEY_CHUNK_MASK) != 0) {
level = sc_level;
goto jumped;
return NULL;
node = result.terminal_node.node;
- smp_read_barrier_depends();
/* If the target key is available to us, it's has to be pointed to by
* the terminal node.
*/
for (slot = 0; slot < ASSOC_ARRAY_FAN_OUT; slot++) {
- ptr = READ_ONCE(node->slots[slot]);
+ ptr = READ_ONCE(node->slots[slot]); /* Address dependency. */
if (ptr && assoc_array_ptr_is_leaf(ptr)) {
/* We need a barrier between the read of the pointer
* and dereferencing the pointer - but only if we are
* actually going to dereference it.
*/
leaf = assoc_array_ptr_to_leaf(ptr);
- smp_read_barrier_depends();
if (ops->compare_object(leaf, index_key))
return (void *)leaf;
}
atomic_long_add(PERCPU_COUNT_BIAS, &ref->count);
/*
- * Restore per-cpu operation. smp_store_release() is paired with
- * smp_read_barrier_depends() in __ref_is_percpu() and guarantees
- * that the zeroing is visible to all percpu accesses which can see
- * the following __PERCPU_REF_ATOMIC clearing.
+ * Restore per-cpu operation. smp_store_release() is paired
+ * with READ_ONCE() in __ref_is_percpu() and guarantees that the
+ * zeroing is visible to all percpu accesses which can see the
+ * following __PERCPU_REF_ATOMIC clearing.
*/
for_each_possible_cpu(cpu)
*per_cpu_ptr(percpu_count, cpu) = 0;
expected_mapping = (void *)((unsigned long)stable_node |
PAGE_MAPPING_KSM);
again:
- kpfn = READ_ONCE(stable_node->kpfn);
+ kpfn = READ_ONCE(stable_node->kpfn); /* Address dependency. */
page = pfn_to_page(kpfn);
-
- /*
- * page is computed from kpfn, so on most architectures reading
- * page->mapping is naturally ordered after reading node->kpfn,
- * but on Alpha we need to be more careful.
- */
- smp_read_barrier_depends();
if (READ_ONCE(page->mapping) != expected_mapping)
goto stale;
/* Ignore errors */
mlock_fixup(vma, &prev, vma->vm_start, vma->vm_end, newflags);
- cond_resched_rcu_qs();
+ cond_resched();
}
out:
return 0;
local_bh_disable();
addend = xt_write_recseq_begin();
- private = table->private;
+ private = READ_ONCE(table->private); /* Address dependency. */
cpu = smp_processor_id();
- /*
- * Ensure we load private-> members after we've fetched the base
- * pointer.
- */
- smp_read_barrier_depends();
table_base = private->entries;
jumpstack = (struct arpt_entry **)private->jumpstack[cpu];
WARN_ON(!(table->valid_hooks & (1 << hook)));
local_bh_disable();
addend = xt_write_recseq_begin();
- private = table->private;
+ private = READ_ONCE(table->private); /* Address dependency. */
cpu = smp_processor_id();
- /*
- * Ensure we load private-> members after we've fetched the base
- * pointer.
- */
- smp_read_barrier_depends();
table_base = private->entries;
jumpstack = (struct ipt_entry **)private->jumpstack[cpu];
local_bh_disable();
addend = xt_write_recseq_begin();
- private = table->private;
- /*
- * Ensure we load private-> members after we've fetched the base
- * pointer.
- */
- smp_read_barrier_depends();
+ private = READ_ONCE(table->private); /* Address dependency. */
cpu = smp_processor_id();
table_base = private->entries;
jumpstack = (struct ip6t_entry **)private->jumpstack[cpu];
* we will just continue with next hash slot.
*/
rcu_read_unlock();
- cond_resched_rcu_qs();
+ cond_resched();
} while (++buckets < goal);
if (gc_work->exiting)
}
}
+# check for smp_read_barrier_depends and read_barrier_depends
+ if (!$file && $line =~ /\b(smp_|)read_barrier_depends\s*\(/) {
+ WARN("READ_BARRIER_DEPENDS",
+ "$1read_barrier_depends should only be used in READ_ONCE or DEC Alpha code\n" . $herecurr);
+ }
+
# check of hardware specific defines
if ($line =~ m@^.\s*\#\s*if.*\b(__i386__|__powerpc64__|__sun__|__s390x__)\b@ && $realfile !~ m@include/asm-@) {
CHK("ARCH_DEFINES",
* doesn't contain any keyring pointers.
*/
shortcut = assoc_array_ptr_to_shortcut(ptr);
- smp_read_barrier_depends();
if ((shortcut->index_key[0] & ASSOC_ARRAY_FAN_MASK) != 0)
goto not_this_keyring;
}
node = assoc_array_ptr_to_node(ptr);
- smp_read_barrier_depends();
-
ptr = node->slots[0];
if (!assoc_array_ptr_is_meta(ptr))
goto begin_node;
kdebug("descend");
if (assoc_array_ptr_is_shortcut(ptr)) {
shortcut = assoc_array_ptr_to_shortcut(ptr);
- smp_read_barrier_depends();
ptr = READ_ONCE(shortcut->next_node);
BUG_ON(!assoc_array_ptr_is_node(ptr));
}
begin_node:
kdebug("begin_node");
- smp_read_barrier_depends();
slot = 0;
ascend_to_node:
/* Go through the slots in a node */
if (ptr && assoc_array_ptr_is_shortcut(ptr)) {
shortcut = assoc_array_ptr_to_shortcut(ptr);
- smp_read_barrier_depends();
ptr = READ_ONCE(shortcut->back_pointer);
slot = shortcut->parent_slot;
}
if (!ptr)
goto not_this_keyring;
node = assoc_array_ptr_to_node(ptr);
- smp_read_barrier_depends();
slot++;
/* If we've ascended to the root (zero backpointer), we must have just