1 .. SPDX-License-Identifier: GPL-2.0
3 .. _kernel_hacking_locktypes:
5 ==========================
6 Lock types and their rules
7 ==========================
12 The kernel provides a variety of locking primitives which can be divided
18 This document conceptually describes these lock types and provides rules
19 for their nesting, including the rules for use under PREEMPT_RT.
28 Sleeping locks can only be acquired in preemptible task context.
30 Although implementations allow try_lock() from other contexts, it is
31 necessary to carefully evaluate the safety of unlock() as well as of
32 try_lock(). Furthermore, it is also necessary to evaluate the debugging
33 versions of these primitives. In short, don't acquire sleeping locks from
34 other contexts unless there is no other option.
45 On PREEMPT_RT kernels, these lock types are converted to sleeping locks:
56 On non-PREEMPT_RT kernels, these lock types are also spinning locks:
61 Spinning locks implicitly disable preemption and the lock / unlock functions
62 can have suffixes which apply further protections:
64 =================== ====================================================
65 _bh() Disable / enable bottom halves (soft interrupts)
66 _irq() Disable / enable interrupts
67 _irqsave/restore() Save and disable / restore interrupt disabled state
68 =================== ====================================================
73 The aforementioned lock types except semaphores have strict owner
76 The context (task) that acquired the lock must release it.
78 rw_semaphores have a special interface which allows non-owner release for
85 RT-mutexes are mutexes with support for priority inheritance (PI).
87 PI has limitations on non-PREEMPT_RT kernels due to preemption and
88 interrupt disabled sections.
90 PI clearly cannot preempt preemption-disabled or interrupt-disabled
91 regions of code, even on PREEMPT_RT kernels. Instead, PREEMPT_RT kernels
92 execute most such regions of code in preemptible task context, especially
93 interrupt handlers and soft interrupts. This conversion allows spinlock_t
94 and rwlock_t to be implemented via RT-mutexes.
100 semaphore is a counting semaphore implementation.
102 Semaphores are often used for both serialization and waiting, but new use
103 cases should instead use separate serialization and wait mechanisms, such
104 as mutexes and completions.
106 semaphores and PREEMPT_RT
107 ----------------------------
109 PREEMPT_RT does not change the semaphore implementation because counting
110 semaphores have no concept of owners, thus preventing PREEMPT_RT from
111 providing priority inheritance for semaphores. After all, an unknown
112 owner cannot be boosted. As a consequence, blocking on semaphores can
113 result in priority inversion.
119 rw_semaphore is a multiple readers and single writer lock mechanism.
121 On non-PREEMPT_RT kernels the implementation is fair, thus preventing
124 rw_semaphore complies by default with the strict owner semantics, but there
125 exist special-purpose interfaces that allow non-owner release for readers.
126 These interfaces work independent of the kernel configuration.
128 rw_semaphore and PREEMPT_RT
129 ---------------------------
131 PREEMPT_RT kernels map rw_semaphore to a separate rt_mutex-based
132 implementation, thus changing the fairness:
134 Because an rw_semaphore writer cannot grant its priority to multiple
135 readers, a preempted low-priority reader will continue holding its lock,
136 thus starving even high-priority writers. In contrast, because readers
137 can grant their priority to a writer, a preempted low-priority writer will
138 have its priority boosted until it releases the lock, thus preventing that
139 writer from starving readers.
142 raw_spinlock_t and spinlock_t
143 =============================
148 raw_spinlock_t is a strict spinning lock implementation regardless of the
149 kernel configuration including PREEMPT_RT enabled kernels.
151 raw_spinlock_t is a strict spinning lock implementation in all kernels,
152 including PREEMPT_RT kernels. Use raw_spinlock_t only in real critical
153 core code, low-level interrupt handling and places where disabling
154 preemption or interrupts is required, for example, to safely access
155 hardware state. raw_spinlock_t can sometimes also be used when the
156 critical section is tiny, thus avoiding RT-mutex overhead.
161 The semantics of spinlock_t change with the state of PREEMPT_RT.
163 On a non-PREEMPT_RT kernel spinlock_t is mapped to raw_spinlock_t and has
164 exactly the same semantics.
166 spinlock_t and PREEMPT_RT
167 -------------------------
169 On a PREEMPT_RT kernel spinlock_t is mapped to a separate implementation
170 based on rt_mutex which changes the semantics:
172 - Preemption is not disabled.
174 - The hard interrupt related suffixes for spin_lock / spin_unlock
175 operations (_irq, _irqsave / _irqrestore) do not affect the CPU's
176 interrupt disabled state.
178 - The soft interrupt related suffix (_bh()) still disables softirq
181 Non-PREEMPT_RT kernels disable preemption to get this effect.
183 PREEMPT_RT kernels use a per-CPU lock for serialization which keeps
184 preemption disabled. The lock disables softirq handlers and also
185 prevents reentrancy due to task preemption.
187 PREEMPT_RT kernels preserve all other spinlock_t semantics:
189 - Tasks holding a spinlock_t do not migrate. Non-PREEMPT_RT kernels
190 avoid migration by disabling preemption. PREEMPT_RT kernels instead
191 disable migration, which ensures that pointers to per-CPU variables
192 remain valid even if the task is preempted.
194 - Task state is preserved across spinlock acquisition, ensuring that the
195 task-state rules apply to all kernel configurations. Non-PREEMPT_RT
196 kernels leave task state untouched. However, PREEMPT_RT must change
197 task state if the task blocks during acquisition. Therefore, it saves
198 the current task state before blocking and the corresponding lock wakeup
199 restores it, as shown below::
201 task->state = TASK_INTERRUPTIBLE
204 task->saved_state = task->state
205 task->state = TASK_UNINTERRUPTIBLE
208 task->state = task->saved_state
210 Other types of wakeups would normally unconditionally set the task state
211 to RUNNING, but that does not work here because the task must remain
212 blocked until the lock becomes available. Therefore, when a non-lock
213 wakeup attempts to awaken a task blocked waiting for a spinlock, it
214 instead sets the saved state to RUNNING. Then, when the lock
215 acquisition completes, the lock wakeup sets the task state to the saved
216 state, in this case setting it to RUNNING::
218 task->state = TASK_INTERRUPTIBLE
221 task->saved_state = task->state
222 task->state = TASK_UNINTERRUPTIBLE
225 task->saved_state = TASK_RUNNING
228 task->state = task->saved_state
230 This ensures that the real wakeup cannot be lost.
236 rwlock_t is a multiple readers and single writer lock mechanism.
238 Non-PREEMPT_RT kernels implement rwlock_t as a spinning lock and the
239 suffix rules of spinlock_t apply accordingly. The implementation is fair,
240 thus preventing writer starvation.
242 rwlock_t and PREEMPT_RT
243 -----------------------
245 PREEMPT_RT kernels map rwlock_t to a separate rt_mutex-based
246 implementation, thus changing semantics:
248 - All the spinlock_t changes also apply to rwlock_t.
250 - Because an rwlock_t writer cannot grant its priority to multiple
251 readers, a preempted low-priority reader will continue holding its lock,
252 thus starving even high-priority writers. In contrast, because readers
253 can grant their priority to a writer, a preempted low-priority writer
254 will have its priority boosted until it releases the lock, thus
255 preventing that writer from starving readers.
261 spinlock_t and rwlock_t
262 -----------------------
264 These changes in spinlock_t and rwlock_t semantics on PREEMPT_RT kernels
265 have a few implications. For example, on a non-PREEMPT_RT kernel the
266 following code sequence works as expected::
271 and is fully equivalent to::
273 spin_lock_irq(&lock);
275 Same applies to rwlock_t and the _irqsave() suffix variants.
277 On PREEMPT_RT kernel this code sequence breaks because RT-mutex requires a
278 fully preemptible context. Instead, use spin_lock_irq() or
279 spin_lock_irqsave() and their unlock counterparts. In cases where the
280 interrupt disabling and locking must remain separate, PREEMPT_RT offers a
281 local_lock mechanism. Acquiring the local_lock pins the task to a CPU,
282 allowing things like per-CPU interrupt disabled locks to be acquired.
283 However, this approach should be used only where absolutely necessary.
289 Acquiring a raw_spinlock_t disables preemption and possibly also
290 interrupts, so the critical section must avoid acquiring a regular
291 spinlock_t or rwlock_t, for example, the critical section must avoid
292 allocating memory. Thus, on a non-PREEMPT_RT kernel the following code
295 raw_spin_lock(&lock);
296 p = kmalloc(sizeof(*p), GFP_ATOMIC);
298 But this code fails on PREEMPT_RT kernels because the memory allocator is
299 fully preemptible and therefore cannot be invoked from truly atomic
300 contexts. However, it is perfectly fine to invoke the memory allocator
301 while holding normal non-raw spinlocks because they do not disable
302 preemption on PREEMPT_RT kernels::
305 p = kmalloc(sizeof(*p), GFP_ATOMIC);
311 PREEMPT_RT cannot substitute bit spinlocks because a single bit is too
312 small to accommodate an RT-mutex. Therefore, the semantics of bit
313 spinlocks are preserved on PREEMPT_RT kernels, so that the raw_spinlock_t
314 caveats also apply to bit spinlocks.
316 Some bit spinlocks are replaced with regular spinlock_t for PREEMPT_RT
317 using conditional (#ifdef'ed) code changes at the usage site. In contrast,
318 usage-site changes are not needed for the spinlock_t substitution.
319 Instead, conditionals in header files and the core locking implemementation
320 enable the compiler to do the substitution transparently.
323 Lock type nesting rules
324 =======================
326 The most basic rules are:
328 - Lock types of the same lock category (sleeping, spinning) can nest
329 arbitrarily as long as they respect the general lock ordering rules to
332 - Sleeping lock types cannot nest inside spinning lock types.
334 - Spinning lock types can nest inside sleeping lock types.
336 These constraints apply both in PREEMPT_RT and otherwise.
338 The fact that PREEMPT_RT changes the lock category of spinlock_t and
339 rwlock_t from spinning to sleeping means that they cannot be acquired while
340 holding a raw spinlock. This results in the following nesting ordering:
343 2) spinlock_t and rwlock_t
344 3) raw_spinlock_t and bit spinlocks
346 Lockdep will complain if these constraints are violated, both in
347 PREEMPT_RT and otherwise.