Documentation/locking/seqlock.rst

   1 ======================================
   2 Sequence counters and sequential locks
   3 ======================================
   4
   5 Introduction
   6 ============
   7
   8 Sequence counters are a reader-writer consistency mechanism with
   9 lockless readers (read-only retry loops), and no writer starvation. They
  10 are used for data that's rarely written to (e.g. system time), where the
  11 reader wants a consistent set of information and is willing to retry if
  12 that information changes.
  13
  14 A data set is consistent when the sequence count at the beginning of the
  15 read side critical section is even and the same sequence count value is
  16 read again at the end of the critical section. The data in the set must
  17 be copied out inside the read side critical section. If the sequence
  18 count has changed between the start and the end of the critical section,
  19 the reader must retry.
  20
  21 Writers increment the sequence count at the start and the end of their
  22 critical section. After starting the critical section the sequence count
  23 is odd and indicates to the readers that an update is in progress. At
  24 the end of the write side critical section the sequence count becomes
  25 even again which lets readers make progress.
  26
  27 A sequence counter write side critical section must never be preempted
  28 or interrupted by read side sections. Otherwise the reader will spin for
  29 the entire scheduler tick due to the odd sequence count value and the
  30 interrupted writer. If that reader belongs to a real-time scheduling
  31 class, it can spin forever and the kernel will livelock.
  32
  33 This mechanism cannot be used if the protected data contains pointers,
  34 as the writer can invalidate a pointer that the reader is following.
  35
  36
  37 .. _seqcount_t:
  38
  39 Sequence counters (``seqcount_t``)
  40 ==================================
  41
  42 This is the the raw counting mechanism, which does not protect against
  43 multiple writers.  Write side critical sections must thus be serialized
  44 by an external lock.
  45
  46 If the write serialization primitive is not implicitly disabling
  47 preemption, preemption must be explicitly disabled before entering the
  48 write side section. If the read section can be invoked from hardirq or
  49 softirq contexts, interrupts or bottom halves must also be respectively
  50 disabled before entering the write section.
  51
  52 If it's desired to automatically handle the sequence counter
  53 requirements of writer serialization and non-preemptibility, use
  54 :ref:`seqlock_t` instead.
  55
  56 Initialization::
  57
  58         /* dynamic */
  59         seqcount_t foo_seqcount;
  60         seqcount_init(&foo_seqcount);
  61
  62         /* static */
  63         static seqcount_t foo_seqcount = SEQCNT_ZERO(foo_seqcount);
  64
  65         /* C99 struct init */
  66         struct {
  67                 .seq   = SEQCNT_ZERO(foo.seq),
  68         } foo;
  69
  70 Write path::
  71
  72         /* Serialized context with disabled preemption */
  73
  74         write_seqcount_begin(&foo_seqcount);
  75
  76         /* ... [[write-side critical section]] ... */
  77
  78         write_seqcount_end(&foo_seqcount);
  79
  80 Read path::
  81
  82         do {
  83                 seq = read_seqcount_begin(&foo_seqcount);
  84
  85                 /* ... [[read-side critical section]] ... */
  86
  87         } while (read_seqcount_retry(&foo_seqcount, seq));
  88
  89
  90 .. _seqcount_locktype_t:
  91
  92 Sequence counters with associated locks (``seqcount_LOCKTYPE_t``)
  93 -----------------------------------------------------------------
  94
  95 As discussed at :ref:`seqcount_t`, sequence count write side critical
  96 sections must be serialized and non-preemptible. This variant of
  97 sequence counters associate the lock used for writer serialization at
  98 initialization time, which enables lockdep to validate that the write
  99 side critical sections are properly serialized.
 100
 101 This lock association is a NOOP if lockdep is disabled and has neither
 102 storage nor runtime overhead. If lockdep is enabled, the lock pointer is
 103 stored in struct seqcount and lockdep's "lock is held" assertions are
 104 injected at the beginning of the write side critical section to validate
 105 that it is properly protected.
 106
 107 For lock types which do not implicitly disable preemption, preemption
 108 protection is enforced in the write side function.
 109
 110 The following sequence counters with associated locks are defined:
 111
 112   - ``seqcount_spinlock_t``
 113   - ``seqcount_raw_spinlock_t``
 114   - ``seqcount_rwlock_t``
 115   - ``seqcount_mutex_t``
 116   - ``seqcount_ww_mutex_t``
 117
 118 The plain seqcount read and write APIs branch out to the specific
 119 seqcount_LOCKTYPE_t implementation at compile-time. This avoids kernel
 120 API explosion per each new seqcount LOCKTYPE.
 121
 122 Initialization (replace "LOCKTYPE" with one of the supported locks)::
 123
 124         /* dynamic */
 125         seqcount_LOCKTYPE_t foo_seqcount;
 126         seqcount_LOCKTYPE_init(&foo_seqcount, &lock);
 127
 128         /* static */
 129         static seqcount_LOCKTYPE_t foo_seqcount =
 130                 SEQCNT_LOCKTYPE_ZERO(foo_seqcount, &lock);
 131
 132         /* C99 struct init */
 133         struct {
 134                 .seq   = SEQCNT_LOCKTYPE_ZERO(foo.seq, &lock),
 135         } foo;
 136
 137 Write path: same as in :ref:`seqcount_t`, while running from a context
 138 with the associated LOCKTYPE lock acquired.
 139
 140 Read path: same as in :ref:`seqcount_t`.
 141
 142 .. _seqlock_t:
 143
 144 Sequential locks (``seqlock_t``)
 145 ================================
 146
 147 This contains the :ref:`seqcount_t` mechanism earlier discussed, plus an
 148 embedded spinlock for writer serialization and non-preemptibility.
 149
 150 If the read side section can be invoked from hardirq or softirq context,
 151 use the write side function variants which disable interrupts or bottom
 152 halves respectively.
 153
 154 Initialization::
 155
 156         /* dynamic */
 157         seqlock_t foo_seqlock;
 158         seqlock_init(&foo_seqlock);
 159
 160         /* static */
 161         static DEFINE_SEQLOCK(foo_seqlock);
 162
 163         /* C99 struct init */
 164         struct {
 165                 .seql   = __SEQLOCK_UNLOCKED(foo.seql)
 166         } foo;
 167
 168 Write path::
 169
 170         write_seqlock(&foo_seqlock);
 171
 172         /* ... [[write-side critical section]] ... */
 173
 174         write_sequnlock(&foo_seqlock);
 175
 176 Read path, three categories:
 177
 178 1. Normal Sequence readers which never block a writer but they must
 179    retry if a writer is in progress by detecting change in the sequence
 180    number.  Writers do not wait for a sequence reader::
 181
 182         do {
 183                 seq = read_seqbegin(&foo_seqlock);
 184
 185                 /* ... [[read-side critical section]] ... */
 186
 187         } while (read_seqretry(&foo_seqlock, seq));
 188
 189 2. Locking readers which will wait if a writer or another locking reader
 190    is in progress. A locking reader in progress will also block a writer
 191    from entering its critical section. This read lock is
 192    exclusive. Unlike rwlock_t, only one locking reader can acquire it::
 193
 194         read_seqlock_excl(&foo_seqlock);
 195
 196         /* ... [[read-side critical section]] ... */
 197
 198         read_sequnlock_excl(&foo_seqlock);
 199
 200 3. Conditional lockless reader (as in 1), or locking reader (as in 2),
 201    according to a passed marker. This is used to avoid lockless readers
 202    starvation (too much retry loops) in case of a sharp spike in write
 203    activity. First, a lockless read is tried (even marker passed). If
 204    that trial fails (odd sequence counter is returned, which is used as
 205    the next iteration marker), the lockless read is transformed to a
 206    full locking read and no retry loop is necessary::
 207
 208         /* marker; even initialization */
 209         int seq = 0;
 210         do {
 211                 read_seqbegin_or_lock(&foo_seqlock, &seq);
 212
 213                 /* ... [[read-side critical section]] ... */
 214
 215         } while (need_seqretry(&foo_seqlock, seq));
 216         done_seqretry(&foo_seqlock, seq);
 217
 218
 219 API documentation
 220 =================
 221
 222 .. kernel-doc:: include/linux/seqlock.h