1 Microarchitectural Data Sampling (MDS) mitigation
2 =================================================
9 Microarchitectural Data Sampling (MDS) is a family of side channel attacks
10 on internal buffers in Intel CPUs. The variants are:
12 - Microarchitectural Store Buffer Data Sampling (MSBDS) (CVE-2018-12126)
13 - Microarchitectural Fill Buffer Data Sampling (MFBDS) (CVE-2018-12130)
14 - Microarchitectural Load Port Data Sampling (MLPDS) (CVE-2018-12127)
15 - Microarchitectural Data Sampling Uncacheable Memory (MDSUM) (CVE-2019-11091)
17 MSBDS leaks Store Buffer Entries which can be speculatively forwarded to a
18 dependent load (store-to-load forwarding) as an optimization. The forward
19 can also happen to a faulting or assisting load operation for a different
20 memory address, which can be exploited under certain conditions. Store
21 buffers are partitioned between Hyper-Threads so cross thread forwarding is
22 not possible. But if a thread enters or exits a sleep state the store
23 buffer is repartitioned which can expose data from one thread to the other.
25 MFBDS leaks Fill Buffer Entries. Fill buffers are used internally to manage
26 L1 miss situations and to hold data which is returned or sent in response
27 to a memory or I/O operation. Fill buffers can forward data to a load
28 operation and also write data to the cache. When the fill buffer is
29 deallocated it can retain the stale data of the preceding operations which
30 can then be forwarded to a faulting or assisting load operation, which can
31 be exploited under certain conditions. Fill buffers are shared between
32 Hyper-Threads so cross thread leakage is possible.
34 MLPDS leaks Load Port Data. Load ports are used to perform load operations
35 from memory or I/O. The received data is then forwarded to the register
36 file or a subsequent operation. In some implementations the Load Port can
37 contain stale data from a previous operation which can be forwarded to
38 faulting or assisting loads under certain conditions, which again can be
39 exploited eventually. Load ports are shared between Hyper-Threads so cross
40 thread leakage is possible.
42 MDSUM is a special case of MSBDS, MFBDS and MLPDS. An uncacheable load from
43 memory that takes a fault or assist can leave data in a microarchitectural
44 structure that may later be observed using one of the same methods used by
45 MSBDS, MFBDS or MLPDS.
50 It is assumed that attack code resides in user space or in a guest with one
51 exception. The rationale behind this assumption is that the code construct
52 needed for exploiting MDS requires:
54 - to control the load to trigger a fault or assist
56 - to have a disclosure gadget which exposes the speculatively accessed
57 data for consumption through a side channel.
59 - to control the pointer through which the disclosure gadget exposes the
62 The existence of such a construct in the kernel cannot be excluded with
63 100% certainty, but the complexity involved makes it extremly unlikely.
65 There is one exception, which is untrusted BPF. The functionality of
66 untrusted BPF is limited, but it needs to be thoroughly investigated
67 whether it can be used to create such a construct.
73 All variants have the same mitigation strategy at least for the single CPU
74 thread case (SMT off): Force the CPU to clear the affected buffers.
76 This is achieved by using the otherwise unused and obsolete VERW
77 instruction in combination with a microcode update. The microcode clears
78 the affected CPU buffers when the VERW instruction is executed.
80 For virtualization there are two ways to achieve CPU buffer
81 clearing. Either the modified VERW instruction or via the L1D Flush
82 command. The latter is issued when L1TF mitigation is enabled so the extra
83 VERW can be avoided. If the CPU is not affected by L1TF then VERW needs to
86 If the VERW instruction with the supplied segment selector argument is
87 executed on a CPU without the microcode update there is no side effect
88 other than a small number of pointlessly wasted CPU cycles.
90 This does not protect against cross Hyper-Thread attacks except for MSBDS
91 which is only exploitable cross Hyper-thread when one of the Hyper-Threads
94 The kernel provides a function to invoke the buffer clearing:
96 mds_clear_cpu_buffers()
98 The mitigation is invoked on kernel/userspace, hypervisor/guest and C-state
101 As a special quirk to address virtualization scenarios where the host has
102 the microcode updated, but the hypervisor does not (yet) expose the
103 MD_CLEAR CPUID bit to guests, the kernel issues the VERW instruction in the
104 hope that it might actually clear the buffers. The state is reflected
107 According to current knowledge additional mitigations inside the kernel
108 itself are not required because the necessary gadgets to expose the leaked
109 data cannot be controlled in a way which allows exploitation from malicious
110 user space or VM guests.
112 Kernel internal mitigation modes
113 --------------------------------
115 ======= ============================================================
116 off Mitigation is disabled. Either the CPU is not affected or
117 mds=off is supplied on the kernel command line
119 full Mitigation is enabled. CPU is affected and MD_CLEAR is
122 vmwerv Mitigation is enabled. CPU is affected and MD_CLEAR is not
123 advertised in CPUID. That is mainly for virtualization
124 scenarios where the host has the updated microcode but the
125 hypervisor does not expose MD_CLEAR in CPUID. It's a best
126 effort approach without guarantee.
127 ======= ============================================================
129 If the CPU is affected and mds=off is not supplied on the kernel command
130 line then the kernel selects the appropriate mitigation mode depending on
131 the availability of the MD_CLEAR CPUID bit.
136 1. Return to user space
137 ^^^^^^^^^^^^^^^^^^^^^^^
139 When transitioning from kernel to user space the CPU buffers are flushed
140 on affected CPUs when the mitigation is not disabled on the kernel
141 command line. The migitation is enabled through the static key
144 The mitigation is invoked in prepare_exit_to_usermode() which covers
145 all but one of the kernel to user space transitions. The exception
146 is when we return from a Non Maskable Interrupt (NMI), which is
147 handled directly in do_nmi().
149 (The reason that NMI is special is that prepare_exit_to_usermode() can
150 enable IRQs. In NMI context, NMIs are blocked, and we don't want to
151 enable IRQs with NMIs blocked.)
154 2. C-State transition
155 ^^^^^^^^^^^^^^^^^^^^^
157 When a CPU goes idle and enters a C-State the CPU buffers need to be
158 cleared on affected CPUs when SMT is active. This addresses the
159 repartitioning of the store buffer when one of the Hyper-Threads enters
162 When SMT is inactive, i.e. either the CPU does not support it or all
163 sibling threads are offline CPU buffer clearing is not required.
165 The idle clearing is enabled on CPUs which are only affected by MSBDS
166 and not by any other MDS variant. The other MDS variants cannot be
167 protected against cross Hyper-Thread attacks because the Fill Buffer and
168 the Load Ports are shared. So on CPUs affected by other variants, the
169 idle clearing would be a window dressing exercise and is therefore not
172 The invocation is controlled by the static key mds_idle_clear which is
173 switched depending on the chosen mitigation mode and the SMT state of
176 The buffer clear is only invoked before entering the C-State to prevent
177 that stale data from the idling CPU from spilling to the Hyper-Thread
178 sibling after the store buffer got repartitioned and all entries are
179 available to the non idle sibling.
181 When coming out of idle the store buffer is partitioned again so each
182 sibling has half of it available. The back from idle CPU could be then
183 speculatively exposed to contents of the sibling. The buffers are
184 flushed either on exit to user space or on VMENTER so malicious code
185 in user space or the guest cannot speculatively access them.
187 The mitigation is hooked into all variants of halt()/mwait(), but does
188 not cover the legacy ACPI IO-Port mechanism because the ACPI idle driver
189 has been superseded by the intel_idle driver around 2010 and is
190 preferred on all affected CPUs which are expected to gain the MD_CLEAR
191 functionality in microcode. Aside of that the IO-Port mechanism is a
192 legacy interface which is only used on older systems which are either
193 not affected or do not receive microcode updates anymore.