Documentation/bpf/instruction-set.rst

   1
   2 ====================
   3 eBPF Instruction Set
   4 ====================
   5
   6 Registers and calling convention
   7 ================================
   8
   9 eBPF has 10 general purpose registers and a read-only frame pointer register,
  10 all of which are 64-bits wide.
  11
  12 The eBPF calling convention is defined as:
  13
  14  * R0: return value from function calls, and exit value for eBPF programs
  15  * R1 - R5: arguments for function calls
  16  * R6 - R9: callee saved registers that function calls will preserve
  17  * R10: read-only frame pointer to access stack
  18
  19 R0 - R5 are scratch registers and eBPF programs needs to spill/fill them if
  20 necessary across calls.
  21
  22 Instruction encoding
  23 ====================
  24
  25 eBPF uses 64-bit instructions with the following encoding:
  26
  27  =============  =======  ===============  ====================  ============
  28  32 bits (MSB)  16 bits  4 bits           4 bits                8 bits (LSB)
  29  =============  =======  ===============  ====================  ============
  30  immediate      offset   source register  destination register  opcode
  31  =============  =======  ===============  ====================  ============
  32
  33 Note that most instructions do not use all of the fields.
  34 Unused fields shall be cleared to zero.
  35
  36 Instruction classes
  37 -------------------
  38
  39 The three LSB bits of the 'opcode' field store the instruction class:
  40
  41   =========  =====  ===============================
  42   class      value  description
  43   =========  =====  ===============================
  44   BPF_LD     0x00   non-standard load operations
  45   BPF_LDX    0x01   load into register operations
  46   BPF_ST     0x02   store from immediate operations
  47   BPF_STX    0x03   store from register operations
  48   BPF_ALU    0x04   32-bit arithmetic operations
  49   BPF_JMP    0x05   64-bit jump operations
  50   BPF_JMP32  0x06   32-bit jump operations
  51   BPF_ALU64  0x07   64-bit arithmetic operations
  52   =========  =====  ===============================
  53
  54 Arithmetic and jump instructions
  55 ================================
  56
  57 For arithmetic and jump instructions (BPF_ALU, BPF_ALU64, BPF_JMP and
  58 BPF_JMP32), the 8-bit 'opcode' field is divided into three parts:
  59
  60   ==============  ======  =================
  61   4 bits (MSB)    1 bit   3 bits (LSB)
  62   ==============  ======  =================
  63   operation code  source  instruction class
  64   ==============  ======  =================
  65
  66 The 4th bit encodes the source operand:
  67
  68   ======  =====  ========================================
  69   source  value  description
  70   ======  =====  ========================================
  71   BPF_K   0x00   use 32-bit immediate as source operand
  72   BPF_X   0x08   use 'src_reg' register as source operand
  73   ======  =====  ========================================
  74
  75 The four MSB bits store the operation code.
  76
  77
  78 Arithmetic instructions
  79 -----------------------
  80
  81 BPF_ALU uses 32-bit wide operands while BPF_ALU64 uses 64-bit wide operands for
  82 otherwise identical operations.
  83 The code field encodes the operation as below:
  84
  85   ========  =====  ==========================
  86   code      value  description
  87   ========  =====  ==========================
  88   BPF_ADD   0x00   dst += src
  89   BPF_SUB   0x10   dst -= src
  90   BPF_MUL   0x20   dst \*= src
  91   BPF_DIV   0x30   dst /= src
  92   BPF_OR    0x40   dst \|= src
  93   BPF_AND   0x50   dst &= src
  94   BPF_LSH   0x60   dst <<= src
  95   BPF_RSH   0x70   dst >>= src
  96   BPF_NEG   0x80   dst = ~src
  97   BPF_MOD   0x90   dst %= src
  98   BPF_XOR   0xa0   dst ^= src
  99   BPF_MOV   0xb0   dst = src
 100   BPF_ARSH  0xc0   sign extending shift right
 101   BPF_END   0xd0   endianness conversion
 102   ========  =====  ==========================
 103
 104 BPF_ADD | BPF_X | BPF_ALU means::
 105
 106   dst_reg = (u32) dst_reg + (u32) src_reg;
 107
 108 BPF_ADD | BPF_X | BPF_ALU64 means::
 109
 110   dst_reg = dst_reg + src_reg
 111
 112 BPF_XOR | BPF_K | BPF_ALU means::
 113
 114   src_reg = (u32) src_reg ^ (u32) imm32
 115
 116 BPF_XOR | BPF_K | BPF_ALU64 means::
 117
 118   src_reg = src_reg ^ imm32
 119
 120
 121 Jump instructions
 122 -----------------
 123
 124 BPF_JMP32 uses 32-bit wide operands while BPF_JMP uses 64-bit wide operands for
 125 otherwise identical operations.
 126 The code field encodes the operation as below:
 127
 128   ========  =====  =========================  ============
 129   code      value  description                notes
 130   ========  =====  =========================  ============
 131   BPF_JA    0x00   PC += off                  BPF_JMP only
 132   BPF_JEQ   0x10   PC += off if dst == src
 133   BPF_JGT   0x20   PC += off if dst > src     unsigned
 134   BPF_JGE   0x30   PC += off if dst >= src    unsigned
 135   BPF_JSET  0x40   PC += off if dst & src
 136   BPF_JNE   0x50   PC += off if dst != src
 137   BPF_JSGT  0x60   PC += off if dst > src     signed
 138   BPF_JSGE  0x70   PC += off if dst >= src    signed
 139   BPF_CALL  0x80   function call
 140   BPF_EXIT  0x90   function / program return  BPF_JMP only
 141   BPF_JLT   0xa0   PC += off if dst < src     unsigned
 142   BPF_JLE   0xb0   PC += off if dst <= src    unsigned
 143   BPF_JSLT  0xc0   PC += off if dst < src     signed
 144   BPF_JSLE  0xd0   PC += off if dst <= src    signed
 145   ========  =====  =========================  ============
 146
 147 The eBPF program needs to store the return value into register R0 before doing a
 148 BPF_EXIT.
 149
 150
 151 Load and store instructions
 152 ===========================
 153
 154 For load and store instructions (BPF_LD, BPF_LDX, BPF_ST and BPF_STX), the
 155 8-bit 'opcode' field is divided as:
 156
 157   ============  ======  =================
 158   3 bits (MSB)  2 bits  3 bits (LSB)
 159   ============  ======  =================
 160   mode          size    instruction class
 161   ============  ======  =================
 162
 163 The size modifier is one of:
 164
 165   =============  =====  =====================
 166   size modifier  value  description
 167   =============  =====  =====================
 168   BPF_W          0x00   word        (4 bytes)
 169   BPF_H          0x08   half word   (2 bytes)
 170   BPF_B          0x10   byte
 171   BPF_DW         0x18   double word (8 bytes)
 172   =============  =====  =====================
 173
 174 The mode modifier is one of:
 175
 176   =============  =====  ====================================
 177   mode modifier  value  description
 178   =============  =====  ====================================
 179   BPF_IMM        0x00   used for 64-bit mov
 180   BPF_ABS        0x20   legacy BPF packet access
 181   BPF_IND        0x40   legacy BPF packet access
 182   BPF_MEM        0x60   all normal load and store operations
 183   BPF_ATOMIC     0xc0   atomic operations
 184   =============  =====  ====================================
 185
 186 BPF_MEM | <size> | BPF_STX means::
 187
 188   *(size *) (dst_reg + off) = src_reg
 189
 190 BPF_MEM | <size> | BPF_ST means::
 191
 192   *(size *) (dst_reg + off) = imm32
 193
 194 BPF_MEM | <size> | BPF_LDX means::
 195
 196   dst_reg = *(size *) (src_reg + off)
 197
 198 Where size is one of: BPF_B or BPF_H or BPF_W or BPF_DW.
 199
 200 Atomic operations
 201 -----------------
 202
 203 eBPF includes atomic operations, which use the immediate field for extra
 204 encoding::
 205
 206    .imm = BPF_ADD, .code = BPF_ATOMIC | BPF_W  | BPF_STX: lock xadd *(u32 *)(dst_reg + off16) += src_reg
 207    .imm = BPF_ADD, .code = BPF_ATOMIC | BPF_DW | BPF_STX: lock xadd *(u64 *)(dst_reg + off16) += src_reg
 208
 209 The basic atomic operations supported are::
 210
 211     BPF_ADD
 212     BPF_AND
 213     BPF_OR
 214     BPF_XOR
 215
 216 Each having equivalent semantics with the ``BPF_ADD`` example, that is: the
 217 memory location addresed by ``dst_reg + off`` is atomically modified, with
 218 ``src_reg`` as the other operand. If the ``BPF_FETCH`` flag is set in the
 219 immediate, then these operations also overwrite ``src_reg`` with the
 220 value that was in memory before it was modified.
 221
 222 The more special operations are::
 223
 224     BPF_XCHG
 225
 226 This atomically exchanges ``src_reg`` with the value addressed by ``dst_reg +
 227 off``. ::
 228
 229     BPF_CMPXCHG
 230
 231 This atomically compares the value addressed by ``dst_reg + off`` with
 232 ``R0``. If they match it is replaced with ``src_reg``. In either case, the
 233 value that was there before is zero-extended and loaded back to ``R0``.
 234
 235 Note that 1 and 2 byte atomic operations are not supported.
 236
 237 Clang can generate atomic instructions by default when ``-mcpu=v3`` is
 238 enabled. If a lower version for ``-mcpu`` is set, the only atomic instruction
 239 Clang can generate is ``BPF_ADD`` *without* ``BPF_FETCH``. If you need to enable
 240 the atomics features, while keeping a lower ``-mcpu`` version, you can use
 241 ``-Xclang -target-feature -Xclang +alu32``.
 242
 243 You may encounter ``BPF_XADD`` - this is a legacy name for ``BPF_ATOMIC``,
 244 referring to the exclusive-add operation encoded when the immediate field is
 245 zero.
 246
 247 16-byte instructions
 248 --------------------
 249
 250 eBPF has one 16-byte instruction: ``BPF_LD | BPF_DW | BPF_IMM`` which consists
 251 of two consecutive ``struct bpf_insn`` 8-byte blocks and interpreted as single
 252 instruction that loads 64-bit immediate value into a dst_reg.
 253
 254 Packet access instructions
 255 --------------------------
 256
 257 eBPF has two non-generic instructions: (BPF_ABS | <size> | BPF_LD) and
 258 (BPF_IND | <size> | BPF_LD) which are used to access packet data.
 259
 260 They had to be carried over from classic BPF to have strong performance of
 261 socket filters running in eBPF interpreter. These instructions can only
 262 be used when interpreter context is a pointer to ``struct sk_buff`` and
 263 have seven implicit operands. Register R6 is an implicit input that must
 264 contain pointer to sk_buff. Register R0 is an implicit output which contains
 265 the data fetched from the packet. Registers R1-R5 are scratch registers
 266 and must not be used to store the data across BPF_ABS | BPF_LD or
 267 BPF_IND | BPF_LD instructions.
 268
 269 These instructions have implicit program exit condition as well. When
 270 eBPF program is trying to access the data beyond the packet boundary,
 271 the interpreter will abort the execution of the program. JIT compilers
 272 therefore must preserve this property. src_reg and imm32 fields are
 273 explicit inputs to these instructions.
 274
 275 For example, BPF_IND | BPF_W | BPF_LD means::
 276
 277   R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32))
 278
 279 and R1 - R5 are clobbered.