Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf...
authorJakub Kicinski <kuba@kernel.org>
Fri, 14 Jul 2023 02:13:24 +0000 (19:13 -0700)
committerJakub Kicinski <kuba@kernel.org>
Fri, 14 Jul 2023 02:13:24 +0000 (19:13 -0700)
Alexei Starovoitov says:

====================
pull-request: bpf-next 2023-07-13

We've added 67 non-merge commits during the last 15 day(s) which contain
a total of 106 files changed, 4444 insertions(+), 619 deletions(-).

The main changes are:

1) Fix bpftool build in presence of stale vmlinux.h,
   from Alexander Lobakin.

2) Introduce bpf_me_mcache_free_rcu() and fix OOM under stress,
   from Alexei Starovoitov.

3) Teach verifier actual bounds of bpf_get_smp_processor_id()
   and fix perf+libbpf issue related to custom section handling,
   from Andrii Nakryiko.

4) Introduce bpf map element count, from Anton Protopopov.

5) Check skb ownership against full socket, from Kui-Feng Lee.

6) Support for up to 12 arguments in BPF trampoline, from Menglong Dong.

7) Export rcu_request_urgent_qs_task, from Paul E. McKenney.

8) Fix BTF walking of unions, from Yafang Shao.

9) Extend link_info for kprobe_multi and perf_event links,
   from Yafang Shao.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (67 commits)
  selftests/bpf: Add selftest for PTR_UNTRUSTED
  bpf: Fix an error in verifying a field in a union
  selftests/bpf: Add selftests for nested_trust
  bpf: Fix an error around PTR_UNTRUSTED
  selftests/bpf: add testcase for TRACING with 6+ arguments
  bpf, x86: allow function arguments up to 12 for TRACING
  bpf, x86: save/restore regs with BPF_DW size
  bpftool: Use "fallthrough;" keyword instead of comments
  bpf: Add object leak check.
  bpf: Convert bpf_cpumask to bpf_mem_cache_free_rcu.
  bpf: Introduce bpf_mem_free_rcu() similar to kfree_rcu().
  selftests/bpf: Improve test coverage of bpf_mem_alloc.
  rcu: Export rcu_request_urgent_qs_task()
  bpf: Allow reuse from waiting_for_gp_ttrace list.
  bpf: Add a hint to allocated objects.
  bpf: Change bpf_mem_cache draining process.
  bpf: Further refactor alloc_bulk().
  bpf: Factor out inc/dec of active flag into helpers.
  bpf: Refactor alloc_bulk().
  bpf: Let free_all() return the number of freed elements.
  ...
====================

Link: https://lore.kernel.org/r/20230714020910.80794-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
108 files changed:
Documentation/bpf/bpf_devel_QA.rst
Documentation/bpf/btf.rst
Documentation/bpf/index.rst
Documentation/bpf/instruction-set.rst [deleted file]
Documentation/bpf/linux-notes.rst [deleted file]
Documentation/bpf/llvm_reloc.rst
Documentation/bpf/standardization/index.rst [new file with mode: 0644]
Documentation/bpf/standardization/instruction-set.rst [new file with mode: 0644]
Documentation/bpf/standardization/linux-notes.rst [new file with mode: 0644]
MAINTAINERS
arch/x86/net/bpf_jit_comp.c
drivers/hid/bpf/entrypoints/Makefile
include/linux/bpf-cgroup.h
include/linux/bpf.h
include/linux/bpf_mem_alloc.h
include/linux/rcutiny.h
include/linux/rcutree.h
include/linux/trace_events.h
include/uapi/linux/bpf.h
kernel/bpf/btf.c
kernel/bpf/cpumask.c
kernel/bpf/hashtab.c
kernel/bpf/map_iter.c
kernel/bpf/memalloc.c
kernel/bpf/preload/iterators/Makefile
kernel/bpf/preload/iterators/iterators.bpf.c
kernel/bpf/preload/iterators/iterators.lskel-little-endian.h
kernel/bpf/ringbuf.c
kernel/bpf/syscall.c
kernel/bpf/verifier.c
kernel/rcu/rcu.h
kernel/trace/bpf_trace.c
kernel/trace/trace_kprobe.c
kernel/trace/trace_uprobe.c
lib/test_bpf.c
net/bpf/test_run.c
samples/bpf/Makefile
samples/bpf/gnu/stubs.h
samples/bpf/syscall_tp_kern.c
samples/bpf/test_lwt_bpf.sh
samples/hid/Makefile
tools/bpf/bpftool/Documentation/bpftool-gen.rst
tools/bpf/bpftool/Makefile
tools/bpf/bpftool/btf_dumper.c
tools/bpf/bpftool/feature.c
tools/bpf/bpftool/link.c
tools/bpf/bpftool/skeleton/pid_iter.bpf.c
tools/bpf/bpftool/skeleton/profiler.bpf.c
tools/bpf/bpftool/xlated_dumper.c
tools/bpf/bpftool/xlated_dumper.h
tools/bpf/runqslower/Makefile
tools/build/feature/Makefile
tools/include/uapi/linux/bpf.h
tools/lib/bpf/bpf.c
tools/lib/bpf/bpf.h
tools/lib/bpf/hashmap.h
tools/lib/bpf/libbpf.c
tools/lib/bpf/libbpf.h
tools/lib/bpf/libbpf.map
tools/lib/bpf/usdt.c
tools/testing/selftests/bpf/DENYLIST.aarch64
tools/testing/selftests/bpf/Makefile
tools/testing/selftests/bpf/bench.c
tools/testing/selftests/bpf/benchs/bench_htab_mem.c [new file with mode: 0644]
tools/testing/selftests/bpf/benchs/bench_ringbufs.c
tools/testing/selftests/bpf/benchs/run_bench_htab_mem.sh [new file with mode: 0755]
tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
tools/testing/selftests/bpf/cgroup_helpers.c
tools/testing/selftests/bpf/cgroup_helpers.h
tools/testing/selftests/bpf/cgroup_tcp_skb.h [new file with mode: 0644]
tools/testing/selftests/bpf/gnu/stubs.h
tools/testing/selftests/bpf/map_tests/map_percpu_stats.c [new file with mode: 0644]
tools/testing/selftests/bpf/prog_tests/bpf_nf.c
tools/testing/selftests/bpf/prog_tests/cgroup_tcp_skb.c [new file with mode: 0644]
tools/testing/selftests/bpf/prog_tests/fentry_test.c
tools/testing/selftests/bpf/prog_tests/fexit_test.c
tools/testing/selftests/bpf/prog_tests/get_func_args_test.c
tools/testing/selftests/bpf/prog_tests/global_map_resize.c
tools/testing/selftests/bpf/prog_tests/modify_return.c
tools/testing/selftests/bpf/prog_tests/netfilter_link_attach.c [new file with mode: 0644]
tools/testing/selftests/bpf/prog_tests/ptr_untrusted.c [new file with mode: 0644]
tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c
tools/testing/selftests/bpf/prog_tests/tracing_struct.c
tools/testing/selftests/bpf/prog_tests/trampoline_count.c
tools/testing/selftests/bpf/prog_tests/verifier.c
tools/testing/selftests/bpf/progs/cgroup_tcp_skb.c [new file with mode: 0644]
tools/testing/selftests/bpf/progs/fentry_many_args.c [new file with mode: 0644]
tools/testing/selftests/bpf/progs/fexit_many_args.c [new file with mode: 0644]
tools/testing/selftests/bpf/progs/htab_mem_bench.c [new file with mode: 0644]
tools/testing/selftests/bpf/progs/linked_list.c
tools/testing/selftests/bpf/progs/map_percpu_stats.c [new file with mode: 0644]
tools/testing/selftests/bpf/progs/modify_return.c
tools/testing/selftests/bpf/progs/nested_trust_failure.c
tools/testing/selftests/bpf/progs/nested_trust_success.c
tools/testing/selftests/bpf/progs/test_global_map_resize.c
tools/testing/selftests/bpf/progs/test_netfilter_link_attach.c [new file with mode: 0644]
tools/testing/selftests/bpf/progs/test_ptr_untrusted.c [new file with mode: 0644]
tools/testing/selftests/bpf/progs/tracing_struct.c
tools/testing/selftests/bpf/progs/verifier_typedef.c [new file with mode: 0644]
tools/testing/selftests/bpf/trace_helpers.c
tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c
tools/testing/selftests/bpf/verifier/ctx_skb.c
tools/testing/selftests/bpf/verifier/jmp32.c
tools/testing/selftests/bpf/verifier/map_kptr.c
tools/testing/selftests/bpf/verifier/precise.c
tools/testing/selftests/hid/Makefile
tools/testing/selftests/net/Makefile
tools/testing/selftests/tc-testing/Makefile

index 609b71f..de27e16 100644 (file)
@@ -635,12 +635,12 @@ test coverage.
 
 Q: clang flag for target bpf?
 -----------------------------
-Q: In some cases clang flag ``-target bpf`` is used but in other cases the
+Q: In some cases clang flag ``--target=bpf`` is used but in other cases the
 default clang target, which matches the underlying architecture, is used.
 What is the difference and when I should use which?
 
 A: Although LLVM IR generation and optimization try to stay architecture
-independent, ``-target <arch>`` still has some impact on generated code:
+independent, ``--target=<arch>`` still has some impact on generated code:
 
 - BPF program may recursively include header file(s) with file scope
   inline assembly codes. The default target can handle this well,
@@ -658,7 +658,7 @@ independent, ``-target <arch>`` still has some impact on generated code:
   The clang option ``-fno-jump-tables`` can be used to disable
   switch table generation.
 
-- For clang ``-target bpf``, it is guaranteed that pointer or long /
+- For clang ``--target=bpf``, it is guaranteed that pointer or long /
   unsigned long types will always have a width of 64 bit, no matter
   whether underlying clang binary or default target (or kernel) is
   32 bit. However, when native clang target is used, then it will
@@ -668,7 +668,7 @@ independent, ``-target <arch>`` still has some impact on generated code:
   while the BPF LLVM back end still operates in 64 bit. The native
   target is mostly needed in tracing for the case of walking ``pt_regs``
   or other kernel structures where CPU's register width matters.
-  Otherwise, ``clang -target bpf`` is generally recommended.
+  Otherwise, ``clang --target=bpf`` is generally recommended.
 
 You should use default target when:
 
@@ -685,7 +685,7 @@ when:
   into these structures is verified by the BPF verifier and may result
   in verification failures if the native architecture is not aligned with
   the BPF architecture, e.g. 64-bit. An example of this is
-  BPF_PROG_TYPE_SK_MSG require ``-target bpf``
+  BPF_PROG_TYPE_SK_MSG require ``--target=bpf``
 
 
 .. Links
index 7cd7c54..f32db1f 100644 (file)
@@ -990,7 +990,7 @@ format.::
     } g2;
     int main() { return 0; }
     int test() { return 0; }
-    -bash-4.4$ clang -c -g -O2 -target bpf t2.c
+    -bash-4.4$ clang -c -g -O2 --target=bpf t2.c
     -bash-4.4$ readelf -S t2.o
       ......
       [ 8] .BTF              PROGBITS         0000000000000000  00000247
@@ -1000,7 +1000,7 @@ format.::
       [10] .rel.BTF.ext      REL              0000000000000000  000007e0
            0000000000000040  0000000000000010          16     9     8
       ......
-    -bash-4.4$ clang -S -g -O2 -target bpf t2.c
+    -bash-4.4$ clang -S -g -O2 --target=bpf t2.c
     -bash-4.4$ cat t2.s
       ......
             .section        .BTF,"",@progbits
index dbb39e8..1ff177b 100644 (file)
@@ -12,9 +12,9 @@ that goes into great technical depth about the BPF Architecture.
 .. toctree::
    :maxdepth: 1
 
-   instruction-set
    verifier
    libbpf/index
+   standardization/index
    btf
    faq
    syscall_api
@@ -29,7 +29,6 @@ that goes into great technical depth about the BPF Architecture.
    bpf_licensing
    test_debug
    clang-notes
-   linux-notes
    other
    redirect
 
diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
deleted file mode 100644 (file)
index 6644842..0000000
+++ /dev/null
@@ -1,478 +0,0 @@
-.. contents::
-.. sectnum::
-
-========================================
-eBPF Instruction Set Specification, v1.0
-========================================
-
-This document specifies version 1.0 of the eBPF instruction set.
-
-Documentation conventions
-=========================
-
-For brevity, this document uses the type notion "u64", "u32", etc.
-to mean an unsigned integer whose width is the specified number of bits,
-and "s32", etc. to mean a signed integer of the specified number of bits.
-
-Registers and calling convention
-================================
-
-eBPF has 10 general purpose registers and a read-only frame pointer register,
-all of which are 64-bits wide.
-
-The eBPF calling convention is defined as:
-
-* R0: return value from function calls, and exit value for eBPF programs
-* R1 - R5: arguments for function calls
-* R6 - R9: callee saved registers that function calls will preserve
-* R10: read-only frame pointer to access stack
-
-R0 - R5 are scratch registers and eBPF programs needs to spill/fill them if
-necessary across calls.
-
-Instruction encoding
-====================
-
-eBPF has two instruction encodings:
-
-* the basic instruction encoding, which uses 64 bits to encode an instruction
-* the wide instruction encoding, which appends a second 64-bit immediate (i.e.,
-  constant) value after the basic instruction for a total of 128 bits.
-
-The fields conforming an encoded basic instruction are stored in the
-following order::
-
-  opcode:8 src_reg:4 dst_reg:4 offset:16 imm:32 // In little-endian BPF.
-  opcode:8 dst_reg:4 src_reg:4 offset:16 imm:32 // In big-endian BPF.
-
-**imm**
-  signed integer immediate value
-
-**offset**
-  signed integer offset used with pointer arithmetic
-
-**src_reg**
-  the source register number (0-10), except where otherwise specified
-  (`64-bit immediate instructions`_ reuse this field for other purposes)
-
-**dst_reg**
-  destination register number (0-10)
-
-**opcode**
-  operation to perform
-
-Note that the contents of multi-byte fields ('imm' and 'offset') are
-stored using big-endian byte ordering in big-endian BPF and
-little-endian byte ordering in little-endian BPF.
-
-For example::
-
-  opcode                  offset imm          assembly
-         src_reg dst_reg
-  07     0       1        00 00  44 33 22 11  r1 += 0x11223344 // little
-         dst_reg src_reg
-  07     1       0        00 00  11 22 33 44  r1 += 0x11223344 // big
-
-Note that most instructions do not use all of the fields.
-Unused fields shall be cleared to zero.
-
-As discussed below in `64-bit immediate instructions`_, a 64-bit immediate
-instruction uses a 64-bit immediate value that is constructed as follows.
-The 64 bits following the basic instruction contain a pseudo instruction
-using the same format but with opcode, dst_reg, src_reg, and offset all set to zero,
-and imm containing the high 32 bits of the immediate value.
-
-This is depicted in the following figure::
-
-        basic_instruction
-  .-----------------------------.
-  |                             |
-  code:8 regs:8 offset:16 imm:32 unused:32 imm:32
-                                 |              |
-                                 '--------------'
-                                pseudo instruction
-
-Thus the 64-bit immediate value is constructed as follows:
-
-  imm64 = (next_imm << 32) | imm
-
-where 'next_imm' refers to the imm value of the pseudo instruction
-following the basic instruction.  The unused bytes in the pseudo
-instruction are reserved and shall be cleared to zero.
-
-Instruction classes
--------------------
-
-The three LSB bits of the 'opcode' field store the instruction class:
-
-=========  =====  ===============================  ===================================
-class      value  description                      reference
-=========  =====  ===============================  ===================================
-BPF_LD     0x00   non-standard load operations     `Load and store instructions`_
-BPF_LDX    0x01   load into register operations    `Load and store instructions`_
-BPF_ST     0x02   store from immediate operations  `Load and store instructions`_
-BPF_STX    0x03   store from register operations   `Load and store instructions`_
-BPF_ALU    0x04   32-bit arithmetic operations     `Arithmetic and jump instructions`_
-BPF_JMP    0x05   64-bit jump operations           `Arithmetic and jump instructions`_
-BPF_JMP32  0x06   32-bit jump operations           `Arithmetic and jump instructions`_
-BPF_ALU64  0x07   64-bit arithmetic operations     `Arithmetic and jump instructions`_
-=========  =====  ===============================  ===================================
-
-Arithmetic and jump instructions
-================================
-
-For arithmetic and jump instructions (``BPF_ALU``, ``BPF_ALU64``, ``BPF_JMP`` and
-``BPF_JMP32``), the 8-bit 'opcode' field is divided into three parts:
-
-==============  ======  =================
-4 bits (MSB)    1 bit   3 bits (LSB)
-==============  ======  =================
-code            source  instruction class
-==============  ======  =================
-
-**code**
-  the operation code, whose meaning varies by instruction class
-
-**source**
-  the source operand location, which unless otherwise specified is one of:
-
-  ======  =====  ==============================================
-  source  value  description
-  ======  =====  ==============================================
-  BPF_K   0x00   use 32-bit 'imm' value as source operand
-  BPF_X   0x08   use 'src_reg' register value as source operand
-  ======  =====  ==============================================
-
-**instruction class**
-  the instruction class (see `Instruction classes`_)
-
-Arithmetic instructions
------------------------
-
-``BPF_ALU`` uses 32-bit wide operands while ``BPF_ALU64`` uses 64-bit wide operands for
-otherwise identical operations.
-The 'code' field encodes the operation as below, where 'src' and 'dst' refer
-to the values of the source and destination registers, respectively.
-
-========  =====  ==========================================================
-code      value  description
-========  =====  ==========================================================
-BPF_ADD   0x00   dst += src
-BPF_SUB   0x10   dst -= src
-BPF_MUL   0x20   dst \*= src
-BPF_DIV   0x30   dst = (src != 0) ? (dst / src) : 0
-BPF_OR    0x40   dst \|= src
-BPF_AND   0x50   dst &= src
-BPF_LSH   0x60   dst <<= (src & mask)
-BPF_RSH   0x70   dst >>= (src & mask)
-BPF_NEG   0x80   dst = ~src
-BPF_MOD   0x90   dst = (src != 0) ? (dst % src) : dst
-BPF_XOR   0xa0   dst ^= src
-BPF_MOV   0xb0   dst = src
-BPF_ARSH  0xc0   sign extending dst >>= (src & mask)
-BPF_END   0xd0   byte swap operations (see `Byte swap instructions`_ below)
-========  =====  ==========================================================
-
-Underflow and overflow are allowed during arithmetic operations, meaning
-the 64-bit or 32-bit value will wrap. If eBPF program execution would
-result in division by zero, the destination register is instead set to zero.
-If execution would result in modulo by zero, for ``BPF_ALU64`` the value of
-the destination register is unchanged whereas for ``BPF_ALU`` the upper
-32 bits of the destination register are zeroed.
-
-``BPF_ADD | BPF_X | BPF_ALU`` means::
-
-  dst = (u32) ((u32) dst + (u32) src)
-
-where '(u32)' indicates that the upper 32 bits are zeroed.
-
-``BPF_ADD | BPF_X | BPF_ALU64`` means::
-
-  dst = dst + src
-
-``BPF_XOR | BPF_K | BPF_ALU`` means::
-
-  dst = (u32) dst ^ (u32) imm32
-
-``BPF_XOR | BPF_K | BPF_ALU64`` means::
-
-  dst = dst ^ imm32
-
-Also note that the division and modulo operations are unsigned. Thus, for
-``BPF_ALU``, 'imm' is first interpreted as an unsigned 32-bit value, whereas
-for ``BPF_ALU64``, 'imm' is first sign extended to 64 bits and the result
-interpreted as an unsigned 64-bit value. There are no instructions for
-signed division or modulo.
-
-Shift operations use a mask of 0x3F (63) for 64-bit operations and 0x1F (31)
-for 32-bit operations.
-
-Byte swap instructions
-~~~~~~~~~~~~~~~~~~~~~~
-
-The byte swap instructions use an instruction class of ``BPF_ALU`` and a 4-bit
-'code' field of ``BPF_END``.
-
-The byte swap instructions operate on the destination register
-only and do not use a separate source register or immediate value.
-
-The 1-bit source operand field in the opcode is used to select what byte
-order the operation convert from or to:
-
-=========  =====  =================================================
-source     value  description
-=========  =====  =================================================
-BPF_TO_LE  0x00   convert between host byte order and little endian
-BPF_TO_BE  0x08   convert between host byte order and big endian
-=========  =====  =================================================
-
-The 'imm' field encodes the width of the swap operations.  The following widths
-are supported: 16, 32 and 64.
-
-Examples:
-
-``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16 means::
-
-  dst = htole16(dst)
-
-``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 64 means::
-
-  dst = htobe64(dst)
-
-Jump instructions
------------------
-
-``BPF_JMP32`` uses 32-bit wide operands while ``BPF_JMP`` uses 64-bit wide operands for
-otherwise identical operations.
-The 'code' field encodes the operation as below:
-
-========  =====  ===  ===========================================  =========================================
-code      value  src  description                                  notes
-========  =====  ===  ===========================================  =========================================
-BPF_JA    0x0    0x0  PC += offset                                 BPF_JMP only
-BPF_JEQ   0x1    any  PC += offset if dst == src
-BPF_JGT   0x2    any  PC += offset if dst > src                    unsigned
-BPF_JGE   0x3    any  PC += offset if dst >= src                   unsigned
-BPF_JSET  0x4    any  PC += offset if dst & src
-BPF_JNE   0x5    any  PC += offset if dst != src
-BPF_JSGT  0x6    any  PC += offset if dst > src                    signed
-BPF_JSGE  0x7    any  PC += offset if dst >= src                   signed
-BPF_CALL  0x8    0x0  call helper function by address              see `Helper functions`_
-BPF_CALL  0x8    0x1  call PC += offset                            see `Program-local functions`_
-BPF_CALL  0x8    0x2  call helper function by BTF ID               see `Helper functions`_
-BPF_EXIT  0x9    0x0  return                                       BPF_JMP only
-BPF_JLT   0xa    any  PC += offset if dst < src                    unsigned
-BPF_JLE   0xb    any  PC += offset if dst <= src                   unsigned
-BPF_JSLT  0xc    any  PC += offset if dst < src                    signed
-BPF_JSLE  0xd    any  PC += offset if dst <= src                   signed
-========  =====  ===  ===========================================  =========================================
-
-The eBPF program needs to store the return value into register R0 before doing a
-``BPF_EXIT``.
-
-Example:
-
-``BPF_JSGE | BPF_X | BPF_JMP32`` (0x7e) means::
-
-  if (s32)dst s>= (s32)src goto +offset
-
-where 's>=' indicates a signed '>=' comparison.
-
-Helper functions
-~~~~~~~~~~~~~~~~
-
-Helper functions are a concept whereby BPF programs can call into a
-set of function calls exposed by the underlying platform.
-
-Historically, each helper function was identified by an address
-encoded in the imm field.  The available helper functions may differ
-for each program type, but address values are unique across all program types.
-
-Platforms that support the BPF Type Format (BTF) support identifying
-a helper function by a BTF ID encoded in the imm field, where the BTF ID
-identifies the helper name and type.
-
-Program-local functions
-~~~~~~~~~~~~~~~~~~~~~~~
-Program-local functions are functions exposed by the same BPF program as the
-caller, and are referenced by offset from the call instruction, similar to
-``BPF_JA``.  A ``BPF_EXIT`` within the program-local function will return to
-the caller.
-
-Load and store instructions
-===========================
-
-For load and store instructions (``BPF_LD``, ``BPF_LDX``, ``BPF_ST``, and ``BPF_STX``), the
-8-bit 'opcode' field is divided as:
-
-============  ======  =================
-3 bits (MSB)  2 bits  3 bits (LSB)
-============  ======  =================
-mode          size    instruction class
-============  ======  =================
-
-The mode modifier is one of:
-
-  =============  =====  ====================================  =============
-  mode modifier  value  description                           reference
-  =============  =====  ====================================  =============
-  BPF_IMM        0x00   64-bit immediate instructions         `64-bit immediate instructions`_
-  BPF_ABS        0x20   legacy BPF packet access (absolute)   `Legacy BPF Packet access instructions`_
-  BPF_IND        0x40   legacy BPF packet access (indirect)   `Legacy BPF Packet access instructions`_
-  BPF_MEM        0x60   regular load and store operations     `Regular load and store operations`_
-  BPF_ATOMIC     0xc0   atomic operations                     `Atomic operations`_
-  =============  =====  ====================================  =============
-
-The size modifier is one of:
-
-  =============  =====  =====================
-  size modifier  value  description
-  =============  =====  =====================
-  BPF_W          0x00   word        (4 bytes)
-  BPF_H          0x08   half word   (2 bytes)
-  BPF_B          0x10   byte
-  BPF_DW         0x18   double word (8 bytes)
-  =============  =====  =====================
-
-Regular load and store operations
----------------------------------
-
-The ``BPF_MEM`` mode modifier is used to encode regular load and store
-instructions that transfer data between a register and memory.
-
-``BPF_MEM | <size> | BPF_STX`` means::
-
-  *(size *) (dst + offset) = src
-
-``BPF_MEM | <size> | BPF_ST`` means::
-
-  *(size *) (dst + offset) = imm32
-
-``BPF_MEM | <size> | BPF_LDX`` means::
-
-  dst = *(size *) (src + offset)
-
-Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``.
-
-Atomic operations
------------------
-
-Atomic operations are operations that operate on memory and can not be
-interrupted or corrupted by other access to the same memory region
-by other eBPF programs or means outside of this specification.
-
-All atomic operations supported by eBPF are encoded as store operations
-that use the ``BPF_ATOMIC`` mode modifier as follows:
-
-* ``BPF_ATOMIC | BPF_W | BPF_STX`` for 32-bit operations
-* ``BPF_ATOMIC | BPF_DW | BPF_STX`` for 64-bit operations
-* 8-bit and 16-bit wide atomic operations are not supported.
-
-The 'imm' field is used to encode the actual atomic operation.
-Simple atomic operation use a subset of the values defined to encode
-arithmetic operations in the 'imm' field to encode the atomic operation:
-
-========  =====  ===========
-imm       value  description
-========  =====  ===========
-BPF_ADD   0x00   atomic add
-BPF_OR    0x40   atomic or
-BPF_AND   0x50   atomic and
-BPF_XOR   0xa0   atomic xor
-========  =====  ===========
-
-
-``BPF_ATOMIC | BPF_W  | BPF_STX`` with 'imm' = BPF_ADD means::
-
-  *(u32 *)(dst + offset) += src
-
-``BPF_ATOMIC | BPF_DW | BPF_STX`` with 'imm' = BPF ADD means::
-
-  *(u64 *)(dst + offset) += src
-
-In addition to the simple atomic operations, there also is a modifier and
-two complex atomic operations:
-
-===========  ================  ===========================
-imm          value             description
-===========  ================  ===========================
-BPF_FETCH    0x01              modifier: return old value
-BPF_XCHG     0xe0 | BPF_FETCH  atomic exchange
-BPF_CMPXCHG  0xf0 | BPF_FETCH  atomic compare and exchange
-===========  ================  ===========================
-
-The ``BPF_FETCH`` modifier is optional for simple atomic operations, and
-always set for the complex atomic operations.  If the ``BPF_FETCH`` flag
-is set, then the operation also overwrites ``src`` with the value that
-was in memory before it was modified.
-
-The ``BPF_XCHG`` operation atomically exchanges ``src`` with the value
-addressed by ``dst + offset``.
-
-The ``BPF_CMPXCHG`` operation atomically compares the value addressed by
-``dst + offset`` with ``R0``. If they match, the value addressed by
-``dst + offset`` is replaced with ``src``. In either case, the
-value that was at ``dst + offset`` before the operation is zero-extended
-and loaded back to ``R0``.
-
-64-bit immediate instructions
------------------------------
-
-Instructions with the ``BPF_IMM`` 'mode' modifier use the wide instruction
-encoding defined in `Instruction encoding`_, and use the 'src' field of the
-basic instruction to hold an opcode subtype.
-
-The following table defines a set of ``BPF_IMM | BPF_DW | BPF_LD`` instructions
-with opcode subtypes in the 'src' field, using new terms such as "map"
-defined further below:
-
-=========================  ======  ===  =========================================  ===========  ==============
-opcode construction        opcode  src  pseudocode                                 imm type     dst type
-=========================  ======  ===  =========================================  ===========  ==============
-BPF_IMM | BPF_DW | BPF_LD  0x18    0x0  dst = imm64                                integer      integer
-BPF_IMM | BPF_DW | BPF_LD  0x18    0x1  dst = map_by_fd(imm)                       map fd       map
-BPF_IMM | BPF_DW | BPF_LD  0x18    0x2  dst = map_val(map_by_fd(imm)) + next_imm   map fd       data pointer
-BPF_IMM | BPF_DW | BPF_LD  0x18    0x3  dst = var_addr(imm)                        variable id  data pointer
-BPF_IMM | BPF_DW | BPF_LD  0x18    0x4  dst = code_addr(imm)                       integer      code pointer
-BPF_IMM | BPF_DW | BPF_LD  0x18    0x5  dst = map_by_idx(imm)                      map index    map
-BPF_IMM | BPF_DW | BPF_LD  0x18    0x6  dst = map_val(map_by_idx(imm)) + next_imm  map index    data pointer
-=========================  ======  ===  =========================================  ===========  ==============
-
-where
-
-* map_by_fd(imm) means to convert a 32-bit file descriptor into an address of a map (see `Maps`_)
-* map_by_idx(imm) means to convert a 32-bit index into an address of a map
-* map_val(map) gets the address of the first value in a given map
-* var_addr(imm) gets the address of a platform variable (see `Platform Variables`_) with a given id
-* code_addr(imm) gets the address of the instruction at a specified relative offset in number of (64-bit) instructions
-* the 'imm type' can be used by disassemblers for display
-* the 'dst type' can be used for verification and JIT compilation purposes
-
-Maps
-~~~~
-
-Maps are shared memory regions accessible by eBPF programs on some platforms.
-A map can have various semantics as defined in a separate document, and may or
-may not have a single contiguous memory region, but the 'map_val(map)' is
-currently only defined for maps that do have a single contiguous memory region.
-
-Each map can have a file descriptor (fd) if supported by the platform, where
-'map_by_fd(imm)' means to get the map with the specified file descriptor. Each
-BPF program can also be defined to use a set of maps associated with the
-program at load time, and 'map_by_idx(imm)' means to get the map with the given
-index in the set associated with the BPF program containing the instruction.
-
-Platform Variables
-~~~~~~~~~~~~~~~~~~
-
-Platform variables are memory regions, identified by integer ids, exposed by
-the runtime and accessible by BPF programs on some platforms.  The
-'var_addr(imm)' operation means to get the address of the memory region
-identified by the given id.
-
-Legacy BPF Packet access instructions
--------------------------------------
-
-eBPF previously introduced special instructions for access to packet data that were
-carried over from classic BPF. However, these instructions are
-deprecated and should no longer be used.
diff --git a/Documentation/bpf/linux-notes.rst b/Documentation/bpf/linux-notes.rst
deleted file mode 100644 (file)
index 508d009..0000000
+++ /dev/null
@@ -1,83 +0,0 @@
-.. contents::
-.. sectnum::
-
-==========================
-Linux implementation notes
-==========================
-
-This document provides more details specific to the Linux kernel implementation of the eBPF instruction set.
-
-Byte swap instructions
-======================
-
-``BPF_FROM_LE`` and ``BPF_FROM_BE`` exist as aliases for ``BPF_TO_LE`` and ``BPF_TO_BE`` respectively.
-
-Jump instructions
-=================
-
-``BPF_CALL | BPF_X | BPF_JMP`` (0x8d), where the helper function
-integer would be read from a specified register, is not currently supported
-by the verifier.  Any programs with this instruction will fail to load
-until such support is added.
-
-Maps
-====
-
-Linux only supports the 'map_val(map)' operation on array maps with a single element.
-
-Linux uses an fd_array to store maps associated with a BPF program. Thus,
-map_by_idx(imm) uses the fd at that index in the array.
-
-Variables
-=========
-
-The following 64-bit immediate instruction specifies that a variable address,
-which corresponds to some integer stored in the 'imm' field, should be loaded:
-
-=========================  ======  ===  =========================================  ===========  ==============
-opcode construction        opcode  src  pseudocode                                 imm type     dst type
-=========================  ======  ===  =========================================  ===========  ==============
-BPF_IMM | BPF_DW | BPF_LD  0x18    0x3  dst = var_addr(imm)                        variable id  data pointer
-=========================  ======  ===  =========================================  ===========  ==============
-
-On Linux, this integer is a BTF ID.
-
-Legacy BPF Packet access instructions
-=====================================
-
-As mentioned in the `ISA standard documentation <instruction-set.rst#legacy-bpf-packet-access-instructions>`_,
-Linux has special eBPF instructions for access to packet data that have been
-carried over from classic BPF to retain the performance of legacy socket
-filters running in the eBPF interpreter.
-
-The instructions come in two forms: ``BPF_ABS | <size> | BPF_LD`` and
-``BPF_IND | <size> | BPF_LD``.
-
-These instructions are used to access packet data and can only be used when
-the program context is a pointer to a networking packet.  ``BPF_ABS``
-accesses packet data at an absolute offset specified by the immediate data
-and ``BPF_IND`` access packet data at an offset that includes the value of
-a register in addition to the immediate data.
-
-These instructions have seven implicit operands:
-
-* Register R6 is an implicit input that must contain a pointer to a
-  struct sk_buff.
-* Register R0 is an implicit output which contains the data fetched from
-  the packet.
-* Registers R1-R5 are scratch registers that are clobbered by the
-  instruction.
-
-These instructions have an implicit program exit condition as well. If an
-eBPF program attempts access data beyond the packet boundary, the
-program execution will be aborted.
-
-``BPF_ABS | BPF_W | BPF_LD`` (0x20) means::
-
-  R0 = ntohl(*(u32 *) ((struct sk_buff *) R6->data + imm))
-
-where ``ntohl()`` converts a 32-bit value from network byte order to host byte order.
-
-``BPF_IND | BPF_W | BPF_LD`` (0x40) means::
-
-  R0 = ntohl(*(u32 *) ((struct sk_buff *) R6->data + src + imm))
index e4a777a..450e640 100644 (file)
@@ -28,7 +28,7 @@ For example, for the following code::
     return g1 + g2 + l1 + l2;
   }
 
-Compiled with ``clang -target bpf -O2 -c test.c``, the following is
+Compiled with ``clang --target=bpf -O2 -c test.c``, the following is
 the code with ``llvm-objdump -dr test.o``::
 
        0:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
@@ -157,7 +157,7 @@ and ``call`` instructions. For example::
     return gfunc(a, b) +  lfunc(a, b) + global;
   }
 
-Compiled with ``clang -target bpf -O2 -c test.c``, we will have
+Compiled with ``clang --target=bpf -O2 -c test.c``, we will have
 following code with `llvm-objdump -dr test.o``::
 
   Disassembly of section .text:
@@ -203,7 +203,7 @@ The following is an example to show how R_BPF_64_ABS64 could be generated::
   int global() { return 0; }
   struct t { void *g; } gbl = { global };
 
-Compiled with ``clang -target bpf -O2 -g -c test.c``, we will see a
+Compiled with ``clang --target=bpf -O2 -g -c test.c``, we will see a
 relocation below in ``.data`` section with command
 ``llvm-readelf -r test.o``::
 
diff --git a/Documentation/bpf/standardization/index.rst b/Documentation/bpf/standardization/index.rst
new file mode 100644 (file)
index 0000000..09c6ba0
--- /dev/null
@@ -0,0 +1,18 @@
+.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
+
+===================
+BPF Standardization
+===================
+
+This directory contains documents that are being iterated on as part of the BPF
+standardization effort with the IETF. See the `IETF BPF Working Group`_ page
+for the working group charter, documents, and more.
+
+.. toctree::
+   :maxdepth: 1
+
+   instruction-set
+   linux-notes
+
+.. Links:
+.. _IETF BPF Working Group: https://datatracker.ietf.org/wg/bpf/about/
diff --git a/Documentation/bpf/standardization/instruction-set.rst b/Documentation/bpf/standardization/instruction-set.rst
new file mode 100644 (file)
index 0000000..751e657
--- /dev/null
@@ -0,0 +1,478 @@
+.. contents::
+.. sectnum::
+
+========================================
+eBPF Instruction Set Specification, v1.0
+========================================
+
+This document specifies version 1.0 of the eBPF instruction set.
+
+Documentation conventions
+=========================
+
+For brevity, this document uses the type notion "u64", "u32", etc.
+to mean an unsigned integer whose width is the specified number of bits,
+and "s32", etc. to mean a signed integer of the specified number of bits.
+
+Registers and calling convention
+================================
+
+eBPF has 10 general purpose registers and a read-only frame pointer register,
+all of which are 64-bits wide.
+
+The eBPF calling convention is defined as:
+
+* R0: return value from function calls, and exit value for eBPF programs
+* R1 - R5: arguments for function calls
+* R6 - R9: callee saved registers that function calls will preserve
+* R10: read-only frame pointer to access stack
+
+R0 - R5 are scratch registers and eBPF programs needs to spill/fill them if
+necessary across calls.
+
+Instruction encoding
+====================
+
+eBPF has two instruction encodings:
+
+* the basic instruction encoding, which uses 64 bits to encode an instruction
+* the wide instruction encoding, which appends a second 64-bit immediate (i.e.,
+  constant) value after the basic instruction for a total of 128 bits.
+
+The fields conforming an encoded basic instruction are stored in the
+following order::
+
+  opcode:8 src_reg:4 dst_reg:4 offset:16 imm:32 // In little-endian BPF.
+  opcode:8 dst_reg:4 src_reg:4 offset:16 imm:32 // In big-endian BPF.
+
+**imm**
+  signed integer immediate value
+
+**offset**
+  signed integer offset used with pointer arithmetic
+
+**src_reg**
+  the source register number (0-10), except where otherwise specified
+  (`64-bit immediate instructions`_ reuse this field for other purposes)
+
+**dst_reg**
+  destination register number (0-10)
+
+**opcode**
+  operation to perform
+
+Note that the contents of multi-byte fields ('imm' and 'offset') are
+stored using big-endian byte ordering in big-endian BPF and
+little-endian byte ordering in little-endian BPF.
+
+For example::
+
+  opcode                  offset imm          assembly
+         src_reg dst_reg
+  07     0       1        00 00  44 33 22 11  r1 += 0x11223344 // little
+         dst_reg src_reg
+  07     1       0        00 00  11 22 33 44  r1 += 0x11223344 // big
+
+Note that most instructions do not use all of the fields.
+Unused fields shall be cleared to zero.
+
+As discussed below in `64-bit immediate instructions`_, a 64-bit immediate
+instruction uses a 64-bit immediate value that is constructed as follows.
+The 64 bits following the basic instruction contain a pseudo instruction
+using the same format but with opcode, dst_reg, src_reg, and offset all set to zero,
+and imm containing the high 32 bits of the immediate value.
+
+This is depicted in the following figure::
+
+        basic_instruction
+  .-----------------------------.
+  |                             |
+  code:8 regs:8 offset:16 imm:32 unused:32 imm:32
+                                 |              |
+                                 '--------------'
+                                pseudo instruction
+
+Thus the 64-bit immediate value is constructed as follows:
+
+  imm64 = (next_imm << 32) | imm
+
+where 'next_imm' refers to the imm value of the pseudo instruction
+following the basic instruction.  The unused bytes in the pseudo
+instruction are reserved and shall be cleared to zero.
+
+Instruction classes
+-------------------
+
+The three LSB bits of the 'opcode' field store the instruction class:
+
+=========  =====  ===============================  ===================================
+class      value  description                      reference
+=========  =====  ===============================  ===================================
+BPF_LD     0x00   non-standard load operations     `Load and store instructions`_
+BPF_LDX    0x01   load into register operations    `Load and store instructions`_
+BPF_ST     0x02   store from immediate operations  `Load and store instructions`_
+BPF_STX    0x03   store from register operations   `Load and store instructions`_
+BPF_ALU    0x04   32-bit arithmetic operations     `Arithmetic and jump instructions`_
+BPF_JMP    0x05   64-bit jump operations           `Arithmetic and jump instructions`_
+BPF_JMP32  0x06   32-bit jump operations           `Arithmetic and jump instructions`_
+BPF_ALU64  0x07   64-bit arithmetic operations     `Arithmetic and jump instructions`_
+=========  =====  ===============================  ===================================
+
+Arithmetic and jump instructions
+================================
+
+For arithmetic and jump instructions (``BPF_ALU``, ``BPF_ALU64``, ``BPF_JMP`` and
+``BPF_JMP32``), the 8-bit 'opcode' field is divided into three parts:
+
+==============  ======  =================
+4 bits (MSB)    1 bit   3 bits (LSB)
+==============  ======  =================
+code            source  instruction class
+==============  ======  =================
+
+**code**
+  the operation code, whose meaning varies by instruction class
+
+**source**
+  the source operand location, which unless otherwise specified is one of:
+
+  ======  =====  ==============================================
+  source  value  description
+  ======  =====  ==============================================
+  BPF_K   0x00   use 32-bit 'imm' value as source operand
+  BPF_X   0x08   use 'src_reg' register value as source operand
+  ======  =====  ==============================================
+
+**instruction class**
+  the instruction class (see `Instruction classes`_)
+
+Arithmetic instructions
+-----------------------
+
+``BPF_ALU`` uses 32-bit wide operands while ``BPF_ALU64`` uses 64-bit wide operands for
+otherwise identical operations.
+The 'code' field encodes the operation as below, where 'src' and 'dst' refer
+to the values of the source and destination registers, respectively.
+
+========  =====  ==========================================================
+code      value  description
+========  =====  ==========================================================
+BPF_ADD   0x00   dst += src
+BPF_SUB   0x10   dst -= src
+BPF_MUL   0x20   dst \*= src
+BPF_DIV   0x30   dst = (src != 0) ? (dst / src) : 0
+BPF_OR    0x40   dst \|= src
+BPF_AND   0x50   dst &= src
+BPF_LSH   0x60   dst <<= (src & mask)
+BPF_RSH   0x70   dst >>= (src & mask)
+BPF_NEG   0x80   dst = -src
+BPF_MOD   0x90   dst = (src != 0) ? (dst % src) : dst
+BPF_XOR   0xa0   dst ^= src
+BPF_MOV   0xb0   dst = src
+BPF_ARSH  0xc0   sign extending dst >>= (src & mask)
+BPF_END   0xd0   byte swap operations (see `Byte swap instructions`_ below)
+========  =====  ==========================================================
+
+Underflow and overflow are allowed during arithmetic operations, meaning
+the 64-bit or 32-bit value will wrap. If eBPF program execution would
+result in division by zero, the destination register is instead set to zero.
+If execution would result in modulo by zero, for ``BPF_ALU64`` the value of
+the destination register is unchanged whereas for ``BPF_ALU`` the upper
+32 bits of the destination register are zeroed.
+
+``BPF_ADD | BPF_X | BPF_ALU`` means::
+
+  dst = (u32) ((u32) dst + (u32) src)
+
+where '(u32)' indicates that the upper 32 bits are zeroed.
+
+``BPF_ADD | BPF_X | BPF_ALU64`` means::
+
+  dst = dst + src
+
+``BPF_XOR | BPF_K | BPF_ALU`` means::
+
+  dst = (u32) dst ^ (u32) imm32
+
+``BPF_XOR | BPF_K | BPF_ALU64`` means::
+
+  dst = dst ^ imm32
+
+Also note that the division and modulo operations are unsigned. Thus, for
+``BPF_ALU``, 'imm' is first interpreted as an unsigned 32-bit value, whereas
+for ``BPF_ALU64``, 'imm' is first sign extended to 64 bits and the result
+interpreted as an unsigned 64-bit value. There are no instructions for
+signed division or modulo.
+
+Shift operations use a mask of 0x3F (63) for 64-bit operations and 0x1F (31)
+for 32-bit operations.
+
+Byte swap instructions
+~~~~~~~~~~~~~~~~~~~~~~
+
+The byte swap instructions use an instruction class of ``BPF_ALU`` and a 4-bit
+'code' field of ``BPF_END``.
+
+The byte swap instructions operate on the destination register
+only and do not use a separate source register or immediate value.
+
+The 1-bit source operand field in the opcode is used to select what byte
+order the operation convert from or to:
+
+=========  =====  =================================================
+source     value  description
+=========  =====  =================================================
+BPF_TO_LE  0x00   convert between host byte order and little endian
+BPF_TO_BE  0x08   convert between host byte order and big endian
+=========  =====  =================================================
+
+The 'imm' field encodes the width of the swap operations.  The following widths
+are supported: 16, 32 and 64.
+
+Examples:
+
+``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16 means::
+
+  dst = htole16(dst)
+
+``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 64 means::
+
+  dst = htobe64(dst)
+
+Jump instructions
+-----------------
+
+``BPF_JMP32`` uses 32-bit wide operands while ``BPF_JMP`` uses 64-bit wide operands for
+otherwise identical operations.
+The 'code' field encodes the operation as below:
+
+========  =====  ===  ===========================================  =========================================
+code      value  src  description                                  notes
+========  =====  ===  ===========================================  =========================================
+BPF_JA    0x0    0x0  PC += offset                                 BPF_JMP only
+BPF_JEQ   0x1    any  PC += offset if dst == src
+BPF_JGT   0x2    any  PC += offset if dst > src                    unsigned
+BPF_JGE   0x3    any  PC += offset if dst >= src                   unsigned
+BPF_JSET  0x4    any  PC += offset if dst & src
+BPF_JNE   0x5    any  PC += offset if dst != src
+BPF_JSGT  0x6    any  PC += offset if dst > src                    signed
+BPF_JSGE  0x7    any  PC += offset if dst >= src                   signed
+BPF_CALL  0x8    0x0  call helper function by address              see `Helper functions`_
+BPF_CALL  0x8    0x1  call PC += offset                            see `Program-local functions`_
+BPF_CALL  0x8    0x2  call helper function by BTF ID               see `Helper functions`_
+BPF_EXIT  0x9    0x0  return                                       BPF_JMP only
+BPF_JLT   0xa    any  PC += offset if dst < src                    unsigned
+BPF_JLE   0xb    any  PC += offset if dst <= src                   unsigned
+BPF_JSLT  0xc    any  PC += offset if dst < src                    signed
+BPF_JSLE  0xd    any  PC += offset if dst <= src                   signed
+========  =====  ===  ===========================================  =========================================
+
+The eBPF program needs to store the return value into register R0 before doing a
+``BPF_EXIT``.
+
+Example:
+
+``BPF_JSGE | BPF_X | BPF_JMP32`` (0x7e) means::
+
+  if (s32)dst s>= (s32)src goto +offset
+
+where 's>=' indicates a signed '>=' comparison.
+
+Helper functions
+~~~~~~~~~~~~~~~~
+
+Helper functions are a concept whereby BPF programs can call into a
+set of function calls exposed by the underlying platform.
+
+Historically, each helper function was identified by an address
+encoded in the imm field.  The available helper functions may differ
+for each program type, but address values are unique across all program types.
+
+Platforms that support the BPF Type Format (BTF) support identifying
+a helper function by a BTF ID encoded in the imm field, where the BTF ID
+identifies the helper name and type.
+
+Program-local functions
+~~~~~~~~~~~~~~~~~~~~~~~
+Program-local functions are functions exposed by the same BPF program as the
+caller, and are referenced by offset from the call instruction, similar to
+``BPF_JA``.  A ``BPF_EXIT`` within the program-local function will return to
+the caller.
+
+Load and store instructions
+===========================
+
+For load and store instructions (``BPF_LD``, ``BPF_LDX``, ``BPF_ST``, and ``BPF_STX``), the
+8-bit 'opcode' field is divided as:
+
+============  ======  =================
+3 bits (MSB)  2 bits  3 bits (LSB)
+============  ======  =================
+mode          size    instruction class
+============  ======  =================
+
+The mode modifier is one of:
+
+  =============  =====  ====================================  =============
+  mode modifier  value  description                           reference
+  =============  =====  ====================================  =============
+  BPF_IMM        0x00   64-bit immediate instructions         `64-bit immediate instructions`_
+  BPF_ABS        0x20   legacy BPF packet access (absolute)   `Legacy BPF Packet access instructions`_
+  BPF_IND        0x40   legacy BPF packet access (indirect)   `Legacy BPF Packet access instructions`_
+  BPF_MEM        0x60   regular load and store operations     `Regular load and store operations`_
+  BPF_ATOMIC     0xc0   atomic operations                     `Atomic operations`_
+  =============  =====  ====================================  =============
+
+The size modifier is one of:
+
+  =============  =====  =====================
+  size modifier  value  description
+  =============  =====  =====================
+  BPF_W          0x00   word        (4 bytes)
+  BPF_H          0x08   half word   (2 bytes)
+  BPF_B          0x10   byte
+  BPF_DW         0x18   double word (8 bytes)
+  =============  =====  =====================
+
+Regular load and store operations
+---------------------------------
+
+The ``BPF_MEM`` mode modifier is used to encode regular load and store
+instructions that transfer data between a register and memory.
+
+``BPF_MEM | <size> | BPF_STX`` means::
+
+  *(size *) (dst + offset) = src
+
+``BPF_MEM | <size> | BPF_ST`` means::
+
+  *(size *) (dst + offset) = imm32
+
+``BPF_MEM | <size> | BPF_LDX`` means::
+
+  dst = *(size *) (src + offset)
+
+Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``.
+
+Atomic operations
+-----------------
+
+Atomic operations are operations that operate on memory and can not be
+interrupted or corrupted by other access to the same memory region
+by other eBPF programs or means outside of this specification.
+
+All atomic operations supported by eBPF are encoded as store operations
+that use the ``BPF_ATOMIC`` mode modifier as follows:
+
+* ``BPF_ATOMIC | BPF_W | BPF_STX`` for 32-bit operations
+* ``BPF_ATOMIC | BPF_DW | BPF_STX`` for 64-bit operations
+* 8-bit and 16-bit wide atomic operations are not supported.
+
+The 'imm' field is used to encode the actual atomic operation.
+Simple atomic operation use a subset of the values defined to encode
+arithmetic operations in the 'imm' field to encode the atomic operation:
+
+========  =====  ===========
+imm       value  description
+========  =====  ===========
+BPF_ADD   0x00   atomic add
+BPF_OR    0x40   atomic or
+BPF_AND   0x50   atomic and
+BPF_XOR   0xa0   atomic xor
+========  =====  ===========
+
+
+``BPF_ATOMIC | BPF_W  | BPF_STX`` with 'imm' = BPF_ADD means::
+
+  *(u32 *)(dst + offset) += src
+
+``BPF_ATOMIC | BPF_DW | BPF_STX`` with 'imm' = BPF ADD means::
+
+  *(u64 *)(dst + offset) += src
+
+In addition to the simple atomic operations, there also is a modifier and
+two complex atomic operations:
+
+===========  ================  ===========================
+imm          value             description
+===========  ================  ===========================
+BPF_FETCH    0x01              modifier: return old value
+BPF_XCHG     0xe0 | BPF_FETCH  atomic exchange
+BPF_CMPXCHG  0xf0 | BPF_FETCH  atomic compare and exchange
+===========  ================  ===========================
+
+The ``BPF_FETCH`` modifier is optional for simple atomic operations, and
+always set for the complex atomic operations.  If the ``BPF_FETCH`` flag
+is set, then the operation also overwrites ``src`` with the value that
+was in memory before it was modified.
+
+The ``BPF_XCHG`` operation atomically exchanges ``src`` with the value
+addressed by ``dst + offset``.
+
+The ``BPF_CMPXCHG`` operation atomically compares the value addressed by
+``dst + offset`` with ``R0``. If they match, the value addressed by
+``dst + offset`` is replaced with ``src``. In either case, the
+value that was at ``dst + offset`` before the operation is zero-extended
+and loaded back to ``R0``.
+
+64-bit immediate instructions
+-----------------------------
+
+Instructions with the ``BPF_IMM`` 'mode' modifier use the wide instruction
+encoding defined in `Instruction encoding`_, and use the 'src' field of the
+basic instruction to hold an opcode subtype.
+
+The following table defines a set of ``BPF_IMM | BPF_DW | BPF_LD`` instructions
+with opcode subtypes in the 'src' field, using new terms such as "map"
+defined further below:
+
+=========================  ======  ===  =========================================  ===========  ==============
+opcode construction        opcode  src  pseudocode                                 imm type     dst type
+=========================  ======  ===  =========================================  ===========  ==============
+BPF_IMM | BPF_DW | BPF_LD  0x18    0x0  dst = imm64                                integer      integer
+BPF_IMM | BPF_DW | BPF_LD  0x18    0x1  dst = map_by_fd(imm)                       map fd       map
+BPF_IMM | BPF_DW | BPF_LD  0x18    0x2  dst = map_val(map_by_fd(imm)) + next_imm   map fd       data pointer
+BPF_IMM | BPF_DW | BPF_LD  0x18    0x3  dst = var_addr(imm)                        variable id  data pointer
+BPF_IMM | BPF_DW | BPF_LD  0x18    0x4  dst = code_addr(imm)                       integer      code pointer
+BPF_IMM | BPF_DW | BPF_LD  0x18    0x5  dst = map_by_idx(imm)                      map index    map
+BPF_IMM | BPF_DW | BPF_LD  0x18    0x6  dst = map_val(map_by_idx(imm)) + next_imm  map index    data pointer
+=========================  ======  ===  =========================================  ===========  ==============
+
+where
+
+* map_by_fd(imm) means to convert a 32-bit file descriptor into an address of a map (see `Maps`_)
+* map_by_idx(imm) means to convert a 32-bit index into an address of a map
+* map_val(map) gets the address of the first value in a given map
+* var_addr(imm) gets the address of a platform variable (see `Platform Variables`_) with a given id
+* code_addr(imm) gets the address of the instruction at a specified relative offset in number of (64-bit) instructions
+* the 'imm type' can be used by disassemblers for display
+* the 'dst type' can be used for verification and JIT compilation purposes
+
+Maps
+~~~~
+
+Maps are shared memory regions accessible by eBPF programs on some platforms.
+A map can have various semantics as defined in a separate document, and may or
+may not have a single contiguous memory region, but the 'map_val(map)' is
+currently only defined for maps that do have a single contiguous memory region.
+
+Each map can have a file descriptor (fd) if supported by the platform, where
+'map_by_fd(imm)' means to get the map with the specified file descriptor. Each
+BPF program can also be defined to use a set of maps associated with the
+program at load time, and 'map_by_idx(imm)' means to get the map with the given
+index in the set associated with the BPF program containing the instruction.
+
+Platform Variables
+~~~~~~~~~~~~~~~~~~
+
+Platform variables are memory regions, identified by integer ids, exposed by
+the runtime and accessible by BPF programs on some platforms.  The
+'var_addr(imm)' operation means to get the address of the memory region
+identified by the given id.
+
+Legacy BPF Packet access instructions
+-------------------------------------
+
+eBPF previously introduced special instructions for access to packet data that were
+carried over from classic BPF. However, these instructions are
+deprecated and should no longer be used.
diff --git a/Documentation/bpf/standardization/linux-notes.rst b/Documentation/bpf/standardization/linux-notes.rst
new file mode 100644 (file)
index 0000000..00d2693
--- /dev/null
@@ -0,0 +1,84 @@
+.. contents::
+.. sectnum::
+
+==========================
+Linux implementation notes
+==========================
+
+This document provides more details specific to the Linux kernel implementation of the eBPF instruction set.
+
+Byte swap instructions
+======================
+
+``BPF_FROM_LE`` and ``BPF_FROM_BE`` exist as aliases for ``BPF_TO_LE`` and ``BPF_TO_BE`` respectively.
+
+Jump instructions
+=================
+
+``BPF_CALL | BPF_X | BPF_JMP`` (0x8d), where the helper function
+integer would be read from a specified register, is not currently supported
+by the verifier.  Any programs with this instruction will fail to load
+until such support is added.
+
+Maps
+====
+
+Linux only supports the 'map_val(map)' operation on array maps with a single element.
+
+Linux uses an fd_array to store maps associated with a BPF program. Thus,
+map_by_idx(imm) uses the fd at that index in the array.
+
+Variables
+=========
+
+The following 64-bit immediate instruction specifies that a variable address,
+which corresponds to some integer stored in the 'imm' field, should be loaded:
+
+=========================  ======  ===  =========================================  ===========  ==============
+opcode construction        opcode  src  pseudocode                                 imm type     dst type
+=========================  ======  ===  =========================================  ===========  ==============
+BPF_IMM | BPF_DW | BPF_LD  0x18    0x3  dst = var_addr(imm)                        variable id  data pointer
+=========================  ======  ===  =========================================  ===========  ==============
+
+On Linux, this integer is a BTF ID.
+
+Legacy BPF Packet access instructions
+=====================================
+
+As mentioned in the `ISA standard documentation
+<instruction-set.html#legacy-bpf-packet-access-instructions>`_,
+Linux has special eBPF instructions for access to packet data that have been
+carried over from classic BPF to retain the performance of legacy socket
+filters running in the eBPF interpreter.
+
+The instructions come in two forms: ``BPF_ABS | <size> | BPF_LD`` and
+``BPF_IND | <size> | BPF_LD``.
+
+These instructions are used to access packet data and can only be used when
+the program context is a pointer to a networking packet.  ``BPF_ABS``
+accesses packet data at an absolute offset specified by the immediate data
+and ``BPF_IND`` access packet data at an offset that includes the value of
+a register in addition to the immediate data.
+
+These instructions have seven implicit operands:
+
+* Register R6 is an implicit input that must contain a pointer to a
+  struct sk_buff.
+* Register R0 is an implicit output which contains the data fetched from
+  the packet.
+* Registers R1-R5 are scratch registers that are clobbered by the
+  instruction.
+
+These instructions have an implicit program exit condition as well. If an
+eBPF program attempts access data beyond the packet boundary, the
+program execution will be aborted.
+
+``BPF_ABS | BPF_W | BPF_LD`` (0x20) means::
+
+  R0 = ntohl(*(u32 *) ((struct sk_buff *) R6->data + imm))
+
+where ``ntohl()`` converts a 32-bit value from network byte order to host byte order.
+
+``BPF_IND | BPF_W | BPF_LD`` (0x40) means::
+
+  R0 = ntohl(*(u32 *) ((struct sk_buff *) R6->data + src + imm))
index dfbb271..d1fc5d2 100644 (file)
@@ -3694,7 +3694,7 @@ R:        David Vernet <void@manifault.com>
 L:     bpf@vger.kernel.org
 L:     bpf@ietf.org
 S:     Maintained
-F:     Documentation/bpf/instruction-set.rst
+F:     Documentation/bpf/standardization/
 
 BPF [GENERAL] (Safe Dynamic Programs and Tools)
 M:     Alexei Starovoitov <ast@kernel.org>
index 438adb6..5ab531b 100644 (file)
@@ -1857,59 +1857,177 @@ emit_jmp:
        return proglen;
 }
 
-static void save_regs(const struct btf_func_model *m, u8 **prog, int nr_regs,
-                     int stack_size)
+static void clean_stack_garbage(const struct btf_func_model *m,
+                               u8 **pprog, int nr_stack_slots,
+                               int stack_size)
 {
-       int i, j, arg_size;
-       bool next_same_struct = false;
+       int arg_size, off;
+       u8 *prog;
+
+       /* Generally speaking, the compiler will pass the arguments
+        * on-stack with "push" instruction, which will take 8-byte
+        * on the stack. In this case, there won't be garbage values
+        * while we copy the arguments from origin stack frame to current
+        * in BPF_DW.
+        *
+        * However, sometimes the compiler will only allocate 4-byte on
+        * the stack for the arguments. For now, this case will only
+        * happen if there is only one argument on-stack and its size
+        * not more than 4 byte. In this case, there will be garbage
+        * values on the upper 4-byte where we store the argument on
+        * current stack frame.
+        *
+        * arguments on origin stack:
+        *
+        * stack_arg_1(4-byte) xxx(4-byte)
+        *
+        * what we copy:
+        *
+        * stack_arg_1(8-byte): stack_arg_1(origin) xxx
+        *
+        * and the xxx is the garbage values which we should clean here.
+        */
+       if (nr_stack_slots != 1)
+               return;
+
+       /* the size of the last argument */
+       arg_size = m->arg_size[m->nr_args - 1];
+       if (arg_size <= 4) {
+               off = -(stack_size - 4);
+               prog = *pprog;
+               /* mov DWORD PTR [rbp + off], 0 */
+               if (!is_imm8(off))
+                       EMIT2_off32(0xC7, 0x85, off);
+               else
+                       EMIT3(0xC7, 0x45, off);
+               EMIT(0, 4);
+               *pprog = prog;
+       }
+}
+
+/* get the count of the regs that are used to pass arguments */
+static int get_nr_used_regs(const struct btf_func_model *m)
+{
+       int i, arg_regs, nr_used_regs = 0;
+
+       for (i = 0; i < min_t(int, m->nr_args, MAX_BPF_FUNC_ARGS); i++) {
+               arg_regs = (m->arg_size[i] + 7) / 8;
+               if (nr_used_regs + arg_regs <= 6)
+                       nr_used_regs += arg_regs;
+
+               if (nr_used_regs >= 6)
+                       break;
+       }
+
+       return nr_used_regs;
+}
+
+static void save_args(const struct btf_func_model *m, u8 **prog,
+                     int stack_size, bool for_call_origin)
+{
+       int arg_regs, first_off, nr_regs = 0, nr_stack_slots = 0;
+       int i, j;
 
        /* Store function arguments to stack.
         * For a function that accepts two pointers the sequence will be:
         * mov QWORD PTR [rbp-0x10],rdi
         * mov QWORD PTR [rbp-0x8],rsi
         */
-       for (i = 0, j = 0; i < min(nr_regs, 6); i++) {
-               /* The arg_size is at most 16 bytes, enforced by the verifier. */
-               arg_size = m->arg_size[j];
-               if (arg_size > 8) {
-                       arg_size = 8;
-                       next_same_struct = !next_same_struct;
-               }
+       for (i = 0; i < min_t(int, m->nr_args, MAX_BPF_FUNC_ARGS); i++) {
+               arg_regs = (m->arg_size[i] + 7) / 8;
 
-               emit_stx(prog, bytes_to_bpf_size(arg_size),
-                        BPF_REG_FP,
-                        i == 5 ? X86_REG_R9 : BPF_REG_1 + i,
-                        -(stack_size - i * 8));
+               /* According to the research of Yonghong, struct members
+                * should be all in register or all on the stack.
+                * Meanwhile, the compiler will pass the argument on regs
+                * if the remaining regs can hold the argument.
+                *
+                * Disorder of the args can happen. For example:
+                *
+                * struct foo_struct {
+                *     long a;
+                *     int b;
+                * };
+                * int foo(char, char, char, char, char, struct foo_struct,
+                *         char);
+                *
+                * the arg1-5,arg7 will be passed by regs, and arg6 will
+                * by stack.
+                */
+               if (nr_regs + arg_regs > 6) {
+                       /* copy function arguments from origin stack frame
+                        * into current stack frame.
+                        *
+                        * The starting address of the arguments on-stack
+                        * is:
+                        *   rbp + 8(push rbp) +
+                        *   8(return addr of origin call) +
+                        *   8(return addr of the caller)
+                        * which means: rbp + 24
+                        */
+                       for (j = 0; j < arg_regs; j++) {
+                               emit_ldx(prog, BPF_DW, BPF_REG_0, BPF_REG_FP,
+                                        nr_stack_slots * 8 + 0x18);
+                               emit_stx(prog, BPF_DW, BPF_REG_FP, BPF_REG_0,
+                                        -stack_size);
+
+                               if (!nr_stack_slots)
+                                       first_off = stack_size;
+                               stack_size -= 8;
+                               nr_stack_slots++;
+                       }
+               } else {
+                       /* Only copy the arguments on-stack to current
+                        * 'stack_size' and ignore the regs, used to
+                        * prepare the arguments on-stack for orign call.
+                        */
+                       if (for_call_origin) {
+                               nr_regs += arg_regs;
+                               continue;
+                       }
 
-               j = next_same_struct ? j : j + 1;
+                       /* copy the arguments from regs into stack */
+                       for (j = 0; j < arg_regs; j++) {
+                               emit_stx(prog, BPF_DW, BPF_REG_FP,
+                                        nr_regs == 5 ? X86_REG_R9 : BPF_REG_1 + nr_regs,
+                                        -stack_size);
+                               stack_size -= 8;
+                               nr_regs++;
+                       }
+               }
        }
+
+       clean_stack_garbage(m, prog, nr_stack_slots, first_off);
 }
 
-static void restore_regs(const struct btf_func_model *m, u8 **prog, int nr_regs,
+static void restore_regs(const struct btf_func_model *m, u8 **prog,
                         int stack_size)
 {
-       int i, j, arg_size;
-       bool next_same_struct = false;
+       int i, j, arg_regs, nr_regs = 0;
 
        /* Restore function arguments from stack.
         * For a function that accepts two pointers the sequence will be:
         * EMIT4(0x48, 0x8B, 0x7D, 0xF0); mov rdi,QWORD PTR [rbp-0x10]
         * EMIT4(0x48, 0x8B, 0x75, 0xF8); mov rsi,QWORD PTR [rbp-0x8]
+        *
+        * The logic here is similar to what we do in save_args()
         */
-       for (i = 0, j = 0; i < min(nr_regs, 6); i++) {
-               /* The arg_size is at most 16 bytes, enforced by the verifier. */
-               arg_size = m->arg_size[j];
-               if (arg_size > 8) {
-                       arg_size = 8;
-                       next_same_struct = !next_same_struct;
+       for (i = 0; i < min_t(int, m->nr_args, MAX_BPF_FUNC_ARGS); i++) {
+               arg_regs = (m->arg_size[i] + 7) / 8;
+               if (nr_regs + arg_regs <= 6) {
+                       for (j = 0; j < arg_regs; j++) {
+                               emit_ldx(prog, BPF_DW,
+                                        nr_regs == 5 ? X86_REG_R9 : BPF_REG_1 + nr_regs,
+                                        BPF_REG_FP,
+                                        -stack_size);
+                               stack_size -= 8;
+                               nr_regs++;
+                       }
+               } else {
+                       stack_size -= 8 * arg_regs;
                }
 
-               emit_ldx(prog, bytes_to_bpf_size(arg_size),
-                        i == 5 ? X86_REG_R9 : BPF_REG_1 + i,
-                        BPF_REG_FP,
-                        -(stack_size - i * 8));
-
-               j = next_same_struct ? j : j + 1;
+               if (nr_regs >= 6)
+                       break;
        }
 }
 
@@ -1938,7 +2056,10 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
        /* arg1: mov rdi, progs[i] */
        emit_mov_imm64(&prog, BPF_REG_1, (long) p >> 32, (u32) (long) p);
        /* arg2: lea rsi, [rbp - ctx_cookie_off] */
-       EMIT4(0x48, 0x8D, 0x75, -run_ctx_off);
+       if (!is_imm8(-run_ctx_off))
+               EMIT3_off32(0x48, 0x8D, 0xB5, -run_ctx_off);
+       else
+               EMIT4(0x48, 0x8D, 0x75, -run_ctx_off);
 
        if (emit_rsb_call(&prog, bpf_trampoline_enter(p), prog))
                return -EINVAL;
@@ -1954,7 +2075,10 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
        emit_nops(&prog, 2);
 
        /* arg1: lea rdi, [rbp - stack_size] */
-       EMIT4(0x48, 0x8D, 0x7D, -stack_size);
+       if (!is_imm8(-stack_size))
+               EMIT3_off32(0x48, 0x8D, 0xBD, -stack_size);
+       else
+               EMIT4(0x48, 0x8D, 0x7D, -stack_size);
        /* arg2: progs[i]->insnsi for interpreter */
        if (!p->jited)
                emit_mov_imm64(&prog, BPF_REG_2,
@@ -1984,7 +2108,10 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
        /* arg2: mov rsi, rbx <- start time in nsec */
        emit_mov_reg(&prog, true, BPF_REG_2, BPF_REG_6);
        /* arg3: lea rdx, [rbp - run_ctx_off] */
-       EMIT4(0x48, 0x8D, 0x55, -run_ctx_off);
+       if (!is_imm8(-run_ctx_off))
+               EMIT3_off32(0x48, 0x8D, 0x95, -run_ctx_off);
+       else
+               EMIT4(0x48, 0x8D, 0x55, -run_ctx_off);
        if (emit_rsb_call(&prog, bpf_trampoline_exit(p), prog))
                return -EINVAL;
 
@@ -2136,7 +2263,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
                                void *func_addr)
 {
        int i, ret, nr_regs = m->nr_args, stack_size = 0;
-       int regs_off, nregs_off, ip_off, run_ctx_off;
+       int regs_off, nregs_off, ip_off, run_ctx_off, arg_stack_off, rbx_off;
        struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
        struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
        struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
@@ -2150,8 +2277,10 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
                if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG)
                        nr_regs += (m->arg_size[i] + 7) / 8 - 1;
 
-       /* x86-64 supports up to 6 arguments. 7+ can be added in the future */
-       if (nr_regs > 6)
+       /* x86-64 supports up to MAX_BPF_FUNC_ARGS arguments. 1-6
+        * are passed through regs, the remains are through stack.
+        */
+       if (nr_regs > MAX_BPF_FUNC_ARGS)
                return -ENOTSUPP;
 
        /* Generated trampoline stack layout:
@@ -2170,7 +2299,14 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
         *
         * RBP - ip_off    [ traced function ]  BPF_TRAMP_F_IP_ARG flag
         *
+        * RBP - rbx_off   [ rbx value       ]  always
+        *
         * RBP - run_ctx_off [ bpf_tramp_run_ctx ]
+        *
+        *                     [ stack_argN ]  BPF_TRAMP_F_CALL_ORIG
+        *                     [ ...        ]
+        *                     [ stack_arg2 ]
+        * RBP - arg_stack_off [ stack_arg1 ]
         */
 
        /* room for return value of orig_call or fentry prog */
@@ -2190,9 +2326,26 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
 
        ip_off = stack_size;
 
+       stack_size += 8;
+       rbx_off = stack_size;
+
        stack_size += (sizeof(struct bpf_tramp_run_ctx) + 7) & ~0x7;
        run_ctx_off = stack_size;
 
+       if (nr_regs > 6 && (flags & BPF_TRAMP_F_CALL_ORIG)) {
+               /* the space that used to pass arguments on-stack */
+               stack_size += (nr_regs - get_nr_used_regs(m)) * 8;
+               /* make sure the stack pointer is 16-byte aligned if we
+                * need pass arguments on stack, which means
+                *  [stack_size + 8(rbp) + 8(rip) + 8(origin rip)]
+                * should be 16-byte aligned. Following code depend on
+                * that stack_size is already 8-byte aligned.
+                */
+               stack_size += (stack_size % 16) ? 0 : 8;
+       }
+
+       arg_stack_off = stack_size;
+
        if (flags & BPF_TRAMP_F_SKIP_FRAME) {
                /* skip patched call instruction and point orig_call to actual
                 * body of the kernel function.
@@ -2212,8 +2365,14 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
        x86_call_depth_emit_accounting(&prog, NULL);
        EMIT1(0x55);             /* push rbp */
        EMIT3(0x48, 0x89, 0xE5); /* mov rbp, rsp */
-       EMIT4(0x48, 0x83, 0xEC, stack_size); /* sub rsp, stack_size */
-       EMIT1(0x53);             /* push rbx */
+       if (!is_imm8(stack_size))
+               /* sub rsp, stack_size */
+               EMIT3_off32(0x48, 0x81, 0xEC, stack_size);
+       else
+               /* sub rsp, stack_size */
+               EMIT4(0x48, 0x83, 0xEC, stack_size);
+       /* mov QWORD PTR [rbp - rbx_off], rbx */
+       emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_6, -rbx_off);
 
        /* Store number of argument registers of the traced function:
         *   mov rax, nr_regs
@@ -2231,7 +2390,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
                emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -ip_off);
        }
 
-       save_regs(m, &prog, nr_regs, regs_off);
+       save_args(m, &prog, regs_off, false);
 
        if (flags & BPF_TRAMP_F_CALL_ORIG) {
                /* arg1: mov rdi, im */
@@ -2261,7 +2420,8 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
        }
 
        if (flags & BPF_TRAMP_F_CALL_ORIG) {
-               restore_regs(m, &prog, nr_regs, regs_off);
+               restore_regs(m, &prog, regs_off);
+               save_args(m, &prog, arg_stack_off, true);
 
                if (flags & BPF_TRAMP_F_ORIG_STACK) {
                        emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, 8);
@@ -2302,7 +2462,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
                }
 
        if (flags & BPF_TRAMP_F_RESTORE_REGS)
-               restore_regs(m, &prog, nr_regs, regs_off);
+               restore_regs(m, &prog, regs_off);
 
        /* This needs to be done regardless. If there were fmod_ret programs,
         * the return value is only updated on the stack and still needs to be
@@ -2321,7 +2481,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
        if (save_ret)
                emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, -8);
 
-       EMIT1(0x5B); /* pop rbx */
+       emit_ldx(&prog, BPF_DW, BPF_REG_6, BPF_REG_FP, -rbx_off);
        EMIT1(0xC9); /* leave */
        if (flags & BPF_TRAMP_F_SKIP_FRAME)
                /* skip our return address and return to parent */
index a12edcf..43b99b5 100644 (file)
@@ -58,7 +58,7 @@ entrypoints.lskel.h: $(OUTPUT)/entrypoints.bpf.o | $(BPFTOOL)
 
 $(OUTPUT)/entrypoints.bpf.o: entrypoints.bpf.c $(OUTPUT)/vmlinux.h $(BPFOBJ) | $(OUTPUT)
        $(call msg,BPF,$@)
-       $(Q)$(CLANG) -g -O2 -target bpf $(INCLUDES)                           \
+       $(Q)$(CLANG) -g -O2 --target=bpf $(INCLUDES)                          \
                 -c $(filter %.c,$^) -o $@ &&                                 \
        $(LLVM_STRIP) -g $@
 
index 57e9e10..8506690 100644 (file)
@@ -199,9 +199,9 @@ static inline bool cgroup_bpf_sock_enabled(struct sock *sk,
 #define BPF_CGROUP_RUN_PROG_INET_EGRESS(sk, skb)                              \
 ({                                                                            \
        int __ret = 0;                                                         \
-       if (cgroup_bpf_enabled(CGROUP_INET_EGRESS) && sk && sk == skb->sk) { \
+       if (cgroup_bpf_enabled(CGROUP_INET_EGRESS) && sk) {                    \
                typeof(sk) __sk = sk_to_full_sk(sk);                           \
-               if (sk_fullsock(__sk) &&                                       \
+               if (sk_fullsock(__sk) && __sk == skb_to_full_sk(skb) &&        \
                    cgroup_bpf_sock_enabled(__sk, CGROUP_INET_EGRESS))         \
                        __ret = __cgroup_bpf_run_filter_skb(__sk, skb,         \
                                                      CGROUP_INET_EGRESS); \
index f588958..360433f 100644 (file)
@@ -275,6 +275,7 @@ struct bpf_map {
        } owner;
        bool bypass_spec_v1;
        bool frozen; /* write-once; write-protected by freeze_mutex */
+       s64 __percpu *elem_count;
 };
 
 static inline const char *btf_field_type_name(enum btf_field_type type)
@@ -2040,6 +2041,35 @@ bpf_map_alloc_percpu(const struct bpf_map *map, size_t size, size_t align,
 }
 #endif
 
+static inline int
+bpf_map_init_elem_count(struct bpf_map *map)
+{
+       size_t size = sizeof(*map->elem_count), align = size;
+       gfp_t flags = GFP_USER | __GFP_NOWARN;
+
+       map->elem_count = bpf_map_alloc_percpu(map, size, align, flags);
+       if (!map->elem_count)
+               return -ENOMEM;
+
+       return 0;
+}
+
+static inline void
+bpf_map_free_elem_count(struct bpf_map *map)
+{
+       free_percpu(map->elem_count);
+}
+
+static inline void bpf_map_inc_elem_count(struct bpf_map *map)
+{
+       this_cpu_inc(*map->elem_count);
+}
+
+static inline void bpf_map_dec_elem_count(struct bpf_map *map)
+{
+       this_cpu_dec(*map->elem_count);
+}
+
 extern int sysctl_unprivileged_bpf_disabled;
 
 static inline bool bpf_allow_ptr_leaks(void)
index 3929be5..d644bbb 100644 (file)
@@ -27,10 +27,12 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma);
 /* kmalloc/kfree equivalent: */
 void *bpf_mem_alloc(struct bpf_mem_alloc *ma, size_t size);
 void bpf_mem_free(struct bpf_mem_alloc *ma, void *ptr);
+void bpf_mem_free_rcu(struct bpf_mem_alloc *ma, void *ptr);
 
 /* kmem_cache_alloc/free equivalent: */
 void *bpf_mem_cache_alloc(struct bpf_mem_alloc *ma);
 void bpf_mem_cache_free(struct bpf_mem_alloc *ma, void *ptr);
+void bpf_mem_cache_free_rcu(struct bpf_mem_alloc *ma, void *ptr);
 void bpf_mem_cache_raw_free(void *ptr);
 void *bpf_mem_cache_alloc_flags(struct bpf_mem_alloc *ma, gfp_t flags);
 
index 7f17acf..7b94929 100644 (file)
@@ -138,6 +138,8 @@ static inline int rcu_needs_cpu(void)
        return 0;
 }
 
+static inline void rcu_request_urgent_qs_task(struct task_struct *t) { }
+
 /*
  * Take advantage of the fact that there is only one CPU, which
  * allows us to ignore virtualization-based context switches.
index 56bccb5..126f6b4 100644 (file)
@@ -21,6 +21,7 @@ void rcu_softirq_qs(void);
 void rcu_note_context_switch(bool preempt);
 int rcu_needs_cpu(void);
 void rcu_cpu_stall_reset(void);
+void rcu_request_urgent_qs_task(struct task_struct *t);
 
 /*
  * Note a virtualization-based context switch.  This is simply a
index 3930e67..e66d04d 100644 (file)
@@ -867,7 +867,8 @@ extern int  perf_uprobe_init(struct perf_event *event,
 extern void perf_uprobe_destroy(struct perf_event *event);
 extern int bpf_get_uprobe_info(const struct perf_event *event,
                               u32 *fd_type, const char **filename,
-                              u64 *probe_offset, bool perf_type_tracepoint);
+                              u64 *probe_offset, u64 *probe_addr,
+                              bool perf_type_tracepoint);
 #endif
 extern int  ftrace_profile_set_filter(struct perf_event *event, int event_id,
                                     char *filter_str);
index 60a9d59..600d0ca 100644 (file)
@@ -1057,6 +1057,16 @@ enum bpf_link_type {
        MAX_BPF_LINK_TYPE,
 };
 
+enum bpf_perf_event_type {
+       BPF_PERF_EVENT_UNSPEC = 0,
+       BPF_PERF_EVENT_UPROBE = 1,
+       BPF_PERF_EVENT_URETPROBE = 2,
+       BPF_PERF_EVENT_KPROBE = 3,
+       BPF_PERF_EVENT_KRETPROBE = 4,
+       BPF_PERF_EVENT_TRACEPOINT = 5,
+       BPF_PERF_EVENT_EVENT = 6,
+};
+
 /* cgroup-bpf attach flags used in BPF_PROG_ATTACH command
  *
  * NONE(default): No further bpf programs allowed in the subtree.
@@ -6439,6 +6449,36 @@ struct bpf_link_info {
                        __s32 priority;
                        __u32 flags;
                } netfilter;
+               struct {
+                       __aligned_u64 addrs;
+                       __u32 count; /* in/out: kprobe_multi function count */
+                       __u32 flags;
+               } kprobe_multi;
+               struct {
+                       __u32 type; /* enum bpf_perf_event_type */
+                       __u32 :32;
+                       union {
+                               struct {
+                                       __aligned_u64 file_name; /* in/out */
+                                       __u32 name_len;
+                                       __u32 offset; /* offset from file_name */
+                               } uprobe; /* BPF_PERF_EVENT_UPROBE, BPF_PERF_EVENT_URETPROBE */
+                               struct {
+                                       __aligned_u64 func_name; /* in/out */
+                                       __u32 name_len;
+                                       __u32 offset; /* offset from func_name */
+                                       __u64 addr;
+                               } kprobe; /* BPF_PERF_EVENT_KPROBE, BPF_PERF_EVENT_KRETPROBE */
+                               struct {
+                                       __aligned_u64 tp_name;   /* in/out */
+                                       __u32 name_len;
+                               } tracepoint; /* BPF_PERF_EVENT_TRACEPOINT */
+                               struct {
+                                       __u64 config;
+                                       __u32 type;
+                               } event; /* BPF_PERF_EVENT_EVENT */
+                       };
+               } perf_event;
        };
 } __attribute__((aligned(8)));
 
index 817204d..ef9581a 100644 (file)
@@ -6133,8 +6133,9 @@ static int btf_struct_walk(struct bpf_verifier_log *log, const struct btf *btf,
        const char *tname, *mname, *tag_value;
        u32 vlen, elem_id, mid;
 
-       *flag = 0;
 again:
+       if (btf_type_is_modifier(t))
+               t = btf_type_skip_modifiers(btf, t->type, NULL);
        tname = __btf_name_by_offset(btf, t->name_off);
        if (!btf_type_is_struct(t)) {
                bpf_log(log, "Type '%s' is not a struct\n", tname);
@@ -6142,6 +6143,14 @@ again:
        }
 
        vlen = btf_type_vlen(t);
+       if (BTF_INFO_KIND(t->info) == BTF_KIND_UNION && vlen != 1 && !(*flag & PTR_UNTRUSTED))
+               /*
+                * walking unions yields untrusted pointers
+                * with exception of __bpf_md_ptr and other
+                * unions with a single member
+                */
+               *flag |= PTR_UNTRUSTED;
+
        if (off + size > t->size) {
                /* If the last element is a variable size array, we may
                 * need to relax the rule.
@@ -6302,15 +6311,6 @@ error:
                 * of this field or inside of this struct
                 */
                if (btf_type_is_struct(mtype)) {
-                       if (BTF_INFO_KIND(mtype->info) == BTF_KIND_UNION &&
-                           btf_type_vlen(mtype) != 1)
-                               /*
-                                * walking unions yields untrusted pointers
-                                * with exception of __bpf_md_ptr and other
-                                * unions with a single member
-                                */
-                               *flag |= PTR_UNTRUSTED;
-
                        /* our field must be inside that union or struct */
                        t = mtype;
 
@@ -6368,7 +6368,7 @@ error:
                 * that also allows using an array of int as a scratch
                 * space. e.g. skb->cb[].
                 */
-               if (off + size > mtrue_end) {
+               if (off + size > mtrue_end && !(*flag & PTR_UNTRUSTED)) {
                        bpf_log(log,
                                "access beyond the end of member %s (mend:%u) in struct %s with off %u size %u\n",
                                mname, mtrue_end, tname, off, size);
@@ -6476,7 +6476,7 @@ bool btf_struct_ids_match(struct bpf_verifier_log *log,
                          bool strict)
 {
        const struct btf_type *type;
-       enum bpf_type_flag flag;
+       enum bpf_type_flag flag = 0;
        int err;
 
        /* Are we already done? */
index 938a60f..6983af8 100644 (file)
@@ -9,7 +9,6 @@
 /**
  * struct bpf_cpumask - refcounted BPF cpumask wrapper structure
  * @cpumask:   The actual cpumask embedded in the struct.
- * @rcu:       The RCU head used to free the cpumask with RCU safety.
  * @usage:     Object reference counter. When the refcount goes to 0, the
  *             memory is released back to the BPF allocator, which provides
  *             RCU safety.
@@ -25,7 +24,6 @@
  */
 struct bpf_cpumask {
        cpumask_t cpumask;
-       struct rcu_head rcu;
        refcount_t usage;
 };
 
@@ -82,16 +80,6 @@ __bpf_kfunc struct bpf_cpumask *bpf_cpumask_acquire(struct bpf_cpumask *cpumask)
        return cpumask;
 }
 
-static void cpumask_free_cb(struct rcu_head *head)
-{
-       struct bpf_cpumask *cpumask;
-
-       cpumask = container_of(head, struct bpf_cpumask, rcu);
-       migrate_disable();
-       bpf_mem_cache_free(&bpf_cpumask_ma, cpumask);
-       migrate_enable();
-}
-
 /**
  * bpf_cpumask_release() - Release a previously acquired BPF cpumask.
  * @cpumask: The cpumask being released.
@@ -102,8 +90,12 @@ static void cpumask_free_cb(struct rcu_head *head)
  */
 __bpf_kfunc void bpf_cpumask_release(struct bpf_cpumask *cpumask)
 {
-       if (refcount_dec_and_test(&cpumask->usage))
-               call_rcu(&cpumask->rcu, cpumask_free_cb);
+       if (!refcount_dec_and_test(&cpumask->usage))
+               return;
+
+       migrate_disable();
+       bpf_mem_cache_free_rcu(&bpf_cpumask_ma, cpumask);
+       migrate_enable();
 }
 
 /**
index 56d3da7..a8c7e1c 100644 (file)
@@ -302,6 +302,7 @@ static struct htab_elem *prealloc_lru_pop(struct bpf_htab *htab, void *key,
        struct htab_elem *l;
 
        if (node) {
+               bpf_map_inc_elem_count(&htab->map);
                l = container_of(node, struct htab_elem, lru_node);
                memcpy(l->key, key, htab->map.key_size);
                return l;
@@ -510,12 +511,16 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
            htab->n_buckets > U32_MAX / sizeof(struct bucket))
                goto free_htab;
 
+       err = bpf_map_init_elem_count(&htab->map);
+       if (err)
+               goto free_htab;
+
        err = -ENOMEM;
        htab->buckets = bpf_map_area_alloc(htab->n_buckets *
                                           sizeof(struct bucket),
                                           htab->map.numa_node);
        if (!htab->buckets)
-               goto free_htab;
+               goto free_elem_count;
 
        for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) {
                htab->map_locked[i] = bpf_map_alloc_percpu(&htab->map,
@@ -593,6 +598,8 @@ free_map_locked:
        bpf_map_area_free(htab->buckets);
        bpf_mem_alloc_destroy(&htab->pcpu_ma);
        bpf_mem_alloc_destroy(&htab->ma);
+free_elem_count:
+       bpf_map_free_elem_count(&htab->map);
 free_htab:
        lockdep_unregister_key(&htab->lockdep_key);
        bpf_map_area_free(htab);
@@ -804,6 +811,7 @@ static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node)
                if (l == tgt_l) {
                        hlist_nulls_del_rcu(&l->hash_node);
                        check_and_free_fields(htab, l);
+                       bpf_map_dec_elem_count(&htab->map);
                        break;
                }
 
@@ -900,6 +908,8 @@ static bool is_map_full(struct bpf_htab *htab)
 
 static void inc_elem_count(struct bpf_htab *htab)
 {
+       bpf_map_inc_elem_count(&htab->map);
+
        if (htab->use_percpu_counter)
                percpu_counter_add_batch(&htab->pcount, 1, PERCPU_COUNTER_BATCH);
        else
@@ -908,6 +918,8 @@ static void inc_elem_count(struct bpf_htab *htab)
 
 static void dec_elem_count(struct bpf_htab *htab)
 {
+       bpf_map_dec_elem_count(&htab->map);
+
        if (htab->use_percpu_counter)
                percpu_counter_add_batch(&htab->pcount, -1, PERCPU_COUNTER_BATCH);
        else
@@ -920,6 +932,7 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l)
        htab_put_fd_value(htab, l);
 
        if (htab_is_prealloc(htab)) {
+               bpf_map_dec_elem_count(&htab->map);
                check_and_free_fields(htab, l);
                __pcpu_freelist_push(&htab->freelist, &l->fnode);
        } else {
@@ -1000,6 +1013,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key,
                        if (!l)
                                return ERR_PTR(-E2BIG);
                        l_new = container_of(l, struct htab_elem, fnode);
+                       bpf_map_inc_elem_count(&htab->map);
                }
        } else {
                if (is_map_full(htab))
@@ -1168,6 +1182,7 @@ err:
 static void htab_lru_push_free(struct bpf_htab *htab, struct htab_elem *elem)
 {
        check_and_free_fields(htab, elem);
+       bpf_map_dec_elem_count(&htab->map);
        bpf_lru_push_free(&htab->lru, &elem->lru_node);
 }
 
@@ -1357,8 +1372,10 @@ static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
 err:
        htab_unlock_bucket(htab, b, hash, flags);
 err_lock_bucket:
-       if (l_new)
+       if (l_new) {
+               bpf_map_dec_elem_count(&htab->map);
                bpf_lru_push_free(&htab->lru, &l_new->lru_node);
+       }
        return ret;
 }
 
@@ -1523,6 +1540,7 @@ static void htab_map_free(struct bpf_map *map)
                prealloc_destroy(htab);
        }
 
+       bpf_map_free_elem_count(map);
        free_percpu(htab->extra_elems);
        bpf_map_area_free(htab->buckets);
        bpf_mem_alloc_destroy(&htab->pcpu_ma);
index b0fa190..d06d3b7 100644 (file)
@@ -93,7 +93,7 @@ static struct bpf_iter_reg bpf_map_reg_info = {
        .ctx_arg_info_size      = 1,
        .ctx_arg_info           = {
                { offsetof(struct bpf_iter__bpf_map, map),
-                 PTR_TO_BTF_ID_OR_NULL },
+                 PTR_TO_BTF_ID_OR_NULL | PTR_TRUSTED },
        },
        .seq_info               = &bpf_map_seq_info,
 };
@@ -193,3 +193,40 @@ static int __init bpf_map_iter_init(void)
 }
 
 late_initcall(bpf_map_iter_init);
+
+__diag_push();
+__diag_ignore_all("-Wmissing-prototypes",
+                 "Global functions as their definitions will be in vmlinux BTF");
+
+__bpf_kfunc s64 bpf_map_sum_elem_count(struct bpf_map *map)
+{
+       s64 *pcount;
+       s64 ret = 0;
+       int cpu;
+
+       if (!map || !map->elem_count)
+               return 0;
+
+       for_each_possible_cpu(cpu) {
+               pcount = per_cpu_ptr(map->elem_count, cpu);
+               ret += READ_ONCE(*pcount);
+       }
+       return ret;
+}
+
+__diag_pop();
+
+BTF_SET8_START(bpf_map_iter_kfunc_ids)
+BTF_ID_FLAGS(func, bpf_map_sum_elem_count, KF_TRUSTED_ARGS)
+BTF_SET8_END(bpf_map_iter_kfunc_ids)
+
+static const struct btf_kfunc_id_set bpf_map_iter_kfunc_set = {
+       .owner = THIS_MODULE,
+       .set   = &bpf_map_iter_kfunc_ids,
+};
+
+static int init_subsystem(void)
+{
+       return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_map_iter_kfunc_set);
+}
+late_initcall(init_subsystem);
index 0668bcd..51d6389 100644 (file)
@@ -98,11 +98,23 @@ struct bpf_mem_cache {
        int free_cnt;
        int low_watermark, high_watermark, batch;
        int percpu_size;
+       bool draining;
+       struct bpf_mem_cache *tgt;
 
-       struct rcu_head rcu;
+       /* list of objects to be freed after RCU GP */
        struct llist_head free_by_rcu;
+       struct llist_node *free_by_rcu_tail;
        struct llist_head waiting_for_gp;
+       struct llist_node *waiting_for_gp_tail;
+       struct rcu_head rcu;
        atomic_t call_rcu_in_progress;
+       struct llist_head free_llist_extra_rcu;
+
+       /* list of objects to be freed after RCU tasks trace GP */
+       struct llist_head free_by_rcu_ttrace;
+       struct llist_head waiting_for_gp_ttrace;
+       struct rcu_head rcu_ttrace;
+       atomic_t call_rcu_ttrace_in_progress;
 };
 
 struct bpf_mem_caches {
@@ -153,59 +165,83 @@ static struct mem_cgroup *get_memcg(const struct bpf_mem_cache *c)
 #endif
 }
 
+static void inc_active(struct bpf_mem_cache *c, unsigned long *flags)
+{
+       if (IS_ENABLED(CONFIG_PREEMPT_RT))
+               /* In RT irq_work runs in per-cpu kthread, so disable
+                * interrupts to avoid preemption and interrupts and
+                * reduce the chance of bpf prog executing on this cpu
+                * when active counter is busy.
+                */
+               local_irq_save(*flags);
+       /* alloc_bulk runs from irq_work which will not preempt a bpf
+        * program that does unit_alloc/unit_free since IRQs are
+        * disabled there. There is no race to increment 'active'
+        * counter. It protects free_llist from corruption in case NMI
+        * bpf prog preempted this loop.
+        */
+       WARN_ON_ONCE(local_inc_return(&c->active) != 1);
+}
+
+static void dec_active(struct bpf_mem_cache *c, unsigned long flags)
+{
+       local_dec(&c->active);
+       if (IS_ENABLED(CONFIG_PREEMPT_RT))
+               local_irq_restore(flags);
+}
+
+static void add_obj_to_free_list(struct bpf_mem_cache *c, void *obj)
+{
+       unsigned long flags;
+
+       inc_active(c, &flags);
+       __llist_add(obj, &c->free_llist);
+       c->free_cnt++;
+       dec_active(c, flags);
+}
+
 /* Mostly runs from irq_work except __init phase. */
 static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node)
 {
        struct mem_cgroup *memcg = NULL, *old_memcg;
-       unsigned long flags;
        void *obj;
        int i;
 
-       memcg = get_memcg(c);
-       old_memcg = set_active_memcg(memcg);
        for (i = 0; i < cnt; i++) {
                /*
-                * free_by_rcu is only manipulated by irq work refill_work().
-                * IRQ works on the same CPU are called sequentially, so it is
-                * safe to use __llist_del_first() here. If alloc_bulk() is
-                * invoked by the initial prefill, there will be no running
-                * refill_work(), so __llist_del_first() is fine as well.
-                *
-                * In most cases, objects on free_by_rcu are from the same CPU.
-                * If some objects come from other CPUs, it doesn't incur any
-                * harm because NUMA_NO_NODE means the preference for current
-                * numa node and it is not a guarantee.
+                * For every 'c' llist_del_first(&c->free_by_rcu_ttrace); is
+                * done only by one CPU == current CPU. Other CPUs might
+                * llist_add() and llist_del_all() in parallel.
                 */
-               obj = __llist_del_first(&c->free_by_rcu);
-               if (!obj) {
-                       /* Allocate, but don't deplete atomic reserves that typical
-                        * GFP_ATOMIC would do. irq_work runs on this cpu and kmalloc
-                        * will allocate from the current numa node which is what we
-                        * want here.
-                        */
-                       obj = __alloc(c, node, GFP_NOWAIT | __GFP_NOWARN | __GFP_ACCOUNT);
-                       if (!obj)
-                               break;
-               }
-               if (IS_ENABLED(CONFIG_PREEMPT_RT))
-                       /* In RT irq_work runs in per-cpu kthread, so disable
-                        * interrupts to avoid preemption and interrupts and
-                        * reduce the chance of bpf prog executing on this cpu
-                        * when active counter is busy.
-                        */
-                       local_irq_save(flags);
-               /* alloc_bulk runs from irq_work which will not preempt a bpf
-                * program that does unit_alloc/unit_free since IRQs are
-                * disabled there. There is no race to increment 'active'
-                * counter. It protects free_llist from corruption in case NMI
-                * bpf prog preempted this loop.
+               obj = llist_del_first(&c->free_by_rcu_ttrace);
+               if (!obj)
+                       break;
+               add_obj_to_free_list(c, obj);
+       }
+       if (i >= cnt)
+               return;
+
+       for (; i < cnt; i++) {
+               obj = llist_del_first(&c->waiting_for_gp_ttrace);
+               if (!obj)
+                       break;
+               add_obj_to_free_list(c, obj);
+       }
+       if (i >= cnt)
+               return;
+
+       memcg = get_memcg(c);
+       old_memcg = set_active_memcg(memcg);
+       for (; i < cnt; i++) {
+               /* Allocate, but don't deplete atomic reserves that typical
+                * GFP_ATOMIC would do. irq_work runs on this cpu and kmalloc
+                * will allocate from the current numa node which is what we
+                * want here.
                 */
-               WARN_ON_ONCE(local_inc_return(&c->active) != 1);
-               __llist_add(obj, &c->free_llist);
-               c->free_cnt++;
-               local_dec(&c->active);
-               if (IS_ENABLED(CONFIG_PREEMPT_RT))
-                       local_irq_restore(flags);
+               obj = __alloc(c, node, GFP_NOWAIT | __GFP_NOWARN | __GFP_ACCOUNT);
+               if (!obj)
+                       break;
+               add_obj_to_free_list(c, obj);
        }
        set_active_memcg(old_memcg);
        mem_cgroup_put(memcg);
@@ -222,20 +258,24 @@ static void free_one(void *obj, bool percpu)
        kfree(obj);
 }
 
-static void free_all(struct llist_node *llnode, bool percpu)
+static int free_all(struct llist_node *llnode, bool percpu)
 {
        struct llist_node *pos, *t;
+       int cnt = 0;
 
-       llist_for_each_safe(pos, t, llnode)
+       llist_for_each_safe(pos, t, llnode) {
                free_one(pos, percpu);
+               cnt++;
+       }
+       return cnt;
 }
 
 static void __free_rcu(struct rcu_head *head)
 {
-       struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu);
+       struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu_ttrace);
 
-       free_all(llist_del_all(&c->waiting_for_gp), !!c->percpu_size);
-       atomic_set(&c->call_rcu_in_progress, 0);
+       free_all(llist_del_all(&c->waiting_for_gp_ttrace), !!c->percpu_size);
+       atomic_set(&c->call_rcu_ttrace_in_progress, 0);
 }
 
 static void __free_rcu_tasks_trace(struct rcu_head *head)
@@ -254,60 +294,128 @@ static void enque_to_free(struct bpf_mem_cache *c, void *obj)
        struct llist_node *llnode = obj;
 
        /* bpf_mem_cache is a per-cpu object. Freeing happens in irq_work.
-        * Nothing races to add to free_by_rcu list.
+        * Nothing races to add to free_by_rcu_ttrace list.
         */
-       __llist_add(llnode, &c->free_by_rcu);
+       llist_add(llnode, &c->free_by_rcu_ttrace);
 }
 
-static void do_call_rcu(struct bpf_mem_cache *c)
+static void do_call_rcu_ttrace(struct bpf_mem_cache *c)
 {
        struct llist_node *llnode, *t;
 
-       if (atomic_xchg(&c->call_rcu_in_progress, 1))
+       if (atomic_xchg(&c->call_rcu_ttrace_in_progress, 1)) {
+               if (unlikely(READ_ONCE(c->draining))) {
+                       llnode = llist_del_all(&c->free_by_rcu_ttrace);
+                       free_all(llnode, !!c->percpu_size);
+               }
                return;
+       }
+
+       WARN_ON_ONCE(!llist_empty(&c->waiting_for_gp_ttrace));
+       llist_for_each_safe(llnode, t, llist_del_all(&c->free_by_rcu_ttrace))
+               llist_add(llnode, &c->waiting_for_gp_ttrace);
+
+       if (unlikely(READ_ONCE(c->draining))) {
+               __free_rcu(&c->rcu_ttrace);
+               return;
+       }
 
-       WARN_ON_ONCE(!llist_empty(&c->waiting_for_gp));
-       llist_for_each_safe(llnode, t, __llist_del_all(&c->free_by_rcu))
-               /* There is no concurrent __llist_add(waiting_for_gp) access.
-                * It doesn't race with llist_del_all either.
-                * But there could be two concurrent llist_del_all(waiting_for_gp):
-                * from __free_rcu() and from drain_mem_cache().
-                */
-               __llist_add(llnode, &c->waiting_for_gp);
        /* Use call_rcu_tasks_trace() to wait for sleepable progs to finish.
         * If RCU Tasks Trace grace period implies RCU grace period, free
         * these elements directly, else use call_rcu() to wait for normal
         * progs to finish and finally do free_one() on each element.
         */
-       call_rcu_tasks_trace(&c->rcu, __free_rcu_tasks_trace);
+       call_rcu_tasks_trace(&c->rcu_ttrace, __free_rcu_tasks_trace);
 }
 
 static void free_bulk(struct bpf_mem_cache *c)
 {
+       struct bpf_mem_cache *tgt = c->tgt;
        struct llist_node *llnode, *t;
        unsigned long flags;
        int cnt;
 
+       WARN_ON_ONCE(tgt->unit_size != c->unit_size);
+
        do {
-               if (IS_ENABLED(CONFIG_PREEMPT_RT))
-                       local_irq_save(flags);
-               WARN_ON_ONCE(local_inc_return(&c->active) != 1);
+               inc_active(c, &flags);
                llnode = __llist_del_first(&c->free_llist);
                if (llnode)
                        cnt = --c->free_cnt;
                else
                        cnt = 0;
-               local_dec(&c->active);
-               if (IS_ENABLED(CONFIG_PREEMPT_RT))
-                       local_irq_restore(flags);
+               dec_active(c, flags);
                if (llnode)
-                       enque_to_free(c, llnode);
+                       enque_to_free(tgt, llnode);
        } while (cnt > (c->high_watermark + c->low_watermark) / 2);
 
        /* and drain free_llist_extra */
        llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist_extra))
-               enque_to_free(c, llnode);
-       do_call_rcu(c);
+               enque_to_free(tgt, llnode);
+       do_call_rcu_ttrace(tgt);
+}
+
+static void __free_by_rcu(struct rcu_head *head)
+{
+       struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu);
+       struct bpf_mem_cache *tgt = c->tgt;
+       struct llist_node *llnode;
+
+       llnode = llist_del_all(&c->waiting_for_gp);
+       if (!llnode)
+               goto out;
+
+       llist_add_batch(llnode, c->waiting_for_gp_tail, &tgt->free_by_rcu_ttrace);
+
+       /* Objects went through regular RCU GP. Send them to RCU tasks trace */
+       do_call_rcu_ttrace(tgt);
+out:
+       atomic_set(&c->call_rcu_in_progress, 0);
+}
+
+static void check_free_by_rcu(struct bpf_mem_cache *c)
+{
+       struct llist_node *llnode, *t;
+       unsigned long flags;
+
+       /* drain free_llist_extra_rcu */
+       if (unlikely(!llist_empty(&c->free_llist_extra_rcu))) {
+               inc_active(c, &flags);
+               llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist_extra_rcu))
+                       if (__llist_add(llnode, &c->free_by_rcu))
+                               c->free_by_rcu_tail = llnode;
+               dec_active(c, flags);
+       }
+
+       if (llist_empty(&c->free_by_rcu))
+               return;
+
+       if (atomic_xchg(&c->call_rcu_in_progress, 1)) {
+               /*
+                * Instead of kmalloc-ing new rcu_head and triggering 10k
+                * call_rcu() to hit rcutree.qhimark and force RCU to notice
+                * the overload just ask RCU to hurry up. There could be many
+                * objects in free_by_rcu list.
+                * This hint reduces memory consumption for an artificial
+                * benchmark from 2 Gbyte to 150 Mbyte.
+                */
+               rcu_request_urgent_qs_task(current);
+               return;
+       }
+
+       WARN_ON_ONCE(!llist_empty(&c->waiting_for_gp));
+
+       inc_active(c, &flags);
+       WRITE_ONCE(c->waiting_for_gp.first, __llist_del_all(&c->free_by_rcu));
+       c->waiting_for_gp_tail = c->free_by_rcu_tail;
+       dec_active(c, flags);
+
+       if (unlikely(READ_ONCE(c->draining))) {
+               free_all(llist_del_all(&c->waiting_for_gp), !!c->percpu_size);
+               atomic_set(&c->call_rcu_in_progress, 0);
+       } else {
+               call_rcu_hurry(&c->rcu, __free_by_rcu);
+       }
 }
 
 static void bpf_mem_refill(struct irq_work *work)
@@ -324,6 +432,8 @@ static void bpf_mem_refill(struct irq_work *work)
                alloc_bulk(c, c->batch, NUMA_NO_NODE);
        else if (cnt > c->high_watermark)
                free_bulk(c);
+
+       check_free_by_rcu(c);
 }
 
 static void notrace irq_work_raise(struct bpf_mem_cache *c)
@@ -406,6 +516,7 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu)
                        c->unit_size = unit_size;
                        c->objcg = objcg;
                        c->percpu_size = percpu_size;
+                       c->tgt = c;
                        prefill_mem_cache(c, cpu);
                }
                ma->cache = pc;
@@ -428,6 +539,7 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu)
                        c = &cc->cache[i];
                        c->unit_size = sizes[i];
                        c->objcg = objcg;
+                       c->tgt = c;
                        prefill_mem_cache(c, cpu);
                }
        }
@@ -441,19 +553,57 @@ static void drain_mem_cache(struct bpf_mem_cache *c)
 
        /* No progs are using this bpf_mem_cache, but htab_map_free() called
         * bpf_mem_cache_free() for all remaining elements and they can be in
-        * free_by_rcu or in waiting_for_gp lists, so drain those lists now.
+        * free_by_rcu_ttrace or in waiting_for_gp_ttrace lists, so drain those lists now.
         *
-        * Except for waiting_for_gp list, there are no concurrent operations
+        * Except for waiting_for_gp_ttrace list, there are no concurrent operations
         * on these lists, so it is safe to use __llist_del_all().
         */
-       free_all(__llist_del_all(&c->free_by_rcu), percpu);
-       free_all(llist_del_all(&c->waiting_for_gp), percpu);
+       free_all(llist_del_all(&c->free_by_rcu_ttrace), percpu);
+       free_all(llist_del_all(&c->waiting_for_gp_ttrace), percpu);
        free_all(__llist_del_all(&c->free_llist), percpu);
        free_all(__llist_del_all(&c->free_llist_extra), percpu);
+       free_all(__llist_del_all(&c->free_by_rcu), percpu);
+       free_all(__llist_del_all(&c->free_llist_extra_rcu), percpu);
+       free_all(llist_del_all(&c->waiting_for_gp), percpu);
+}
+
+static void check_mem_cache(struct bpf_mem_cache *c)
+{
+       WARN_ON_ONCE(!llist_empty(&c->free_by_rcu_ttrace));
+       WARN_ON_ONCE(!llist_empty(&c->waiting_for_gp_ttrace));
+       WARN_ON_ONCE(!llist_empty(&c->free_llist));
+       WARN_ON_ONCE(!llist_empty(&c->free_llist_extra));
+       WARN_ON_ONCE(!llist_empty(&c->free_by_rcu));
+       WARN_ON_ONCE(!llist_empty(&c->free_llist_extra_rcu));
+       WARN_ON_ONCE(!llist_empty(&c->waiting_for_gp));
+}
+
+static void check_leaked_objs(struct bpf_mem_alloc *ma)
+{
+       struct bpf_mem_caches *cc;
+       struct bpf_mem_cache *c;
+       int cpu, i;
+
+       if (ma->cache) {
+               for_each_possible_cpu(cpu) {
+                       c = per_cpu_ptr(ma->cache, cpu);
+                       check_mem_cache(c);
+               }
+       }
+       if (ma->caches) {
+               for_each_possible_cpu(cpu) {
+                       cc = per_cpu_ptr(ma->caches, cpu);
+                       for (i = 0; i < NUM_CACHES; i++) {
+                               c = &cc->cache[i];
+                               check_mem_cache(c);
+                       }
+               }
+       }
 }
 
 static void free_mem_alloc_no_barrier(struct bpf_mem_alloc *ma)
 {
+       check_leaked_objs(ma);
        free_percpu(ma->cache);
        free_percpu(ma->caches);
        ma->cache = NULL;
@@ -462,8 +612,8 @@ static void free_mem_alloc_no_barrier(struct bpf_mem_alloc *ma)
 
 static void free_mem_alloc(struct bpf_mem_alloc *ma)
 {
-       /* waiting_for_gp lists was drained, but __free_rcu might
-        * still execute. Wait for it now before we freeing percpu caches.
+       /* waiting_for_gp[_ttrace] lists were drained, but RCU callbacks
+        * might still execute. Wait for them.
         *
         * rcu_barrier_tasks_trace() doesn't imply synchronize_rcu_tasks_trace(),
         * but rcu_barrier_tasks_trace() and rcu_barrier() below are only used
@@ -472,7 +622,8 @@ static void free_mem_alloc(struct bpf_mem_alloc *ma)
         * rcu_trace_implies_rcu_gp(), it will be OK to skip rcu_barrier() by
         * using rcu_trace_implies_rcu_gp() as well.
         */
-       rcu_barrier_tasks_trace();
+       rcu_barrier(); /* wait for __free_by_rcu */
+       rcu_barrier_tasks_trace(); /* wait for __free_rcu */
        if (!rcu_trace_implies_rcu_gp())
                rcu_barrier();
        free_mem_alloc_no_barrier(ma);
@@ -498,7 +649,7 @@ static void destroy_mem_alloc(struct bpf_mem_alloc *ma, int rcu_in_progress)
                return;
        }
 
-       copy = kmalloc(sizeof(*ma), GFP_KERNEL);
+       copy = kmemdup(ma, sizeof(*ma), GFP_KERNEL);
        if (!copy) {
                /* Slow path with inline barrier-s */
                free_mem_alloc(ma);
@@ -506,10 +657,7 @@ static void destroy_mem_alloc(struct bpf_mem_alloc *ma, int rcu_in_progress)
        }
 
        /* Defer barriers into worker to let the rest of map memory to be freed */
-       copy->cache = ma->cache;
-       ma->cache = NULL;
-       copy->caches = ma->caches;
-       ma->caches = NULL;
+       memset(ma, 0, sizeof(*ma));
        INIT_WORK(&copy->work, free_mem_alloc_deferred);
        queue_work(system_unbound_wq, &copy->work);
 }
@@ -524,17 +672,10 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma)
                rcu_in_progress = 0;
                for_each_possible_cpu(cpu) {
                        c = per_cpu_ptr(ma->cache, cpu);
-                       /*
-                        * refill_work may be unfinished for PREEMPT_RT kernel
-                        * in which irq work is invoked in a per-CPU RT thread.
-                        * It is also possible for kernel with
-                        * arch_irq_work_has_interrupt() being false and irq
-                        * work is invoked in timer interrupt. So waiting for
-                        * the completion of irq work to ease the handling of
-                        * concurrency.
-                        */
+                       WRITE_ONCE(c->draining, true);
                        irq_work_sync(&c->refill_work);
                        drain_mem_cache(c);
+                       rcu_in_progress += atomic_read(&c->call_rcu_ttrace_in_progress);
                        rcu_in_progress += atomic_read(&c->call_rcu_in_progress);
                }
                /* objcg is the same across cpus */
@@ -548,8 +689,10 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma)
                        cc = per_cpu_ptr(ma->caches, cpu);
                        for (i = 0; i < NUM_CACHES; i++) {
                                c = &cc->cache[i];
+                               WRITE_ONCE(c->draining, true);
                                irq_work_sync(&c->refill_work);
                                drain_mem_cache(c);
+                               rcu_in_progress += atomic_read(&c->call_rcu_ttrace_in_progress);
                                rcu_in_progress += atomic_read(&c->call_rcu_in_progress);
                        }
                }
@@ -581,8 +724,10 @@ static void notrace *unit_alloc(struct bpf_mem_cache *c)
        local_irq_save(flags);
        if (local_inc_return(&c->active) == 1) {
                llnode = __llist_del_first(&c->free_llist);
-               if (llnode)
+               if (llnode) {
                        cnt = --c->free_cnt;
+                       *(struct bpf_mem_cache **)llnode = c;
+               }
        }
        local_dec(&c->active);
        local_irq_restore(flags);
@@ -606,6 +751,12 @@ static void notrace unit_free(struct bpf_mem_cache *c, void *ptr)
 
        BUILD_BUG_ON(LLIST_NODE_SZ > 8);
 
+       /*
+        * Remember bpf_mem_cache that allocated this object.
+        * The hint is not accurate.
+        */
+       c->tgt = *(struct bpf_mem_cache **)llnode;
+
        local_irq_save(flags);
        if (local_inc_return(&c->active) == 1) {
                __llist_add(llnode, &c->free_llist);
@@ -627,6 +778,27 @@ static void notrace unit_free(struct bpf_mem_cache *c, void *ptr)
                irq_work_raise(c);
 }
 
+static void notrace unit_free_rcu(struct bpf_mem_cache *c, void *ptr)
+{
+       struct llist_node *llnode = ptr - LLIST_NODE_SZ;
+       unsigned long flags;
+
+       c->tgt = *(struct bpf_mem_cache **)llnode;
+
+       local_irq_save(flags);
+       if (local_inc_return(&c->active) == 1) {
+               if (__llist_add(llnode, &c->free_by_rcu))
+                       c->free_by_rcu_tail = llnode;
+       } else {
+               llist_add(llnode, &c->free_llist_extra_rcu);
+       }
+       local_dec(&c->active);
+       local_irq_restore(flags);
+
+       if (!atomic_read(&c->call_rcu_in_progress))
+               irq_work_raise(c);
+}
+
 /* Called from BPF program or from sys_bpf syscall.
  * In both cases migration is disabled.
  */
@@ -660,6 +832,20 @@ void notrace bpf_mem_free(struct bpf_mem_alloc *ma, void *ptr)
        unit_free(this_cpu_ptr(ma->caches)->cache + idx, ptr);
 }
 
+void notrace bpf_mem_free_rcu(struct bpf_mem_alloc *ma, void *ptr)
+{
+       int idx;
+
+       if (!ptr)
+               return;
+
+       idx = bpf_mem_cache_idx(ksize(ptr - LLIST_NODE_SZ));
+       if (idx < 0)
+               return;
+
+       unit_free_rcu(this_cpu_ptr(ma->caches)->cache + idx, ptr);
+}
+
 void notrace *bpf_mem_cache_alloc(struct bpf_mem_alloc *ma)
 {
        void *ret;
@@ -676,6 +862,14 @@ void notrace bpf_mem_cache_free(struct bpf_mem_alloc *ma, void *ptr)
        unit_free(this_cpu_ptr(ma->cache), ptr);
 }
 
+void notrace bpf_mem_cache_free_rcu(struct bpf_mem_alloc *ma, void *ptr)
+{
+       if (!ptr)
+               return;
+
+       unit_free_rcu(this_cpu_ptr(ma->cache), ptr);
+}
+
 /* Directly does a kfree() without putting 'ptr' back to the free_llist
  * for reuse and without waiting for a rcu_tasks_trace gp.
  * The caller must first go through the rcu_tasks_trace gp for 'ptr'
index 8937dc6..b83c2f5 100644 (file)
@@ -50,7 +50,7 @@ iterators.lskel-%.h: $(OUTPUT)/%/iterators.bpf.o | $(BPFTOOL)
 $(OUTPUT)/%/iterators.bpf.o: iterators.bpf.c $(BPFOBJ) | $(OUTPUT)
        $(call msg,BPF,$@)
        $(Q)mkdir -p $(@D)
-       $(Q)$(CLANG) -g -O2 -target bpf -m$* $(INCLUDES)                      \
+       $(Q)$(CLANG) -g -O2 --target=bpf -m$* $(INCLUDES)                     \
                 -c $(filter %.c,$^) -o $@ &&                                 \
        $(LLVM_STRIP) -g $@
 
index 03af863..b78968b 100644 (file)
@@ -73,6 +73,8 @@ static const char *get_name(struct btf *btf, long btf_id, const char *fallback)
        return str + name_off;
 }
 
+__s64 bpf_map_sum_elem_count(struct bpf_map *map) __ksym;
+
 SEC("iter/bpf_map")
 int dump_bpf_map(struct bpf_iter__bpf_map *ctx)
 {
@@ -84,9 +86,12 @@ int dump_bpf_map(struct bpf_iter__bpf_map *ctx)
                return 0;
 
        if (seq_num == 0)
-               BPF_SEQ_PRINTF(seq, "  id name             max_entries\n");
+               BPF_SEQ_PRINTF(seq, "  id name             max_entries  cur_entries\n");
+
+       BPF_SEQ_PRINTF(seq, "%4u %-16s  %10d   %10lld\n",
+                      map->id, map->name, map->max_entries,
+                      bpf_map_sum_elem_count(map));
 
-       BPF_SEQ_PRINTF(seq, "%4u %-16s%6d\n", map->id, map->name, map->max_entries);
        return 0;
 }
 
index 70f236a..5b98ab0 100644 (file)
@@ -1,5 +1,5 @@
 /* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
-/* THIS FILE IS AUTOGENERATED! */
+/* THIS FILE IS AUTOGENERATED BY BPFTOOL! */
 #ifndef __ITERATORS_BPF_SKEL_H__
 #define __ITERATORS_BPF_SKEL_H__
 
@@ -18,8 +18,6 @@ struct iterators_bpf {
                int dump_bpf_map_fd;
                int dump_bpf_prog_fd;
        } links;
-       struct iterators_bpf__rodata {
-       } *rodata;
 };
 
 static inline int
@@ -68,7 +66,6 @@ iterators_bpf__destroy(struct iterators_bpf *skel)
        iterators_bpf__detach(skel);
        skel_closenz(skel->progs.dump_bpf_map.prog_fd);
        skel_closenz(skel->progs.dump_bpf_prog.prog_fd);
-       skel_free_map_data(skel->rodata, skel->maps.rodata.initial_value, 4096);
        skel_closenz(skel->maps.rodata.map_fd);
        skel_free(skel);
 }
@@ -81,15 +78,6 @@ iterators_bpf__open(void)
        if (!skel)
                goto cleanup;
        skel->ctx.sz = (void *)&skel->links - (void *)skel;
-       skel->rodata = skel_prep_map_data((void *)"\
-\x20\x20\x69\x64\x20\x6e\x61\x6d\x65\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\
-\x20\x20\x20\x6d\x61\x78\x5f\x65\x6e\x74\x72\x69\x65\x73\x0a\0\x25\x34\x75\x20\
-\x25\x2d\x31\x36\x73\x25\x36\x64\x0a\0\x20\x20\x69\x64\x20\x6e\x61\x6d\x65\x20\
-\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x61\x74\x74\x61\x63\x68\x65\
-\x64\x0a\0\x25\x34\x75\x20\x25\x2d\x31\x36\x73\x20\x25\x73\x20\x25\x73\x0a\0", 4096, 98);
-       if (!skel->rodata)
-               goto cleanup;
-       skel->maps.rodata.initial_value = (__u64) (long) skel->rodata;
        return skel;
 cleanup:
        iterators_bpf__destroy(skel);
@@ -103,7 +91,7 @@ iterators_bpf__load(struct iterators_bpf *skel)
        int err;
 
        opts.ctx = (struct bpf_loader_ctx *)skel;
-       opts.data_sz = 6056;
+       opts.data_sz = 6208;
        opts.data = (void *)"\
 \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
 \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
@@ -138,190 +126,197 @@ iterators_bpf__load(struct iterators_bpf *skel)
 \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
 \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
 \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x9f\xeb\x01\0\
-\x18\0\0\0\0\0\0\0\x1c\x04\0\0\x1c\x04\0\0\xf9\x04\0\0\0\0\0\0\0\0\0\x02\x02\0\
+\x18\0\0\0\0\0\0\0\x80\x04\0\0\x80\x04\0\0\x31\x05\0\0\0\0\0\0\0\0\0\x02\x02\0\
 \0\0\x01\0\0\0\x02\0\0\x04\x10\0\0\0\x13\0\0\0\x03\0\0\0\0\0\0\0\x18\0\0\0\x04\
 \0\0\0\x40\0\0\0\0\0\0\0\0\0\0\x02\x08\0\0\0\0\0\0\0\0\0\0\x02\x0d\0\0\0\0\0\0\
 \0\x01\0\0\x0d\x06\0\0\0\x1c\0\0\0\x01\0\0\0\x20\0\0\0\0\0\0\x01\x04\0\0\0\x20\
-\0\0\x01\x24\0\0\0\x01\0\0\x0c\x05\0\0\0\xa3\0\0\0\x03\0\0\x04\x18\0\0\0\xb1\0\
-\0\0\x09\0\0\0\0\0\0\0\xb5\0\0\0\x0b\0\0\0\x40\0\0\0\xc0\0\0\0\x0b\0\0\0\x80\0\
-\0\0\0\0\0\0\0\0\0\x02\x0a\0\0\0\xc8\0\0\0\0\0\0\x07\0\0\0\0\xd1\0\0\0\0\0\0\
-\x08\x0c\0\0\0\xd7\0\0\0\0\0\0\x01\x08\0\0\0\x40\0\0\0\x94\x01\0\0\x03\0\0\x04\
-\x18\0\0\0\x9c\x01\0\0\x0e\0\0\0\0\0\0\0\x9f\x01\0\0\x11\0\0\0\x20\0\0\0\xa4\
-\x01\0\0\x0e\0\0\0\xa0\0\0\0\xb0\x01\0\0\0\0\0\x08\x0f\0\0\0\xb6\x01\0\0\0\0\0\
-\x01\x04\0\0\0\x20\0\0\0\xc3\x01\0\0\0\0\0\x01\x01\0\0\0\x08\0\0\x01\0\0\0\0\0\
-\0\0\x03\0\0\0\0\x10\0\0\0\x12\0\0\0\x10\0\0\0\xc8\x01\0\0\0\0\0\x01\x04\0\0\0\
-\x20\0\0\0\0\0\0\0\0\0\0\x02\x14\0\0\0\x2c\x02\0\0\x02\0\0\x04\x10\0\0\0\x13\0\
-\0\0\x03\0\0\0\0\0\0\0\x3f\x02\0\0\x15\0\0\0\x40\0\0\0\0\0\0\0\0\0\0\x02\x18\0\
-\0\0\0\0\0\0\x01\0\0\x0d\x06\0\0\0\x1c\0\0\0\x13\0\0\0\x44\x02\0\0\x01\0\0\x0c\
-\x16\0\0\0\x90\x02\0\0\x01\0\0\x04\x08\0\0\0\x99\x02\0\0\x19\0\0\0\0\0\0\0\0\0\
-\0\0\0\0\0\x02\x1a\0\0\0\xea\x02\0\0\x06\0\0\x04\x38\0\0\0\x9c\x01\0\0\x0e\0\0\
-\0\0\0\0\0\x9f\x01\0\0\x11\0\0\0\x20\0\0\0\xf7\x02\0\0\x1b\0\0\0\xc0\0\0\0\x08\
-\x03\0\0\x15\0\0\0\0\x01\0\0\x11\x03\0\0\x1d\0\0\0\x40\x01\0\0\x1b\x03\0\0\x1e\
-\0\0\0\x80\x01\0\0\0\0\0\0\0\0\0\x02\x1c\0\0\0\0\0\0\0\0\0\0\x0a\x10\0\0\0\0\0\
-\0\0\0\0\0\x02\x1f\0\0\0\0\0\0\0\0\0\0\x02\x20\0\0\0\x65\x03\0\0\x02\0\0\x04\
-\x08\0\0\0\x73\x03\0\0\x0e\0\0\0\0\0\0\0\x7c\x03\0\0\x0e\0\0\0\x20\0\0\0\x1b\
-\x03\0\0\x03\0\0\x04\x18\0\0\0\x86\x03\0\0\x1b\0\0\0\0\0\0\0\x8e\x03\0\0\x21\0\
-\0\0\x40\0\0\0\x94\x03\0\0\x23\0\0\0\x80\0\0\0\0\0\0\0\0\0\0\x02\x22\0\0\0\0\0\
-\0\0\0\0\0\x02\x24\0\0\0\x98\x03\0\0\x01\0\0\x04\x04\0\0\0\xa3\x03\0\0\x0e\0\0\
-\0\0\0\0\0\x0c\x04\0\0\x01\0\0\x04\x04\0\0\0\x15\x04\0\0\x0e\0\0\0\0\0\0\0\0\0\
-\0\0\0\0\0\x03\0\0\0\0\x1c\0\0\0\x12\0\0\0\x23\0\0\0\x8b\x04\0\0\0\0\0\x0e\x25\
-\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x03\0\0\0\0\x1c\0\0\0\x12\0\0\0\x0e\0\0\0\x9f\x04\
-\0\0\0\0\0\x0e\x27\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x03\0\0\0\0\x1c\0\0\0\x12\0\0\0\
-\x20\0\0\0\xb5\x04\0\0\0\0\0\x0e\x29\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x03\0\0\0\0\
-\x1c\0\0\0\x12\0\0\0\x11\0\0\0\xca\x04\0\0\0\0\0\x0e\x2b\0\0\0\0\0\0\0\0\0\0\0\
-\0\0\0\x03\0\0\0\0\x10\0\0\0\x12\0\0\0\x04\0\0\0\xe1\x04\0\0\0\0\0\x0e\x2d\0\0\
-\0\x01\0\0\0\xe9\x04\0\0\x04\0\0\x0f\x62\0\0\0\x26\0\0\0\0\0\0\0\x23\0\0\0\x28\
-\0\0\0\x23\0\0\0\x0e\0\0\0\x2a\0\0\0\x31\0\0\0\x20\0\0\0\x2c\0\0\0\x51\0\0\0\
-\x11\0\0\0\xf1\x04\0\0\x01\0\0\x0f\x04\0\0\0\x2e\0\0\0\0\0\0\0\x04\0\0\0\0\x62\
-\x70\x66\x5f\x69\x74\x65\x72\x5f\x5f\x62\x70\x66\x5f\x6d\x61\x70\0\x6d\x65\x74\
-\x61\0\x6d\x61\x70\0\x63\x74\x78\0\x69\x6e\x74\0\x64\x75\x6d\x70\x5f\x62\x70\
-\x66\x5f\x6d\x61\x70\0\x69\x74\x65\x72\x2f\x62\x70\x66\x5f\x6d\x61\x70\0\x30\
-\x3a\x30\0\x2f\x77\x2f\x6e\x65\x74\x2d\x6e\x65\x78\x74\x2f\x6b\x65\x72\x6e\x65\
-\x6c\x2f\x62\x70\x66\x2f\x70\x72\x65\x6c\x6f\x61\x64\x2f\x69\x74\x65\x72\x61\
-\x74\x6f\x72\x73\x2f\x69\x74\x65\x72\x61\x74\x6f\x72\x73\x2e\x62\x70\x66\x2e\
-\x63\0\x09\x73\x74\x72\x75\x63\x74\x20\x73\x65\x71\x5f\x66\x69\x6c\x65\x20\x2a\
-\x73\x65\x71\x20\x3d\x20\x63\x74\x78\x2d\x3e\x6d\x65\x74\x61\x2d\x3e\x73\x65\
-\x71\x3b\0\x62\x70\x66\x5f\x69\x74\x65\x72\x5f\x6d\x65\x74\x61\0\x73\x65\x71\0\
-\x73\x65\x73\x73\x69\x6f\x6e\x5f\x69\x64\0\x73\x65\x71\x5f\x6e\x75\x6d\0\x73\
-\x65\x71\x5f\x66\x69\x6c\x65\0\x5f\x5f\x75\x36\x34\0\x75\x6e\x73\x69\x67\x6e\
-\x65\x64\x20\x6c\x6f\x6e\x67\x20\x6c\x6f\x6e\x67\0\x30\x3a\x31\0\x09\x73\x74\
-\x72\x75\x63\x74\x20\x62\x70\x66\x5f\x6d\x61\x70\x20\x2a\x6d\x61\x70\x20\x3d\
-\x20\x63\x74\x78\x2d\x3e\x6d\x61\x70\x3b\0\x09\x69\x66\x20\x28\x21\x6d\x61\x70\
-\x29\0\x09\x5f\x5f\x75\x36\x34\x20\x73\x65\x71\x5f\x6e\x75\x6d\x20\x3d\x20\x63\
-\x74\x78\x2d\x3e\x6d\x65\x74\x61\x2d\x3e\x73\x65\x71\x5f\x6e\x75\x6d\x3b\0\x30\
-\x3a\x32\0\x09\x69\x66\x20\x28\x73\x65\x71\x5f\x6e\x75\x6d\x20\x3d\x3d\x20\x30\
-\x29\0\x09\x09\x42\x50\x46\x5f\x53\x45\x51\x5f\x50\x52\x49\x4e\x54\x46\x28\x73\
-\x65\x71\x2c\x20\x22\x20\x20\x69\x64\x20\x6e\x61\x6d\x65\x20\x20\x20\x20\x20\
-\x20\x20\x20\x20\x20\x20\x20\x20\x6d\x61\x78\x5f\x65\x6e\x74\x72\x69\x65\x73\
-\x5c\x6e\x22\x29\x3b\0\x62\x70\x66\x5f\x6d\x61\x70\0\x69\x64\0\x6e\x61\x6d\x65\
-\0\x6d\x61\x78\x5f\x65\x6e\x74\x72\x69\x65\x73\0\x5f\x5f\x75\x33\x32\0\x75\x6e\
-\x73\x69\x67\x6e\x65\x64\x20\x69\x6e\x74\0\x63\x68\x61\x72\0\x5f\x5f\x41\x52\
-\x52\x41\x59\x5f\x53\x49\x5a\x45\x5f\x54\x59\x50\x45\x5f\x5f\0\x09\x42\x50\x46\
-\x5f\x53\x45\x51\x5f\x50\x52\x49\x4e\x54\x46\x28\x73\x65\x71\x2c\x20\x22\x25\
-\x34\x75\x20\x25\x2d\x31\x36\x73\x25\x36\x64\x5c\x6e\x22\x2c\x20\x6d\x61\x70\
-\x2d\x3e\x69\x64\x2c\x20\x6d\x61\x70\x2d\x3e\x6e\x61\x6d\x65\x2c\x20\x6d\x61\
-\x70\x2d\x3e\x6d\x61\x78\x5f\x65\x6e\x74\x72\x69\x65\x73\x29\x3b\0\x7d\0\x62\
-\x70\x66\x5f\x69\x74\x65\x72\x5f\x5f\x62\x70\x66\x5f\x70\x72\x6f\x67\0\x70\x72\
-\x6f\x67\0\x64\x75\x6d\x70\x5f\x62\x70\x66\x5f\x70\x72\x6f\x67\0\x69\x74\x65\
-\x72\x2f\x62\x70\x66\x5f\x70\x72\x6f\x67\0\x09\x73\x74\x72\x75\x63\x74\x20\x62\
-\x70\x66\x5f\x70\x72\x6f\x67\x20\x2a\x70\x72\x6f\x67\x20\x3d\x20\x63\x74\x78\
-\x2d\x3e\x70\x72\x6f\x67\x3b\0\x09\x69\x66\x20\x28\x21\x70\x72\x6f\x67\x29\0\
-\x62\x70\x66\x5f\x70\x72\x6f\x67\0\x61\x75\x78\0\x09\x61\x75\x78\x20\x3d\x20\
-\x70\x72\x6f\x67\x2d\x3e\x61\x75\x78\x3b\0\x09\x09\x42\x50\x46\x5f\x53\x45\x51\
-\x5f\x50\x52\x49\x4e\x54\x46\x28\x73\x65\x71\x2c\x20\x22\x20\x20\x69\x64\x20\
-\x6e\x61\x6d\x65\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x61\x74\
-\x74\x61\x63\x68\x65\x64\x5c\x6e\x22\x29\x3b\0\x62\x70\x66\x5f\x70\x72\x6f\x67\
-\x5f\x61\x75\x78\0\x61\x74\x74\x61\x63\x68\x5f\x66\x75\x6e\x63\x5f\x6e\x61\x6d\
-\x65\0\x64\x73\x74\x5f\x70\x72\x6f\x67\0\x66\x75\x6e\x63\x5f\x69\x6e\x66\x6f\0\
-\x62\x74\x66\0\x09\x42\x50\x46\x5f\x53\x45\x51\x5f\x50\x52\x49\x4e\x54\x46\x28\
-\x73\x65\x71\x2c\x20\x22\x25\x34\x75\x20\x25\x2d\x31\x36\x73\x20\x25\x73\x20\
-\x25\x73\x5c\x6e\x22\x2c\x20\x61\x75\x78\x2d\x3e\x69\x64\x2c\0\x30\x3a\x34\0\
-\x30\x3a\x35\0\x09\x69\x66\x20\x28\x21\x62\x74\x66\x29\0\x62\x70\x66\x5f\x66\
-\x75\x6e\x63\x5f\x69\x6e\x66\x6f\0\x69\x6e\x73\x6e\x5f\x6f\x66\x66\0\x74\x79\
-\x70\x65\x5f\x69\x64\0\x30\0\x73\x74\x72\x69\x6e\x67\x73\0\x74\x79\x70\x65\x73\
-\0\x68\x64\x72\0\x62\x74\x66\x5f\x68\x65\x61\x64\x65\x72\0\x73\x74\x72\x5f\x6c\
-\x65\x6e\0\x09\x74\x79\x70\x65\x73\x20\x3d\x20\x62\x74\x66\x2d\x3e\x74\x79\x70\
-\x65\x73\x3b\0\x09\x62\x70\x66\x5f\x70\x72\x6f\x62\x65\x5f\x72\x65\x61\x64\x5f\
-\x6b\x65\x72\x6e\x65\x6c\x28\x26\x74\x2c\x20\x73\x69\x7a\x65\x6f\x66\x28\x74\
-\x29\x2c\x20\x74\x79\x70\x65\x73\x20\x2b\x20\x62\x74\x66\x5f\x69\x64\x29\x3b\0\
-\x09\x73\x74\x72\x20\x3d\x20\x62\x74\x66\x2d\x3e\x73\x74\x72\x69\x6e\x67\x73\
-\x3b\0\x62\x74\x66\x5f\x74\x79\x70\x65\0\x6e\x61\x6d\x65\x5f\x6f\x66\x66\0\x09\
-\x6e\x61\x6d\x65\x5f\x6f\x66\x66\x20\x3d\x20\x42\x50\x46\x5f\x43\x4f\x52\x45\
-\x5f\x52\x45\x41\x44\x28\x74\x2c\x20\x6e\x61\x6d\x65\x5f\x6f\x66\x66\x29\x3b\0\
-\x30\x3a\x32\x3a\x30\0\x09\x69\x66\x20\x28\x6e\x61\x6d\x65\x5f\x6f\x66\x66\x20\
-\x3e\x3d\x20\x62\x74\x66\x2d\x3e\x68\x64\x72\x2e\x73\x74\x72\x5f\x6c\x65\x6e\
-\x29\0\x09\x72\x65\x74\x75\x72\x6e\x20\x73\x74\x72\x20\x2b\x20\x6e\x61\x6d\x65\
-\x5f\x6f\x66\x66\x3b\0\x30\x3a\x33\0\x64\x75\x6d\x70\x5f\x62\x70\x66\x5f\x6d\
-\x61\x70\x2e\x5f\x5f\x5f\x66\x6d\x74\0\x64\x75\x6d\x70\x5f\x62\x70\x66\x5f\x6d\
-\x61\x70\x2e\x5f\x5f\x5f\x66\x6d\x74\x2e\x31\0\x64\x75\x6d\x70\x5f\x62\x70\x66\
-\x5f\x70\x72\x6f\x67\x2e\x5f\x5f\x5f\x66\x6d\x74\0\x64\x75\x6d\x70\x5f\x62\x70\
-\x66\x5f\x70\x72\x6f\x67\x2e\x5f\x5f\x5f\x66\x6d\x74\x2e\x32\0\x4c\x49\x43\x45\
-\x4e\x53\x45\0\x2e\x72\x6f\x64\x61\x74\x61\0\x6c\x69\x63\x65\x6e\x73\x65\0\0\0\
-\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x2d\x09\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x02\0\0\
-\0\x04\0\0\0\x62\0\0\0\x01\0\0\0\x80\x04\0\0\0\0\0\0\0\0\0\0\x69\x74\x65\x72\
-\x61\x74\x6f\x72\x2e\x72\x6f\x64\x61\x74\x61\0\0\0\0\0\0\0\0\0\0\0\0\0\x2f\0\0\
-\0\0\0\0\0\0\0\0\0\0\0\0\0\x20\x20\x69\x64\x20\x6e\x61\x6d\x65\x20\x20\x20\x20\
-\x20\x20\x20\x20\x20\x20\x20\x20\x20\x6d\x61\x78\x5f\x65\x6e\x74\x72\x69\x65\
-\x73\x0a\0\x25\x34\x75\x20\x25\x2d\x31\x36\x73\x25\x36\x64\x0a\0\x20\x20\x69\
-\x64\x20\x6e\x61\x6d\x65\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\
-\x61\x74\x74\x61\x63\x68\x65\x64\x0a\0\x25\x34\x75\x20\x25\x2d\x31\x36\x73\x20\
-\x25\x73\x20\x25\x73\x0a\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
-\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x47\x50\x4c\0\0\0\0\0\
-\x79\x12\0\0\0\0\0\0\x79\x26\0\0\0\0\0\0\x79\x17\x08\0\0\0\0\0\x15\x07\x1b\0\0\
-\0\0\0\x79\x11\0\0\0\0\0\0\x79\x11\x10\0\0\0\0\0\x55\x01\x08\0\0\0\0\0\xbf\xa4\
-\0\0\0\0\0\0\x07\x04\0\0\xe8\xff\xff\xff\xbf\x61\0\0\0\0\0\0\x18\x62\0\0\0\0\0\
-\0\0\0\0\0\0\0\0\0\xb7\x03\0\0\x23\0\0\0\xb7\x05\0\0\0\0\0\0\x85\0\0\0\x7e\0\0\
-\0\x61\x71\0\0\0\0\0\0\x7b\x1a\xe8\xff\0\0\0\0\xb7\x01\0\0\x04\0\0\0\xbf\x72\0\
-\0\0\0\0\0\x0f\x12\0\0\0\0\0\0\x7b\x2a\xf0\xff\0\0\0\0\x61\x71\x14\0\0\0\0\0\
-\x7b\x1a\xf8\xff\0\0\0\0\xbf\xa4\0\0\0\0\0\0\x07\x04\0\0\xe8\xff\xff\xff\xbf\
-\x61\0\0\0\0\0\0\x18\x62\0\0\0\0\0\0\0\0\0\0\x23\0\0\0\xb7\x03\0\0\x0e\0\0\0\
-\xb7\x05\0\0\x18\0\0\0\x85\0\0\0\x7e\0\0\0\xb7\0\0\0\0\0\0\0\x95\0\0\0\0\0\0\0\
-\0\0\0\0\x07\0\0\0\0\0\0\0\x42\0\0\0\x7b\0\0\0\x1e\x3c\x01\0\x01\0\0\0\x42\0\0\
-\0\x7b\0\0\0\x24\x3c\x01\0\x02\0\0\0\x42\0\0\0\xee\0\0\0\x1d\x44\x01\0\x03\0\0\
-\0\x42\0\0\0\x0f\x01\0\0\x06\x4c\x01\0\x04\0\0\0\x42\0\0\0\x1a\x01\0\0\x17\x40\
-\x01\0\x05\0\0\0\x42\0\0\0\x1a\x01\0\0\x1d\x40\x01\0\x06\0\0\0\x42\0\0\0\x43\
-\x01\0\0\x06\x58\x01\0\x08\0\0\0\x42\0\0\0\x56\x01\0\0\x03\x5c\x01\0\x0f\0\0\0\
-\x42\0\0\0\xdc\x01\0\0\x02\x64\x01\0\x1f\0\0\0\x42\0\0\0\x2a\x02\0\0\x01\x6c\
-\x01\0\0\0\0\0\x02\0\0\0\x3e\0\0\0\0\0\0\0\x08\0\0\0\x08\0\0\0\x3e\0\0\0\0\0\0\
-\0\x10\0\0\0\x02\0\0\0\xea\0\0\0\0\0\0\0\x20\0\0\0\x02\0\0\0\x3e\0\0\0\0\0\0\0\
-\x28\0\0\0\x08\0\0\0\x3f\x01\0\0\0\0\0\0\x78\0\0\0\x0d\0\0\0\x3e\0\0\0\0\0\0\0\
-\x88\0\0\0\x0d\0\0\0\xea\0\0\0\0\0\0\0\xa8\0\0\0\x0d\0\0\0\x3f\x01\0\0\0\0\0\0\
-\x1a\0\0\0\x21\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
-\0\0\0\0\0\0\0\0\0\0\0\x64\x75\x6d\x70\x5f\x62\x70\x66\x5f\x6d\x61\x70\0\0\0\0\
-\0\0\0\0\x1c\0\0\0\0\0\0\0\x08\0\0\0\0\0\0\0\0\0\0\0\x01\0\0\0\x10\0\0\0\0\0\0\
-\0\0\0\0\0\x0a\0\0\0\x01\0\0\0\0\0\0\0\x08\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
-\0\x10\0\0\0\0\0\0\0\x62\x70\x66\x5f\x69\x74\x65\x72\x5f\x62\x70\x66\x5f\x6d\
-\x61\x70\0\0\0\0\0\0\0\0\x47\x50\x4c\0\0\0\0\0\x79\x12\0\0\0\0\0\0\x79\x26\0\0\
-\0\0\0\0\x79\x12\x08\0\0\0\0\0\x15\x02\x3c\0\0\0\0\0\x79\x11\0\0\0\0\0\0\x79\
-\x27\0\0\0\0\0\0\x79\x11\x10\0\0\0\0\0\x55\x01\x08\0\0\0\0\0\xbf\xa4\0\0\0\0\0\
-\0\x07\x04\0\0\xd0\xff\xff\xff\xbf\x61\0\0\0\0\0\0\x18\x62\0\0\0\0\0\0\0\0\0\0\
-\x31\0\0\0\xb7\x03\0\0\x20\0\0\0\xb7\x05\0\0\0\0\0\0\x85\0\0\0\x7e\0\0\0\x7b\
-\x6a\xc8\xff\0\0\0\0\x61\x71\0\0\0\0\0\0\x7b\x1a\xd0\xff\0\0\0\0\xb7\x03\0\0\
-\x04\0\0\0\xbf\x79\0\0\0\0\0\0\x0f\x39\0\0\0\0\0\0\x79\x71\x28\0\0\0\0\0\x79\
-\x78\x30\0\0\0\0\0\x15\x08\x18\0\0\0\0\0\xb7\x02\0\0\0\0\0\0\x0f\x21\0\0\0\0\0\
-\0\x61\x11\x04\0\0\0\0\0\x79\x83\x08\0\0\0\0\0\x67\x01\0\0\x03\0\0\0\x0f\x13\0\
-\0\0\0\0\0\x79\x86\0\0\0\0\0\0\xbf\xa1\0\0\0\0\0\0\x07\x01\0\0\xf8\xff\xff\xff\
-\xb7\x02\0\0\x08\0\0\0\x85\0\0\0\x71\0\0\0\xb7\x01\0\0\0\0\0\0\x79\xa3\xf8\xff\
-\0\0\0\0\x0f\x13\0\0\0\0\0\0\xbf\xa1\0\0\0\0\0\0\x07\x01\0\0\xf4\xff\xff\xff\
-\xb7\x02\0\0\x04\0\0\0\x85\0\0\0\x71\0\0\0\xb7\x03\0\0\x04\0\0\0\x61\xa1\xf4\
-\xff\0\0\0\0\x61\x82\x10\0\0\0\0\0\x3d\x21\x02\0\0\0\0\0\x0f\x16\0\0\0\0\0\0\
-\xbf\x69\0\0\0\0\0\0\x7b\x9a\xd8\xff\0\0\0\0\x79\x71\x18\0\0\0\0\0\x7b\x1a\xe0\
-\xff\0\0\0\0\x79\x71\x20\0\0\0\0\0\x79\x11\0\0\0\0\0\0\x0f\x31\0\0\0\0\0\0\x7b\
-\x1a\xe8\xff\0\0\0\0\xbf\xa4\0\0\0\0\0\0\x07\x04\0\0\xd0\xff\xff\xff\x79\xa1\
-\xc8\xff\0\0\0\0\x18\x62\0\0\0\0\0\0\0\0\0\0\x51\0\0\0\xb7\x03\0\0\x11\0\0\0\
-\xb7\x05\0\0\x20\0\0\0\x85\0\0\0\x7e\0\0\0\xb7\0\0\0\0\0\0\0\x95\0\0\0\0\0\0\0\
-\0\0\0\0\x17\0\0\0\0\0\0\0\x42\0\0\0\x7b\0\0\0\x1e\x80\x01\0\x01\0\0\0\x42\0\0\
-\0\x7b\0\0\0\x24\x80\x01\0\x02\0\0\0\x42\0\0\0\x60\x02\0\0\x1f\x88\x01\0\x03\0\
-\0\0\x42\0\0\0\x84\x02\0\0\x06\x94\x01\0\x04\0\0\0\x42\0\0\0\x1a\x01\0\0\x17\
-\x84\x01\0\x05\0\0\0\x42\0\0\0\x9d\x02\0\0\x0e\xa0\x01\0\x06\0\0\0\x42\0\0\0\
-\x1a\x01\0\0\x1d\x84\x01\0\x07\0\0\0\x42\0\0\0\x43\x01\0\0\x06\xa4\x01\0\x09\0\
-\0\0\x42\0\0\0\xaf\x02\0\0\x03\xa8\x01\0\x11\0\0\0\x42\0\0\0\x1f\x03\0\0\x02\
-\xb0\x01\0\x18\0\0\0\x42\0\0\0\x5a\x03\0\0\x06\x04\x01\0\x1b\0\0\0\x42\0\0\0\0\
-\0\0\0\0\0\0\0\x1c\0\0\0\x42\0\0\0\xab\x03\0\0\x0f\x10\x01\0\x1d\0\0\0\x42\0\0\
-\0\xc0\x03\0\0\x2d\x14\x01\0\x1f\0\0\0\x42\0\0\0\xf7\x03\0\0\x0d\x0c\x01\0\x21\
-\0\0\0\x42\0\0\0\0\0\0\0\0\0\0\0\x22\0\0\0\x42\0\0\0\xc0\x03\0\0\x02\x14\x01\0\
-\x25\0\0\0\x42\0\0\0\x1e\x04\0\0\x0d\x18\x01\0\x28\0\0\0\x42\0\0\0\0\0\0\0\0\0\
-\0\0\x29\0\0\0\x42\0\0\0\x1e\x04\0\0\x0d\x18\x01\0\x2c\0\0\0\x42\0\0\0\x1e\x04\
-\0\0\x0d\x18\x01\0\x2d\0\0\0\x42\0\0\0\x4c\x04\0\0\x1b\x1c\x01\0\x2e\0\0\0\x42\
-\0\0\0\x4c\x04\0\0\x06\x1c\x01\0\x2f\0\0\0\x42\0\0\0\x6f\x04\0\0\x0d\x24\x01\0\
-\x31\0\0\0\x42\0\0\0\x1f\x03\0\0\x02\xb0\x01\0\x40\0\0\0\x42\0\0\0\x2a\x02\0\0\
-\x01\xc0\x01\0\0\0\0\0\x14\0\0\0\x3e\0\0\0\0\0\0\0\x08\0\0\0\x08\0\0\0\x3e\0\0\
-\0\0\0\0\0\x10\0\0\0\x14\0\0\0\xea\0\0\0\0\0\0\0\x20\0\0\0\x14\0\0\0\x3e\0\0\0\
-\0\0\0\0\x28\0\0\0\x18\0\0\0\x3e\0\0\0\0\0\0\0\x30\0\0\0\x08\0\0\0\x3f\x01\0\0\
-\0\0\0\0\x88\0\0\0\x1a\0\0\0\x3e\0\0\0\0\0\0\0\x98\0\0\0\x1a\0\0\0\xea\0\0\0\0\
-\0\0\0\xb0\0\0\0\x1a\0\0\0\x52\x03\0\0\0\0\0\0\xb8\0\0\0\x1a\0\0\0\x56\x03\0\0\
-\0\0\0\0\xc8\0\0\0\x1f\0\0\0\x84\x03\0\0\0\0\0\0\xe0\0\0\0\x20\0\0\0\xea\0\0\0\
-\0\0\0\0\xf8\0\0\0\x20\0\0\0\x3e\0\0\0\0\0\0\0\x20\x01\0\0\x24\0\0\0\x3e\0\0\0\
-\0\0\0\0\x58\x01\0\0\x1a\0\0\0\xea\0\0\0\0\0\0\0\x68\x01\0\0\x20\0\0\0\x46\x04\
-\0\0\0\0\0\0\x90\x01\0\0\x1a\0\0\0\x3f\x01\0\0\0\0\0\0\xa0\x01\0\0\x1a\0\0\0\
-\x87\x04\0\0\0\0\0\0\xa8\x01\0\0\x18\0\0\0\x3e\0\0\0\0\0\0\0\x1a\0\0\0\x42\0\0\
-\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
-\0\0\x64\x75\x6d\x70\x5f\x62\x70\x66\x5f\x70\x72\x6f\x67\0\0\0\0\0\0\0\x1c\0\0\
-\0\0\0\0\0\x08\0\0\0\0\0\0\0\0\0\0\0\x01\0\0\0\x10\0\0\0\0\0\0\0\0\0\0\0\x1a\0\
-\0\0\x01\0\0\0\0\0\0\0\x13\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x10\0\0\0\0\0\
-\0\0\x62\x70\x66\x5f\x69\x74\x65\x72\x5f\x62\x70\x66\x5f\x70\x72\x6f\x67\0\0\0\
-\0\0\0\0";
-       opts.insns_sz = 2216;
+\0\0\x01\x24\0\0\0\x01\0\0\x0c\x05\0\0\0\xb0\0\0\0\x03\0\0\x04\x18\0\0\0\xbe\0\
+\0\0\x09\0\0\0\0\0\0\0\xc2\0\0\0\x0b\0\0\0\x40\0\0\0\xcd\0\0\0\x0b\0\0\0\x80\0\
+\0\0\0\0\0\0\0\0\0\x02\x0a\0\0\0\xd5\0\0\0\0\0\0\x07\0\0\0\0\xde\0\0\0\0\0\0\
+\x08\x0c\0\0\0\xe4\0\0\0\0\0\0\x01\x08\0\0\0\x40\0\0\0\xae\x01\0\0\x03\0\0\x04\
+\x18\0\0\0\xb6\x01\0\0\x0e\0\0\0\0\0\0\0\xb9\x01\0\0\x11\0\0\0\x20\0\0\0\xbe\
+\x01\0\0\x0e\0\0\0\xa0\0\0\0\xca\x01\0\0\0\0\0\x08\x0f\0\0\0\xd0\x01\0\0\0\0\0\
+\x01\x04\0\0\0\x20\0\0\0\xdd\x01\0\0\0\0\0\x01\x01\0\0\0\x08\0\0\x01\0\0\0\0\0\
+\0\0\x03\0\0\0\0\x10\0\0\0\x12\0\0\0\x10\0\0\0\xe2\x01\0\0\0\0\0\x01\x04\0\0\0\
+\x20\0\0\0\0\0\0\0\x01\0\0\x0d\x14\0\0\0\x26\x05\0\0\x04\0\0\0\x2b\x02\0\0\0\0\
+\0\x08\x15\0\0\0\x31\x02\0\0\0\0\0\x01\x08\0\0\0\x40\0\0\x01\x3b\x02\0\0\x01\0\
+\0\x0c\x13\0\0\0\0\0\0\0\0\0\0\x02\x18\0\0\0\x52\x02\0\0\x02\0\0\x04\x10\0\0\0\
+\x13\0\0\0\x03\0\0\0\0\0\0\0\x65\x02\0\0\x19\0\0\0\x40\0\0\0\0\0\0\0\0\0\0\x02\
+\x1c\0\0\0\0\0\0\0\x01\0\0\x0d\x06\0\0\0\x1c\0\0\0\x17\0\0\0\x6a\x02\0\0\x01\0\
+\0\x0c\x1a\0\0\0\xb6\x02\0\0\x01\0\0\x04\x08\0\0\0\xbf\x02\0\0\x1d\0\0\0\0\0\0\
+\0\0\0\0\0\0\0\0\x02\x1e\0\0\0\x10\x03\0\0\x06\0\0\x04\x38\0\0\0\xb6\x01\0\0\
+\x0e\0\0\0\0\0\0\0\xb9\x01\0\0\x11\0\0\0\x20\0\0\0\x1d\x03\0\0\x1f\0\0\0\xc0\0\
+\0\0\x2e\x03\0\0\x19\0\0\0\0\x01\0\0\x37\x03\0\0\x21\0\0\0\x40\x01\0\0\x41\x03\
+\0\0\x22\0\0\0\x80\x01\0\0\0\0\0\0\0\0\0\x02\x20\0\0\0\0\0\0\0\0\0\0\x0a\x10\0\
+\0\0\0\0\0\0\0\0\0\x02\x23\0\0\0\0\0\0\0\0\0\0\x02\x24\0\0\0\x8b\x03\0\0\x02\0\
+\0\x04\x08\0\0\0\x99\x03\0\0\x0e\0\0\0\0\0\0\0\xa2\x03\0\0\x0e\0\0\0\x20\0\0\0\
+\x41\x03\0\0\x03\0\0\x04\x18\0\0\0\xac\x03\0\0\x1f\0\0\0\0\0\0\0\xb4\x03\0\0\
+\x25\0\0\0\x40\0\0\0\xba\x03\0\0\x27\0\0\0\x80\0\0\0\0\0\0\0\0\0\0\x02\x26\0\0\
+\0\0\0\0\0\0\0\0\x02\x28\0\0\0\xbe\x03\0\0\x01\0\0\x04\x04\0\0\0\xc9\x03\0\0\
+\x0e\0\0\0\0\0\0\0\x32\x04\0\0\x01\0\0\x04\x04\0\0\0\x3b\x04\0\0\x0e\0\0\0\0\0\
+\0\0\0\0\0\0\0\0\0\x03\0\0\0\0\x20\0\0\0\x12\0\0\0\x30\0\0\0\xb1\x04\0\0\0\0\0\
+\x0e\x29\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x03\0\0\0\0\x20\0\0\0\x12\0\0\0\x1a\0\0\0\
+\xc5\x04\0\0\0\0\0\x0e\x2b\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x03\0\0\0\0\x20\0\0\0\
+\x12\0\0\0\x20\0\0\0\xdb\x04\0\0\0\0\0\x0e\x2d\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x03\
+\0\0\0\0\x20\0\0\0\x12\0\0\0\x11\0\0\0\xf0\x04\0\0\0\0\0\x0e\x2f\0\0\0\0\0\0\0\
+\0\0\0\0\0\0\0\x03\0\0\0\0\x10\0\0\0\x12\0\0\0\x04\0\0\0\x07\x05\0\0\0\0\0\x0e\
+\x31\0\0\0\x01\0\0\0\x0f\x05\0\0\x01\0\0\x0f\x04\0\0\0\x36\0\0\0\0\0\0\0\x04\0\
+\0\0\x16\x05\0\0\x04\0\0\x0f\x7b\0\0\0\x2a\0\0\0\0\0\0\0\x30\0\0\0\x2c\0\0\0\
+\x30\0\0\0\x1a\0\0\0\x2e\0\0\0\x4a\0\0\0\x20\0\0\0\x30\0\0\0\x6a\0\0\0\x11\0\0\
+\0\x1e\x05\0\0\x01\0\0\x0f\x04\0\0\0\x32\0\0\0\0\0\0\0\x04\0\0\0\x26\x05\0\0\0\
+\0\0\x0e\x06\0\0\0\x01\0\0\0\0\x62\x70\x66\x5f\x69\x74\x65\x72\x5f\x5f\x62\x70\
+\x66\x5f\x6d\x61\x70\0\x6d\x65\x74\x61\0\x6d\x61\x70\0\x63\x74\x78\0\x69\x6e\
+\x74\0\x64\x75\x6d\x70\x5f\x62\x70\x66\x5f\x6d\x61\x70\0\x69\x74\x65\x72\x2f\
+\x62\x70\x66\x5f\x6d\x61\x70\0\x30\x3a\x30\0\x2f\x68\x6f\x6d\x65\x2f\x61\x73\
+\x70\x73\x6b\x2f\x73\x72\x63\x2f\x62\x70\x66\x2d\x6e\x65\x78\x74\x2f\x6b\x65\
+\x72\x6e\x65\x6c\x2f\x62\x70\x66\x2f\x70\x72\x65\x6c\x6f\x61\x64\x2f\x69\x74\
+\x65\x72\x61\x74\x6f\x72\x73\x2f\x69\x74\x65\x72\x61\x74\x6f\x72\x73\x2e\x62\
+\x70\x66\x2e\x63\0\x09\x73\x74\x72\x75\x63\x74\x20\x73\x65\x71\x5f\x66\x69\x6c\
+\x65\x20\x2a\x73\x65\x71\x20\x3d\x20\x63\x74\x78\x2d\x3e\x6d\x65\x74\x61\x2d\
+\x3e\x73\x65\x71\x3b\0\x62\x70\x66\x5f\x69\x74\x65\x72\x5f\x6d\x65\x74\x61\0\
+\x73\x65\x71\0\x73\x65\x73\x73\x69\x6f\x6e\x5f\x69\x64\0\x73\x65\x71\x5f\x6e\
+\x75\x6d\0\x73\x65\x71\x5f\x66\x69\x6c\x65\0\x5f\x5f\x75\x36\x34\0\x75\x6e\x73\
+\x69\x67\x6e\x65\x64\x20\x6c\x6f\x6e\x67\x20\x6c\x6f\x6e\x67\0\x30\x3a\x31\0\
+\x09\x73\x74\x72\x75\x63\x74\x20\x62\x70\x66\x5f\x6d\x61\x70\x20\x2a\x6d\x61\
+\x70\x20\x3d\x20\x63\x74\x78\x2d\x3e\x6d\x61\x70\x3b\0\x09\x69\x66\x20\x28\x21\
+\x6d\x61\x70\x29\0\x30\x3a\x32\0\x09\x5f\x5f\x75\x36\x34\x20\x73\x65\x71\x5f\
+\x6e\x75\x6d\x20\x3d\x20\x63\x74\x78\x2d\x3e\x6d\x65\x74\x61\x2d\x3e\x73\x65\
+\x71\x5f\x6e\x75\x6d\x3b\0\x09\x69\x66\x20\x28\x73\x65\x71\x5f\x6e\x75\x6d\x20\
+\x3d\x3d\x20\x30\x29\0\x09\x09\x42\x50\x46\x5f\x53\x45\x51\x5f\x50\x52\x49\x4e\
+\x54\x46\x28\x73\x65\x71\x2c\x20\x22\x20\x20\x69\x64\x20\x6e\x61\x6d\x65\x20\
+\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x6d\x61\x78\x5f\x65\x6e\x74\
+\x72\x69\x65\x73\x20\x20\x63\x75\x72\x5f\x65\x6e\x74\x72\x69\x65\x73\x5c\x6e\
+\x22\x29\x3b\0\x62\x70\x66\x5f\x6d\x61\x70\0\x69\x64\0\x6e\x61\x6d\x65\0\x6d\
+\x61\x78\x5f\x65\x6e\x74\x72\x69\x65\x73\0\x5f\x5f\x75\x33\x32\0\x75\x6e\x73\
+\x69\x67\x6e\x65\x64\x20\x69\x6e\x74\0\x63\x68\x61\x72\0\x5f\x5f\x41\x52\x52\
+\x41\x59\x5f\x53\x49\x5a\x45\x5f\x54\x59\x50\x45\x5f\x5f\0\x09\x42\x50\x46\x5f\
+\x53\x45\x51\x5f\x50\x52\x49\x4e\x54\x46\x28\x73\x65\x71\x2c\x20\x22\x25\x34\
+\x75\x20\x25\x2d\x31\x36\x73\x20\x20\x25\x31\x30\x64\x20\x20\x20\x25\x31\x30\
+\x6c\x6c\x64\x5c\x6e\x22\x2c\0\x7d\0\x5f\x5f\x73\x36\x34\0\x6c\x6f\x6e\x67\x20\
+\x6c\x6f\x6e\x67\0\x62\x70\x66\x5f\x6d\x61\x70\x5f\x73\x75\x6d\x5f\x65\x6c\x65\
+\x6d\x5f\x63\x6f\x75\x6e\x74\0\x62\x70\x66\x5f\x69\x74\x65\x72\x5f\x5f\x62\x70\
+\x66\x5f\x70\x72\x6f\x67\0\x70\x72\x6f\x67\0\x64\x75\x6d\x70\x5f\x62\x70\x66\
+\x5f\x70\x72\x6f\x67\0\x69\x74\x65\x72\x2f\x62\x70\x66\x5f\x70\x72\x6f\x67\0\
+\x09\x73\x74\x72\x75\x63\x74\x20\x62\x70\x66\x5f\x70\x72\x6f\x67\x20\x2a\x70\
+\x72\x6f\x67\x20\x3d\x20\x63\x74\x78\x2d\x3e\x70\x72\x6f\x67\x3b\0\x09\x69\x66\
+\x20\x28\x21\x70\x72\x6f\x67\x29\0\x62\x70\x66\x5f\x70\x72\x6f\x67\0\x61\x75\
+\x78\0\x09\x61\x75\x78\x20\x3d\x20\x70\x72\x6f\x67\x2d\x3e\x61\x75\x78\x3b\0\
+\x09\x09\x42\x50\x46\x5f\x53\x45\x51\x5f\x50\x52\x49\x4e\x54\x46\x28\x73\x65\
+\x71\x2c\x20\x22\x20\x20\x69\x64\x20\x6e\x61\x6d\x65\x20\x20\x20\x20\x20\x20\
+\x20\x20\x20\x20\x20\x20\x20\x61\x74\x74\x61\x63\x68\x65\x64\x5c\x6e\x22\x29\
+\x3b\0\x62\x70\x66\x5f\x70\x72\x6f\x67\x5f\x61\x75\x78\0\x61\x74\x74\x61\x63\
+\x68\x5f\x66\x75\x6e\x63\x5f\x6e\x61\x6d\x65\0\x64\x73\x74\x5f\x70\x72\x6f\x67\
+\0\x66\x75\x6e\x63\x5f\x69\x6e\x66\x6f\0\x62\x74\x66\0\x09\x42\x50\x46\x5f\x53\
+\x45\x51\x5f\x50\x52\x49\x4e\x54\x46\x28\x73\x65\x71\x2c\x20\x22\x25\x34\x75\
+\x20\x25\x2d\x31\x36\x73\x20\x25\x73\x20\x25\x73\x5c\x6e\x22\x2c\x20\x61\x75\
+\x78\x2d\x3e\x69\x64\x2c\0\x30\x3a\x34\0\x30\x3a\x35\0\x09\x69\x66\x20\x28\x21\
+\x62\x74\x66\x29\0\x62\x70\x66\x5f\x66\x75\x6e\x63\x5f\x69\x6e\x66\x6f\0\x69\
+\x6e\x73\x6e\x5f\x6f\x66\x66\0\x74\x79\x70\x65\x5f\x69\x64\0\x30\0\x73\x74\x72\
+\x69\x6e\x67\x73\0\x74\x79\x70\x65\x73\0\x68\x64\x72\0\x62\x74\x66\x5f\x68\x65\
+\x61\x64\x65\x72\0\x73\x74\x72\x5f\x6c\x65\x6e\0\x09\x74\x79\x70\x65\x73\x20\
+\x3d\x20\x62\x74\x66\x2d\x3e\x74\x79\x70\x65\x73\x3b\0\x09\x62\x70\x66\x5f\x70\
+\x72\x6f\x62\x65\x5f\x72\x65\x61\x64\x5f\x6b\x65\x72\x6e\x65\x6c\x28\x26\x74\
+\x2c\x20\x73\x69\x7a\x65\x6f\x66\x28\x74\x29\x2c\x20\x74\x79\x70\x65\x73\x20\
+\x2b\x20\x62\x74\x66\x5f\x69\x64\x29\x3b\0\x09\x73\x74\x72\x20\x3d\x20\x62\x74\
+\x66\x2d\x3e\x73\x74\x72\x69\x6e\x67\x73\x3b\0\x62\x74\x66\x5f\x74\x79\x70\x65\
+\0\x6e\x61\x6d\x65\x5f\x6f\x66\x66\0\x09\x6e\x61\x6d\x65\x5f\x6f\x66\x66\x20\
+\x3d\x20\x42\x50\x46\x5f\x43\x4f\x52\x45\x5f\x52\x45\x41\x44\x28\x74\x2c\x20\
+\x6e\x61\x6d\x65\x5f\x6f\x66\x66\x29\x3b\0\x30\x3a\x32\x3a\x30\0\x09\x69\x66\
+\x20\x28\x6e\x61\x6d\x65\x5f\x6f\x66\x66\x20\x3e\x3d\x20\x62\x74\x66\x2d\x3e\
+\x68\x64\x72\x2e\x73\x74\x72\x5f\x6c\x65\x6e\x29\0\x09\x72\x65\x74\x75\x72\x6e\
+\x20\x73\x74\x72\x20\x2b\x20\x6e\x61\x6d\x65\x5f\x6f\x66\x66\x3b\0\x30\x3a\x33\
+\0\x64\x75\x6d\x70\x5f\x62\x70\x66\x5f\x6d\x61\x70\x2e\x5f\x5f\x5f\x66\x6d\x74\
+\0\x64\x75\x6d\x70\x5f\x62\x70\x66\x5f\x6d\x61\x70\x2e\x5f\x5f\x5f\x66\x6d\x74\
+\x2e\x31\0\x64\x75\x6d\x70\x5f\x62\x70\x66\x5f\x70\x72\x6f\x67\x2e\x5f\x5f\x5f\
+\x66\x6d\x74\0\x64\x75\x6d\x70\x5f\x62\x70\x66\x5f\x70\x72\x6f\x67\x2e\x5f\x5f\
+\x5f\x66\x6d\x74\x2e\x32\0\x4c\x49\x43\x45\x4e\x53\x45\0\x2e\x6b\x73\x79\x6d\
+\x73\0\x2e\x72\x6f\x64\x61\x74\x61\0\x6c\x69\x63\x65\x6e\x73\x65\0\x64\x75\x6d\
+\x6d\x79\x5f\x6b\x73\x79\x6d\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
+\xc9\x09\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x02\0\0\0\x04\0\0\0\x7b\0\0\0\x01\0\0\0\
+\x80\0\0\0\0\0\0\0\0\0\0\0\x69\x74\x65\x72\x61\x74\x6f\x72\x2e\x72\x6f\x64\x61\
+\x74\x61\0\0\0\0\0\0\0\0\0\0\0\0\0\x34\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x20\x20\
+\x69\x64\x20\x6e\x61\x6d\x65\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\
+\x20\x6d\x61\x78\x5f\x65\x6e\x74\x72\x69\x65\x73\x20\x20\x63\x75\x72\x5f\x65\
+\x6e\x74\x72\x69\x65\x73\x0a\0\x25\x34\x75\x20\x25\x2d\x31\x36\x73\x20\x20\x25\
+\x31\x30\x64\x20\x20\x20\x25\x31\x30\x6c\x6c\x64\x0a\0\x20\x20\x69\x64\x20\x6e\
+\x61\x6d\x65\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x61\x74\x74\
+\x61\x63\x68\x65\x64\x0a\0\x25\x34\x75\x20\x25\x2d\x31\x36\x73\x20\x25\x73\x20\
+\x25\x73\x0a\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
+\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x47\x50\x4c\0\0\0\0\0\x79\x12\0\0\0\
+\0\0\0\x79\x26\0\0\0\0\0\0\x79\x17\x08\0\0\0\0\0\x15\x07\x1d\0\0\0\0\0\x79\x21\
+\x10\0\0\0\0\0\x55\x01\x08\0\0\0\0\0\xbf\xa4\0\0\0\0\0\0\x07\x04\0\0\xe0\xff\
+\xff\xff\xbf\x61\0\0\0\0\0\0\x18\x62\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xb7\x03\0\0\
+\x30\0\0\0\xb7\x05\0\0\0\0\0\0\x85\0\0\0\x7e\0\0\0\x61\x71\0\0\0\0\0\0\x7b\x1a\
+\xe0\xff\0\0\0\0\xb7\x01\0\0\x04\0\0\0\xbf\x72\0\0\0\0\0\0\x0f\x12\0\0\0\0\0\0\
+\x7b\x2a\xe8\xff\0\0\0\0\x61\x71\x14\0\0\0\0\0\x7b\x1a\xf0\xff\0\0\0\0\xbf\x71\
+\0\0\0\0\0\0\x85\x20\0\0\0\0\0\0\x7b\x0a\xf8\xff\0\0\0\0\xbf\xa4\0\0\0\0\0\0\
+\x07\x04\0\0\xe0\xff\xff\xff\xbf\x61\0\0\0\0\0\0\x18\x62\0\0\0\0\0\0\0\0\0\0\
+\x30\0\0\0\xb7\x03\0\0\x1a\0\0\0\xb7\x05\0\0\x20\0\0\0\x85\0\0\0\x7e\0\0\0\xb7\
+\0\0\0\0\0\0\0\x95\0\0\0\0\0\0\0\0\0\0\0\x07\0\0\0\0\0\0\0\x42\0\0\0\x88\0\0\0\
+\x1e\x44\x01\0\x01\0\0\0\x42\0\0\0\x88\0\0\0\x24\x44\x01\0\x02\0\0\0\x42\0\0\0\
+\xfb\0\0\0\x1d\x4c\x01\0\x03\0\0\0\x42\0\0\0\x1c\x01\0\0\x06\x54\x01\0\x04\0\0\
+\0\x42\0\0\0\x2b\x01\0\0\x1d\x48\x01\0\x05\0\0\0\x42\0\0\0\x50\x01\0\0\x06\x60\
+\x01\0\x07\0\0\0\x42\0\0\0\x63\x01\0\0\x03\x64\x01\0\x0e\0\0\0\x42\0\0\0\xf6\
+\x01\0\0\x02\x6c\x01\0\x21\0\0\0\x42\0\0\0\x29\x02\0\0\x01\x80\x01\0\0\0\0\0\
+\x02\0\0\0\x3e\0\0\0\0\0\0\0\x08\0\0\0\x08\0\0\0\x3e\0\0\0\0\0\0\0\x10\0\0\0\
+\x02\0\0\0\xf7\0\0\0\0\0\0\0\x20\0\0\0\x08\0\0\0\x27\x01\0\0\0\0\0\0\x70\0\0\0\
+\x0d\0\0\0\x3e\0\0\0\0\0\0\0\x80\0\0\0\x0d\0\0\0\xf7\0\0\0\0\0\0\0\xa0\0\0\0\
+\x0d\0\0\0\x27\x01\0\0\0\0\0\0\x1a\0\0\0\x23\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
+\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x64\x75\x6d\x70\x5f\x62\
+\x70\x66\x5f\x6d\x61\x70\0\0\0\0\0\0\0\0\x1c\0\0\0\0\0\0\0\x08\0\0\0\0\0\0\0\0\
+\0\0\0\x01\0\0\0\x10\0\0\0\0\0\0\0\0\0\0\0\x09\0\0\0\x01\0\0\0\0\0\0\0\x07\0\0\
+\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x10\0\0\0\0\0\0\0\x62\x70\x66\x5f\x69\x74\
+\x65\x72\x5f\x62\x70\x66\x5f\x6d\x61\x70\0\0\0\0\0\0\0\0\x62\x70\x66\x5f\x6d\
+\x61\x70\x5f\x73\x75\x6d\x5f\x65\x6c\x65\x6d\x5f\x63\x6f\x75\x6e\x74\0\0\x47\
+\x50\x4c\0\0\0\0\0\x79\x12\0\0\0\0\0\0\x79\x26\0\0\0\0\0\0\x79\x11\x08\0\0\0\0\
+\0\x15\x01\x3b\0\0\0\0\0\x79\x17\0\0\0\0\0\0\x79\x21\x10\0\0\0\0\0\x55\x01\x08\
+\0\0\0\0\0\xbf\xa4\0\0\0\0\0\0\x07\x04\0\0\xd0\xff\xff\xff\xbf\x61\0\0\0\0\0\0\
+\x18\x62\0\0\0\0\0\0\0\0\0\0\x4a\0\0\0\xb7\x03\0\0\x20\0\0\0\xb7\x05\0\0\0\0\0\
+\0\x85\0\0\0\x7e\0\0\0\x7b\x6a\xc8\xff\0\0\0\0\x61\x71\0\0\0\0\0\0\x7b\x1a\xd0\
+\xff\0\0\0\0\xb7\x03\0\0\x04\0\0\0\xbf\x79\0\0\0\0\0\0\x0f\x39\0\0\0\0\0\0\x79\
+\x71\x28\0\0\0\0\0\x79\x78\x30\0\0\0\0\0\x15\x08\x18\0\0\0\0\0\xb7\x02\0\0\0\0\
+\0\0\x0f\x21\0\0\0\0\0\0\x61\x11\x04\0\0\0\0\0\x79\x83\x08\0\0\0\0\0\x67\x01\0\
+\0\x03\0\0\0\x0f\x13\0\0\0\0\0\0\x79\x86\0\0\0\0\0\0\xbf\xa1\0\0\0\0\0\0\x07\
+\x01\0\0\xf8\xff\xff\xff\xb7\x02\0\0\x08\0\0\0\x85\0\0\0\x71\0\0\0\xb7\x01\0\0\
+\0\0\0\0\x79\xa3\xf8\xff\0\0\0\0\x0f\x13\0\0\0\0\0\0\xbf\xa1\0\0\0\0\0\0\x07\
+\x01\0\0\xf4\xff\xff\xff\xb7\x02\0\0\x04\0\0\0\x85\0\0\0\x71\0\0\0\xb7\x03\0\0\
+\x04\0\0\0\x61\xa1\xf4\xff\0\0\0\0\x61\x82\x10\0\0\0\0\0\x3d\x21\x02\0\0\0\0\0\
+\x0f\x16\0\0\0\0\0\0\xbf\x69\0\0\0\0\0\0\x7b\x9a\xd8\xff\0\0\0\0\x79\x71\x18\0\
+\0\0\0\0\x7b\x1a\xe0\xff\0\0\0\0\x79\x71\x20\0\0\0\0\0\x79\x11\0\0\0\0\0\0\x0f\
+\x31\0\0\0\0\0\0\x7b\x1a\xe8\xff\0\0\0\0\xbf\xa4\0\0\0\0\0\0\x07\x04\0\0\xd0\
+\xff\xff\xff\x79\xa1\xc8\xff\0\0\0\0\x18\x62\0\0\0\0\0\0\0\0\0\0\x6a\0\0\0\xb7\
+\x03\0\0\x11\0\0\0\xb7\x05\0\0\x20\0\0\0\x85\0\0\0\x7e\0\0\0\xb7\0\0\0\0\0\0\0\
+\x95\0\0\0\0\0\0\0\0\0\0\0\x1b\0\0\0\0\0\0\0\x42\0\0\0\x88\0\0\0\x1e\x94\x01\0\
+\x01\0\0\0\x42\0\0\0\x88\0\0\0\x24\x94\x01\0\x02\0\0\0\x42\0\0\0\x86\x02\0\0\
+\x1f\x9c\x01\0\x03\0\0\0\x42\0\0\0\xaa\x02\0\0\x06\xa8\x01\0\x04\0\0\0\x42\0\0\
+\0\xc3\x02\0\0\x0e\xb4\x01\0\x05\0\0\0\x42\0\0\0\x2b\x01\0\0\x1d\x98\x01\0\x06\
+\0\0\0\x42\0\0\0\x50\x01\0\0\x06\xb8\x01\0\x08\0\0\0\x42\0\0\0\xd5\x02\0\0\x03\
+\xbc\x01\0\x10\0\0\0\x42\0\0\0\x45\x03\0\0\x02\xc4\x01\0\x17\0\0\0\x42\0\0\0\
+\x80\x03\0\0\x06\x04\x01\0\x1a\0\0\0\x42\0\0\0\x45\x03\0\0\x02\xc4\x01\0\x1b\0\
+\0\0\x42\0\0\0\xd1\x03\0\0\x0f\x10\x01\0\x1c\0\0\0\x42\0\0\0\xe6\x03\0\0\x2d\
+\x14\x01\0\x1e\0\0\0\x42\0\0\0\x1d\x04\0\0\x0d\x0c\x01\0\x20\0\0\0\x42\0\0\0\
+\x45\x03\0\0\x02\xc4\x01\0\x21\0\0\0\x42\0\0\0\xe6\x03\0\0\x02\x14\x01\0\x24\0\
+\0\0\x42\0\0\0\x44\x04\0\0\x0d\x18\x01\0\x27\0\0\0\x42\0\0\0\x45\x03\0\0\x02\
+\xc4\x01\0\x28\0\0\0\x42\0\0\0\x44\x04\0\0\x0d\x18\x01\0\x2b\0\0\0\x42\0\0\0\
+\x44\x04\0\0\x0d\x18\x01\0\x2c\0\0\0\x42\0\0\0\x72\x04\0\0\x1b\x1c\x01\0\x2d\0\
+\0\0\x42\0\0\0\x72\x04\0\0\x06\x1c\x01\0\x2e\0\0\0\x42\0\0\0\x95\x04\0\0\x0d\
+\x24\x01\0\x30\0\0\0\x42\0\0\0\x45\x03\0\0\x02\xc4\x01\0\x3f\0\0\0\x42\0\0\0\
+\x29\x02\0\0\x01\xd4\x01\0\0\0\0\0\x18\0\0\0\x3e\0\0\0\0\0\0\0\x08\0\0\0\x08\0\
+\0\0\x3e\0\0\0\0\0\0\0\x10\0\0\0\x18\0\0\0\xf7\0\0\0\0\0\0\0\x20\0\0\0\x1c\0\0\
+\0\x3e\0\0\0\0\0\0\0\x28\0\0\0\x08\0\0\0\x27\x01\0\0\0\0\0\0\x80\0\0\0\x1e\0\0\
+\0\x3e\0\0\0\0\0\0\0\x90\0\0\0\x1e\0\0\0\xf7\0\0\0\0\0\0\0\xa8\0\0\0\x1e\0\0\0\
+\x78\x03\0\0\0\0\0\0\xb0\0\0\0\x1e\0\0\0\x7c\x03\0\0\0\0\0\0\xc0\0\0\0\x23\0\0\
+\0\xaa\x03\0\0\0\0\0\0\xd8\0\0\0\x24\0\0\0\xf7\0\0\0\0\0\0\0\xf0\0\0\0\x24\0\0\
+\0\x3e\0\0\0\0\0\0\0\x18\x01\0\0\x28\0\0\0\x3e\0\0\0\0\0\0\0\x50\x01\0\0\x1e\0\
+\0\0\xf7\0\0\0\0\0\0\0\x60\x01\0\0\x24\0\0\0\x6c\x04\0\0\0\0\0\0\x88\x01\0\0\
+\x1e\0\0\0\x27\x01\0\0\0\0\0\0\x98\x01\0\0\x1e\0\0\0\xad\x04\0\0\0\0\0\0\xa0\
+\x01\0\0\x1c\0\0\0\x3e\0\0\0\0\0\0\0\x1a\0\0\0\x41\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
+\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x64\x75\x6d\x70\x5f\
+\x62\x70\x66\x5f\x70\x72\x6f\x67\0\0\0\0\0\0\0\x1c\0\0\0\0\0\0\0\x08\0\0\0\0\0\
+\0\0\0\0\0\0\x01\0\0\0\x10\0\0\0\0\0\0\0\0\0\0\0\x19\0\0\0\x01\0\0\0\0\0\0\0\
+\x12\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x10\0\0\0\0\0\0\0\x62\x70\x66\x5f\
+\x69\x74\x65\x72\x5f\x62\x70\x66\x5f\x70\x72\x6f\x67\0\0\0\0\0\0\0";
+       opts.insns_sz = 2456;
        opts.insns = (void *)"\
 \xbf\x16\0\0\0\0\0\0\xbf\xa1\0\0\0\0\0\0\x07\x01\0\0\x78\xff\xff\xff\xb7\x02\0\
 \0\x88\0\0\0\xb7\x03\0\0\0\0\0\0\x85\0\0\0\x71\0\0\0\x05\0\x14\0\0\0\0\0\x61\
@@ -331,79 +326,83 @@ iterators_bpf__load(struct iterators_bpf *skel)
 \0\0\0\x85\0\0\0\xa8\0\0\0\x18\x60\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x61\x01\0\0\0\0\
 \0\0\xd5\x01\x02\0\0\0\0\0\xbf\x19\0\0\0\0\0\0\x85\0\0\0\xa8\0\0\0\xbf\x70\0\0\
 \0\0\0\0\x95\0\0\0\0\0\0\0\x61\x60\x08\0\0\0\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\
-\x48\x0e\0\0\x63\x01\0\0\0\0\0\0\x61\x60\x0c\0\0\0\0\0\x18\x61\0\0\0\0\0\0\0\0\
-\0\0\x44\x0e\0\0\x63\x01\0\0\0\0\0\0\x79\x60\x10\0\0\0\0\0\x18\x61\0\0\0\0\0\0\
-\0\0\0\0\x38\x0e\0\0\x7b\x01\0\0\0\0\0\0\x18\x60\0\0\0\0\0\0\0\0\0\0\0\x05\0\0\
-\x18\x61\0\0\0\0\0\0\0\0\0\0\x30\x0e\0\0\x7b\x01\0\0\0\0\0\0\xb7\x01\0\0\x12\0\
-\0\0\x18\x62\0\0\0\0\0\0\0\0\0\0\x30\x0e\0\0\xb7\x03\0\0\x1c\0\0\0\x85\0\0\0\
+\xe8\x0e\0\0\x63\x01\0\0\0\0\0\0\x61\x60\x0c\0\0\0\0\0\x18\x61\0\0\0\0\0\0\0\0\
+\0\0\xe4\x0e\0\0\x63\x01\0\0\0\0\0\0\x79\x60\x10\0\0\0\0\0\x18\x61\0\0\0\0\0\0\
+\0\0\0\0\xd8\x0e\0\0\x7b\x01\0\0\0\0\0\0\x18\x60\0\0\0\0\0\0\0\0\0\0\0\x05\0\0\
+\x18\x61\0\0\0\0\0\0\0\0\0\0\xd0\x0e\0\0\x7b\x01\0\0\0\0\0\0\xb7\x01\0\0\x12\0\
+\0\0\x18\x62\0\0\0\0\0\0\0\0\0\0\xd0\x0e\0\0\xb7\x03\0\0\x1c\0\0\0\x85\0\0\0\
 \xa6\0\0\0\xbf\x07\0\0\0\0\0\0\xc5\x07\xd4\xff\0\0\0\0\x63\x7a\x78\xff\0\0\0\0\
-\x61\xa0\x78\xff\0\0\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\x80\x0e\0\0\x63\x01\0\0\0\
+\x61\xa0\x78\xff\0\0\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\x20\x0f\0\0\x63\x01\0\0\0\
 \0\0\0\x61\x60\x1c\0\0\0\0\0\x15\0\x03\0\0\0\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\
-\x5c\x0e\0\0\x63\x01\0\0\0\0\0\0\xb7\x01\0\0\0\0\0\0\x18\x62\0\0\0\0\0\0\0\0\0\
-\0\x50\x0e\0\0\xb7\x03\0\0\x48\0\0\0\x85\0\0\0\xa6\0\0\0\xbf\x07\0\0\0\0\0\0\
+\xfc\x0e\0\0\x63\x01\0\0\0\0\0\0\xb7\x01\0\0\0\0\0\0\x18\x62\0\0\0\0\0\0\0\0\0\
+\0\xf0\x0e\0\0\xb7\x03\0\0\x48\0\0\0\x85\0\0\0\xa6\0\0\0\xbf\x07\0\0\0\0\0\0\
 \xc5\x07\xc3\xff\0\0\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x63\x71\0\0\0\0\0\
-\0\x79\x63\x20\0\0\0\0\0\x15\x03\x08\0\0\0\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\x98\
-\x0e\0\0\xb7\x02\0\0\x62\0\0\0\x61\x60\x04\0\0\0\0\0\x45\0\x02\0\x01\0\0\0\x85\
+\0\x79\x63\x20\0\0\0\0\0\x15\x03\x08\0\0\0\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\x38\
+\x0f\0\0\xb7\x02\0\0\x7b\0\0\0\x61\x60\x04\0\0\0\0\0\x45\0\x02\0\x01\0\0\0\x85\
 \0\0\0\x94\0\0\0\x05\0\x01\0\0\0\0\0\x85\0\0\0\x71\0\0\0\x18\x62\0\0\0\0\0\0\0\
-\0\0\0\0\0\0\0\x61\x20\0\0\0\0\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\x08\x0f\0\0\x63\
-\x01\0\0\0\0\0\0\x18\x60\0\0\0\0\0\0\0\0\0\0\0\x0f\0\0\x18\x61\0\0\0\0\0\0\0\0\
-\0\0\x10\x0f\0\0\x7b\x01\0\0\0\0\0\0\x18\x60\0\0\0\0\0\0\0\0\0\0\x98\x0e\0\0\
-\x18\x61\0\0\0\0\0\0\0\0\0\0\x18\x0f\0\0\x7b\x01\0\0\0\0\0\0\xb7\x01\0\0\x02\0\
-\0\0\x18\x62\0\0\0\0\0\0\0\0\0\0\x08\x0f\0\0\xb7\x03\0\0\x20\0\0\0\x85\0\0\0\
+\0\0\0\0\0\0\0\x61\x20\0\0\0\0\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\xc0\x0f\0\0\x63\
+\x01\0\0\0\0\0\0\x18\x60\0\0\0\0\0\0\0\0\0\0\xb8\x0f\0\0\x18\x61\0\0\0\0\0\0\0\
+\0\0\0\xc8\x0f\0\0\x7b\x01\0\0\0\0\0\0\x18\x60\0\0\0\0\0\0\0\0\0\0\x38\x0f\0\0\
+\x18\x61\0\0\0\0\0\0\0\0\0\0\xd0\x0f\0\0\x7b\x01\0\0\0\0\0\0\xb7\x01\0\0\x02\0\
+\0\0\x18\x62\0\0\0\0\0\0\0\0\0\0\xc0\x0f\0\0\xb7\x03\0\0\x20\0\0\0\x85\0\0\0\
 \xa6\0\0\0\xbf\x07\0\0\0\0\0\0\xc5\x07\x9f\xff\0\0\0\0\x18\x62\0\0\0\0\0\0\0\0\
-\0\0\0\0\0\0\x61\x20\0\0\0\0\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\x28\x0f\0\0\x63\
-\x01\0\0\0\0\0\0\xb7\x01\0\0\x16\0\0\0\x18\x62\0\0\0\0\0\0\0\0\0\0\x28\x0f\0\0\
+\0\0\0\0\0\0\x61\x20\0\0\0\0\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\xe0\x0f\0\0\x63\
+\x01\0\0\0\0\0\0\xb7\x01\0\0\x16\0\0\0\x18\x62\0\0\0\0\0\0\0\0\0\0\xe0\x0f\0\0\
 \xb7\x03\0\0\x04\0\0\0\x85\0\0\0\xa6\0\0\0\xbf\x07\0\0\0\0\0\0\xc5\x07\x92\xff\
-\0\0\0\0\x18\x60\0\0\0\0\0\0\0\0\0\0\x30\x0f\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\
-\x78\x11\0\0\x7b\x01\0\0\0\0\0\0\x18\x60\0\0\0\0\0\0\0\0\0\0\x38\x0f\0\0\x18\
-\x61\0\0\0\0\0\0\0\0\0\0\x70\x11\0\0\x7b\x01\0\0\0\0\0\0\x18\x60\0\0\0\0\0\0\0\
-\0\0\0\x40\x10\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\xb8\x11\0\0\x7b\x01\0\0\0\0\0\0\
-\x18\x60\0\0\0\0\0\0\0\0\0\0\x48\x10\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\xc8\x11\0\
-\0\x7b\x01\0\0\0\0\0\0\x18\x60\0\0\0\0\0\0\0\0\0\0\xe8\x10\0\0\x18\x61\0\0\0\0\
-\0\0\0\0\0\0\xe8\x11\0\0\x7b\x01\0\0\0\0\0\0\x18\x60\0\0\0\0\0\0\0\0\0\0\0\0\0\
-\0\x18\x61\0\0\0\0\0\0\0\0\0\0\xe0\x11\0\0\x7b\x01\0\0\0\0\0\0\x61\x60\x08\0\0\
-\0\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\x80\x11\0\0\x63\x01\0\0\0\0\0\0\x61\x60\x0c\
-\0\0\0\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\x84\x11\0\0\x63\x01\0\0\0\0\0\0\x79\x60\
-\x10\0\0\0\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\x88\x11\0\0\x7b\x01\0\0\0\0\0\0\x61\
-\xa0\x78\xff\0\0\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\xb0\x11\0\0\x63\x01\0\0\0\0\0\
-\0\x18\x61\0\0\0\0\0\0\0\0\0\0\xf8\x11\0\0\xb7\x02\0\0\x11\0\0\0\xb7\x03\0\0\
+\0\0\0\0\x18\x60\0\0\0\0\0\0\0\0\0\0\xe8\x0f\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\
+\x20\x12\0\0\x7b\x01\0\0\0\0\0\0\x18\x60\0\0\0\0\0\0\0\0\0\0\xf0\x0f\0\0\x18\
+\x61\0\0\0\0\0\0\0\0\0\0\x18\x12\0\0\x7b\x01\0\0\0\0\0\0\x18\x60\0\0\0\0\0\0\0\
+\0\0\0\x08\x11\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\x60\x12\0\0\x7b\x01\0\0\0\0\0\0\
+\x18\x60\0\0\0\0\0\0\0\0\0\0\x10\x11\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\x70\x12\0\
+\0\x7b\x01\0\0\0\0\0\0\x18\x60\0\0\0\0\0\0\0\0\0\0\xa0\x11\0\0\x18\x61\0\0\0\0\
+\0\0\0\0\0\0\x90\x12\0\0\x7b\x01\0\0\0\0\0\0\x18\x60\0\0\0\0\0\0\0\0\0\0\0\0\0\
+\0\x18\x61\0\0\0\0\0\0\0\0\0\0\x88\x12\0\0\x7b\x01\0\0\0\0\0\0\x61\x60\x08\0\0\
+\0\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\x28\x12\0\0\x63\x01\0\0\0\0\0\0\x61\x60\x0c\
+\0\0\0\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\x2c\x12\0\0\x63\x01\0\0\0\0\0\0\x79\x60\
+\x10\0\0\0\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\x30\x12\0\0\x7b\x01\0\0\0\0\0\0\x61\
+\xa0\x78\xff\0\0\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\x58\x12\0\0\x63\x01\0\0\0\0\0\
+\0\x18\x61\0\0\0\0\0\0\0\0\0\0\xa0\x12\0\0\xb7\x02\0\0\x11\0\0\0\xb7\x03\0\0\
 \x0c\0\0\0\xb7\x04\0\0\0\0\0\0\x85\0\0\0\xa7\0\0\0\xbf\x07\0\0\0\0\0\0\xc5\x07\
-\x5c\xff\0\0\0\0\x18\x60\0\0\0\0\0\0\0\0\0\0\x68\x11\0\0\x63\x70\x6c\0\0\0\0\0\
-\x77\x07\0\0\x20\0\0\0\x63\x70\x70\0\0\0\0\0\xb7\x01\0\0\x05\0\0\0\x18\x62\0\0\
-\0\0\0\0\0\0\0\0\x68\x11\0\0\xb7\x03\0\0\x8c\0\0\0\x85\0\0\0\xa6\0\0\0\xbf\x07\
-\0\0\0\0\0\0\x18\x60\0\0\0\0\0\0\0\0\0\0\xd8\x11\0\0\x61\x01\0\0\0\0\0\0\xd5\
-\x01\x02\0\0\0\0\0\xbf\x19\0\0\0\0\0\0\x85\0\0\0\xa8\0\0\0\xc5\x07\x4a\xff\0\0\
-\0\0\x63\x7a\x80\xff\0\0\0\0\x18\x60\0\0\0\0\0\0\0\0\0\0\x10\x12\0\0\x18\x61\0\
-\0\0\0\0\0\0\0\0\0\x10\x17\0\0\x7b\x01\0\0\0\0\0\0\x18\x60\0\0\0\0\0\0\0\0\0\0\
-\x18\x12\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\x08\x17\0\0\x7b\x01\0\0\0\0\0\0\x18\
-\x60\0\0\0\0\0\0\0\0\0\0\x28\x14\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\x50\x17\0\0\
-\x7b\x01\0\0\0\0\0\0\x18\x60\0\0\0\0\0\0\0\0\0\0\x30\x14\0\0\x18\x61\0\0\0\0\0\
-\0\0\0\0\0\x60\x17\0\0\x7b\x01\0\0\0\0\0\0\x18\x60\0\0\0\0\0\0\0\0\0\0\xd0\x15\
-\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\x80\x17\0\0\x7b\x01\0\0\0\0\0\0\x18\x60\0\0\0\
-\0\0\0\0\0\0\0\0\0\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\x78\x17\0\0\x7b\x01\0\0\0\0\
-\0\0\x61\x60\x08\0\0\0\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\x18\x17\0\0\x63\x01\0\0\
-\0\0\0\0\x61\x60\x0c\0\0\0\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\x1c\x17\0\0\x63\x01\
-\0\0\0\0\0\0\x79\x60\x10\0\0\0\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\x20\x17\0\0\x7b\
-\x01\0\0\0\0\0\0\x61\xa0\x78\xff\0\0\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\x48\x17\0\
-\0\x63\x01\0\0\0\0\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\x90\x17\0\0\xb7\x02\0\0\x12\
-\0\0\0\xb7\x03\0\0\x0c\0\0\0\xb7\x04\0\0\0\0\0\0\x85\0\0\0\xa7\0\0\0\xbf\x07\0\
-\0\0\0\0\0\xc5\x07\x13\xff\0\0\0\0\x18\x60\0\0\0\0\0\0\0\0\0\0\0\x17\0\0\x63\
-\x70\x6c\0\0\0\0\0\x77\x07\0\0\x20\0\0\0\x63\x70\x70\0\0\0\0\0\xb7\x01\0\0\x05\
-\0\0\0\x18\x62\0\0\0\0\0\0\0\0\0\0\0\x17\0\0\xb7\x03\0\0\x8c\0\0\0\x85\0\0\0\
-\xa6\0\0\0\xbf\x07\0\0\0\0\0\0\x18\x60\0\0\0\0\0\0\0\0\0\0\x70\x17\0\0\x61\x01\
-\0\0\0\0\0\0\xd5\x01\x02\0\0\0\0\0\xbf\x19\0\0\0\0\0\0\x85\0\0\0\xa8\0\0\0\xc5\
-\x07\x01\xff\0\0\0\0\x63\x7a\x84\xff\0\0\0\0\x61\xa1\x78\xff\0\0\0\0\xd5\x01\
-\x02\0\0\0\0\0\xbf\x19\0\0\0\0\0\0\x85\0\0\0\xa8\0\0\0\x61\xa0\x80\xff\0\0\0\0\
-\x63\x06\x28\0\0\0\0\0\x61\xa0\x84\xff\0\0\0\0\x63\x06\x2c\0\0\0\0\0\x18\x61\0\
-\0\0\0\0\0\0\0\0\0\0\0\0\0\x61\x10\0\0\0\0\0\0\x63\x06\x18\0\0\0\0\0\xb7\0\0\0\
-\0\0\0\0\x95\0\0\0\0\0\0\0";
+\x5c\xff\0\0\0\0\x18\x60\0\0\0\0\0\0\0\0\0\0\x10\x12\0\0\x63\x70\x6c\0\0\0\0\0\
+\x77\x07\0\0\x20\0\0\0\x63\x70\x70\0\0\0\0\0\x18\x68\0\0\0\0\0\0\0\0\0\0\xa8\
+\x10\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\xb8\x12\0\0\xb7\x02\0\0\x17\0\0\0\xb7\x03\
+\0\0\x0c\0\0\0\xb7\x04\0\0\0\0\0\0\x85\0\0\0\xa7\0\0\0\xbf\x07\0\0\0\0\0\0\xc5\
+\x07\x4d\xff\0\0\0\0\x75\x07\x03\0\0\0\0\0\x62\x08\x04\0\0\0\0\0\x6a\x08\x02\0\
+\0\0\0\0\x05\0\x0a\0\0\0\0\0\x63\x78\x04\0\0\0\0\0\xbf\x79\0\0\0\0\0\0\x77\x09\
+\0\0\x20\0\0\0\x18\x60\0\0\0\0\0\0\0\0\0\0\0\x01\0\0\x63\x90\0\0\0\0\0\0\x55\
+\x09\x02\0\0\0\0\0\x6a\x08\x02\0\0\0\0\0\x05\0\x01\0\0\0\0\0\x6a\x08\x02\0\x40\
+\0\0\0\xb7\x01\0\0\x05\0\0\0\x18\x62\0\0\0\0\0\0\0\0\0\0\x10\x12\0\0\xb7\x03\0\
+\0\x8c\0\0\0\x85\0\0\0\xa6\0\0\0\xbf\x07\0\0\0\0\0\0\x18\x60\0\0\0\0\0\0\0\0\0\
+\0\0\x01\0\0\x61\x01\0\0\0\0\0\0\xd5\x01\x02\0\0\0\0\0\xbf\x19\0\0\0\0\0\0\x85\
+\0\0\0\xa8\0\0\0\x18\x60\0\0\0\0\0\0\0\0\0\0\x80\x12\0\0\x61\x01\0\0\0\0\0\0\
+\xd5\x01\x02\0\0\0\0\0\xbf\x19\0\0\0\0\0\0\x85\0\0\0\xa8\0\0\0\xc5\x07\x2c\xff\
+\0\0\0\0\x63\x7a\x80\xff\0\0\0\0\x18\x60\0\0\0\0\0\0\0\0\0\0\xd0\x12\0\0\x18\
+\x61\0\0\0\0\0\0\0\0\0\0\xa8\x17\0\0\x7b\x01\0\0\0\0\0\0\x18\x60\0\0\0\0\0\0\0\
+\0\0\0\xd8\x12\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\xa0\x17\0\0\x7b\x01\0\0\0\0\0\0\
+\x18\x60\0\0\0\0\0\0\0\0\0\0\xe0\x14\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\xe8\x17\0\
+\0\x7b\x01\0\0\0\0\0\0\x18\x60\0\0\0\0\0\0\0\0\0\0\xe8\x14\0\0\x18\x61\0\0\0\0\
+\0\0\0\0\0\0\xf8\x17\0\0\x7b\x01\0\0\0\0\0\0\x18\x60\0\0\0\0\0\0\0\0\0\0\x78\
+\x16\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\x18\x18\0\0\x7b\x01\0\0\0\0\0\0\x18\x60\0\
+\0\0\0\0\0\0\0\0\0\0\0\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\x10\x18\0\0\x7b\x01\0\0\
+\0\0\0\0\x61\x60\x08\0\0\0\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\xb0\x17\0\0\x63\x01\
+\0\0\0\0\0\0\x61\x60\x0c\0\0\0\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\xb4\x17\0\0\x63\
+\x01\0\0\0\0\0\0\x79\x60\x10\0\0\0\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\xb8\x17\0\0\
+\x7b\x01\0\0\0\0\0\0\x61\xa0\x78\xff\0\0\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\xe0\
+\x17\0\0\x63\x01\0\0\0\0\0\0\x18\x61\0\0\0\0\0\0\0\0\0\0\x28\x18\0\0\xb7\x02\0\
+\0\x12\0\0\0\xb7\x03\0\0\x0c\0\0\0\xb7\x04\0\0\0\0\0\0\x85\0\0\0\xa7\0\0\0\xbf\
+\x07\0\0\0\0\0\0\xc5\x07\xf5\xfe\0\0\0\0\x18\x60\0\0\0\0\0\0\0\0\0\0\x98\x17\0\
+\0\x63\x70\x6c\0\0\0\0\0\x77\x07\0\0\x20\0\0\0\x63\x70\x70\0\0\0\0\0\xb7\x01\0\
+\0\x05\0\0\0\x18\x62\0\0\0\0\0\0\0\0\0\0\x98\x17\0\0\xb7\x03\0\0\x8c\0\0\0\x85\
+\0\0\0\xa6\0\0\0\xbf\x07\0\0\0\0\0\0\x18\x60\0\0\0\0\0\0\0\0\0\0\x08\x18\0\0\
+\x61\x01\0\0\0\0\0\0\xd5\x01\x02\0\0\0\0\0\xbf\x19\0\0\0\0\0\0\x85\0\0\0\xa8\0\
+\0\0\xc5\x07\xe3\xfe\0\0\0\0\x63\x7a\x84\xff\0\0\0\0\x61\xa1\x78\xff\0\0\0\0\
+\xd5\x01\x02\0\0\0\0\0\xbf\x19\0\0\0\0\0\0\x85\0\0\0\xa8\0\0\0\x61\xa0\x80\xff\
+\0\0\0\0\x63\x06\x28\0\0\0\0\0\x61\xa0\x84\xff\0\0\0\0\x63\x06\x2c\0\0\0\0\0\
+\x18\x61\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x61\x10\0\0\0\0\0\0\x63\x06\x18\0\0\0\0\0\
+\xb7\0\0\0\0\0\0\0\x95\0\0\0\0\0\0\0";
        err = bpf_load_and_run(&opts);
        if (err < 0)
                return err;
-       skel->rodata = skel_finalize_map_data(&skel->maps.rodata.initial_value,
-                                       4096, PROT_READ, skel->maps.rodata.map_fd);
-       if (!skel->rodata)
-               return -ENOMEM;
        return 0;
 }
 
@@ -422,4 +421,15 @@ iterators_bpf__open_and_load(void)
        return skel;
 }
 
+__attribute__((unused)) static void
+iterators_bpf__assert(struct iterators_bpf *s __attribute__((unused)))
+{
+#ifdef __cplusplus
+#define _Static_assert static_assert
+#endif
+#ifdef __cplusplus
+#undef _Static_assert
+#endif
+}
+
 #endif /* __ITERATORS_BPF_SKEL_H__ */
index 875ac9b..f045fde 100644 (file)
 
 #define RINGBUF_MAX_RECORD_SZ (UINT_MAX/4)
 
-/* Maximum size of ring buffer area is limited by 32-bit page offset within
- * record header, counted in pages. Reserve 8 bits for extensibility, and take
- * into account few extra pages for consumer/producer pages and
- * non-mmap()'able parts. This gives 64GB limit, which seems plenty for single
- * ring buffer.
- */
-#define RINGBUF_MAX_DATA_SZ \
-       (((1ULL << 24) - RINGBUF_POS_PAGES - RINGBUF_PGOFF) * PAGE_SIZE)
-
 struct bpf_ringbuf {
        wait_queue_head_t waitq;
        struct irq_work work;
@@ -161,6 +152,17 @@ static void bpf_ringbuf_notify(struct irq_work *work)
        wake_up_all(&rb->waitq);
 }
 
+/* Maximum size of ring buffer area is limited by 32-bit page offset within
+ * record header, counted in pages. Reserve 8 bits for extensibility, and
+ * take into account few extra pages for consumer/producer pages and
+ * non-mmap()'able parts, the current maximum size would be:
+ *
+ *     (((1ULL << 24) - RINGBUF_POS_PAGES - RINGBUF_PGOFF) * PAGE_SIZE)
+ *
+ * This gives 64GB limit, which seems plenty for single ring buffer. Now
+ * considering that the maximum value of data_sz is (4GB - 1), there
+ * will be no overflow, so just note the size limit in the comments.
+ */
 static struct bpf_ringbuf *bpf_ringbuf_alloc(size_t data_sz, int numa_node)
 {
        struct bpf_ringbuf *rb;
@@ -193,12 +195,6 @@ static struct bpf_map *ringbuf_map_alloc(union bpf_attr *attr)
            !PAGE_ALIGNED(attr->max_entries))
                return ERR_PTR(-EINVAL);
 
-#ifdef CONFIG_64BIT
-       /* on 32-bit arch, it's impossible to overflow record's hdr->pgoff */
-       if (attr->max_entries > RINGBUF_MAX_DATA_SZ)
-               return ERR_PTR(-E2BIG);
-#endif
-
        rb_map = bpf_map_area_alloc(sizeof(*rb_map), NUMA_NO_NODE);
        if (!rb_map)
                return ERR_PTR(-ENOMEM);
index a2aef90..ee8cb1a 100644 (file)
@@ -3295,6 +3295,25 @@ static void bpf_raw_tp_link_show_fdinfo(const struct bpf_link *link,
                   raw_tp_link->btp->tp->name);
 }
 
+static int bpf_copy_to_user(char __user *ubuf, const char *buf, u32 ulen,
+                           u32 len)
+{
+       if (ulen >= len + 1) {
+               if (copy_to_user(ubuf, buf, len + 1))
+                       return -EFAULT;
+       } else {
+               char zero = '\0';
+
+               if (copy_to_user(ubuf, buf, ulen - 1))
+                       return -EFAULT;
+               if (put_user(zero, ubuf + ulen - 1))
+                       return -EFAULT;
+               return -ENOSPC;
+       }
+
+       return 0;
+}
+
 static int bpf_raw_tp_link_fill_link_info(const struct bpf_link *link,
                                          struct bpf_link_info *info)
 {
@@ -3313,20 +3332,7 @@ static int bpf_raw_tp_link_fill_link_info(const struct bpf_link *link,
        if (!ubuf)
                return 0;
 
-       if (ulen >= tp_len + 1) {
-               if (copy_to_user(ubuf, tp_name, tp_len + 1))
-                       return -EFAULT;
-       } else {
-               char zero = '\0';
-
-               if (copy_to_user(ubuf, tp_name, ulen - 1))
-                       return -EFAULT;
-               if (put_user(zero, ubuf + ulen - 1))
-                       return -EFAULT;
-               return -ENOSPC;
-       }
-
-       return 0;
+       return bpf_copy_to_user(ubuf, tp_name, ulen, tp_len);
 }
 
 static const struct bpf_link_ops bpf_raw_tp_link_lops = {
@@ -3358,9 +3364,155 @@ static void bpf_perf_link_dealloc(struct bpf_link *link)
        kfree(perf_link);
 }
 
+static int bpf_perf_link_fill_common(const struct perf_event *event,
+                                    char __user *uname, u32 ulen,
+                                    u64 *probe_offset, u64 *probe_addr,
+                                    u32 *fd_type)
+{
+       const char *buf;
+       u32 prog_id;
+       size_t len;
+       int err;
+
+       if (!ulen ^ !uname)
+               return -EINVAL;
+       if (!uname)
+               return 0;
+
+       err = bpf_get_perf_event_info(event, &prog_id, fd_type, &buf,
+                                     probe_offset, probe_addr);
+       if (err)
+               return err;
+
+       if (buf) {
+               len = strlen(buf);
+               err = bpf_copy_to_user(uname, buf, ulen, len);
+               if (err)
+                       return err;
+       } else {
+               char zero = '\0';
+
+               if (put_user(zero, uname))
+                       return -EFAULT;
+       }
+       return 0;
+}
+
+#ifdef CONFIG_KPROBE_EVENTS
+static int bpf_perf_link_fill_kprobe(const struct perf_event *event,
+                                    struct bpf_link_info *info)
+{
+       char __user *uname;
+       u64 addr, offset;
+       u32 ulen, type;
+       int err;
+
+       uname = u64_to_user_ptr(info->perf_event.kprobe.func_name);
+       ulen = info->perf_event.kprobe.name_len;
+       err = bpf_perf_link_fill_common(event, uname, ulen, &offset, &addr,
+                                       &type);
+       if (err)
+               return err;
+       if (type == BPF_FD_TYPE_KRETPROBE)
+               info->perf_event.type = BPF_PERF_EVENT_KRETPROBE;
+       else
+               info->perf_event.type = BPF_PERF_EVENT_KPROBE;
+
+       info->perf_event.kprobe.offset = offset;
+       if (!kallsyms_show_value(current_cred()))
+               addr = 0;
+       info->perf_event.kprobe.addr = addr;
+       return 0;
+}
+#endif
+
+#ifdef CONFIG_UPROBE_EVENTS
+static int bpf_perf_link_fill_uprobe(const struct perf_event *event,
+                                    struct bpf_link_info *info)
+{
+       char __user *uname;
+       u64 addr, offset;
+       u32 ulen, type;
+       int err;
+
+       uname = u64_to_user_ptr(info->perf_event.uprobe.file_name);
+       ulen = info->perf_event.uprobe.name_len;
+       err = bpf_perf_link_fill_common(event, uname, ulen, &offset, &addr,
+                                       &type);
+       if (err)
+               return err;
+
+       if (type == BPF_FD_TYPE_URETPROBE)
+               info->perf_event.type = BPF_PERF_EVENT_URETPROBE;
+       else
+               info->perf_event.type = BPF_PERF_EVENT_UPROBE;
+       info->perf_event.uprobe.offset = offset;
+       return 0;
+}
+#endif
+
+static int bpf_perf_link_fill_probe(const struct perf_event *event,
+                                   struct bpf_link_info *info)
+{
+#ifdef CONFIG_KPROBE_EVENTS
+       if (event->tp_event->flags & TRACE_EVENT_FL_KPROBE)
+               return bpf_perf_link_fill_kprobe(event, info);
+#endif
+#ifdef CONFIG_UPROBE_EVENTS
+       if (event->tp_event->flags & TRACE_EVENT_FL_UPROBE)
+               return bpf_perf_link_fill_uprobe(event, info);
+#endif
+       return -EOPNOTSUPP;
+}
+
+static int bpf_perf_link_fill_tracepoint(const struct perf_event *event,
+                                        struct bpf_link_info *info)
+{
+       char __user *uname;
+       u32 ulen;
+
+       uname = u64_to_user_ptr(info->perf_event.tracepoint.tp_name);
+       ulen = info->perf_event.tracepoint.name_len;
+       info->perf_event.type = BPF_PERF_EVENT_TRACEPOINT;
+       return bpf_perf_link_fill_common(event, uname, ulen, NULL, NULL, NULL);
+}
+
+static int bpf_perf_link_fill_perf_event(const struct perf_event *event,
+                                        struct bpf_link_info *info)
+{
+       info->perf_event.event.type = event->attr.type;
+       info->perf_event.event.config = event->attr.config;
+       info->perf_event.type = BPF_PERF_EVENT_EVENT;
+       return 0;
+}
+
+static int bpf_perf_link_fill_link_info(const struct bpf_link *link,
+                                       struct bpf_link_info *info)
+{
+       struct bpf_perf_link *perf_link;
+       const struct perf_event *event;
+
+       perf_link = container_of(link, struct bpf_perf_link, link);
+       event = perf_get_event(perf_link->perf_file);
+       if (IS_ERR(event))
+               return PTR_ERR(event);
+
+       switch (event->prog->type) {
+       case BPF_PROG_TYPE_PERF_EVENT:
+               return bpf_perf_link_fill_perf_event(event, info);
+       case BPF_PROG_TYPE_TRACEPOINT:
+               return bpf_perf_link_fill_tracepoint(event, info);
+       case BPF_PROG_TYPE_KPROBE:
+               return bpf_perf_link_fill_probe(event, info);
+       default:
+               return -EOPNOTSUPP;
+       }
+}
+
 static const struct bpf_link_ops bpf_perf_link_lops = {
        .release = bpf_perf_link_release,
        .dealloc = bpf_perf_link_dealloc,
+       .fill_link_info = bpf_perf_link_fill_link_info,
 };
 
 static int bpf_perf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
index 930b555..0b9da95 100644 (file)
@@ -25,6 +25,7 @@
 #include <linux/btf_ids.h>
 #include <linux/poison.h>
 #include <linux/module.h>
+#include <linux/cpumask.h>
 
 #include "disasm.h"
 
@@ -6067,6 +6068,11 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
                                   type_is_rcu_or_null(env, reg, field_name, btf_id)) {
                                /* __rcu tagged pointers can be NULL */
                                flag |= MEM_RCU | PTR_MAYBE_NULL;
+
+                               /* We always trust them */
+                               if (type_is_rcu_or_null(env, reg, field_name, btf_id) &&
+                                   flag & PTR_UNTRUSTED)
+                                       flag &= ~PTR_UNTRUSTED;
                        } else if (flag & (MEM_PERCPU | MEM_USER)) {
                                /* keep as-is */
                        } else {
@@ -9117,19 +9123,33 @@ static void do_refine_retval_range(struct bpf_reg_state *regs, int ret_type,
 {
        struct bpf_reg_state *ret_reg = &regs[BPF_REG_0];
 
-       if (ret_type != RET_INTEGER ||
-           (func_id != BPF_FUNC_get_stack &&
-            func_id != BPF_FUNC_get_task_stack &&
-            func_id != BPF_FUNC_probe_read_str &&
-            func_id != BPF_FUNC_probe_read_kernel_str &&
-            func_id != BPF_FUNC_probe_read_user_str))
+       if (ret_type != RET_INTEGER)
                return;
 
-       ret_reg->smax_value = meta->msize_max_value;
-       ret_reg->s32_max_value = meta->msize_max_value;
-       ret_reg->smin_value = -MAX_ERRNO;
-       ret_reg->s32_min_value = -MAX_ERRNO;
-       reg_bounds_sync(ret_reg);
+       switch (func_id) {
+       case BPF_FUNC_get_stack:
+       case BPF_FUNC_get_task_stack:
+       case BPF_FUNC_probe_read_str:
+       case BPF_FUNC_probe_read_kernel_str:
+       case BPF_FUNC_probe_read_user_str:
+               ret_reg->smax_value = meta->msize_max_value;
+               ret_reg->s32_max_value = meta->msize_max_value;
+               ret_reg->smin_value = -MAX_ERRNO;
+               ret_reg->s32_min_value = -MAX_ERRNO;
+               reg_bounds_sync(ret_reg);
+               break;
+       case BPF_FUNC_get_smp_processor_id:
+               ret_reg->umax_value = nr_cpu_ids - 1;
+               ret_reg->u32_max_value = nr_cpu_ids - 1;
+               ret_reg->smax_value = nr_cpu_ids - 1;
+               ret_reg->s32_max_value = nr_cpu_ids - 1;
+               ret_reg->umin_value = 0;
+               ret_reg->u32_min_value = 0;
+               ret_reg->smin_value = 0;
+               ret_reg->s32_min_value = 0;
+               reg_bounds_sync(ret_reg);
+               break;
+       }
 }
 
 static int
index 98c1544..f95cfb5 100644 (file)
@@ -493,7 +493,6 @@ static inline void rcu_expedite_gp(void) { }
 static inline void rcu_unexpedite_gp(void) { }
 static inline void rcu_async_hurry(void) { }
 static inline void rcu_async_relax(void) { }
-static inline void rcu_request_urgent_qs_task(struct task_struct *t) { }
 #else /* #ifdef CONFIG_TINY_RCU */
 bool rcu_gp_is_normal(void);     /* Internal RCU use. */
 bool rcu_gp_is_expedited(void);  /* Internal RCU use. */
@@ -508,7 +507,6 @@ void show_rcu_tasks_gp_kthreads(void);
 #else /* #ifdef CONFIG_TASKS_RCU_GENERIC */
 static inline void show_rcu_tasks_gp_kthreads(void) {}
 #endif /* #else #ifdef CONFIG_TASKS_RCU_GENERIC */
-void rcu_request_urgent_qs_task(struct task_struct *t);
 #endif /* #else #ifdef CONFIG_TINY_RCU */
 
 #define RCU_SCHEDULER_INACTIVE 0
index 5f2dcab..c92eb8c 100644 (file)
@@ -2369,9 +2369,13 @@ int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
        if (is_tracepoint || is_syscall_tp) {
                *buf = is_tracepoint ? event->tp_event->tp->name
                                     : event->tp_event->name;
-               *fd_type = BPF_FD_TYPE_TRACEPOINT;
-               *probe_offset = 0x0;
-               *probe_addr = 0x0;
+               /* We allow NULL pointer for tracepoint */
+               if (fd_type)
+                       *fd_type = BPF_FD_TYPE_TRACEPOINT;
+               if (probe_offset)
+                       *probe_offset = 0x0;
+               if (probe_addr)
+                       *probe_addr = 0x0;
        } else {
                /* kprobe/uprobe */
                err = -EOPNOTSUPP;
@@ -2384,7 +2388,7 @@ int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
 #ifdef CONFIG_UPROBE_EVENTS
                if (flags & TRACE_EVENT_FL_UPROBE)
                        err = bpf_get_uprobe_info(event, fd_type, buf,
-                                                 probe_offset,
+                                                 probe_offset, probe_addr,
                                                  event->attr.type == PERF_TYPE_TRACEPOINT);
 #endif
        }
@@ -2469,6 +2473,7 @@ struct bpf_kprobe_multi_link {
        u32 cnt;
        u32 mods_cnt;
        struct module **mods;
+       u32 flags;
 };
 
 struct bpf_kprobe_multi_run_ctx {
@@ -2558,9 +2563,44 @@ static void bpf_kprobe_multi_link_dealloc(struct bpf_link *link)
        kfree(kmulti_link);
 }
 
+static int bpf_kprobe_multi_link_fill_link_info(const struct bpf_link *link,
+                                               struct bpf_link_info *info)
+{
+       u64 __user *uaddrs = u64_to_user_ptr(info->kprobe_multi.addrs);
+       struct bpf_kprobe_multi_link *kmulti_link;
+       u32 ucount = info->kprobe_multi.count;
+       int err = 0, i;
+
+       if (!uaddrs ^ !ucount)
+               return -EINVAL;
+
+       kmulti_link = container_of(link, struct bpf_kprobe_multi_link, link);
+       info->kprobe_multi.count = kmulti_link->cnt;
+       info->kprobe_multi.flags = kmulti_link->flags;
+
+       if (!uaddrs)
+               return 0;
+       if (ucount < kmulti_link->cnt)
+               err = -ENOSPC;
+       else
+               ucount = kmulti_link->cnt;
+
+       if (kallsyms_show_value(current_cred())) {
+               if (copy_to_user(uaddrs, kmulti_link->addrs, ucount * sizeof(u64)))
+                       return -EFAULT;
+       } else {
+               for (i = 0; i < ucount; i++) {
+                       if (put_user(0, uaddrs + i))
+                               return -EFAULT;
+               }
+       }
+       return err;
+}
+
 static const struct bpf_link_ops bpf_kprobe_multi_link_lops = {
        .release = bpf_kprobe_multi_link_release,
        .dealloc = bpf_kprobe_multi_link_dealloc,
+       .fill_link_info = bpf_kprobe_multi_link_fill_link_info,
 };
 
 static void bpf_kprobe_multi_cookie_swap(void *a, void *b, int size, const void *priv)
@@ -2874,6 +2914,7 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
        link->addrs = addrs;
        link->cookies = cookies;
        link->cnt = cnt;
+       link->flags = flags;
 
        if (cookies) {
                /*
index 23dba01..17c21c0 100644 (file)
@@ -1561,15 +1561,10 @@ int bpf_get_kprobe_info(const struct perf_event *event, u32 *fd_type,
 
        *fd_type = trace_kprobe_is_return(tk) ? BPF_FD_TYPE_KRETPROBE
                                              : BPF_FD_TYPE_KPROBE;
-       if (tk->symbol) {
-               *symbol = tk->symbol;
-               *probe_offset = tk->rp.kp.offset;
-               *probe_addr = 0;
-       } else {
-               *symbol = NULL;
-               *probe_offset = 0;
-               *probe_addr = (unsigned long)tk->rp.kp.addr;
-       }
+       *probe_offset = tk->rp.kp.offset;
+       *probe_addr = kallsyms_show_value(current_cred()) ?
+                     (unsigned long)tk->rp.kp.addr : 0;
+       *symbol = tk->symbol;
        return 0;
 }
 #endif /* CONFIG_PERF_EVENTS */
index fa09b33..eaa8005 100644 (file)
@@ -1417,7 +1417,7 @@ static void uretprobe_perf_func(struct trace_uprobe *tu, unsigned long func,
 
 int bpf_get_uprobe_info(const struct perf_event *event, u32 *fd_type,
                        const char **filename, u64 *probe_offset,
-                       bool perf_type_tracepoint)
+                       u64 *probe_addr, bool perf_type_tracepoint)
 {
        const char *pevent = trace_event_name(event->tp_event);
        const char *group = event->tp_event->class->system;
@@ -1434,6 +1434,7 @@ int bpf_get_uprobe_info(const struct perf_event *event, u32 *fd_type,
                                    : BPF_FD_TYPE_UPROBE;
        *filename = tu->filename;
        *probe_offset = tu->offset;
+       *probe_addr = 0;
        return 0;
 }
 #endif /* CONFIG_PERF_EVENTS */
index fa08334..913a7a0 100644 (file)
@@ -14381,25 +14381,15 @@ static void *generate_test_data(struct bpf_test *test, int sub)
                 * single fragment to the skb, filled with
                 * test->frag_data.
                 */
-               void *ptr;
-
                page = alloc_page(GFP_KERNEL);
-
                if (!page)
                        goto err_kfree_skb;
 
-               ptr = kmap(page);
-               if (!ptr)
-                       goto err_free_page;
-               memcpy(ptr, test->frag_data, MAX_DATA);
-               kunmap(page);
+               memcpy(page_address(page), test->frag_data, MAX_DATA);
                skb_add_rx_frag(skb, 0, page, 0, MAX_DATA, MAX_DATA);
        }
 
        return skb;
-
-err_free_page:
-       __free_page(page);
 err_kfree_skb:
        kfree_skb(skb);
        return NULL;
index 2321bd2..7d47f53 100644 (file)
@@ -555,12 +555,23 @@ __bpf_kfunc u32 bpf_fentry_test9(u32 *a)
        return *a;
 }
 
+void noinline bpf_fentry_test_sinfo(struct skb_shared_info *sinfo)
+{
+}
+
 __bpf_kfunc int bpf_modify_return_test(int a, int *b)
 {
        *b += 1;
        return a + *b;
 }
 
+__bpf_kfunc int bpf_modify_return_test2(int a, int *b, short c, int d,
+                                       void *e, char f, int g)
+{
+       *b += 1;
+       return a + *b + c + d + (long)e + f + g;
+}
+
 int noinline bpf_fentry_shadow_test(int a)
 {
        return a + 1;
@@ -596,6 +607,7 @@ __diag_pop();
 
 BTF_SET8_START(bpf_test_modify_return_ids)
 BTF_ID_FLAGS(func, bpf_modify_return_test)
+BTF_ID_FLAGS(func, bpf_modify_return_test2)
 BTF_ID_FLAGS(func, bpf_fentry_test1, KF_SLEEPABLE)
 BTF_SET8_END(bpf_test_modify_return_ids)
 
@@ -663,7 +675,11 @@ int bpf_prog_test_run_tracing(struct bpf_prog *prog,
        case BPF_MODIFY_RETURN:
                ret = bpf_modify_return_test(1, &b);
                if (b != 2)
-                       side_effect = 1;
+                       side_effect++;
+               b = 2;
+               ret += bpf_modify_return_test2(1, &b, 3, 4, (void *)5, 6, 7);
+               if (b != 2)
+                       side_effect++;
                break;
        default:
                goto out;
index 615f24e..595b98d 100644 (file)
@@ -248,7 +248,7 @@ BTF_LLC_PROBE := $(shell $(LLC) -march=bpf -mattr=help 2>&1 | grep dwarfris)
 BTF_PAHOLE_PROBE := $(shell $(BTF_PAHOLE) --help 2>&1 | grep BTF)
 BTF_OBJCOPY_PROBE := $(shell $(LLVM_OBJCOPY) --help 2>&1 | grep -i 'usage.*llvm')
 BTF_LLVM_PROBE := $(shell echo "int main() { return 0; }" | \
-                         $(CLANG) -target bpf -O2 -g -c -x c - -o ./llvm_btf_verify.o; \
+                         $(CLANG) --target=bpf -O2 -g -c -x c - -o ./llvm_btf_verify.o; \
                          $(LLVM_READELF) -S ./llvm_btf_verify.o | grep BTF; \
                          /bin/rm -f ./llvm_btf_verify.o)
 
@@ -370,7 +370,7 @@ endif
 clean-files += vmlinux.h
 
 # Get Clang's default includes on this system, as opposed to those seen by
-# '-target bpf'. This fixes "missing" files on some architectures/distros,
+# '--target=bpf'. This fixes "missing" files on some architectures/distros,
 # such as asm/byteorder.h, asm/socket.h, asm/sockios.h, sys/cdefs.h etc.
 #
 # Use '-idirafter': Don't interfere with include mechanics except where the
@@ -392,7 +392,7 @@ $(obj)/xdp_router_ipv4.bpf.o: $(obj)/xdp_sample.bpf.o
 
 $(obj)/%.bpf.o: $(src)/%.bpf.c $(obj)/vmlinux.h $(src)/xdp_sample.bpf.h $(src)/xdp_sample_shared.h
        @echo "  CLANG-BPF " $@
-       $(Q)$(CLANG) -g -O2 -target bpf -D__TARGET_ARCH_$(SRCARCH) \
+       $(Q)$(CLANG) -g -O2 --target=bpf -D__TARGET_ARCH_$(SRCARCH) \
                -Wno-compare-distinct-pointer-types -I$(srctree)/include \
                -I$(srctree)/samples/bpf -I$(srctree)/tools/include \
                -I$(LIBBPF_INCLUDE) $(CLANG_SYS_INCLUDES) \
index 719225b..1c638d9 100644 (file)
@@ -1 +1 @@
-/* dummy .h to trick /usr/include/features.h to work with 'clang -target bpf' */
+/* dummy .h to trick /usr/include/features.h to work with 'clang --target=bpf' */
index e7121dd..090fecf 100644 (file)
@@ -44,12 +44,14 @@ static __always_inline void count(void *map)
                bpf_map_update_elem(map, &key, &init_val, BPF_NOEXIST);
 }
 
+#if !defined(__aarch64__)
 SEC("tracepoint/syscalls/sys_enter_open")
 int trace_enter_open(struct syscalls_enter_open_args *ctx)
 {
        count(&enter_open_map);
        return 0;
 }
+#endif
 
 SEC("tracepoint/syscalls/sys_enter_openat")
 int trace_enter_open_at(struct syscalls_enter_open_args *ctx)
@@ -65,12 +67,14 @@ int trace_enter_open_at2(struct syscalls_enter_open_args *ctx)
        return 0;
 }
 
+#if !defined(__aarch64__)
 SEC("tracepoint/syscalls/sys_exit_open")
 int trace_enter_exit(struct syscalls_exit_open_args *ctx)
 {
        count(&exit_open_map);
        return 0;
 }
+#endif
 
 SEC("tracepoint/syscalls/sys_exit_openat")
 int trace_enter_exit_at(struct syscalls_exit_open_args *ctx)
index 0bf2d0f..148e2df 100755 (executable)
@@ -376,7 +376,7 @@ DST_MAC=$(lookup_mac $VETH1 $NS1)
 SRC_MAC=$(lookup_mac $VETH0)
 DST_IFINDEX=$(cat /sys/class/net/$VETH0/ifindex)
 
-CLANG_OPTS="-O2 -target bpf -I ../include/"
+CLANG_OPTS="-O2 --target=bpf -I ../include/"
 CLANG_OPTS+=" -DSRC_MAC=$SRC_MAC -DDST_MAC=$DST_MAC -DDST_IFINDEX=$DST_IFINDEX"
 clang $CLANG_OPTS -c $PROG_SRC -o $BPF_PROG
 
index 0262882..9f7fe29 100644 (file)
@@ -86,7 +86,7 @@ BTF_LLC_PROBE := $(shell $(LLC) -march=bpf -mattr=help 2>&1 | grep dwarfris)
 BTF_PAHOLE_PROBE := $(shell $(BTF_PAHOLE) --help 2>&1 | grep BTF)
 BTF_OBJCOPY_PROBE := $(shell $(LLVM_OBJCOPY) --help 2>&1 | grep -i 'usage.*llvm')
 BTF_LLVM_PROBE := $(shell echo "int main() { return 0; }" | \
-                         $(CLANG) -target bpf -O2 -g -c -x c - -o ./llvm_btf_verify.o; \
+                         $(CLANG) --target=bpf -O2 -g -c -x c - -o ./llvm_btf_verify.o; \
                          $(LLVM_READELF) -S ./llvm_btf_verify.o | grep BTF; \
                          /bin/rm -f ./llvm_btf_verify.o)
 
@@ -181,7 +181,7 @@ endif
 clean-files += vmlinux.h
 
 # Get Clang's default includes on this system, as opposed to those seen by
-# '-target bpf'. This fixes "missing" files on some architectures/distros,
+# '--target=bpf'. This fixes "missing" files on some architectures/distros,
 # such as asm/byteorder.h, asm/socket.h, asm/sockios.h, sys/cdefs.h etc.
 #
 # Use '-idirafter': Don't interfere with include mechanics except where the
@@ -198,7 +198,7 @@ EXTRA_BPF_HEADERS_SRC := $(addprefix $(src)/,$(EXTRA_BPF_HEADERS))
 
 $(obj)/%.bpf.o: $(src)/%.bpf.c $(EXTRA_BPF_HEADERS_SRC) $(obj)/vmlinux.h
        @echo "  CLANG-BPF " $@
-       $(Q)$(CLANG) -g -O2 -target bpf -D__TARGET_ARCH_$(SRCARCH) \
+       $(Q)$(CLANG) -g -O2 --target=bpf -D__TARGET_ARCH_$(SRCARCH) \
                -Wno-compare-distinct-pointer-types -I$(srctree)/include \
                -I$(srctree)/samples/bpf -I$(srctree)/tools/include \
                -I$(LIBBPF_INCLUDE) $(CLANG_SYS_INCLUDES) \
index 68454ef..5006e72 100644 (file)
@@ -260,9 +260,9 @@ EXAMPLES
 This is example BPF application with two BPF programs and a mix of BPF maps
 and global variables. Source code is split across two source code files.
 
-**$ clang -target bpf -g example1.bpf.c -o example1.bpf.o**
+**$ clang --target=bpf -g example1.bpf.c -o example1.bpf.o**
 
-**$ clang -target bpf -g example2.bpf.c -o example2.bpf.o**
+**$ clang --target=bpf -g example2.bpf.c -o example2.bpf.o**
 
 **$ bpftool gen object example.bpf.o example1.bpf.o example2.bpf.o**
 
index 681fbcc..e9154ac 100644 (file)
@@ -216,7 +216,7 @@ $(OUTPUT)%.bpf.o: skeleton/%.bpf.c $(OUTPUT)vmlinux.h $(LIBBPF_BOOTSTRAP)
                -I$(srctree)/tools/include/uapi/ \
                -I$(LIBBPF_BOOTSTRAP_INCLUDE) \
                -g -O2 -Wall -fno-stack-protector \
-               -target bpf -c $< -o $@
+               --target=bpf -c $< -o $@
        $(Q)$(LLVM_STRIP) -g $@
 
 $(OUTPUT)%.skel.h: $(OUTPUT)%.bpf.o $(BPFTOOL_BOOTSTRAP)
index 294de23..1b7f697 100644 (file)
@@ -835,7 +835,7 @@ static void dotlabel_puts(const char *s)
                case '|':
                case ' ':
                        putchar('\\');
-                       /* fallthrough */
+                       fallthrough;
                default:
                        putchar(*s);
                }
index 0675d6a..edda4fc 100644 (file)
@@ -757,7 +757,7 @@ probe_helpers_for_progtype(enum bpf_prog_type prog_type,
                case BPF_FUNC_probe_write_user:
                        if (!full_mode)
                                continue;
-                       /* fallthrough */
+                       fallthrough;
                default:
                        probe_res |= probe_helper_for_progtype(prog_type, supported_type,
                                                  define_prefix, id, prog_type_str,
index 2d78607..65a168d 100644 (file)
@@ -5,6 +5,7 @@
 #include <linux/err.h>
 #include <linux/netfilter.h>
 #include <linux/netfilter_arp.h>
+#include <linux/perf_event.h>
 #include <net/if.h>
 #include <stdio.h>
 #include <unistd.h>
 
 #include "json_writer.h"
 #include "main.h"
+#include "xlated_dumper.h"
+
+#define PERF_HW_CACHE_LEN 128
 
 static struct hashmap *link_table;
+static struct dump_data dd;
+
+static const char *perf_type_name[PERF_TYPE_MAX] = {
+       [PERF_TYPE_HARDWARE]                    = "hardware",
+       [PERF_TYPE_SOFTWARE]                    = "software",
+       [PERF_TYPE_TRACEPOINT]                  = "tracepoint",
+       [PERF_TYPE_HW_CACHE]                    = "hw-cache",
+       [PERF_TYPE_RAW]                         = "raw",
+       [PERF_TYPE_BREAKPOINT]                  = "breakpoint",
+};
+
+const char *event_symbols_hw[PERF_COUNT_HW_MAX] = {
+       [PERF_COUNT_HW_CPU_CYCLES]              = "cpu-cycles",
+       [PERF_COUNT_HW_INSTRUCTIONS]            = "instructions",
+       [PERF_COUNT_HW_CACHE_REFERENCES]        = "cache-references",
+       [PERF_COUNT_HW_CACHE_MISSES]            = "cache-misses",
+       [PERF_COUNT_HW_BRANCH_INSTRUCTIONS]     = "branch-instructions",
+       [PERF_COUNT_HW_BRANCH_MISSES]           = "branch-misses",
+       [PERF_COUNT_HW_BUS_CYCLES]              = "bus-cycles",
+       [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = "stalled-cycles-frontend",
+       [PERF_COUNT_HW_STALLED_CYCLES_BACKEND]  = "stalled-cycles-backend",
+       [PERF_COUNT_HW_REF_CPU_CYCLES]          = "ref-cycles",
+};
+
+const char *event_symbols_sw[PERF_COUNT_SW_MAX] = {
+       [PERF_COUNT_SW_CPU_CLOCK]               = "cpu-clock",
+       [PERF_COUNT_SW_TASK_CLOCK]              = "task-clock",
+       [PERF_COUNT_SW_PAGE_FAULTS]             = "page-faults",
+       [PERF_COUNT_SW_CONTEXT_SWITCHES]        = "context-switches",
+       [PERF_COUNT_SW_CPU_MIGRATIONS]          = "cpu-migrations",
+       [PERF_COUNT_SW_PAGE_FAULTS_MIN]         = "minor-faults",
+       [PERF_COUNT_SW_PAGE_FAULTS_MAJ]         = "major-faults",
+       [PERF_COUNT_SW_ALIGNMENT_FAULTS]        = "alignment-faults",
+       [PERF_COUNT_SW_EMULATION_FAULTS]        = "emulation-faults",
+       [PERF_COUNT_SW_DUMMY]                   = "dummy",
+       [PERF_COUNT_SW_BPF_OUTPUT]              = "bpf-output",
+       [PERF_COUNT_SW_CGROUP_SWITCHES]         = "cgroup-switches",
+};
+
+const char *evsel__hw_cache[PERF_COUNT_HW_CACHE_MAX] = {
+       [PERF_COUNT_HW_CACHE_L1D]               = "L1-dcache",
+       [PERF_COUNT_HW_CACHE_L1I]               = "L1-icache",
+       [PERF_COUNT_HW_CACHE_LL]                = "LLC",
+       [PERF_COUNT_HW_CACHE_DTLB]              = "dTLB",
+       [PERF_COUNT_HW_CACHE_ITLB]              = "iTLB",
+       [PERF_COUNT_HW_CACHE_BPU]               = "branch",
+       [PERF_COUNT_HW_CACHE_NODE]              = "node",
+};
+
+const char *evsel__hw_cache_op[PERF_COUNT_HW_CACHE_OP_MAX] = {
+       [PERF_COUNT_HW_CACHE_OP_READ]           = "load",
+       [PERF_COUNT_HW_CACHE_OP_WRITE]          = "store",
+       [PERF_COUNT_HW_CACHE_OP_PREFETCH]       = "prefetch",
+};
+
+const char *evsel__hw_cache_result[PERF_COUNT_HW_CACHE_RESULT_MAX] = {
+       [PERF_COUNT_HW_CACHE_RESULT_ACCESS]     = "refs",
+       [PERF_COUNT_HW_CACHE_RESULT_MISS]       = "misses",
+};
+
+#define perf_event_name(array, id) ({                  \
+       const char *event_str = NULL;                   \
+                                                       \
+       if ((id) >= 0 && (id) < ARRAY_SIZE(array))      \
+               event_str = array[id];                  \
+       event_str;                                      \
+})
 
 static int link_parse_fd(int *argc, char ***argv)
 {
@@ -166,6 +237,154 @@ static int get_prog_info(int prog_id, struct bpf_prog_info *info)
        return err;
 }
 
+static int cmp_u64(const void *A, const void *B)
+{
+       const __u64 *a = A, *b = B;
+
+       return *a - *b;
+}
+
+static void
+show_kprobe_multi_json(struct bpf_link_info *info, json_writer_t *wtr)
+{
+       __u32 i, j = 0;
+       __u64 *addrs;
+
+       jsonw_bool_field(json_wtr, "retprobe",
+                        info->kprobe_multi.flags & BPF_F_KPROBE_MULTI_RETURN);
+       jsonw_uint_field(json_wtr, "func_cnt", info->kprobe_multi.count);
+       jsonw_name(json_wtr, "funcs");
+       jsonw_start_array(json_wtr);
+       addrs = u64_to_ptr(info->kprobe_multi.addrs);
+       qsort(addrs, info->kprobe_multi.count, sizeof(addrs[0]), cmp_u64);
+
+       /* Load it once for all. */
+       if (!dd.sym_count)
+               kernel_syms_load(&dd);
+       for (i = 0; i < dd.sym_count; i++) {
+               if (dd.sym_mapping[i].address != addrs[j])
+                       continue;
+               jsonw_start_object(json_wtr);
+               jsonw_uint_field(json_wtr, "addr", dd.sym_mapping[i].address);
+               jsonw_string_field(json_wtr, "func", dd.sym_mapping[i].name);
+               /* Print null if it is vmlinux */
+               if (dd.sym_mapping[i].module[0] == '\0') {
+                       jsonw_name(json_wtr, "module");
+                       jsonw_null(json_wtr);
+               } else {
+                       jsonw_string_field(json_wtr, "module", dd.sym_mapping[i].module);
+               }
+               jsonw_end_object(json_wtr);
+               if (j++ == info->kprobe_multi.count)
+                       break;
+       }
+       jsonw_end_array(json_wtr);
+}
+
+static void
+show_perf_event_kprobe_json(struct bpf_link_info *info, json_writer_t *wtr)
+{
+       jsonw_bool_field(wtr, "retprobe", info->perf_event.type == BPF_PERF_EVENT_KRETPROBE);
+       jsonw_uint_field(wtr, "addr", info->perf_event.kprobe.addr);
+       jsonw_string_field(wtr, "func",
+                          u64_to_ptr(info->perf_event.kprobe.func_name));
+       jsonw_uint_field(wtr, "offset", info->perf_event.kprobe.offset);
+}
+
+static void
+show_perf_event_uprobe_json(struct bpf_link_info *info, json_writer_t *wtr)
+{
+       jsonw_bool_field(wtr, "retprobe", info->perf_event.type == BPF_PERF_EVENT_URETPROBE);
+       jsonw_string_field(wtr, "file",
+                          u64_to_ptr(info->perf_event.uprobe.file_name));
+       jsonw_uint_field(wtr, "offset", info->perf_event.uprobe.offset);
+}
+
+static void
+show_perf_event_tracepoint_json(struct bpf_link_info *info, json_writer_t *wtr)
+{
+       jsonw_string_field(wtr, "tracepoint",
+                          u64_to_ptr(info->perf_event.tracepoint.tp_name));
+}
+
+static char *perf_config_hw_cache_str(__u64 config)
+{
+       const char *hw_cache, *result, *op;
+       char *str = malloc(PERF_HW_CACHE_LEN);
+
+       if (!str) {
+               p_err("mem alloc failed");
+               return NULL;
+       }
+
+       hw_cache = perf_event_name(evsel__hw_cache, config & 0xff);
+       if (hw_cache)
+               snprintf(str, PERF_HW_CACHE_LEN, "%s-", hw_cache);
+       else
+               snprintf(str, PERF_HW_CACHE_LEN, "%lld-", config & 0xff);
+
+       op = perf_event_name(evsel__hw_cache_op, (config >> 8) & 0xff);
+       if (op)
+               snprintf(str + strlen(str), PERF_HW_CACHE_LEN - strlen(str),
+                        "%s-", op);
+       else
+               snprintf(str + strlen(str), PERF_HW_CACHE_LEN - strlen(str),
+                        "%lld-", (config >> 8) & 0xff);
+
+       result = perf_event_name(evsel__hw_cache_result, config >> 16);
+       if (result)
+               snprintf(str + strlen(str), PERF_HW_CACHE_LEN - strlen(str),
+                        "%s", result);
+       else
+               snprintf(str + strlen(str), PERF_HW_CACHE_LEN - strlen(str),
+                        "%lld", config >> 16);
+       return str;
+}
+
+static const char *perf_config_str(__u32 type, __u64 config)
+{
+       const char *perf_config;
+
+       switch (type) {
+       case PERF_TYPE_HARDWARE:
+               perf_config = perf_event_name(event_symbols_hw, config);
+               break;
+       case PERF_TYPE_SOFTWARE:
+               perf_config = perf_event_name(event_symbols_sw, config);
+               break;
+       case PERF_TYPE_HW_CACHE:
+               perf_config = perf_config_hw_cache_str(config);
+               break;
+       default:
+               perf_config = NULL;
+               break;
+       }
+       return perf_config;
+}
+
+static void
+show_perf_event_event_json(struct bpf_link_info *info, json_writer_t *wtr)
+{
+       __u64 config = info->perf_event.event.config;
+       __u32 type = info->perf_event.event.type;
+       const char *perf_type, *perf_config;
+
+       perf_type = perf_event_name(perf_type_name, type);
+       if (perf_type)
+               jsonw_string_field(wtr, "event_type", perf_type);
+       else
+               jsonw_uint_field(wtr, "event_type", type);
+
+       perf_config = perf_config_str(type, config);
+       if (perf_config)
+               jsonw_string_field(wtr, "event_config", perf_config);
+       else
+               jsonw_uint_field(wtr, "event_config", config);
+
+       if (type == PERF_TYPE_HW_CACHE && perf_config)
+               free((void *)perf_config);
+}
+
 static int show_link_close_json(int fd, struct bpf_link_info *info)
 {
        struct bpf_prog_info prog_info;
@@ -218,6 +437,29 @@ static int show_link_close_json(int fd, struct bpf_link_info *info)
                jsonw_uint_field(json_wtr, "map_id",
                                 info->struct_ops.map_id);
                break;
+       case BPF_LINK_TYPE_KPROBE_MULTI:
+               show_kprobe_multi_json(info, json_wtr);
+               break;
+       case BPF_LINK_TYPE_PERF_EVENT:
+               switch (info->perf_event.type) {
+               case BPF_PERF_EVENT_EVENT:
+                       show_perf_event_event_json(info, json_wtr);
+                       break;
+               case BPF_PERF_EVENT_TRACEPOINT:
+                       show_perf_event_tracepoint_json(info, json_wtr);
+                       break;
+               case BPF_PERF_EVENT_KPROBE:
+               case BPF_PERF_EVENT_KRETPROBE:
+                       show_perf_event_kprobe_json(info, json_wtr);
+                       break;
+               case BPF_PERF_EVENT_UPROBE:
+               case BPF_PERF_EVENT_URETPROBE:
+                       show_perf_event_uprobe_json(info, json_wtr);
+                       break;
+               default:
+                       break;
+               }
+               break;
        default:
                break;
        }
@@ -351,6 +593,113 @@ void netfilter_dump_plain(const struct bpf_link_info *info)
                printf(" flags 0x%x", info->netfilter.flags);
 }
 
+static void show_kprobe_multi_plain(struct bpf_link_info *info)
+{
+       __u32 i, j = 0;
+       __u64 *addrs;
+
+       if (!info->kprobe_multi.count)
+               return;
+
+       if (info->kprobe_multi.flags & BPF_F_KPROBE_MULTI_RETURN)
+               printf("\n\tkretprobe.multi  ");
+       else
+               printf("\n\tkprobe.multi  ");
+       printf("func_cnt %u  ", info->kprobe_multi.count);
+       addrs = (__u64 *)u64_to_ptr(info->kprobe_multi.addrs);
+       qsort(addrs, info->kprobe_multi.count, sizeof(__u64), cmp_u64);
+
+       /* Load it once for all. */
+       if (!dd.sym_count)
+               kernel_syms_load(&dd);
+       if (!dd.sym_count)
+               return;
+
+       printf("\n\t%-16s %s", "addr", "func [module]");
+       for (i = 0; i < dd.sym_count; i++) {
+               if (dd.sym_mapping[i].address != addrs[j])
+                       continue;
+               printf("\n\t%016lx %s",
+                      dd.sym_mapping[i].address, dd.sym_mapping[i].name);
+               if (dd.sym_mapping[i].module[0] != '\0')
+                       printf(" [%s]  ", dd.sym_mapping[i].module);
+               else
+                       printf("  ");
+
+               if (j++ == info->kprobe_multi.count)
+                       break;
+       }
+}
+
+static void show_perf_event_kprobe_plain(struct bpf_link_info *info)
+{
+       const char *buf;
+
+       buf = u64_to_ptr(info->perf_event.kprobe.func_name);
+       if (buf[0] == '\0' && !info->perf_event.kprobe.addr)
+               return;
+
+       if (info->perf_event.type == BPF_PERF_EVENT_KRETPROBE)
+               printf("\n\tkretprobe ");
+       else
+               printf("\n\tkprobe ");
+       if (info->perf_event.kprobe.addr)
+               printf("%llx ", info->perf_event.kprobe.addr);
+       printf("%s", buf);
+       if (info->perf_event.kprobe.offset)
+               printf("+%#x", info->perf_event.kprobe.offset);
+       printf("  ");
+}
+
+static void show_perf_event_uprobe_plain(struct bpf_link_info *info)
+{
+       const char *buf;
+
+       buf = u64_to_ptr(info->perf_event.uprobe.file_name);
+       if (buf[0] == '\0')
+               return;
+
+       if (info->perf_event.type == BPF_PERF_EVENT_URETPROBE)
+               printf("\n\turetprobe ");
+       else
+               printf("\n\tuprobe ");
+       printf("%s+%#x  ", buf, info->perf_event.uprobe.offset);
+}
+
+static void show_perf_event_tracepoint_plain(struct bpf_link_info *info)
+{
+       const char *buf;
+
+       buf = u64_to_ptr(info->perf_event.tracepoint.tp_name);
+       if (buf[0] == '\0')
+               return;
+
+       printf("\n\ttracepoint %s  ", buf);
+}
+
+static void show_perf_event_event_plain(struct bpf_link_info *info)
+{
+       __u64 config = info->perf_event.event.config;
+       __u32 type = info->perf_event.event.type;
+       const char *perf_type, *perf_config;
+
+       printf("\n\tevent ");
+       perf_type = perf_event_name(perf_type_name, type);
+       if (perf_type)
+               printf("%s:", perf_type);
+       else
+               printf("%u :", type);
+
+       perf_config = perf_config_str(type, config);
+       if (perf_config)
+               printf("%s  ", perf_config);
+       else
+               printf("%llu  ", config);
+
+       if (type == PERF_TYPE_HW_CACHE && perf_config)
+               free((void *)perf_config);
+}
+
 static int show_link_close_plain(int fd, struct bpf_link_info *info)
 {
        struct bpf_prog_info prog_info;
@@ -396,6 +745,29 @@ static int show_link_close_plain(int fd, struct bpf_link_info *info)
        case BPF_LINK_TYPE_NETFILTER:
                netfilter_dump_plain(info);
                break;
+       case BPF_LINK_TYPE_KPROBE_MULTI:
+               show_kprobe_multi_plain(info);
+               break;
+       case BPF_LINK_TYPE_PERF_EVENT:
+               switch (info->perf_event.type) {
+               case BPF_PERF_EVENT_EVENT:
+                       show_perf_event_event_plain(info);
+                       break;
+               case BPF_PERF_EVENT_TRACEPOINT:
+                       show_perf_event_tracepoint_plain(info);
+                       break;
+               case BPF_PERF_EVENT_KPROBE:
+               case BPF_PERF_EVENT_KRETPROBE:
+                       show_perf_event_kprobe_plain(info);
+                       break;
+               case BPF_PERF_EVENT_UPROBE:
+               case BPF_PERF_EVENT_URETPROBE:
+                       show_perf_event_uprobe_plain(info);
+                       break;
+               default:
+                       break;
+               }
+               break;
        default:
                break;
        }
@@ -417,10 +789,13 @@ static int do_show_link(int fd)
 {
        struct bpf_link_info info;
        __u32 len = sizeof(info);
-       char buf[256];
+       __u64 *addrs = NULL;
+       char buf[PATH_MAX];
+       int count;
        int err;
 
        memset(&info, 0, sizeof(info));
+       buf[0] = '\0';
 again:
        err = bpf_link_get_info_by_fd(fd, &info, &len);
        if (err) {
@@ -431,22 +806,67 @@ again:
        }
        if (info.type == BPF_LINK_TYPE_RAW_TRACEPOINT &&
            !info.raw_tracepoint.tp_name) {
-               info.raw_tracepoint.tp_name = (unsigned long)&buf;
+               info.raw_tracepoint.tp_name = ptr_to_u64(&buf);
                info.raw_tracepoint.tp_name_len = sizeof(buf);
                goto again;
        }
        if (info.type == BPF_LINK_TYPE_ITER &&
            !info.iter.target_name) {
-               info.iter.target_name = (unsigned long)&buf;
+               info.iter.target_name = ptr_to_u64(&buf);
                info.iter.target_name_len = sizeof(buf);
                goto again;
        }
+       if (info.type == BPF_LINK_TYPE_KPROBE_MULTI &&
+           !info.kprobe_multi.addrs) {
+               count = info.kprobe_multi.count;
+               if (count) {
+                       addrs = calloc(count, sizeof(__u64));
+                       if (!addrs) {
+                               p_err("mem alloc failed");
+                               close(fd);
+                               return -ENOMEM;
+                       }
+                       info.kprobe_multi.addrs = ptr_to_u64(addrs);
+                       goto again;
+               }
+       }
+       if (info.type == BPF_LINK_TYPE_PERF_EVENT) {
+               switch (info.perf_event.type) {
+               case BPF_PERF_EVENT_TRACEPOINT:
+                       if (!info.perf_event.tracepoint.tp_name) {
+                               info.perf_event.tracepoint.tp_name = ptr_to_u64(&buf);
+                               info.perf_event.tracepoint.name_len = sizeof(buf);
+                               goto again;
+                       }
+                       break;
+               case BPF_PERF_EVENT_KPROBE:
+               case BPF_PERF_EVENT_KRETPROBE:
+                       if (!info.perf_event.kprobe.func_name) {
+                               info.perf_event.kprobe.func_name = ptr_to_u64(&buf);
+                               info.perf_event.kprobe.name_len = sizeof(buf);
+                               goto again;
+                       }
+                       break;
+               case BPF_PERF_EVENT_UPROBE:
+               case BPF_PERF_EVENT_URETPROBE:
+                       if (!info.perf_event.uprobe.file_name) {
+                               info.perf_event.uprobe.file_name = ptr_to_u64(&buf);
+                               info.perf_event.uprobe.name_len = sizeof(buf);
+                               goto again;
+                       }
+                       break;
+               default:
+                       break;
+               }
+       }
 
        if (json_output)
                show_link_close_json(fd, &info);
        else
                show_link_close_plain(fd, &info);
 
+       if (addrs)
+               free(addrs);
        close(fd);
        return 0;
 }
@@ -471,7 +891,8 @@ static int do_show(int argc, char **argv)
                fd = link_parse_fd(&argc, &argv);
                if (fd < 0)
                        return fd;
-               return do_show_link(fd);
+               do_show_link(fd);
+               goto out;
        }
 
        if (argc)
@@ -510,6 +931,9 @@ static int do_show(int argc, char **argv)
        if (show_pinned)
                delete_pinned_obj_table(link_table);
 
+out:
+       if (dd.sym_count)
+               kernel_syms_destroy(&dd);
        return errno == ENOENT ? 0 : -1;
 }
 
index eb05ea5..26004f0 100644 (file)
@@ -15,6 +15,19 @@ enum bpf_obj_type {
        BPF_OBJ_BTF,
 };
 
+struct bpf_perf_link___local {
+       struct bpf_link link;
+       struct file *perf_file;
+} __attribute__((preserve_access_index));
+
+struct perf_event___local {
+       u64 bpf_cookie;
+} __attribute__((preserve_access_index));
+
+enum bpf_link_type___local {
+       BPF_LINK_TYPE_PERF_EVENT___local = 7,
+};
+
 extern const void bpf_link_fops __ksym;
 extern const void bpf_map_fops __ksym;
 extern const void bpf_prog_fops __ksym;
@@ -41,10 +54,10 @@ static __always_inline __u32 get_obj_id(void *ent, enum bpf_obj_type type)
 /* could be used only with BPF_LINK_TYPE_PERF_EVENT links */
 static __u64 get_bpf_cookie(struct bpf_link *link)
 {
-       struct bpf_perf_link *perf_link;
-       struct perf_event *event;
+       struct bpf_perf_link___local *perf_link;
+       struct perf_event___local *event;
 
-       perf_link = container_of(link, struct bpf_perf_link, link);
+       perf_link = container_of(link, struct bpf_perf_link___local, link);
        event = BPF_CORE_READ(perf_link, perf_file, private_data);
        return BPF_CORE_READ(event, bpf_cookie);
 }
@@ -84,10 +97,13 @@ int iter(struct bpf_iter__task_file *ctx)
        e.pid = task->tgid;
        e.id = get_obj_id(file->private_data, obj_type);
 
-       if (obj_type == BPF_OBJ_LINK) {
+       if (obj_type == BPF_OBJ_LINK &&
+           bpf_core_enum_value_exists(enum bpf_link_type___local,
+                                      BPF_LINK_TYPE_PERF_EVENT___local)) {
                struct bpf_link *link = (struct bpf_link *) file->private_data;
 
-               if (BPF_CORE_READ(link, type) == BPF_LINK_TYPE_PERF_EVENT) {
+               if (link->type == bpf_core_enum_value(enum bpf_link_type___local,
+                                                     BPF_LINK_TYPE_PERF_EVENT___local)) {
                        e.has_bpf_cookie = true;
                        e.bpf_cookie = get_bpf_cookie(link);
                }
index ce5b65e..2f80edc 100644 (file)
@@ -4,6 +4,12 @@
 #include <bpf/bpf_helpers.h>
 #include <bpf/bpf_tracing.h>
 
+struct bpf_perf_event_value___local {
+       __u64 counter;
+       __u64 enabled;
+       __u64 running;
+} __attribute__((preserve_access_index));
+
 /* map of perf event fds, num_cpu * num_metric entries */
 struct {
        __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
@@ -15,14 +21,14 @@ struct {
 struct {
        __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
        __uint(key_size, sizeof(u32));
-       __uint(value_size, sizeof(struct bpf_perf_event_value));
+       __uint(value_size, sizeof(struct bpf_perf_event_value___local));
 } fentry_readings SEC(".maps");
 
 /* accumulated readings */
 struct {
        __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
        __uint(key_size, sizeof(u32));
-       __uint(value_size, sizeof(struct bpf_perf_event_value));
+       __uint(value_size, sizeof(struct bpf_perf_event_value___local));
 } accum_readings SEC(".maps");
 
 /* sample counts, one per cpu */
@@ -39,7 +45,7 @@ const volatile __u32 num_metric = 1;
 SEC("fentry/XXX")
 int BPF_PROG(fentry_XXX)
 {
-       struct bpf_perf_event_value *ptrs[MAX_NUM_MATRICS];
+       struct bpf_perf_event_value___local *ptrs[MAX_NUM_MATRICS];
        u32 key = bpf_get_smp_processor_id();
        u32 i;
 
@@ -53,10 +59,10 @@ int BPF_PROG(fentry_XXX)
        }
 
        for (i = 0; i < num_metric && i < MAX_NUM_MATRICS; i++) {
-               struct bpf_perf_event_value reading;
+               struct bpf_perf_event_value___local reading;
                int err;
 
-               err = bpf_perf_event_read_value(&events, key, &reading,
+               err = bpf_perf_event_read_value(&events, key, (void *)&reading,
                                                sizeof(reading));
                if (err)
                        return 0;
@@ -68,14 +74,14 @@ int BPF_PROG(fentry_XXX)
 }
 
 static inline void
-fexit_update_maps(u32 id, struct bpf_perf_event_value *after)
+fexit_update_maps(u32 id, struct bpf_perf_event_value___local *after)
 {
-       struct bpf_perf_event_value *before, diff;
+       struct bpf_perf_event_value___local *before, diff;
 
        before = bpf_map_lookup_elem(&fentry_readings, &id);
        /* only account samples with a valid fentry_reading */
        if (before && before->counter) {
-               struct bpf_perf_event_value *accum;
+               struct bpf_perf_event_value___local *accum;
 
                diff.counter = after->counter - before->counter;
                diff.enabled = after->enabled - before->enabled;
@@ -93,7 +99,7 @@ fexit_update_maps(u32 id, struct bpf_perf_event_value *after)
 SEC("fexit/XXX")
 int BPF_PROG(fexit_XXX)
 {
-       struct bpf_perf_event_value readings[MAX_NUM_MATRICS];
+       struct bpf_perf_event_value___local readings[MAX_NUM_MATRICS];
        u32 cpu = bpf_get_smp_processor_id();
        u32 i, zero = 0;
        int err;
@@ -102,7 +108,8 @@ int BPF_PROG(fexit_XXX)
        /* read all events before updating the maps, to reduce error */
        for (i = 0; i < num_metric && i < MAX_NUM_MATRICS; i++) {
                err = bpf_perf_event_read_value(&events, cpu + i * num_cpu,
-                                               readings + i, sizeof(*readings));
+                                               (void *)(readings + i),
+                                               sizeof(*readings));
                if (err)
                        return 0;
        }
index da608e1..567f56d 100644 (file)
@@ -46,7 +46,11 @@ out:
                }
                dd->sym_mapping = tmp;
                sym = &dd->sym_mapping[dd->sym_count];
-               if (sscanf(buff, "%p %*c %s", &address, sym->name) != 2)
+
+               /* module is optional */
+               sym->module[0] = '\0';
+               /* trim the square brackets around the module name */
+               if (sscanf(buff, "%p %*c %s [%[^]]s", &address, sym->name, sym->module) < 2)
                        continue;
                sym->address = (unsigned long)address;
                if (!strcmp(sym->name, "__bpf_call_base")) {
index 9a94637..db3ba06 100644 (file)
@@ -5,12 +5,14 @@
 #define __BPF_TOOL_XLATED_DUMPER_H
 
 #define SYM_MAX_NAME   256
+#define MODULE_MAX_NAME        64
 
 struct bpf_prog_linfo;
 
 struct kernel_sym {
        unsigned long address;
        char name[SYM_MAX_NAME];
+       char module[MODULE_MAX_NAME];
 };
 
 struct dump_data {
index 47acf69..d828893 100644 (file)
@@ -62,7 +62,7 @@ $(OUTPUT)/%.skel.h: $(OUTPUT)/%.bpf.o | $(BPFTOOL)
        $(QUIET_GEN)$(BPFTOOL) gen skeleton $< > $@
 
 $(OUTPUT)/%.bpf.o: %.bpf.c $(BPFOBJ) | $(OUTPUT)
-       $(QUIET_GEN)$(CLANG) -g -O2 -target bpf $(INCLUDES)                   \
+       $(QUIET_GEN)$(CLANG) -g -O2 --target=bpf $(INCLUDES)                  \
                 -c $(filter %.c,$^) -o $@ &&                                 \
        $(LLVM_STRIP) -g $@
 
index 0f0aa9b..6654b1a 100644 (file)
@@ -372,7 +372,7 @@ $(OUTPUT)test-libzstd.bin:
        $(BUILD) -lzstd
 
 $(OUTPUT)test-clang-bpf-co-re.bin:
-       $(CLANG) -S -g -target bpf -o - $(patsubst %.bin,%.c,$(@F)) |   \
+       $(CLANG) -S -g --target=bpf -o - $(patsubst %.bin,%.c,$(@F)) |  \
                grep BTF_KIND_VAR
 
 $(OUTPUT)test-file-handle.bin:
index 60a9d59..600d0ca 100644 (file)
@@ -1057,6 +1057,16 @@ enum bpf_link_type {
        MAX_BPF_LINK_TYPE,
 };
 
+enum bpf_perf_event_type {
+       BPF_PERF_EVENT_UNSPEC = 0,
+       BPF_PERF_EVENT_UPROBE = 1,
+       BPF_PERF_EVENT_URETPROBE = 2,
+       BPF_PERF_EVENT_KPROBE = 3,
+       BPF_PERF_EVENT_KRETPROBE = 4,
+       BPF_PERF_EVENT_TRACEPOINT = 5,
+       BPF_PERF_EVENT_EVENT = 6,
+};
+
 /* cgroup-bpf attach flags used in BPF_PROG_ATTACH command
  *
  * NONE(default): No further bpf programs allowed in the subtree.
@@ -6439,6 +6449,36 @@ struct bpf_link_info {
                        __s32 priority;
                        __u32 flags;
                } netfilter;
+               struct {
+                       __aligned_u64 addrs;
+                       __u32 count; /* in/out: kprobe_multi function count */
+                       __u32 flags;
+               } kprobe_multi;
+               struct {
+                       __u32 type; /* enum bpf_perf_event_type */
+                       __u32 :32;
+                       union {
+                               struct {
+                                       __aligned_u64 file_name; /* in/out */
+                                       __u32 name_len;
+                                       __u32 offset; /* offset from file_name */
+                               } uprobe; /* BPF_PERF_EVENT_UPROBE, BPF_PERF_EVENT_URETPROBE */
+                               struct {
+                                       __aligned_u64 func_name; /* in/out */
+                                       __u32 name_len;
+                                       __u32 offset; /* offset from func_name */
+                                       __u64 addr;
+                               } kprobe; /* BPF_PERF_EVENT_KPROBE, BPF_PERF_EVENT_KRETPROBE */
+                               struct {
+                                       __aligned_u64 tp_name;   /* in/out */
+                                       __u32 name_len;
+                               } tracepoint; /* BPF_PERF_EVENT_TRACEPOINT */
+                               struct {
+                                       __u64 config;
+                                       __u32 type;
+                               } event; /* BPF_PERF_EVENT_EVENT */
+                       };
+               } perf_event;
        };
 } __attribute__((aligned(8)));
 
index ed86b37..3b0da19 100644 (file)
@@ -741,6 +741,14 @@ int bpf_link_create(int prog_fd, int target_fd,
                if (!OPTS_ZEROED(opts, tracing))
                        return libbpf_err(-EINVAL);
                break;
+       case BPF_NETFILTER:
+               attr.link_create.netfilter.pf = OPTS_GET(opts, netfilter.pf, 0);
+               attr.link_create.netfilter.hooknum = OPTS_GET(opts, netfilter.hooknum, 0);
+               attr.link_create.netfilter.priority = OPTS_GET(opts, netfilter.priority, 0);
+               attr.link_create.netfilter.flags = OPTS_GET(opts, netfilter.flags, 0);
+               if (!OPTS_ZEROED(opts, netfilter))
+                       return libbpf_err(-EINVAL);
+               break;
        default:
                if (!OPTS_ZEROED(opts, flags))
                        return libbpf_err(-EINVAL);
index 9aa0ee4..c676295 100644 (file)
@@ -349,6 +349,12 @@ struct bpf_link_create_opts {
                struct {
                        __u64 cookie;
                } tracing;
+               struct {
+                       __u32 pf;
+                       __u32 hooknum;
+                       __s32 priority;
+                       __u32 flags;
+               } netfilter;
        };
        size_t :0;
 };
index 0a5bf19..c12f832 100644 (file)
@@ -80,16 +80,6 @@ struct hashmap {
        size_t sz;
 };
 
-#define HASHMAP_INIT(hash_fn, equal_fn, ctx) { \
-       .hash_fn = (hash_fn),                   \
-       .equal_fn = (equal_fn),                 \
-       .ctx = (ctx),                           \
-       .buckets = NULL,                        \
-       .cap = 0,                               \
-       .cap_bits = 0,                          \
-       .sz = 0,                                \
-}
-
 void hashmap__init(struct hashmap *map, hashmap_hash_fn hash_fn,
                   hashmap_equal_fn equal_fn, void *ctx);
 struct hashmap *hashmap__new(hashmap_hash_fn hash_fn,
index 214f828..63311a7 100644 (file)
@@ -5471,6 +5471,10 @@ static int load_module_btfs(struct bpf_object *obj)
                err = bpf_btf_get_next_id(id, &id);
                if (err && errno == ENOENT)
                        return 0;
+               if (err && errno == EPERM) {
+                       pr_debug("skipping module BTFs loading, missing privileges\n");
+                       return 0;
+               }
                if (err) {
                        err = -errno;
                        pr_warn("failed to iterate BTF objects: %d\n", err);
@@ -6157,7 +6161,11 @@ static int append_subprog_relos(struct bpf_program *main_prog, struct bpf_progra
        if (main_prog == subprog)
                return 0;
        relos = libbpf_reallocarray(main_prog->reloc_desc, new_cnt, sizeof(*relos));
-       if (!relos)
+       /* if new count is zero, reallocarray can return a valid NULL result;
+        * in this case the previous pointer will be freed, so we *have to*
+        * reassign old pointer to the new value (even if it's NULL)
+        */
+       if (!relos && new_cnt)
                return -ENOMEM;
        if (subprog->nr_reloc)
                memcpy(relos + main_prog->nr_reloc, subprog->reloc_desc,
@@ -8528,7 +8536,8 @@ int bpf_program__set_insns(struct bpf_program *prog,
                return -EBUSY;
 
        insns = libbpf_reallocarray(prog->insns, new_insn_cnt, sizeof(*insns));
-       if (!insns) {
+       /* NULL is a valid return from reallocarray if the new count is zero */
+       if (!insns && new_insn_cnt) {
                pr_warn("prog '%s': failed to realloc prog code\n", prog->name);
                return -ENOMEM;
        }
@@ -8558,13 +8567,31 @@ enum bpf_prog_type bpf_program__type(const struct bpf_program *prog)
        return prog->type;
 }
 
+static size_t custom_sec_def_cnt;
+static struct bpf_sec_def *custom_sec_defs;
+static struct bpf_sec_def custom_fallback_def;
+static bool has_custom_fallback_def;
+static int last_custom_sec_def_handler_id;
+
 int bpf_program__set_type(struct bpf_program *prog, enum bpf_prog_type type)
 {
        if (prog->obj->loaded)
                return libbpf_err(-EBUSY);
 
+       /* if type is not changed, do nothing */
+       if (prog->type == type)
+               return 0;
+
        prog->type = type;
-       prog->sec_def = NULL;
+
+       /* If a program type was changed, we need to reset associated SEC()
+        * handler, as it will be invalid now. The only exception is a generic
+        * fallback handler, which by definition is program type-agnostic and
+        * is a catch-all custom handler, optionally set by the application,
+        * so should be able to handle any type of BPF program.
+        */
+       if (prog->sec_def != &custom_fallback_def)
+               prog->sec_def = NULL;
        return 0;
 }
 
@@ -8740,13 +8767,6 @@ static const struct bpf_sec_def section_defs[] = {
        SEC_DEF("netfilter",            NETFILTER, BPF_NETFILTER, SEC_NONE),
 };
 
-static size_t custom_sec_def_cnt;
-static struct bpf_sec_def *custom_sec_defs;
-static struct bpf_sec_def custom_fallback_def;
-static bool has_custom_fallback_def;
-
-static int last_custom_sec_def_handler_id;
-
 int libbpf_register_prog_handler(const char *sec,
                                 enum bpf_prog_type prog_type,
                                 enum bpf_attach_type exp_attach_type,
@@ -8826,7 +8846,11 @@ int libbpf_unregister_prog_handler(int handler_id)
 
        /* try to shrink the array, but it's ok if we couldn't */
        sec_defs = libbpf_reallocarray(custom_sec_defs, custom_sec_def_cnt, sizeof(*sec_defs));
-       if (sec_defs)
+       /* if new count is zero, reallocarray can return a valid NULL result;
+        * in this case the previous pointer will be freed, so we *have to*
+        * reassign old pointer to the new value (even if it's NULL)
+        */
+       if (sec_defs || custom_sec_def_cnt == 0)
                custom_sec_defs = sec_defs;
 
        return 0;
@@ -10224,6 +10248,18 @@ static const char *tracefs_uprobe_events(void)
        return use_debugfs() ? DEBUGFS"/uprobe_events" : TRACEFS"/uprobe_events";
 }
 
+static const char *tracefs_available_filter_functions(void)
+{
+       return use_debugfs() ? DEBUGFS"/available_filter_functions"
+                            : TRACEFS"/available_filter_functions";
+}
+
+static const char *tracefs_available_filter_functions_addrs(void)
+{
+       return use_debugfs() ? DEBUGFS"/available_filter_functions_addrs"
+                            : TRACEFS"/available_filter_functions_addrs";
+}
+
 static void gen_kprobe_legacy_event_name(char *buf, size_t buf_sz,
                                         const char *kfunc_name, size_t offset)
 {
@@ -10539,25 +10575,158 @@ struct kprobe_multi_resolve {
        size_t cnt;
 };
 
-static int
-resolve_kprobe_multi_cb(unsigned long long sym_addr, char sym_type,
-                       const char *sym_name, void *ctx)
+struct avail_kallsyms_data {
+       char **syms;
+       size_t cnt;
+       struct kprobe_multi_resolve *res;
+};
+
+static int avail_func_cmp(const void *a, const void *b)
 {
-       struct kprobe_multi_resolve *res = ctx;
+       return strcmp(*(const char **)a, *(const char **)b);
+}
+
+static int avail_kallsyms_cb(unsigned long long sym_addr, char sym_type,
+                            const char *sym_name, void *ctx)
+{
+       struct avail_kallsyms_data *data = ctx;
+       struct kprobe_multi_resolve *res = data->res;
        int err;
 
-       if (!glob_match(sym_name, res->pattern))
+       if (!bsearch(&sym_name, data->syms, data->cnt, sizeof(*data->syms), avail_func_cmp))
                return 0;
 
-       err = libbpf_ensure_mem((void **) &res->addrs, &res->cap, sizeof(unsigned long),
-                               res->cnt + 1);
+       err = libbpf_ensure_mem((void **)&res->addrs, &res->cap, sizeof(*res->addrs), res->cnt + 1);
        if (err)
                return err;
 
-       res->addrs[res->cnt++] = (unsigned long) sym_addr;
+       res->addrs[res->cnt++] = (unsigned long)sym_addr;
        return 0;
 }
 
+static int libbpf_available_kallsyms_parse(struct kprobe_multi_resolve *res)
+{
+       const char *available_functions_file = tracefs_available_filter_functions();
+       struct avail_kallsyms_data data;
+       char sym_name[500];
+       FILE *f;
+       int err = 0, ret, i;
+       char **syms = NULL;
+       size_t cap = 0, cnt = 0;
+
+       f = fopen(available_functions_file, "re");
+       if (!f) {
+               err = -errno;
+               pr_warn("failed to open %s: %d\n", available_functions_file, err);
+               return err;
+       }
+
+       while (true) {
+               char *name;
+
+               ret = fscanf(f, "%499s%*[^\n]\n", sym_name);
+               if (ret == EOF && feof(f))
+                       break;
+
+               if (ret != 1) {
+                       pr_warn("failed to parse available_filter_functions entry: %d\n", ret);
+                       err = -EINVAL;
+                       goto cleanup;
+               }
+
+               if (!glob_match(sym_name, res->pattern))
+                       continue;
+
+               err = libbpf_ensure_mem((void **)&syms, &cap, sizeof(*syms), cnt + 1);
+               if (err)
+                       goto cleanup;
+
+               name = strdup(sym_name);
+               if (!name) {
+                       err = -errno;
+                       goto cleanup;
+               }
+
+               syms[cnt++] = name;
+       }
+
+       /* no entries found, bail out */
+       if (cnt == 0) {
+               err = -ENOENT;
+               goto cleanup;
+       }
+
+       /* sort available functions */
+       qsort(syms, cnt, sizeof(*syms), avail_func_cmp);
+
+       data.syms = syms;
+       data.res = res;
+       data.cnt = cnt;
+       libbpf_kallsyms_parse(avail_kallsyms_cb, &data);
+
+       if (res->cnt == 0)
+               err = -ENOENT;
+
+cleanup:
+       for (i = 0; i < cnt; i++)
+               free((char *)syms[i]);
+       free(syms);
+
+       fclose(f);
+       return err;
+}
+
+static bool has_available_filter_functions_addrs(void)
+{
+       return access(tracefs_available_filter_functions_addrs(), R_OK) != -1;
+}
+
+static int libbpf_available_kprobes_parse(struct kprobe_multi_resolve *res)
+{
+       const char *available_path = tracefs_available_filter_functions_addrs();
+       char sym_name[500];
+       FILE *f;
+       int ret, err = 0;
+       unsigned long long sym_addr;
+
+       f = fopen(available_path, "re");
+       if (!f) {
+               err = -errno;
+               pr_warn("failed to open %s: %d\n", available_path, err);
+               return err;
+       }
+
+       while (true) {
+               ret = fscanf(f, "%llx %499s%*[^\n]\n", &sym_addr, sym_name);
+               if (ret == EOF && feof(f))
+                       break;
+
+               if (ret != 2) {
+                       pr_warn("failed to parse available_filter_functions_addrs entry: %d\n",
+                               ret);
+                       err = -EINVAL;
+                       goto cleanup;
+               }
+
+               if (!glob_match(sym_name, res->pattern))
+                       continue;
+
+               err = libbpf_ensure_mem((void **)&res->addrs, &res->cap,
+                                       sizeof(*res->addrs), res->cnt + 1);
+               if (err)
+                       goto cleanup;
+
+               res->addrs[res->cnt++] = (unsigned long)sym_addr;
+       }
+
+       if (res->cnt == 0)
+               err = -ENOENT;
+
+cleanup:
+       fclose(f);
+       return err;
+}
+
 struct bpf_link *
 bpf_program__attach_kprobe_multi_opts(const struct bpf_program *prog,
                                      const char *pattern,
@@ -10594,13 +10763,12 @@ bpf_program__attach_kprobe_multi_opts(const struct bpf_program *prog,
                return libbpf_err_ptr(-EINVAL);
 
        if (pattern) {
-               err = libbpf_kallsyms_parse(resolve_kprobe_multi_cb, &res);
+               if (has_available_filter_functions_addrs())
+                       err = libbpf_available_kprobes_parse(&res);
+               else
+                       err = libbpf_available_kallsyms_parse(&res);
                if (err)
                        goto error;
-               if (!res.cnt) {
-                       err = -ENOENT;
-                       goto error;
-               }
                addrs = res.addrs;
                cnt = res.cnt;
        }
@@ -11811,6 +11979,48 @@ static int attach_iter(const struct bpf_program *prog, long cookie, struct bpf_l
        return libbpf_get_error(*link);
 }
 
+struct bpf_link *bpf_program__attach_netfilter(const struct bpf_program *prog,
+                                              const struct bpf_netfilter_opts *opts)
+{
+       LIBBPF_OPTS(bpf_link_create_opts, lopts);
+       struct bpf_link *link;
+       int prog_fd, link_fd;
+
+       if (!OPTS_VALID(opts, bpf_netfilter_opts))
+               return libbpf_err_ptr(-EINVAL);
+
+       prog_fd = bpf_program__fd(prog);
+       if (prog_fd < 0) {
+               pr_warn("prog '%s': can't attach before loaded\n", prog->name);
+               return libbpf_err_ptr(-EINVAL);
+       }
+
+       link = calloc(1, sizeof(*link));
+       if (!link)
+               return libbpf_err_ptr(-ENOMEM);
+
+       link->detach = &bpf_link__detach_fd;
+
+       lopts.netfilter.pf = OPTS_GET(opts, pf, 0);
+       lopts.netfilter.hooknum = OPTS_GET(opts, hooknum, 0);
+       lopts.netfilter.priority = OPTS_GET(opts, priority, 0);
+       lopts.netfilter.flags = OPTS_GET(opts, flags, 0);
+
+       link_fd = bpf_link_create(prog_fd, 0, BPF_NETFILTER, &lopts);
+       if (link_fd < 0) {
+               char errmsg[STRERR_BUFSIZE];
+
+               link_fd = -errno;
+               free(link);
+               pr_warn("prog '%s': failed to attach to netfilter: %s\n",
+                       prog->name, libbpf_strerror_r(link_fd, errmsg, sizeof(errmsg)));
+               return libbpf_err_ptr(link_fd);
+       }
+       link->fd = link_fd;
+
+       return link;
+}
+
 struct bpf_link *bpf_program__attach(const struct bpf_program *prog)
 {
        struct bpf_link *link = NULL;
index 754da73..10642ad 100644 (file)
@@ -718,6 +718,21 @@ LIBBPF_API struct bpf_link *
 bpf_program__attach_freplace(const struct bpf_program *prog,
                             int target_fd, const char *attach_func_name);
 
+struct bpf_netfilter_opts {
+       /* size of this struct, for forward/backward compatibility */
+       size_t sz;
+
+       __u32 pf;
+       __u32 hooknum;
+       __s32 priority;
+       __u32 flags;
+};
+#define bpf_netfilter_opts__last_field flags
+
+LIBBPF_API struct bpf_link *
+bpf_program__attach_netfilter(const struct bpf_program *prog,
+                             const struct bpf_netfilter_opts *opts);
+
 struct bpf_map;
 
 LIBBPF_API struct bpf_link *bpf_map__attach_struct_ops(const struct bpf_map *map);
index 7521a2f..d9ec440 100644 (file)
@@ -395,4 +395,5 @@ LIBBPF_1.2.0 {
 LIBBPF_1.3.0 {
        global:
                bpf_obj_pin_opts;
+               bpf_program__attach_netfilter;
 } LIBBPF_1.2.0;
index f1a1415..37455d0 100644 (file)
@@ -852,8 +852,11 @@ static int bpf_link_usdt_detach(struct bpf_link *link)
                 * system is so exhausted on memory, it's the least of user's
                 * concerns, probably.
                 * So just do our best here to return those IDs to usdt_manager.
+                * Another edge case when we can legitimately get NULL is when
+                * new_cnt is zero, which can happen in some edge cases, so we
+                * need to be careful about that.
                 */
-               if (new_free_ids) {
+               if (new_free_ids || new_cnt == 0) {
                        memcpy(new_free_ids + man->free_spec_cnt, usdt_link->spec_ids,
                               usdt_link->spec_cnt * sizeof(*usdt_link->spec_ids));
                        man->free_spec_ids = new_free_ids;
index 08adc80..3b61e8b 100644 (file)
@@ -10,3 +10,5 @@ kprobe_multi_test/link_api_addrs                 # link_fd unexpected link_fd: a
 kprobe_multi_test/link_api_syms                  # link_fd unexpected link_fd: actual -95 < expected 0
 kprobe_multi_test/skel_api                       # libbpf: failed to load BPF skeleton 'kprobe_multi': -3
 module_attach                                    # prog 'kprobe_multi': failed to auto-attach: -95
+fentry_test/fentry_many_args                     # fentry_many_args:FAIL:fentry_many_args_attach unexpected error: -524
+fexit_test/fexit_many_args                       # fexit_many_args:FAIL:fexit_many_args_attach unexpected error: -524
index 538df8f..882be03 100644 (file)
@@ -12,7 +12,11 @@ BPFDIR := $(LIBDIR)/bpf
 TOOLSINCDIR := $(TOOLSDIR)/include
 BPFTOOLDIR := $(TOOLSDIR)/bpf/bpftool
 APIDIR := $(TOOLSINCDIR)/uapi
+ifneq ($(O),)
+GENDIR := $(O)/include/generated
+else
 GENDIR := $(abspath ../../../../include/generated)
+endif
 GENHDR := $(GENDIR)/autoconf.h
 HOSTPKG_CONFIG := pkg-config
 
@@ -331,7 +335,7 @@ $(RESOLVE_BTFIDS): $(HOST_BPFOBJ) | $(HOST_BUILD_DIR)/resolve_btfids        \
                OUTPUT=$(HOST_BUILD_DIR)/resolve_btfids/ BPFOBJ=$(HOST_BPFOBJ)
 
 # Get Clang's default includes on this system, as opposed to those seen by
-# '-target bpf'. This fixes "missing" files on some architectures/distros,
+# '--target=bpf'. This fixes "missing" files on some architectures/distros,
 # such as asm/byteorder.h, asm/socket.h, asm/sockios.h, sys/cdefs.h etc.
 #
 # Use '-idirafter': Don't interfere with include mechanics except where the
@@ -372,12 +376,12 @@ $(OUTPUT)/cgroup_getset_retval_hooks.o: cgroup_getset_retval_hooks.h
 # $3 - CFLAGS
 define CLANG_BPF_BUILD_RULE
        $(call msg,CLNG-BPF,$(TRUNNER_BINARY),$2)
-       $(Q)$(CLANG) $3 -O2 -target bpf -c $1 -mcpu=v3 -o $2
+       $(Q)$(CLANG) $3 -O2 --target=bpf -c $1 -mcpu=v3 -o $2
 endef
 # Similar to CLANG_BPF_BUILD_RULE, but with disabled alu32
 define CLANG_NOALU32_BPF_BUILD_RULE
        $(call msg,CLNG-BPF,$(TRUNNER_BINARY),$2)
-       $(Q)$(CLANG) $3 -O2 -target bpf -c $1 -mcpu=v2 -o $2
+       $(Q)$(CLANG) $3 -O2 --target=bpf -c $1 -mcpu=v2 -o $2
 endef
 # Build BPF object using GCC
 define GCC_BPF_BUILD_RULE
@@ -644,11 +648,13 @@ $(OUTPUT)/bench_local_storage.o: $(OUTPUT)/local_storage_bench.skel.h
 $(OUTPUT)/bench_local_storage_rcu_tasks_trace.o: $(OUTPUT)/local_storage_rcu_tasks_trace_bench.skel.h
 $(OUTPUT)/bench_local_storage_create.o: $(OUTPUT)/bench_local_storage_create.skel.h
 $(OUTPUT)/bench_bpf_hashmap_lookup.o: $(OUTPUT)/bpf_hashmap_lookup.skel.h
+$(OUTPUT)/bench_htab_mem.o: $(OUTPUT)/htab_mem_bench.skel.h
 $(OUTPUT)/bench.o: bench.h testing_helpers.h $(BPFOBJ)
 $(OUTPUT)/bench: LDLIBS += -lm
 $(OUTPUT)/bench: $(OUTPUT)/bench.o \
                 $(TESTING_HELPERS) \
                 $(TRACE_HELPERS) \
+                $(CGROUP_HELPERS) \
                 $(OUTPUT)/bench_count.o \
                 $(OUTPUT)/bench_rename.o \
                 $(OUTPUT)/bench_trigger.o \
@@ -661,6 +667,7 @@ $(OUTPUT)/bench: $(OUTPUT)/bench.o \
                 $(OUTPUT)/bench_local_storage_rcu_tasks_trace.o \
                 $(OUTPUT)/bench_bpf_hashmap_lookup.o \
                 $(OUTPUT)/bench_local_storage_create.o \
+                $(OUTPUT)/bench_htab_mem.o \
                 #
        $(call msg,BINARY,,$@)
        $(Q)$(CC) $(CFLAGS) $(LDFLAGS) $(filter %.a %.o,$^) $(LDLIBS) -o $@
index 41fe5a8..73ce11b 100644 (file)
@@ -279,6 +279,7 @@ extern struct argp bench_local_storage_rcu_tasks_trace_argp;
 extern struct argp bench_strncmp_argp;
 extern struct argp bench_hashmap_lookup_argp;
 extern struct argp bench_local_storage_create_argp;
+extern struct argp bench_htab_mem_argp;
 
 static const struct argp_child bench_parsers[] = {
        { &bench_ringbufs_argp, 0, "Ring buffers benchmark", 0 },
@@ -290,6 +291,7 @@ static const struct argp_child bench_parsers[] = {
                "local_storage RCU Tasks Trace slowdown benchmark", 0 },
        { &bench_hashmap_lookup_argp, 0, "Hashmap lookup benchmark", 0 },
        { &bench_local_storage_create_argp, 0, "local-storage-create benchmark", 0 },
+       { &bench_htab_mem_argp, 0, "hash map memory benchmark", 0 },
        {},
 };
 
@@ -520,6 +522,7 @@ extern const struct bench bench_local_storage_cache_hashmap_control;
 extern const struct bench bench_local_storage_tasks_trace;
 extern const struct bench bench_bpf_hashmap_lookup;
 extern const struct bench bench_local_storage_create;
+extern const struct bench bench_htab_mem;
 
 static const struct bench *benchs[] = {
        &bench_count_global,
@@ -561,6 +564,7 @@ static const struct bench *benchs[] = {
        &bench_local_storage_tasks_trace,
        &bench_bpf_hashmap_lookup,
        &bench_local_storage_create,
+       &bench_htab_mem,
 };
 
 static void find_benchmark(void)
diff --git a/tools/testing/selftests/bpf/benchs/bench_htab_mem.c b/tools/testing/selftests/bpf/benchs/bench_htab_mem.c
new file mode 100644 (file)
index 0000000..9146d3f
--- /dev/null
@@ -0,0 +1,350 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2023. Huawei Technologies Co., Ltd */
+#include <argp.h>
+#include <stdbool.h>
+#include <pthread.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/param.h>
+#include <fcntl.h>
+
+#include "bench.h"
+#include "bpf_util.h"
+#include "cgroup_helpers.h"
+#include "htab_mem_bench.skel.h"
+
+struct htab_mem_use_case {
+       const char *name;
+       const char **progs;
+       /* Do synchronization between addition thread and deletion thread */
+       bool need_sync;
+};
+
+static struct htab_mem_ctx {
+       const struct htab_mem_use_case *uc;
+       struct htab_mem_bench *skel;
+       pthread_barrier_t *notify;
+       int fd;
+} ctx;
+
+const char *ow_progs[] = {"overwrite", NULL};
+const char *batch_progs[] = {"batch_add_batch_del", NULL};
+const char *add_del_progs[] = {"add_only", "del_only", NULL};
+const static struct htab_mem_use_case use_cases[] = {
+       { .name = "overwrite", .progs = ow_progs },
+       { .name = "batch_add_batch_del", .progs = batch_progs },
+       { .name = "add_del_on_diff_cpu", .progs = add_del_progs, .need_sync = true },
+};
+
+static struct htab_mem_args {
+       u32 value_size;
+       const char *use_case;
+       bool preallocated;
+} args = {
+       .value_size = 8,
+       .use_case = "overwrite",
+       .preallocated = false,
+};
+
+enum {
+       ARG_VALUE_SIZE = 10000,
+       ARG_USE_CASE = 10001,
+       ARG_PREALLOCATED = 10002,
+};
+
+static const struct argp_option opts[] = {
+       { "value-size", ARG_VALUE_SIZE, "VALUE_SIZE", 0,
+         "Set the value size of hash map (default 8)" },
+       { "use-case", ARG_USE_CASE, "USE_CASE", 0,
+         "Set the use case of hash map: overwrite|batch_add_batch_del|add_del_on_diff_cpu" },
+       { "preallocated", ARG_PREALLOCATED, NULL, 0, "use preallocated hash map" },
+       {},
+};
+
+static error_t htab_mem_parse_arg(int key, char *arg, struct argp_state *state)
+{
+       switch (key) {
+       case ARG_VALUE_SIZE:
+               args.value_size = strtoul(arg, NULL, 10);
+               if (args.value_size > 4096) {
+                       fprintf(stderr, "too big value size %u\n", args.value_size);
+                       argp_usage(state);
+               }
+               break;
+       case ARG_USE_CASE:
+               args.use_case = strdup(arg);
+               if (!args.use_case) {
+                       fprintf(stderr, "no mem for use-case\n");
+                       argp_usage(state);
+               }
+               break;
+       case ARG_PREALLOCATED:
+               args.preallocated = true;
+               break;
+       default:
+               return ARGP_ERR_UNKNOWN;
+       }
+
+       return 0;
+}
+
+const struct argp bench_htab_mem_argp = {
+       .options = opts,
+       .parser = htab_mem_parse_arg,
+};
+
+static void htab_mem_validate(void)
+{
+       if (!strcmp(use_cases[2].name, args.use_case) && env.producer_cnt % 2) {
+               fprintf(stderr, "%s needs an even number of producers\n", args.use_case);
+               exit(1);
+       }
+}
+
+static int htab_mem_bench_init_barriers(void)
+{
+       pthread_barrier_t *barriers;
+       unsigned int i, nr;
+
+       if (!ctx.uc->need_sync)
+               return 0;
+
+       nr = (env.producer_cnt + 1) / 2;
+       barriers = calloc(nr, sizeof(*barriers));
+       if (!barriers)
+               return -1;
+
+       /* Used for synchronization between two threads */
+       for (i = 0; i < nr; i++)
+               pthread_barrier_init(&barriers[i], NULL, 2);
+
+       ctx.notify = barriers;
+       return 0;
+}
+
+static void htab_mem_bench_exit_barriers(void)
+{
+       unsigned int i, nr;
+
+       if (!ctx.notify)
+               return;
+
+       nr = (env.producer_cnt + 1) / 2;
+       for (i = 0; i < nr; i++)
+               pthread_barrier_destroy(&ctx.notify[i]);
+       free(ctx.notify);
+}
+
+static const struct htab_mem_use_case *htab_mem_find_use_case_or_exit(const char *name)
+{
+       unsigned int i;
+
+       for (i = 0; i < ARRAY_SIZE(use_cases); i++) {
+               if (!strcmp(name, use_cases[i].name))
+                       return &use_cases[i];
+       }
+
+       fprintf(stderr, "no such use-case: %s\n", name);
+       fprintf(stderr, "available use case:");
+       for (i = 0; i < ARRAY_SIZE(use_cases); i++)
+               fprintf(stderr, " %s", use_cases[i].name);
+       fprintf(stderr, "\n");
+       exit(1);
+}
+
+static void htab_mem_setup(void)
+{
+       struct bpf_map *map;
+       const char **names;
+       int err;
+
+       setup_libbpf();
+
+       ctx.uc = htab_mem_find_use_case_or_exit(args.use_case);
+       err = htab_mem_bench_init_barriers();
+       if (err) {
+               fprintf(stderr, "failed to init barrier\n");
+               exit(1);
+       }
+
+       ctx.fd = cgroup_setup_and_join("/htab_mem");
+       if (ctx.fd < 0)
+               goto cleanup;
+
+       ctx.skel = htab_mem_bench__open();
+       if (!ctx.skel) {
+               fprintf(stderr, "failed to open skeleton\n");
+               goto cleanup;
+       }
+
+       map = ctx.skel->maps.htab;
+       bpf_map__set_value_size(map, args.value_size);
+       /* Ensure that different CPUs can operate on different subset */
+       bpf_map__set_max_entries(map, MAX(8192, 64 * env.nr_cpus));
+       if (args.preallocated)
+               bpf_map__set_map_flags(map, bpf_map__map_flags(map) & ~BPF_F_NO_PREALLOC);
+
+       names = ctx.uc->progs;
+       while (*names) {
+               struct bpf_program *prog;
+
+               prog = bpf_object__find_program_by_name(ctx.skel->obj, *names);
+               if (!prog) {
+                       fprintf(stderr, "no such program %s\n", *names);
+                       goto cleanup;
+               }
+               bpf_program__set_autoload(prog, true);
+               names++;
+       }
+       ctx.skel->bss->nr_thread = env.producer_cnt;
+
+       err = htab_mem_bench__load(ctx.skel);
+       if (err) {
+               fprintf(stderr, "failed to load skeleton\n");
+               goto cleanup;
+       }
+       err = htab_mem_bench__attach(ctx.skel);
+       if (err) {
+               fprintf(stderr, "failed to attach skeleton\n");
+               goto cleanup;
+       }
+       return;
+
+cleanup:
+       htab_mem_bench__destroy(ctx.skel);
+       htab_mem_bench_exit_barriers();
+       if (ctx.fd >= 0) {
+               close(ctx.fd);
+               cleanup_cgroup_environment();
+       }
+       exit(1);
+}
+
+static void htab_mem_add_fn(pthread_barrier_t *notify)
+{
+       while (true) {
+               /* Do addition */
+               (void)syscall(__NR_getpgid, 0);
+               /* Notify deletion thread to do deletion */
+               pthread_barrier_wait(notify);
+               /* Wait for deletion to complete */
+               pthread_barrier_wait(notify);
+       }
+}
+
+static void htab_mem_delete_fn(pthread_barrier_t *notify)
+{
+       while (true) {
+               /* Wait for addition to complete */
+               pthread_barrier_wait(notify);
+               /* Do deletion */
+               (void)syscall(__NR_getppid);
+               /* Notify addition thread to do addition */
+               pthread_barrier_wait(notify);
+       }
+}
+
+static void *htab_mem_producer(void *arg)
+{
+       pthread_barrier_t *notify;
+       int seq;
+
+       if (!ctx.uc->need_sync) {
+               while (true)
+                       (void)syscall(__NR_getpgid, 0);
+               return NULL;
+       }
+
+       seq = (long)arg;
+       notify = &ctx.notify[seq / 2];
+       if (seq & 1)
+               htab_mem_delete_fn(notify);
+       else
+               htab_mem_add_fn(notify);
+       return NULL;
+}
+
+static void htab_mem_read_mem_cgrp_file(const char *name, unsigned long *value)
+{
+       char buf[32];
+       ssize_t got;
+       int fd;
+
+       fd = openat(ctx.fd, name, O_RDONLY);
+       if (fd < 0) {
+               /* cgroup v1 ? */
+               fprintf(stderr, "no %s\n", name);
+               *value = 0;
+               return;
+       }
+
+       got = read(fd, buf, sizeof(buf) - 1);
+       if (got <= 0) {
+               *value = 0;
+               return;
+       }
+       buf[got] = 0;
+
+       *value = strtoull(buf, NULL, 0);
+
+       close(fd);
+}
+
+static void htab_mem_measure(struct bench_res *res)
+{
+       res->hits = atomic_swap(&ctx.skel->bss->op_cnt, 0) / env.producer_cnt;
+       htab_mem_read_mem_cgrp_file("memory.current", &res->gp_ct);
+}
+
+static void htab_mem_report_progress(int iter, struct bench_res *res, long delta_ns)
+{
+       double loop, mem;
+
+       loop = res->hits / 1000.0 / (delta_ns / 1000000000.0);
+       mem = res->gp_ct / 1048576.0;
+       printf("Iter %3d (%7.3lfus): ", iter, (delta_ns - 1000000000) / 1000.0);
+       printf("per-prod-op %7.2lfk/s, memory usage %7.2lfMiB\n", loop, mem);
+}
+
+static void htab_mem_report_final(struct bench_res res[], int res_cnt)
+{
+       double mem_mean = 0.0, mem_stddev = 0.0;
+       double loop_mean = 0.0, loop_stddev = 0.0;
+       unsigned long peak_mem;
+       int i;
+
+       for (i = 0; i < res_cnt; i++) {
+               loop_mean += res[i].hits / 1000.0 / (0.0 + res_cnt);
+               mem_mean += res[i].gp_ct / 1048576.0 / (0.0 + res_cnt);
+       }
+       if (res_cnt > 1)  {
+               for (i = 0; i < res_cnt; i++) {
+                       loop_stddev += (loop_mean - res[i].hits / 1000.0) *
+                                      (loop_mean - res[i].hits / 1000.0) /
+                                      (res_cnt - 1.0);
+                       mem_stddev += (mem_mean - res[i].gp_ct / 1048576.0) *
+                                     (mem_mean - res[i].gp_ct / 1048576.0) /
+                                     (res_cnt - 1.0);
+               }
+               loop_stddev = sqrt(loop_stddev);
+               mem_stddev = sqrt(mem_stddev);
+       }
+
+       htab_mem_read_mem_cgrp_file("memory.peak", &peak_mem);
+       printf("Summary: per-prod-op %7.2lf \u00B1 %7.2lfk/s, memory usage %7.2lf \u00B1 %7.2lfMiB,"
+              " peak memory usage %7.2lfMiB\n",
+              loop_mean, loop_stddev, mem_mean, mem_stddev, peak_mem / 1048576.0);
+
+       cleanup_cgroup_environment();
+}
+
+const struct bench bench_htab_mem = {
+       .name = "htab-mem",
+       .argp = &bench_htab_mem_argp,
+       .validate = htab_mem_validate,
+       .setup = htab_mem_setup,
+       .producer_thread = htab_mem_producer,
+       .measure = htab_mem_measure,
+       .report_progress = htab_mem_report_progress,
+       .report_final = htab_mem_report_final,
+};
index 3ca14ad..e1ee979 100644 (file)
@@ -399,7 +399,7 @@ static void perfbuf_libbpf_setup(void)
        ctx->skel = perfbuf_setup_skeleton();
 
        memset(&attr, 0, sizeof(attr));
-       attr.config = PERF_COUNT_SW_BPF_OUTPUT,
+       attr.config = PERF_COUNT_SW_BPF_OUTPUT;
        attr.type = PERF_TYPE_SOFTWARE;
        attr.sample_type = PERF_SAMPLE_RAW;
        /* notify only every Nth sample */
diff --git a/tools/testing/selftests/bpf/benchs/run_bench_htab_mem.sh b/tools/testing/selftests/bpf/benchs/run_bench_htab_mem.sh
new file mode 100755 (executable)
index 0000000..9ff5832
--- /dev/null
@@ -0,0 +1,40 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+source ./benchs/run_common.sh
+
+set -eufo pipefail
+
+htab_mem()
+{
+       echo -n "per-prod-op: "
+       echo -n "$*" | sed -E "s/.* per-prod-op\s+([0-9]+\.[0-9]+ Â± [0-9]+\.[0-9]+k\/s).*/\1/"
+       echo -n -e ", avg mem: "
+       echo -n "$*" | sed -E "s/.* memory usage\s+([0-9]+\.[0-9]+ Â± [0-9]+\.[0-9]+MiB).*/\1/"
+       echo -n ", peak mem: "
+       echo "$*" | sed -E "s/.* peak memory usage\s+([0-9]+\.[0-9]+MiB).*/\1/"
+}
+
+summarize_htab_mem()
+{
+       local bench="$1"
+       local summary=$(echo $2 | tail -n1)
+
+       printf "%-20s %s\n" "$bench" "$(htab_mem $summary)"
+}
+
+htab_mem_bench()
+{
+       local name
+
+       for name in overwrite batch_add_batch_del add_del_on_diff_cpu
+       do
+               summarize_htab_mem "$name" "$($RUN_BENCH htab-mem --use-case $name -p8 "$@")"
+       done
+}
+
+header "preallocated"
+htab_mem_bench "--preallocated"
+
+header "normal bpf ma"
+htab_mem_bench
index aaf6ef1..a6f991b 100644 (file)
@@ -34,6 +34,11 @@ struct bpf_testmod_struct_arg_3 {
        int b[];
 };
 
+struct bpf_testmod_struct_arg_4 {
+       u64 a;
+       int b;
+};
+
 __diag_push();
 __diag_ignore_all("-Wmissing-prototypes",
                  "Global functions as their definitions will be in bpf_testmod.ko BTF");
@@ -75,6 +80,24 @@ bpf_testmod_test_struct_arg_6(struct bpf_testmod_struct_arg_3 *a) {
        return bpf_testmod_test_struct_arg_result;
 }
 
+noinline int
+bpf_testmod_test_struct_arg_7(u64 a, void *b, short c, int d, void *e,
+                             struct bpf_testmod_struct_arg_4 f)
+{
+       bpf_testmod_test_struct_arg_result = a + (long)b + c + d +
+               (long)e + f.a + f.b;
+       return bpf_testmod_test_struct_arg_result;
+}
+
+noinline int
+bpf_testmod_test_struct_arg_8(u64 a, void *b, short c, int d, void *e,
+                             struct bpf_testmod_struct_arg_4 f, int g)
+{
+       bpf_testmod_test_struct_arg_result = a + (long)b + c + d +
+               (long)e + f.a + f.b + g;
+       return bpf_testmod_test_struct_arg_result;
+}
+
 __bpf_kfunc void
 bpf_testmod_test_mod_kfunc(int i)
 {
@@ -191,6 +214,20 @@ noinline int bpf_testmod_fentry_test3(char a, int b, u64 c)
        return a + b + c;
 }
 
+noinline int bpf_testmod_fentry_test7(u64 a, void *b, short c, int d,
+                                     void *e, char f, int g)
+{
+       return a + (long)b + c + d + (long)e + f + g;
+}
+
+noinline int bpf_testmod_fentry_test11(u64 a, void *b, short c, int d,
+                                      void *e, char f, int g,
+                                      unsigned int h, long i, __u64 j,
+                                      unsigned long k)
+{
+       return a + (long)b + c + d + (long)e + f + g + h + i + j + k;
+}
+
 int bpf_testmod_fentry_ok;
 
 noinline ssize_t
@@ -206,6 +243,7 @@ bpf_testmod_test_read(struct file *file, struct kobject *kobj,
        struct bpf_testmod_struct_arg_1 struct_arg1 = {10};
        struct bpf_testmod_struct_arg_2 struct_arg2 = {2, 3};
        struct bpf_testmod_struct_arg_3 *struct_arg3;
+       struct bpf_testmod_struct_arg_4 struct_arg4 = {21, 22};
        int i = 1;
 
        while (bpf_testmod_return_ptr(i))
@@ -216,6 +254,11 @@ bpf_testmod_test_read(struct file *file, struct kobject *kobj,
        (void)bpf_testmod_test_struct_arg_3(1, 4, struct_arg2);
        (void)bpf_testmod_test_struct_arg_4(struct_arg1, 1, 2, 3, struct_arg2);
        (void)bpf_testmod_test_struct_arg_5();
+       (void)bpf_testmod_test_struct_arg_7(16, (void *)17, 18, 19,
+                                           (void *)20, struct_arg4);
+       (void)bpf_testmod_test_struct_arg_8(16, (void *)17, 18, 19,
+                                           (void *)20, struct_arg4, 23);
+
 
        struct_arg3 = kmalloc((sizeof(struct bpf_testmod_struct_arg_3) +
                                sizeof(int)), GFP_KERNEL);
@@ -243,7 +286,11 @@ bpf_testmod_test_read(struct file *file, struct kobject *kobj,
 
        if (bpf_testmod_fentry_test1(1) != 2 ||
            bpf_testmod_fentry_test2(2, 3) != 5 ||
-           bpf_testmod_fentry_test3(4, 5, 6) != 15)
+           bpf_testmod_fentry_test3(4, 5, 6) != 15 ||
+           bpf_testmod_fentry_test7(16, (void *)17, 18, 19, (void *)20,
+                       21, 22) != 133 ||
+           bpf_testmod_fentry_test11(16, (void *)17, 18, 19, (void *)20,
+                       21, 22, 23, 24, 25, 26) != 231)
                goto out;
 
        bpf_testmod_fentry_ok = 1;
index 9e95b37..2caee84 100644 (file)
@@ -277,6 +277,18 @@ int join_cgroup(const char *relative_path)
        return join_cgroup_from_top(cgroup_path);
 }
 
+/**
+ * join_root_cgroup() - Join the root cgroup
+ *
+ * This function joins the root cgroup.
+ *
+ * On success, it returns 0, otherwise on failure it returns 1.
+ */
+int join_root_cgroup(void)
+{
+       return join_cgroup_from_top(CGROUP_MOUNT_PATH);
+}
+
 /**
  * join_parent_cgroup() - Join a cgroup in the parent process workdir
  * @relative_path: The cgroup path, relative to parent process workdir, to join
index f099a16..5c2cb9c 100644 (file)
@@ -22,6 +22,7 @@ void remove_cgroup(const char *relative_path);
 unsigned long long get_cgroup_id(const char *relative_path);
 
 int join_cgroup(const char *relative_path);
+int join_root_cgroup(void);
 int join_parent_cgroup(const char *relative_path);
 
 int setup_cgroup_environment(void);
diff --git a/tools/testing/selftests/bpf/cgroup_tcp_skb.h b/tools/testing/selftests/bpf/cgroup_tcp_skb.h
new file mode 100644 (file)
index 0000000..7f6b24f
--- /dev/null
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
+
+/* Define states of a socket to tracking messages sending to and from the
+ * socket.
+ *
+ * These states are based on rfc9293 with some modifications to support
+ * tracking of messages sent out from a socket. For example, when a SYN is
+ * received, a new socket is transiting to the SYN_RECV state defined in
+ * rfc9293. But, we put it in SYN_RECV_SENDING_SYN_ACK state and when
+ * SYN-ACK is sent out, it moves to SYN_RECV state. With this modification,
+ * we can track the message sent out from a socket.
+ */
+
+#ifndef __CGROUP_TCP_SKB_H__
+#define __CGROUP_TCP_SKB_H__
+
+enum {
+       INIT,
+       CLOSED,
+       SYN_SENT,
+       SYN_RECV_SENDING_SYN_ACK,
+       SYN_RECV,
+       ESTABLISHED,
+       FIN_WAIT1,
+       FIN_WAIT2,
+       CLOSE_WAIT_SENDING_ACK,
+       CLOSE_WAIT,
+       CLOSING,
+       LAST_ACK,
+       TIME_WAIT_SENDING_ACK,
+       TIME_WAIT,
+};
+
+#endif /* __CGROUP_TCP_SKB_H__ */
index 719225b..1c638d9 100644 (file)
@@ -1 +1 @@
-/* dummy .h to trick /usr/include/features.h to work with 'clang -target bpf' */
+/* dummy .h to trick /usr/include/features.h to work with 'clang --target=bpf' */
diff --git a/tools/testing/selftests/bpf/map_tests/map_percpu_stats.c b/tools/testing/selftests/bpf/map_tests/map_percpu_stats.c
new file mode 100644 (file)
index 0000000..1a9eeef
--- /dev/null
@@ -0,0 +1,447 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Isovalent */
+
+#include <errno.h>
+#include <unistd.h>
+#include <pthread.h>
+
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+
+#include <bpf_util.h>
+#include <test_maps.h>
+
+#include "map_percpu_stats.skel.h"
+
+#define MAX_ENTRIES                    16384
+#define MAX_ENTRIES_HASH_OF_MAPS       64
+#define N_THREADS                      8
+#define MAX_MAP_KEY_SIZE               4
+
+static void map_info(int map_fd, struct bpf_map_info *info)
+{
+       __u32 len = sizeof(*info);
+       int ret;
+
+       memset(info, 0, sizeof(*info));
+
+       ret = bpf_obj_get_info_by_fd(map_fd, info, &len);
+       CHECK(ret < 0, "bpf_obj_get_info_by_fd", "error: %s\n", strerror(errno));
+}
+
+static const char *map_type_to_s(__u32 type)
+{
+       switch (type) {
+       case BPF_MAP_TYPE_HASH:
+               return "HASH";
+       case BPF_MAP_TYPE_PERCPU_HASH:
+               return "PERCPU_HASH";
+       case BPF_MAP_TYPE_LRU_HASH:
+               return "LRU_HASH";
+       case BPF_MAP_TYPE_LRU_PERCPU_HASH:
+               return "LRU_PERCPU_HASH";
+       case BPF_MAP_TYPE_HASH_OF_MAPS:
+               return "BPF_MAP_TYPE_HASH_OF_MAPS";
+       default:
+               return "<define-me>";
+       }
+}
+
+static __u32 map_count_elements(__u32 type, int map_fd)
+{
+       __u32 key = -1;
+       int n = 0;
+
+       while (!bpf_map_get_next_key(map_fd, &key, &key))
+               n++;
+       return n;
+}
+
+#define BATCH  true
+
+static void delete_and_lookup_batch(int map_fd, void *keys, __u32 count)
+{
+       static __u8 values[(8 << 10) * MAX_ENTRIES];
+       void *in_batch = NULL, *out_batch;
+       __u32 save_count = count;
+       int ret;
+
+       ret = bpf_map_lookup_and_delete_batch(map_fd,
+                                             &in_batch, &out_batch,
+                                             keys, values, &count,
+                                             NULL);
+
+       /*
+        * Despite what uapi header says, lookup_and_delete_batch will return
+        * -ENOENT in case we successfully have deleted all elements, so check
+        * this separately
+        */
+       CHECK(ret < 0 && (errno != ENOENT || !count), "bpf_map_lookup_and_delete_batch",
+                      "error: %s\n", strerror(errno));
+
+       CHECK(count != save_count,
+                       "bpf_map_lookup_and_delete_batch",
+                       "deleted not all elements: removed=%u expected=%u\n",
+                       count, save_count);
+}
+
+static void delete_all_elements(__u32 type, int map_fd, bool batch)
+{
+       static __u8 val[8 << 10]; /* enough for 1024 CPUs */
+       __u32 key = -1;
+       void *keys;
+       __u32 i, n;
+       int ret;
+
+       keys = calloc(MAX_MAP_KEY_SIZE, MAX_ENTRIES);
+       CHECK(!keys, "calloc", "error: %s\n", strerror(errno));
+
+       for (n = 0; !bpf_map_get_next_key(map_fd, &key, &key); n++)
+               memcpy(keys + n*MAX_MAP_KEY_SIZE, &key, MAX_MAP_KEY_SIZE);
+
+       if (batch) {
+               /* Can't mix delete_batch and delete_and_lookup_batch because
+                * they have different semantics in relation to the keys
+                * argument. However, delete_batch utilize map_delete_elem,
+                * so we actually test it in non-batch scenario */
+               delete_and_lookup_batch(map_fd, keys, n);
+       } else {
+               /* Intentionally mix delete and lookup_and_delete so we can test both */
+               for (i = 0; i < n; i++) {
+                       void *keyp = keys + i*MAX_MAP_KEY_SIZE;
+
+                       if (i % 2 || type == BPF_MAP_TYPE_HASH_OF_MAPS) {
+                               ret = bpf_map_delete_elem(map_fd, keyp);
+                               CHECK(ret < 0, "bpf_map_delete_elem",
+                                              "error: key %u: %s\n", i, strerror(errno));
+                       } else {
+                               ret = bpf_map_lookup_and_delete_elem(map_fd, keyp, val);
+                               CHECK(ret < 0, "bpf_map_lookup_and_delete_elem",
+                                              "error: key %u: %s\n", i, strerror(errno));
+                       }
+               }
+       }
+
+       free(keys);
+}
+
+static bool is_lru(__u32 map_type)
+{
+       return map_type == BPF_MAP_TYPE_LRU_HASH ||
+              map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH;
+}
+
+struct upsert_opts {
+       __u32 map_type;
+       int map_fd;
+       __u32 n;
+};
+
+static int create_small_hash(void)
+{
+       int map_fd;
+
+       map_fd = bpf_map_create(BPF_MAP_TYPE_HASH, "small", 4, 4, 4, NULL);
+       CHECK(map_fd < 0, "bpf_map_create()", "error:%s (name=%s)\n",
+                       strerror(errno), "small");
+
+       return map_fd;
+}
+
+static void *patch_map_thread(void *arg)
+{
+       struct upsert_opts *opts = arg;
+       int val;
+       int ret;
+       int i;
+
+       for (i = 0; i < opts->n; i++) {
+               if (opts->map_type == BPF_MAP_TYPE_HASH_OF_MAPS)
+                       val = create_small_hash();
+               else
+                       val = rand();
+               ret = bpf_map_update_elem(opts->map_fd, &i, &val, 0);
+               CHECK(ret < 0, "bpf_map_update_elem", "key=%d error: %s\n", i, strerror(errno));
+
+               if (opts->map_type == BPF_MAP_TYPE_HASH_OF_MAPS)
+                       close(val);
+       }
+       return NULL;
+}
+
+static void upsert_elements(struct upsert_opts *opts)
+{
+       pthread_t threads[N_THREADS];
+       int ret;
+       int i;
+
+       for (i = 0; i < ARRAY_SIZE(threads); i++) {
+               ret = pthread_create(&i[threads], NULL, patch_map_thread, opts);
+               CHECK(ret != 0, "pthread_create", "error: %s\n", strerror(ret));
+       }
+
+       for (i = 0; i < ARRAY_SIZE(threads); i++) {
+               ret = pthread_join(i[threads], NULL);
+               CHECK(ret != 0, "pthread_join", "error: %s\n", strerror(ret));
+       }
+}
+
+static __u32 read_cur_elements(int iter_fd)
+{
+       char buf[64];
+       ssize_t n;
+       __u32 ret;
+
+       n = read(iter_fd, buf, sizeof(buf)-1);
+       CHECK(n <= 0, "read", "error: %s\n", strerror(errno));
+       buf[n] = '\0';
+
+       errno = 0;
+       ret = (__u32)strtol(buf, NULL, 10);
+       CHECK(errno != 0, "strtol", "error: %s\n", strerror(errno));
+
+       return ret;
+}
+
+static __u32 get_cur_elements(int map_id)
+{
+       struct map_percpu_stats *skel;
+       struct bpf_link *link;
+       __u32 n_elements;
+       int iter_fd;
+       int ret;
+
+       skel = map_percpu_stats__open();
+       CHECK(skel == NULL, "map_percpu_stats__open", "error: %s", strerror(errno));
+
+       skel->bss->target_id = map_id;
+
+       ret = map_percpu_stats__load(skel);
+       CHECK(ret != 0, "map_percpu_stats__load", "error: %s", strerror(errno));
+
+       link = bpf_program__attach_iter(skel->progs.dump_bpf_map, NULL);
+       CHECK(!link, "bpf_program__attach_iter", "error: %s\n", strerror(errno));
+
+       iter_fd = bpf_iter_create(bpf_link__fd(link));
+       CHECK(iter_fd < 0, "bpf_iter_create", "error: %s\n", strerror(errno));
+
+       n_elements = read_cur_elements(iter_fd);
+
+       close(iter_fd);
+       bpf_link__destroy(link);
+       map_percpu_stats__destroy(skel);
+
+       return n_elements;
+}
+
+static void check_expected_number_elements(__u32 n_inserted, int map_fd,
+                                          struct bpf_map_info *info)
+{
+       __u32 n_real;
+       __u32 n_iter;
+
+       /* Count the current number of elements in the map by iterating through
+        * all the map keys via bpf_get_next_key
+        */
+       n_real = map_count_elements(info->type, map_fd);
+
+       /* The "real" number of elements should be the same as the inserted
+        * number of elements in all cases except LRU maps, where some elements
+        * may have been evicted
+        */
+       if (n_inserted == 0 || !is_lru(info->type))
+               CHECK(n_inserted != n_real, "map_count_elements",
+                     "n_real(%u) != n_inserted(%u)\n", n_real, n_inserted);
+
+       /* Count the current number of elements in the map using an iterator */
+       n_iter = get_cur_elements(info->id);
+
+       /* Both counts should be the same, as all updates are over */
+       CHECK(n_iter != n_real, "get_cur_elements",
+             "n_iter=%u, expected %u (map_type=%s,map_flags=%08x)\n",
+             n_iter, n_real, map_type_to_s(info->type), info->map_flags);
+}
+
+static void __test(int map_fd)
+{
+       struct upsert_opts opts = {
+               .map_fd = map_fd,
+       };
+       struct bpf_map_info info;
+
+       map_info(map_fd, &info);
+       opts.map_type = info.type;
+       opts.n = info.max_entries;
+
+       /* Reduce the number of elements we are updating such that we don't
+        * bump into -E2BIG from non-preallocated hash maps, but still will
+        * have some evictions for LRU maps  */
+       if (opts.map_type != BPF_MAP_TYPE_HASH_OF_MAPS)
+               opts.n -= 512;
+       else
+               opts.n /= 2;
+
+       /*
+        * Upsert keys [0, n) under some competition: with random values from
+        * N_THREADS threads. Check values, then delete all elements and check
+        * values again.
+        */
+       upsert_elements(&opts);
+       check_expected_number_elements(opts.n, map_fd, &info);
+       delete_all_elements(info.type, map_fd, !BATCH);
+       check_expected_number_elements(0, map_fd, &info);
+
+       /* Now do the same, but using batch delete operations */
+       upsert_elements(&opts);
+       check_expected_number_elements(opts.n, map_fd, &info);
+       delete_all_elements(info.type, map_fd, BATCH);
+       check_expected_number_elements(0, map_fd, &info);
+
+       close(map_fd);
+}
+
+static int map_create_opts(__u32 type, const char *name,
+                          struct bpf_map_create_opts *map_opts,
+                          __u32 key_size, __u32 val_size)
+{
+       int max_entries;
+       int map_fd;
+
+       if (type == BPF_MAP_TYPE_HASH_OF_MAPS)
+               max_entries = MAX_ENTRIES_HASH_OF_MAPS;
+       else
+               max_entries = MAX_ENTRIES;
+
+       map_fd = bpf_map_create(type, name, key_size, val_size, max_entries, map_opts);
+       CHECK(map_fd < 0, "bpf_map_create()", "error:%s (name=%s)\n",
+                       strerror(errno), name);
+
+       return map_fd;
+}
+
+static int map_create(__u32 type, const char *name, struct bpf_map_create_opts *map_opts)
+{
+       return map_create_opts(type, name, map_opts, sizeof(int), sizeof(int));
+}
+
+static int create_hash(void)
+{
+       struct bpf_map_create_opts map_opts = {
+               .sz = sizeof(map_opts),
+               .map_flags = BPF_F_NO_PREALLOC,
+       };
+
+       return map_create(BPF_MAP_TYPE_HASH, "hash", &map_opts);
+}
+
+static int create_percpu_hash(void)
+{
+       struct bpf_map_create_opts map_opts = {
+               .sz = sizeof(map_opts),
+               .map_flags = BPF_F_NO_PREALLOC,
+       };
+
+       return map_create(BPF_MAP_TYPE_PERCPU_HASH, "percpu_hash", &map_opts);
+}
+
+static int create_hash_prealloc(void)
+{
+       return map_create(BPF_MAP_TYPE_HASH, "hash", NULL);
+}
+
+static int create_percpu_hash_prealloc(void)
+{
+       return map_create(BPF_MAP_TYPE_PERCPU_HASH, "percpu_hash_prealloc", NULL);
+}
+
+static int create_lru_hash(__u32 type, __u32 map_flags)
+{
+       struct bpf_map_create_opts map_opts = {
+               .sz = sizeof(map_opts),
+               .map_flags = map_flags,
+       };
+
+       return map_create(type, "lru_hash", &map_opts);
+}
+
+static int create_hash_of_maps(void)
+{
+       struct bpf_map_create_opts map_opts = {
+               .sz = sizeof(map_opts),
+               .map_flags = BPF_F_NO_PREALLOC,
+               .inner_map_fd = create_small_hash(),
+       };
+       int ret;
+
+       ret = map_create_opts(BPF_MAP_TYPE_HASH_OF_MAPS, "hash_of_maps",
+                             &map_opts, sizeof(int), sizeof(int));
+       close(map_opts.inner_map_fd);
+       return ret;
+}
+
+static void map_percpu_stats_hash(void)
+{
+       __test(create_hash());
+       printf("test_%s:PASS\n", __func__);
+}
+
+static void map_percpu_stats_percpu_hash(void)
+{
+       __test(create_percpu_hash());
+       printf("test_%s:PASS\n", __func__);
+}
+
+static void map_percpu_stats_hash_prealloc(void)
+{
+       __test(create_hash_prealloc());
+       printf("test_%s:PASS\n", __func__);
+}
+
+static void map_percpu_stats_percpu_hash_prealloc(void)
+{
+       __test(create_percpu_hash_prealloc());
+       printf("test_%s:PASS\n", __func__);
+}
+
+static void map_percpu_stats_lru_hash(void)
+{
+       __test(create_lru_hash(BPF_MAP_TYPE_LRU_HASH, 0));
+       printf("test_%s:PASS\n", __func__);
+}
+
+static void map_percpu_stats_lru_hash_no_common(void)
+{
+       __test(create_lru_hash(BPF_MAP_TYPE_LRU_HASH, BPF_F_NO_COMMON_LRU));
+       printf("test_%s:PASS\n", __func__);
+}
+
+static void map_percpu_stats_percpu_lru_hash(void)
+{
+       __test(create_lru_hash(BPF_MAP_TYPE_LRU_PERCPU_HASH, 0));
+       printf("test_%s:PASS\n", __func__);
+}
+
+static void map_percpu_stats_percpu_lru_hash_no_common(void)
+{
+       __test(create_lru_hash(BPF_MAP_TYPE_LRU_PERCPU_HASH, BPF_F_NO_COMMON_LRU));
+       printf("test_%s:PASS\n", __func__);
+}
+
+static void map_percpu_stats_hash_of_maps(void)
+{
+       __test(create_hash_of_maps());
+       printf("test_%s:PASS\n", __func__);
+}
+
+void test_map_percpu_stats(void)
+{
+       map_percpu_stats_hash();
+       map_percpu_stats_percpu_hash();
+       map_percpu_stats_hash_prealloc();
+       map_percpu_stats_percpu_hash_prealloc();
+       map_percpu_stats_lru_hash();
+       map_percpu_stats_lru_hash_no_common();
+       map_percpu_stats_percpu_lru_hash();
+       map_percpu_stats_percpu_lru_hash_no_common();
+       map_percpu_stats_hash_of_maps();
+}
index c8ba400..b30ff6b 100644 (file)
@@ -123,12 +123,13 @@ static void test_bpf_nf_ct(int mode)
        ASSERT_EQ(skel->data->test_snat_addr, 0, "Test for source natting");
        ASSERT_EQ(skel->data->test_dnat_addr, 0, "Test for destination natting");
 end:
-       if (srv_client_fd != -1)
-               close(srv_client_fd);
        if (client_fd != -1)
                close(client_fd);
+       if (srv_client_fd != -1)
+               close(srv_client_fd);
        if (srv_fd != -1)
                close(srv_fd);
+
        snprintf(cmd, sizeof(cmd), iptables, "-D");
        system(cmd);
        test_bpf_nf__destroy(skel);
diff --git a/tools/testing/selftests/bpf/prog_tests/cgroup_tcp_skb.c b/tools/testing/selftests/bpf/prog_tests/cgroup_tcp_skb.c
new file mode 100644 (file)
index 0000000..95bab61
--- /dev/null
@@ -0,0 +1,402 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Facebook */
+#include <test_progs.h>
+#include <linux/in6.h>
+#include <sys/socket.h>
+#include <sched.h>
+#include <unistd.h>
+#include "cgroup_helpers.h"
+#include "testing_helpers.h"
+#include "cgroup_tcp_skb.skel.h"
+#include "cgroup_tcp_skb.h"
+
+#define CGROUP_TCP_SKB_PATH "/test_cgroup_tcp_skb"
+
+static int install_filters(int cgroup_fd,
+                          struct bpf_link **egress_link,
+                          struct bpf_link **ingress_link,
+                          struct bpf_program *egress_prog,
+                          struct bpf_program *ingress_prog,
+                          struct cgroup_tcp_skb *skel)
+{
+       /* Prepare filters */
+       skel->bss->g_sock_state = 0;
+       skel->bss->g_unexpected = 0;
+       *egress_link =
+               bpf_program__attach_cgroup(egress_prog,
+                                          cgroup_fd);
+       if (!ASSERT_OK_PTR(egress_link, "egress_link"))
+               return -1;
+       *ingress_link =
+               bpf_program__attach_cgroup(ingress_prog,
+                                          cgroup_fd);
+       if (!ASSERT_OK_PTR(ingress_link, "ingress_link"))
+               return -1;
+
+       return 0;
+}
+
+static void uninstall_filters(struct bpf_link **egress_link,
+                             struct bpf_link **ingress_link)
+{
+       bpf_link__destroy(*egress_link);
+       *egress_link = NULL;
+       bpf_link__destroy(*ingress_link);
+       *ingress_link = NULL;
+}
+
+static int create_client_sock_v6(void)
+{
+       int fd;
+
+       fd = socket(AF_INET6, SOCK_STREAM, 0);
+       if (fd < 0) {
+               perror("socket");
+               return -1;
+       }
+
+       return fd;
+}
+
+static int create_server_sock_v6(void)
+{
+       struct sockaddr_in6 addr = {
+               .sin6_family = AF_INET6,
+               .sin6_port = htons(0),
+               .sin6_addr = IN6ADDR_LOOPBACK_INIT,
+       };
+       int fd, err;
+
+       fd = socket(AF_INET6, SOCK_STREAM, 0);
+       if (fd < 0) {
+               perror("socket");
+               return -1;
+       }
+
+       err = bind(fd, (struct sockaddr *)&addr, sizeof(addr));
+       if (err < 0) {
+               perror("bind");
+               return -1;
+       }
+
+       err = listen(fd, 1);
+       if (err < 0) {
+               perror("listen");
+               return -1;
+       }
+
+       return fd;
+}
+
+static int get_sock_port_v6(int fd)
+{
+       struct sockaddr_in6 addr;
+       socklen_t len;
+       int err;
+
+       len = sizeof(addr);
+       err = getsockname(fd, (struct sockaddr *)&addr, &len);
+       if (err < 0) {
+               perror("getsockname");
+               return -1;
+       }
+
+       return ntohs(addr.sin6_port);
+}
+
+static int connect_client_server_v6(int client_fd, int listen_fd)
+{
+       struct sockaddr_in6 addr = {
+               .sin6_family = AF_INET6,
+               .sin6_addr = IN6ADDR_LOOPBACK_INIT,
+       };
+       int err;
+
+       addr.sin6_port = htons(get_sock_port_v6(listen_fd));
+       if (addr.sin6_port < 0)
+               return -1;
+
+       err = connect(client_fd, (struct sockaddr *)&addr, sizeof(addr));
+       if (err < 0) {
+               perror("connect");
+               return -1;
+       }
+
+       return 0;
+}
+
+/* Connect to the server in a cgroup from the outside of the cgroup. */
+static int talk_to_cgroup(int *client_fd, int *listen_fd, int *service_fd,
+                         struct cgroup_tcp_skb *skel)
+{
+       int err, cp;
+       char buf[5];
+
+       /* Create client & server socket */
+       err = join_root_cgroup();
+       if (!ASSERT_OK(err, "join_root_cgroup"))
+               return -1;
+       *client_fd = create_client_sock_v6();
+       if (!ASSERT_GE(*client_fd, 0, "client_fd"))
+               return -1;
+       err = join_cgroup(CGROUP_TCP_SKB_PATH);
+       if (!ASSERT_OK(err, "join_cgroup"))
+               return -1;
+       *listen_fd = create_server_sock_v6();
+       if (!ASSERT_GE(*listen_fd, 0, "listen_fd"))
+               return -1;
+       skel->bss->g_sock_port = get_sock_port_v6(*listen_fd);
+
+       /* Connect client to server */
+       err = connect_client_server_v6(*client_fd, *listen_fd);
+       if (!ASSERT_OK(err, "connect_client_server_v6"))
+               return -1;
+       *service_fd = accept(*listen_fd, NULL, NULL);
+       if (!ASSERT_GE(*service_fd, 0, "service_fd"))
+               return -1;
+       err = join_root_cgroup();
+       if (!ASSERT_OK(err, "join_root_cgroup"))
+               return -1;
+       cp = write(*client_fd, "hello", 5);
+       if (!ASSERT_EQ(cp, 5, "write"))
+               return -1;
+       cp = read(*service_fd, buf, 5);
+       if (!ASSERT_EQ(cp, 5, "read"))
+               return -1;
+
+       return 0;
+}
+
+/* Connect to the server out of a cgroup from inside the cgroup. */
+static int talk_to_outside(int *client_fd, int *listen_fd, int *service_fd,
+                          struct cgroup_tcp_skb *skel)
+
+{
+       int err, cp;
+       char buf[5];
+
+       /* Create client & server socket */
+       err = join_root_cgroup();
+       if (!ASSERT_OK(err, "join_root_cgroup"))
+               return -1;
+       *listen_fd = create_server_sock_v6();
+       if (!ASSERT_GE(*listen_fd, 0, "listen_fd"))
+               return -1;
+       err = join_cgroup(CGROUP_TCP_SKB_PATH);
+       if (!ASSERT_OK(err, "join_cgroup"))
+               return -1;
+       *client_fd = create_client_sock_v6();
+       if (!ASSERT_GE(*client_fd, 0, "client_fd"))
+               return -1;
+       err = join_root_cgroup();
+       if (!ASSERT_OK(err, "join_root_cgroup"))
+               return -1;
+       skel->bss->g_sock_port = get_sock_port_v6(*listen_fd);
+
+       /* Connect client to server */
+       err = connect_client_server_v6(*client_fd, *listen_fd);
+       if (!ASSERT_OK(err, "connect_client_server_v6"))
+               return -1;
+       *service_fd = accept(*listen_fd, NULL, NULL);
+       if (!ASSERT_GE(*service_fd, 0, "service_fd"))
+               return -1;
+       cp = write(*client_fd, "hello", 5);
+       if (!ASSERT_EQ(cp, 5, "write"))
+               return -1;
+       cp = read(*service_fd, buf, 5);
+       if (!ASSERT_EQ(cp, 5, "read"))
+               return -1;
+
+       return 0;
+}
+
+static int close_connection(int *closing_fd, int *peer_fd, int *listen_fd,
+                           struct cgroup_tcp_skb *skel)
+{
+       __u32 saved_packet_count = 0;
+       int err;
+       int i;
+
+       /* Wait for ACKs to be sent */
+       saved_packet_count = skel->bss->g_packet_count;
+       usleep(100000);         /* 0.1s */
+       for (i = 0;
+            skel->bss->g_packet_count != saved_packet_count && i < 10;
+            i++) {
+               saved_packet_count = skel->bss->g_packet_count;
+               usleep(100000); /* 0.1s */
+       }
+       if (!ASSERT_EQ(skel->bss->g_packet_count, saved_packet_count,
+                      "packet_count"))
+               return -1;
+
+       skel->bss->g_packet_count = 0;
+       saved_packet_count = 0;
+
+       /* Half shutdown to make sure the closing socket having a chance to
+        * receive a FIN from the peer.
+        */
+       err = shutdown(*closing_fd, SHUT_WR);
+       if (!ASSERT_OK(err, "shutdown closing_fd"))
+               return -1;
+
+       /* Wait for FIN and the ACK of the FIN to be observed */
+       for (i = 0;
+            skel->bss->g_packet_count < saved_packet_count + 2 && i < 10;
+            i++)
+               usleep(100000); /* 0.1s */
+       if (!ASSERT_GE(skel->bss->g_packet_count, saved_packet_count + 2,
+                      "packet_count"))
+               return -1;
+
+       saved_packet_count = skel->bss->g_packet_count;
+
+       /* Fully shutdown the connection */
+       err = close(*peer_fd);
+       if (!ASSERT_OK(err, "close peer_fd"))
+               return -1;
+       *peer_fd = -1;
+
+       /* Wait for FIN and the ACK of the FIN to be observed */
+       for (i = 0;
+            skel->bss->g_packet_count < saved_packet_count + 2 && i < 10;
+            i++)
+               usleep(100000); /* 0.1s */
+       if (!ASSERT_GE(skel->bss->g_packet_count, saved_packet_count + 2,
+                      "packet_count"))
+               return -1;
+
+       err = close(*closing_fd);
+       if (!ASSERT_OK(err, "close closing_fd"))
+               return -1;
+       *closing_fd = -1;
+
+       close(*listen_fd);
+       *listen_fd = -1;
+
+       return 0;
+}
+
+/* This test case includes four scenarios:
+ * 1. Connect to the server from outside the cgroup and close the connection
+ *    from outside the cgroup.
+ * 2. Connect to the server from outside the cgroup and close the connection
+ *    from inside the cgroup.
+ * 3. Connect to the server from inside the cgroup and close the connection
+ *    from outside the cgroup.
+ * 4. Connect to the server from inside the cgroup and close the connection
+ *    from inside the cgroup.
+ *
+ * The test case is to verify that cgroup_skb/{egress,ingress} filters
+ * receive expected packets including SYN, SYN/ACK, ACK, FIN, and FIN/ACK.
+ */
+void test_cgroup_tcp_skb(void)
+{
+       struct bpf_link *ingress_link = NULL;
+       struct bpf_link *egress_link = NULL;
+       int client_fd = -1, listen_fd = -1;
+       struct cgroup_tcp_skb *skel;
+       int service_fd = -1;
+       int cgroup_fd = -1;
+       int err;
+
+       skel = cgroup_tcp_skb__open_and_load();
+       if (!ASSERT_OK(!skel, "skel_open_load"))
+               return;
+
+       err = setup_cgroup_environment();
+       if (!ASSERT_OK(err, "setup_cgroup_environment"))
+               goto cleanup;
+
+       cgroup_fd = create_and_get_cgroup(CGROUP_TCP_SKB_PATH);
+       if (!ASSERT_GE(cgroup_fd, 0, "cgroup_fd"))
+               goto cleanup;
+
+       /* Scenario 1 */
+       err = install_filters(cgroup_fd, &egress_link, &ingress_link,
+                             skel->progs.server_egress,
+                             skel->progs.server_ingress,
+                             skel);
+       if (!ASSERT_OK(err, "install_filters"))
+               goto cleanup;
+
+       err = talk_to_cgroup(&client_fd, &listen_fd, &service_fd, skel);
+       if (!ASSERT_OK(err, "talk_to_cgroup"))
+               goto cleanup;
+
+       err = close_connection(&client_fd, &service_fd, &listen_fd, skel);
+       if (!ASSERT_OK(err, "close_connection"))
+               goto cleanup;
+
+       ASSERT_EQ(skel->bss->g_unexpected, 0, "g_unexpected");
+       ASSERT_EQ(skel->bss->g_sock_state, CLOSED, "g_sock_state");
+
+       uninstall_filters(&egress_link, &ingress_link);
+
+       /* Scenario 2 */
+       err = install_filters(cgroup_fd, &egress_link, &ingress_link,
+                             skel->progs.server_egress_srv,
+                             skel->progs.server_ingress_srv,
+                             skel);
+
+       err = talk_to_cgroup(&client_fd, &listen_fd, &service_fd, skel);
+       if (!ASSERT_OK(err, "talk_to_cgroup"))
+               goto cleanup;
+
+       err = close_connection(&service_fd, &client_fd, &listen_fd, skel);
+       if (!ASSERT_OK(err, "close_connection"))
+               goto cleanup;
+
+       ASSERT_EQ(skel->bss->g_unexpected, 0, "g_unexpected");
+       ASSERT_EQ(skel->bss->g_sock_state, TIME_WAIT, "g_sock_state");
+
+       uninstall_filters(&egress_link, &ingress_link);
+
+       /* Scenario 3 */
+       err = install_filters(cgroup_fd, &egress_link, &ingress_link,
+                             skel->progs.client_egress_srv,
+                             skel->progs.client_ingress_srv,
+                             skel);
+
+       err = talk_to_outside(&client_fd, &listen_fd, &service_fd, skel);
+       if (!ASSERT_OK(err, "talk_to_outside"))
+               goto cleanup;
+
+       err = close_connection(&service_fd, &client_fd, &listen_fd, skel);
+       if (!ASSERT_OK(err, "close_connection"))
+               goto cleanup;
+
+       ASSERT_EQ(skel->bss->g_unexpected, 0, "g_unexpected");
+       ASSERT_EQ(skel->bss->g_sock_state, CLOSED, "g_sock_state");
+
+       uninstall_filters(&egress_link, &ingress_link);
+
+       /* Scenario 4 */
+       err = install_filters(cgroup_fd, &egress_link, &ingress_link,
+                             skel->progs.client_egress,
+                             skel->progs.client_ingress,
+                             skel);
+
+       err = talk_to_outside(&client_fd, &listen_fd, &service_fd, skel);
+       if (!ASSERT_OK(err, "talk_to_outside"))
+               goto cleanup;
+
+       err = close_connection(&client_fd, &service_fd, &listen_fd, skel);
+       if (!ASSERT_OK(err, "close_connection"))
+               goto cleanup;
+
+       ASSERT_EQ(skel->bss->g_unexpected, 0, "g_unexpected");
+       ASSERT_EQ(skel->bss->g_sock_state, TIME_WAIT, "g_sock_state");
+
+       uninstall_filters(&egress_link, &ingress_link);
+
+cleanup:
+       close(client_fd);
+       close(listen_fd);
+       close(service_fd);
+       close(cgroup_fd);
+       bpf_link__destroy(egress_link);
+       bpf_link__destroy(ingress_link);
+       cleanup_cgroup_environment();
+       cgroup_tcp_skb__destroy(skel);
+}
index c0d1d61..aee1bc7 100644 (file)
@@ -2,8 +2,9 @@
 /* Copyright (c) 2019 Facebook */
 #include <test_progs.h>
 #include "fentry_test.lskel.h"
+#include "fentry_many_args.skel.h"
 
-static int fentry_test(struct fentry_test_lskel *fentry_skel)
+static int fentry_test_common(struct fentry_test_lskel *fentry_skel)
 {
        int err, prog_fd, i;
        int link_fd;
@@ -37,7 +38,7 @@ static int fentry_test(struct fentry_test_lskel *fentry_skel)
        return 0;
 }
 
-void test_fentry_test(void)
+static void fentry_test(void)
 {
        struct fentry_test_lskel *fentry_skel = NULL;
        int err;
@@ -46,13 +47,47 @@ void test_fentry_test(void)
        if (!ASSERT_OK_PTR(fentry_skel, "fentry_skel_load"))
                goto cleanup;
 
-       err = fentry_test(fentry_skel);
+       err = fentry_test_common(fentry_skel);
        if (!ASSERT_OK(err, "fentry_first_attach"))
                goto cleanup;
 
-       err = fentry_test(fentry_skel);
+       err = fentry_test_common(fentry_skel);
        ASSERT_OK(err, "fentry_second_attach");
 
 cleanup:
        fentry_test_lskel__destroy(fentry_skel);
 }
+
+static void fentry_many_args(void)
+{
+       struct fentry_many_args *fentry_skel = NULL;
+       int err;
+
+       fentry_skel = fentry_many_args__open_and_load();
+       if (!ASSERT_OK_PTR(fentry_skel, "fentry_many_args_skel_load"))
+               goto cleanup;
+
+       err = fentry_many_args__attach(fentry_skel);
+       if (!ASSERT_OK(err, "fentry_many_args_attach"))
+               goto cleanup;
+
+       ASSERT_OK(trigger_module_test_read(1), "trigger_read");
+
+       ASSERT_EQ(fentry_skel->bss->test1_result, 1,
+                 "fentry_many_args_result1");
+       ASSERT_EQ(fentry_skel->bss->test2_result, 1,
+                 "fentry_many_args_result2");
+       ASSERT_EQ(fentry_skel->bss->test3_result, 1,
+                 "fentry_many_args_result3");
+
+cleanup:
+       fentry_many_args__destroy(fentry_skel);
+}
+
+void test_fentry_test(void)
+{
+       if (test__start_subtest("fentry"))
+               fentry_test();
+       if (test__start_subtest("fentry_many_args"))
+               fentry_many_args();
+}
index 101b734..1c13007 100644 (file)
@@ -2,8 +2,9 @@
 /* Copyright (c) 2019 Facebook */
 #include <test_progs.h>
 #include "fexit_test.lskel.h"
+#include "fexit_many_args.skel.h"
 
-static int fexit_test(struct fexit_test_lskel *fexit_skel)
+static int fexit_test_common(struct fexit_test_lskel *fexit_skel)
 {
        int err, prog_fd, i;
        int link_fd;
@@ -37,7 +38,7 @@ static int fexit_test(struct fexit_test_lskel *fexit_skel)
        return 0;
 }
 
-void test_fexit_test(void)
+static void fexit_test(void)
 {
        struct fexit_test_lskel *fexit_skel = NULL;
        int err;
@@ -46,13 +47,47 @@ void test_fexit_test(void)
        if (!ASSERT_OK_PTR(fexit_skel, "fexit_skel_load"))
                goto cleanup;
 
-       err = fexit_test(fexit_skel);
+       err = fexit_test_common(fexit_skel);
        if (!ASSERT_OK(err, "fexit_first_attach"))
                goto cleanup;
 
-       err = fexit_test(fexit_skel);
+       err = fexit_test_common(fexit_skel);
        ASSERT_OK(err, "fexit_second_attach");
 
 cleanup:
        fexit_test_lskel__destroy(fexit_skel);
 }
+
+static void fexit_many_args(void)
+{
+       struct fexit_many_args *fexit_skel = NULL;
+       int err;
+
+       fexit_skel = fexit_many_args__open_and_load();
+       if (!ASSERT_OK_PTR(fexit_skel, "fexit_many_args_skel_load"))
+               goto cleanup;
+
+       err = fexit_many_args__attach(fexit_skel);
+       if (!ASSERT_OK(err, "fexit_many_args_attach"))
+               goto cleanup;
+
+       ASSERT_OK(trigger_module_test_read(1), "trigger_read");
+
+       ASSERT_EQ(fexit_skel->bss->test1_result, 1,
+                 "fexit_many_args_result1");
+       ASSERT_EQ(fexit_skel->bss->test2_result, 1,
+                 "fexit_many_args_result2");
+       ASSERT_EQ(fexit_skel->bss->test3_result, 1,
+                 "fexit_many_args_result3");
+
+cleanup:
+       fexit_many_args__destroy(fexit_skel);
+}
+
+void test_fexit_test(void)
+{
+       if (test__start_subtest("fexit"))
+               fexit_test();
+       if (test__start_subtest("fexit_many_args"))
+               fexit_many_args();
+}
index 28cf639..64a9c95 100644 (file)
@@ -30,7 +30,9 @@ void test_get_func_args_test(void)
        prog_fd = bpf_program__fd(skel->progs.fmod_ret_test);
        err = bpf_prog_test_run_opts(prog_fd, &topts);
        ASSERT_OK(err, "test_run");
-       ASSERT_EQ(topts.retval, 1234, "test_run");
+
+       ASSERT_EQ(topts.retval >> 16, 1, "test_run");
+       ASSERT_EQ(topts.retval & 0xffff, 1234 + 29, "test_run");
 
        ASSERT_EQ(skel->bss->test1_result, 1, "test1_result");
        ASSERT_EQ(skel->bss->test2_result, 1, "test2_result");
index fd41425..56b5bae 100644 (file)
@@ -22,7 +22,7 @@ static void global_map_resize_bss_subtest(void)
        struct test_global_map_resize *skel;
        struct bpf_map *map;
        const __u32 desired_sz = sizeof(skel->bss->sum) + sysconf(_SC_PAGE_SIZE) * 2;
-       size_t array_len, actual_sz;
+       size_t array_len, actual_sz, new_sz;
 
        skel = test_global_map_resize__open();
        if (!ASSERT_OK_PTR(skel, "test_global_map_resize__open"))
@@ -42,6 +42,10 @@ static void global_map_resize_bss_subtest(void)
        if (!ASSERT_EQ(bpf_map__value_size(map), desired_sz, "resize"))
                goto teardown;
 
+       new_sz = sizeof(skel->data_percpu_arr->percpu_arr[0]) * libbpf_num_possible_cpus();
+       err = bpf_map__set_value_size(skel->maps.data_percpu_arr, new_sz);
+       ASSERT_OK(err, "percpu_arr_resize");
+
        /* set the expected number of elements based on the resized array */
        array_len = (desired_sz - sizeof(skel->bss->sum)) / sizeof(skel->bss->array[0]);
        if (!ASSERT_GT(array_len, 1, "array_len"))
@@ -84,11 +88,11 @@ teardown:
 
 static void global_map_resize_data_subtest(void)
 {
-       int err;
        struct test_global_map_resize *skel;
        struct bpf_map *map;
        const __u32 desired_sz = sysconf(_SC_PAGE_SIZE) * 2;
-       size_t array_len, actual_sz;
+       size_t array_len, actual_sz, new_sz;
+       int err;
 
        skel = test_global_map_resize__open();
        if (!ASSERT_OK_PTR(skel, "test_global_map_resize__open"))
@@ -108,6 +112,10 @@ static void global_map_resize_data_subtest(void)
        if (!ASSERT_EQ(bpf_map__value_size(map), desired_sz, "resize"))
                goto teardown;
 
+       new_sz = sizeof(skel->data_percpu_arr->percpu_arr[0]) * libbpf_num_possible_cpus();
+       err = bpf_map__set_value_size(skel->maps.data_percpu_arr, new_sz);
+       ASSERT_OK(err, "percpu_arr_resize");
+
        /* set the expected number of elements based on the resized array */
        array_len = (desired_sz - sizeof(skel->bss->sum)) / sizeof(skel->data_custom->my_array[0]);
        if (!ASSERT_GT(array_len, 1, "array_len"))
index 5d9955a..a70c99c 100644 (file)
@@ -41,6 +41,10 @@ static void run_test(__u32 input_retval, __u16 want_side_effect, __s16 want_ret)
        ASSERT_EQ(skel->bss->fexit_result, 1, "modify_return fexit_result");
        ASSERT_EQ(skel->bss->fmod_ret_result, 1, "modify_return fmod_ret_result");
 
+       ASSERT_EQ(skel->bss->fentry_result2, 1, "modify_return fentry_result2");
+       ASSERT_EQ(skel->bss->fexit_result2, 1, "modify_return fexit_result2");
+       ASSERT_EQ(skel->bss->fmod_ret_result2, 1, "modify_return fmod_ret_result2");
+
 cleanup:
        modify_return__destroy(skel);
 }
@@ -49,9 +53,9 @@ cleanup:
 void serial_test_modify_return(void)
 {
        run_test(0 /* input_retval */,
-                1 /* want_side_effect */,
-                4 /* want_ret */);
+                2 /* want_side_effect */,
+                33 /* want_ret */);
        run_test(-EINVAL /* input_retval */,
                 0 /* want_side_effect */,
-                -EINVAL /* want_ret */);
+                -EINVAL * 2 /* want_ret */);
 }
diff --git a/tools/testing/selftests/bpf/prog_tests/netfilter_link_attach.c b/tools/testing/selftests/bpf/prog_tests/netfilter_link_attach.c
new file mode 100644 (file)
index 0000000..4297a2a
--- /dev/null
@@ -0,0 +1,86 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include <netinet/in.h>
+#include <linux/netfilter.h>
+
+#include "test_progs.h"
+#include "test_netfilter_link_attach.skel.h"
+
+struct nf_link_test {
+       __u32 pf;
+       __u32 hooknum;
+       __s32 priority;
+       __u32 flags;
+
+       bool expect_success;
+       const char * const name;
+};
+
+static const struct nf_link_test nf_hook_link_tests[] = {
+       { .name = "allzero", },
+       { .pf = NFPROTO_NUMPROTO, .name = "invalid-pf", },
+       { .pf = NFPROTO_IPV4, .hooknum = 42, .name = "invalid-hooknum", },
+       { .pf = NFPROTO_IPV4, .priority = INT_MIN, .name = "invalid-priority-min", },
+       { .pf = NFPROTO_IPV4, .priority = INT_MAX, .name = "invalid-priority-max", },
+       { .pf = NFPROTO_IPV4, .flags = UINT_MAX, .name = "invalid-flags", },
+
+       { .pf = NFPROTO_INET, .priority = 1, .name = "invalid-inet-not-supported", },
+
+       { .pf = NFPROTO_IPV4, .priority = -10000, .expect_success = true, .name = "attach ipv4", },
+       { .pf = NFPROTO_IPV6, .priority =  10001, .expect_success = true, .name = "attach ipv6", },
+};
+
+void test_netfilter_link_attach(void)
+{
+       struct test_netfilter_link_attach *skel;
+       struct bpf_program *prog;
+       LIBBPF_OPTS(bpf_netfilter_opts, opts);
+       int i;
+
+       skel = test_netfilter_link_attach__open_and_load();
+       if (!ASSERT_OK_PTR(skel, "test_netfilter_link_attach__open_and_load"))
+               goto out;
+
+       prog = skel->progs.nf_link_attach_test;
+       if (!ASSERT_OK_PTR(prog, "attach program"))
+               goto out;
+
+       for (i = 0; i < ARRAY_SIZE(nf_hook_link_tests); i++) {
+               struct bpf_link *link;
+
+               if (!test__start_subtest(nf_hook_link_tests[i].name))
+                       continue;
+
+#define X(opts, m, i)  opts.m = nf_hook_link_tests[(i)].m
+               X(opts, pf, i);
+               X(opts, hooknum, i);
+               X(opts, priority, i);
+               X(opts, flags, i);
+#undef X
+               link = bpf_program__attach_netfilter(prog, &opts);
+               if (nf_hook_link_tests[i].expect_success) {
+                       struct bpf_link *link2;
+
+                       if (!ASSERT_OK_PTR(link, "program attach successful"))
+                               continue;
+
+                       link2 = bpf_program__attach_netfilter(prog, &opts);
+                       ASSERT_ERR_PTR(link2, "attach program with same pf/hook/priority");
+
+                       if (!ASSERT_OK(bpf_link__destroy(link), "link destroy"))
+                               break;
+
+                       link2 = bpf_program__attach_netfilter(prog, &opts);
+                       if (!ASSERT_OK_PTR(link2, "program reattach successful"))
+                               continue;
+                       if (!ASSERT_OK(bpf_link__destroy(link2), "link destroy"))
+                               break;
+               } else {
+                       ASSERT_ERR_PTR(link, "program load failure");
+               }
+       }
+
+out:
+       test_netfilter_link_attach__destroy(skel);
+}
+
diff --git a/tools/testing/selftests/bpf/prog_tests/ptr_untrusted.c b/tools/testing/selftests/bpf/prog_tests/ptr_untrusted.c
new file mode 100644 (file)
index 0000000..8d077d1
--- /dev/null
@@ -0,0 +1,36 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2023 Yafang Shao <laoar.shao@gmail.com> */
+
+#include <string.h>
+#include <linux/bpf.h>
+#include <test_progs.h>
+#include "test_ptr_untrusted.skel.h"
+
+#define TP_NAME "sched_switch"
+
+void serial_test_ptr_untrusted(void)
+{
+       struct test_ptr_untrusted *skel;
+       int err;
+
+       skel = test_ptr_untrusted__open_and_load();
+       if (!ASSERT_OK_PTR(skel, "skel_open"))
+               goto cleanup;
+
+       /* First, attach lsm prog */
+       skel->links.lsm_run = bpf_program__attach_lsm(skel->progs.lsm_run);
+       if (!ASSERT_OK_PTR(skel->links.lsm_run, "lsm_attach"))
+               goto cleanup;
+
+       /* Second, attach raw_tp prog. The lsm prog will be triggered. */
+       skel->links.raw_tp_run = bpf_program__attach_raw_tracepoint(skel->progs.raw_tp_run,
+                                                                   TP_NAME);
+       if (!ASSERT_OK_PTR(skel->links.raw_tp_run, "raw_tp_attach"))
+               goto cleanup;
+
+       err = strncmp(skel->bss->tp_name, TP_NAME, strlen(TP_NAME));
+       ASSERT_EQ(err, 0, "cmp_tp_name");
+
+cleanup:
+       test_ptr_untrusted__destroy(skel);
+}
index 13bcaeb..56685fc 100644 (file)
@@ -347,7 +347,7 @@ static void syncookie_estab(void)
        exp_active_estab_in.max_delack_ms = 22;
 
        exp_passive_hdr_stg.syncookie = true;
-       exp_active_hdr_stg.resend_syn = true,
+       exp_active_hdr_stg.resend_syn = true;
 
        prepare_out();
 
index 1c75a32..fe0fb0c 100644 (file)
@@ -55,6 +55,25 @@ static void test_fentry(void)
 
        ASSERT_EQ(skel->bss->t6, 1, "t6 ret");
 
+       ASSERT_EQ(skel->bss->t7_a, 16, "t7:a");
+       ASSERT_EQ(skel->bss->t7_b, 17, "t7:b");
+       ASSERT_EQ(skel->bss->t7_c, 18, "t7:c");
+       ASSERT_EQ(skel->bss->t7_d, 19, "t7:d");
+       ASSERT_EQ(skel->bss->t7_e, 20, "t7:e");
+       ASSERT_EQ(skel->bss->t7_f_a, 21, "t7:f.a");
+       ASSERT_EQ(skel->bss->t7_f_b, 22, "t7:f.b");
+       ASSERT_EQ(skel->bss->t7_ret, 133, "t7 ret");
+
+       ASSERT_EQ(skel->bss->t8_a, 16, "t8:a");
+       ASSERT_EQ(skel->bss->t8_b, 17, "t8:b");
+       ASSERT_EQ(skel->bss->t8_c, 18, "t8:c");
+       ASSERT_EQ(skel->bss->t8_d, 19, "t8:d");
+       ASSERT_EQ(skel->bss->t8_e, 20, "t8:e");
+       ASSERT_EQ(skel->bss->t8_f_a, 21, "t8:f.a");
+       ASSERT_EQ(skel->bss->t8_f_b, 22, "t8:f.b");
+       ASSERT_EQ(skel->bss->t8_g, 23, "t8:g");
+       ASSERT_EQ(skel->bss->t8_ret, 156, "t8 ret");
+
        tracing_struct__detach(skel);
 destroy_skel:
        tracing_struct__destroy(skel);
index e91d0d1..6cd7349 100644 (file)
@@ -88,8 +88,8 @@ void serial_test_trampoline_count(void)
        if (!ASSERT_OK(err, "bpf_prog_test_run_opts"))
                goto cleanup;
 
-       ASSERT_EQ(opts.retval & 0xffff, 4, "bpf_modify_return_test.result");
-       ASSERT_EQ(opts.retval >> 16, 1, "bpf_modify_return_test.side_effect");
+       ASSERT_EQ(opts.retval & 0xffff, 33, "bpf_modify_return_test.result");
+       ASSERT_EQ(opts.retval >> 16, 2, "bpf_modify_return_test.side_effect");
 
 cleanup:
        for (; i >= 0; i--) {
index 070a138..c375e59 100644 (file)
@@ -58,6 +58,7 @@
 #include "verifier_stack_ptr.skel.h"
 #include "verifier_subprog_precision.skel.h"
 #include "verifier_subreg.skel.h"
+#include "verifier_typedef.skel.h"
 #include "verifier_uninit.skel.h"
 #include "verifier_unpriv.skel.h"
 #include "verifier_unpriv_perf.skel.h"
@@ -159,6 +160,7 @@ void test_verifier_spin_lock(void)            { RUN(verifier_spin_lock); }
 void test_verifier_stack_ptr(void)            { RUN(verifier_stack_ptr); }
 void test_verifier_subprog_precision(void)    { RUN(verifier_subprog_precision); }
 void test_verifier_subreg(void)               { RUN(verifier_subreg); }
+void test_verifier_typedef(void)              { RUN(verifier_typedef); }
 void test_verifier_uninit(void)               { RUN(verifier_uninit); }
 void test_verifier_unpriv(void)               { RUN(verifier_unpriv); }
 void test_verifier_unpriv_perf(void)          { RUN(verifier_unpriv_perf); }
diff --git a/tools/testing/selftests/bpf/progs/cgroup_tcp_skb.c b/tools/testing/selftests/bpf/progs/cgroup_tcp_skb.c
new file mode 100644 (file)
index 0000000..1e2e73f
--- /dev/null
@@ -0,0 +1,382 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
+#include <linux/bpf.h>
+#include <bpf/bpf_endian.h>
+#include <bpf/bpf_helpers.h>
+
+#include <linux/if_ether.h>
+#include <linux/in.h>
+#include <linux/in6.h>
+#include <linux/ipv6.h>
+#include <linux/tcp.h>
+
+#include <sys/types.h>
+#include <sys/socket.h>
+
+#include "cgroup_tcp_skb.h"
+
+char _license[] SEC("license") = "GPL";
+
+__u16 g_sock_port = 0;
+__u32 g_sock_state = 0;
+int g_unexpected = 0;
+__u32 g_packet_count = 0;
+
+int needed_tcp_pkt(struct __sk_buff *skb, struct tcphdr *tcph)
+{
+       struct ipv6hdr ip6h;
+
+       if (skb->protocol != bpf_htons(ETH_P_IPV6))
+               return 0;
+       if (bpf_skb_load_bytes(skb, 0, &ip6h, sizeof(ip6h)))
+               return 0;
+
+       if (ip6h.nexthdr != IPPROTO_TCP)
+               return 0;
+
+       if (bpf_skb_load_bytes(skb, sizeof(ip6h), tcph, sizeof(*tcph)))
+               return 0;
+
+       if (tcph->source != bpf_htons(g_sock_port) &&
+           tcph->dest != bpf_htons(g_sock_port))
+               return 0;
+
+       return 1;
+}
+
+/* Run accept() on a socket in the cgroup to receive a new connection. */
+static int egress_accept(struct tcphdr *tcph)
+{
+       if (g_sock_state ==  SYN_RECV_SENDING_SYN_ACK) {
+               if (tcph->fin || !tcph->syn || !tcph->ack)
+                       g_unexpected++;
+               else
+                       g_sock_state = SYN_RECV;
+               return 1;
+       }
+
+       return 0;
+}
+
+static int ingress_accept(struct tcphdr *tcph)
+{
+       switch (g_sock_state) {
+       case INIT:
+               if (!tcph->syn || tcph->fin || tcph->ack)
+                       g_unexpected++;
+               else
+                       g_sock_state = SYN_RECV_SENDING_SYN_ACK;
+               break;
+       case SYN_RECV:
+               if (tcph->fin || tcph->syn || !tcph->ack)
+                       g_unexpected++;
+               else
+                       g_sock_state = ESTABLISHED;
+               break;
+       default:
+               return 0;
+       }
+
+       return 1;
+}
+
+/* Run connect() on a socket in the cgroup to start a new connection. */
+static int egress_connect(struct tcphdr *tcph)
+{
+       if (g_sock_state == INIT) {
+               if (!tcph->syn || tcph->fin || tcph->ack)
+                       g_unexpected++;
+               else
+                       g_sock_state = SYN_SENT;
+               return 1;
+       }
+
+       return 0;
+}
+
+static int ingress_connect(struct tcphdr *tcph)
+{
+       if (g_sock_state == SYN_SENT) {
+               if (tcph->fin || !tcph->syn || !tcph->ack)
+                       g_unexpected++;
+               else
+                       g_sock_state = ESTABLISHED;
+               return 1;
+       }
+
+       return 0;
+}
+
+/* The connection is closed by the peer outside the cgroup. */
+static int egress_close_remote(struct tcphdr *tcph)
+{
+       switch (g_sock_state) {
+       case ESTABLISHED:
+               break;
+       case CLOSE_WAIT_SENDING_ACK:
+               if (tcph->fin || tcph->syn || !tcph->ack)
+                       g_unexpected++;
+               else
+                       g_sock_state = CLOSE_WAIT;
+               break;
+       case CLOSE_WAIT:
+               if (!tcph->fin)
+                       g_unexpected++;
+               else
+                       g_sock_state = LAST_ACK;
+               break;
+       default:
+               return 0;
+       }
+
+       return 1;
+}
+
+static int ingress_close_remote(struct tcphdr *tcph)
+{
+       switch (g_sock_state) {
+       case ESTABLISHED:
+               if (tcph->fin)
+                       g_sock_state = CLOSE_WAIT_SENDING_ACK;
+               break;
+       case LAST_ACK:
+               if (tcph->fin || tcph->syn || !tcph->ack)
+                       g_unexpected++;
+               else
+                       g_sock_state = CLOSED;
+               break;
+       default:
+               return 0;
+       }
+
+       return 1;
+}
+
+/* The connection is closed by the endpoint inside the cgroup. */
+static int egress_close_local(struct tcphdr *tcph)
+{
+       switch (g_sock_state) {
+       case ESTABLISHED:
+               if (tcph->fin)
+                       g_sock_state = FIN_WAIT1;
+               break;
+       case TIME_WAIT_SENDING_ACK:
+               if (tcph->fin || tcph->syn || !tcph->ack)
+                       g_unexpected++;
+               else
+                       g_sock_state = TIME_WAIT;
+               break;
+       default:
+               return 0;
+       }
+
+       return 1;
+}
+
+static int ingress_close_local(struct tcphdr *tcph)
+{
+       switch (g_sock_state) {
+       case ESTABLISHED:
+               break;
+       case FIN_WAIT1:
+               if (tcph->fin || tcph->syn || !tcph->ack)
+                       g_unexpected++;
+               else
+                       g_sock_state = FIN_WAIT2;
+               break;
+       case FIN_WAIT2:
+               if (!tcph->fin || tcph->syn || !tcph->ack)
+                       g_unexpected++;
+               else
+                       g_sock_state = TIME_WAIT_SENDING_ACK;
+               break;
+       default:
+               return 0;
+       }
+
+       return 1;
+}
+
+/* Check the types of outgoing packets of a server socket to make sure they
+ * are consistent with the state of the server socket.
+ *
+ * The connection is closed by the client side.
+ */
+SEC("cgroup_skb/egress")
+int server_egress(struct __sk_buff *skb)
+{
+       struct tcphdr tcph;
+
+       if (!needed_tcp_pkt(skb, &tcph))
+               return 1;
+
+       g_packet_count++;
+
+       /* Egress of the server socket. */
+       if (egress_accept(&tcph) || egress_close_remote(&tcph))
+               return 1;
+
+       g_unexpected++;
+       return 1;
+}
+
+/* Check the types of incoming packets of a server socket to make sure they
+ * are consistent with the state of the server socket.
+ *
+ * The connection is closed by the client side.
+ */
+SEC("cgroup_skb/ingress")
+int server_ingress(struct __sk_buff *skb)
+{
+       struct tcphdr tcph;
+
+       if (!needed_tcp_pkt(skb, &tcph))
+               return 1;
+
+       g_packet_count++;
+
+       /* Ingress of the server socket. */
+       if (ingress_accept(&tcph) || ingress_close_remote(&tcph))
+               return 1;
+
+       g_unexpected++;
+       return 1;
+}
+
+/* Check the types of outgoing packets of a server socket to make sure they
+ * are consistent with the state of the server socket.
+ *
+ * The connection is closed by the server side.
+ */
+SEC("cgroup_skb/egress")
+int server_egress_srv(struct __sk_buff *skb)
+{
+       struct tcphdr tcph;
+
+       if (!needed_tcp_pkt(skb, &tcph))
+               return 1;
+
+       g_packet_count++;
+
+       /* Egress of the server socket. */
+       if (egress_accept(&tcph) || egress_close_local(&tcph))
+               return 1;
+
+       g_unexpected++;
+       return 1;
+}
+
+/* Check the types of incoming packets of a server socket to make sure they
+ * are consistent with the state of the server socket.
+ *
+ * The connection is closed by the server side.
+ */
+SEC("cgroup_skb/ingress")
+int server_ingress_srv(struct __sk_buff *skb)
+{
+       struct tcphdr tcph;
+
+       if (!needed_tcp_pkt(skb, &tcph))
+               return 1;
+
+       g_packet_count++;
+
+       /* Ingress of the server socket. */
+       if (ingress_accept(&tcph) || ingress_close_local(&tcph))
+               return 1;
+
+       g_unexpected++;
+       return 1;
+}
+
+/* Check the types of outgoing packets of a client socket to make sure they
+ * are consistent with the state of the client socket.
+ *
+ * The connection is closed by the server side.
+ */
+SEC("cgroup_skb/egress")
+int client_egress_srv(struct __sk_buff *skb)
+{
+       struct tcphdr tcph;
+
+       if (!needed_tcp_pkt(skb, &tcph))
+               return 1;
+
+       g_packet_count++;
+
+       /* Egress of the server socket. */
+       if (egress_connect(&tcph) || egress_close_remote(&tcph))
+               return 1;
+
+       g_unexpected++;
+       return 1;
+}
+
+/* Check the types of incoming packets of a client socket to make sure they
+ * are consistent with the state of the client socket.
+ *
+ * The connection is closed by the server side.
+ */
+SEC("cgroup_skb/ingress")
+int client_ingress_srv(struct __sk_buff *skb)
+{
+       struct tcphdr tcph;
+
+       if (!needed_tcp_pkt(skb, &tcph))
+               return 1;
+
+       g_packet_count++;
+
+       /* Ingress of the server socket. */
+       if (ingress_connect(&tcph) || ingress_close_remote(&tcph))
+               return 1;
+
+       g_unexpected++;
+       return 1;
+}
+
+/* Check the types of outgoing packets of a client socket to make sure they
+ * are consistent with the state of the client socket.
+ *
+ * The connection is closed by the client side.
+ */
+SEC("cgroup_skb/egress")
+int client_egress(struct __sk_buff *skb)
+{
+       struct tcphdr tcph;
+
+       if (!needed_tcp_pkt(skb, &tcph))
+               return 1;
+
+       g_packet_count++;
+
+       /* Egress of the server socket. */
+       if (egress_connect(&tcph) || egress_close_local(&tcph))
+               return 1;
+
+       g_unexpected++;
+       return 1;
+}
+
+/* Check the types of incoming packets of a client socket to make sure they
+ * are consistent with the state of the client socket.
+ *
+ * The connection is closed by the client side.
+ */
+SEC("cgroup_skb/ingress")
+int client_ingress(struct __sk_buff *skb)
+{
+       struct tcphdr tcph;
+
+       if (!needed_tcp_pkt(skb, &tcph))
+               return 1;
+
+       g_packet_count++;
+
+       /* Ingress of the server socket. */
+       if (ingress_connect(&tcph) || ingress_close_local(&tcph))
+               return 1;
+
+       g_unexpected++;
+       return 1;
+}
diff --git a/tools/testing/selftests/bpf/progs/fentry_many_args.c b/tools/testing/selftests/bpf/progs/fentry_many_args.c
new file mode 100644 (file)
index 0000000..b61bb92
--- /dev/null
@@ -0,0 +1,39 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Tencent */
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+__u64 test1_result = 0;
+SEC("fentry/bpf_testmod_fentry_test7")
+int BPF_PROG(test1, __u64 a, void *b, short c, int d, void *e, char f,
+            int g)
+{
+       test1_result = a == 16 && b == (void *)17 && c == 18 && d == 19 &&
+               e == (void *)20 && f == 21 && g == 22;
+       return 0;
+}
+
+__u64 test2_result = 0;
+SEC("fentry/bpf_testmod_fentry_test11")
+int BPF_PROG(test2, __u64 a, void *b, short c, int d, void *e, char f,
+            int g, unsigned int h, long i, __u64 j, unsigned long k)
+{
+       test2_result = a == 16 && b == (void *)17 && c == 18 && d == 19 &&
+               e == (void *)20 && f == 21 && g == 22 && h == 23 &&
+               i == 24 && j == 25 && k == 26;
+       return 0;
+}
+
+__u64 test3_result = 0;
+SEC("fentry/bpf_testmod_fentry_test11")
+int BPF_PROG(test3, __u64 a, __u64 b, __u64 c, __u64 d, __u64 e, __u64 f,
+            __u64 g, __u64 h, __u64 i, __u64 j, __u64 k)
+{
+       test3_result = a == 16 && b == 17 && c == 18 && d == 19 &&
+               e == 20 && f == 21 && g == 22 && h == 23 &&
+               i == 24 && j == 25 && k == 26;
+       return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/fexit_many_args.c b/tools/testing/selftests/bpf/progs/fexit_many_args.c
new file mode 100644 (file)
index 0000000..53b335c
--- /dev/null
@@ -0,0 +1,40 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Tencent */
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+__u64 test1_result = 0;
+SEC("fexit/bpf_testmod_fentry_test7")
+int BPF_PROG(test1, __u64 a, void *b, short c, int d, void *e, char f,
+            int g, int ret)
+{
+       test1_result = a == 16 && b == (void *)17 && c == 18 && d == 19 &&
+               e == (void *)20 && f == 21 && g == 22 && ret == 133;
+       return 0;
+}
+
+__u64 test2_result = 0;
+SEC("fexit/bpf_testmod_fentry_test11")
+int BPF_PROG(test2, __u64 a, void *b, short c, int d, void *e, char f,
+            int g, unsigned int h, long i, __u64 j, unsigned long k,
+            int ret)
+{
+       test2_result = a == 16 && b == (void *)17 && c == 18 && d == 19 &&
+               e == (void *)20 && f == 21 && g == 22 && h == 23 &&
+               i == 24 && j == 25 && k == 26 && ret == 231;
+       return 0;
+}
+
+__u64 test3_result = 0;
+SEC("fexit/bpf_testmod_fentry_test11")
+int BPF_PROG(test3, __u64 a, __u64 b, __u64 c, __u64 d, __u64 e, __u64 f,
+            __u64 g, __u64 h, __u64 i, __u64 j, __u64 k, __u64 ret)
+{
+       test3_result = a == 16 && b == 17 && c == 18 && d == 19 &&
+               e == 20 && f == 21 && g == 22 && h == 23 &&
+               i == 24 && j == 25 && k == 26 && ret == 231;
+       return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/htab_mem_bench.c b/tools/testing/selftests/bpf/progs/htab_mem_bench.c
new file mode 100644 (file)
index 0000000..b1b721b
--- /dev/null
@@ -0,0 +1,105 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2023. Huawei Technologies Co., Ltd */
+#include <stdbool.h>
+#include <errno.h>
+#include <linux/types.h>
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+#define OP_BATCH 64
+
+struct update_ctx {
+       unsigned int from;
+       unsigned int step;
+};
+
+struct {
+       __uint(type, BPF_MAP_TYPE_HASH);
+       __uint(key_size, 4);
+       __uint(map_flags, BPF_F_NO_PREALLOC);
+} htab SEC(".maps");
+
+char _license[] SEC("license") = "GPL";
+
+unsigned char zeroed_value[4096];
+unsigned int nr_thread = 0;
+long op_cnt = 0;
+
+static int write_htab(unsigned int i, struct update_ctx *ctx, unsigned int flags)
+{
+       bpf_map_update_elem(&htab, &ctx->from, zeroed_value, flags);
+       ctx->from += ctx->step;
+
+       return 0;
+}
+
+static int overwrite_htab(unsigned int i, struct update_ctx *ctx)
+{
+       return write_htab(i, ctx, 0);
+}
+
+static int newwrite_htab(unsigned int i, struct update_ctx *ctx)
+{
+       return write_htab(i, ctx, BPF_NOEXIST);
+}
+
+static int del_htab(unsigned int i, struct update_ctx *ctx)
+{
+       bpf_map_delete_elem(&htab, &ctx->from);
+       ctx->from += ctx->step;
+
+       return 0;
+}
+
+SEC("?tp/syscalls/sys_enter_getpgid")
+int overwrite(void *ctx)
+{
+       struct update_ctx update;
+
+       update.from = bpf_get_smp_processor_id();
+       update.step = nr_thread;
+       bpf_loop(OP_BATCH, overwrite_htab, &update, 0);
+       __sync_fetch_and_add(&op_cnt, 1);
+       return 0;
+}
+
+SEC("?tp/syscalls/sys_enter_getpgid")
+int batch_add_batch_del(void *ctx)
+{
+       struct update_ctx update;
+
+       update.from = bpf_get_smp_processor_id();
+       update.step = nr_thread;
+       bpf_loop(OP_BATCH, overwrite_htab, &update, 0);
+
+       update.from = bpf_get_smp_processor_id();
+       bpf_loop(OP_BATCH, del_htab, &update, 0);
+
+       __sync_fetch_and_add(&op_cnt, 2);
+       return 0;
+}
+
+SEC("?tp/syscalls/sys_enter_getpgid")
+int add_only(void *ctx)
+{
+       struct update_ctx update;
+
+       update.from = bpf_get_smp_processor_id() / 2;
+       update.step = nr_thread / 2;
+       bpf_loop(OP_BATCH, newwrite_htab, &update, 0);
+       __sync_fetch_and_add(&op_cnt, 1);
+       return 0;
+}
+
+SEC("?tp/syscalls/sys_enter_getppid")
+int del_only(void *ctx)
+{
+       struct update_ctx update;
+
+       update.from = bpf_get_smp_processor_id() / 2;
+       update.step = nr_thread / 2;
+       bpf_loop(OP_BATCH, del_htab, &update, 0);
+       __sync_fetch_and_add(&op_cnt, 1);
+       return 0;
+}
index 57440a5..84d1777 100644 (file)
@@ -96,7 +96,7 @@ static __always_inline
 int list_push_pop_multiple(struct bpf_spin_lock *lock, struct bpf_list_head *head, bool leave_in_map)
 {
        struct bpf_list_node *n;
-       struct foo *f[8], *pf;
+       struct foo *f[200], *pf;
        int i;
 
        /* Loop following this check adds nodes 2-at-a-time in order to
diff --git a/tools/testing/selftests/bpf/progs/map_percpu_stats.c b/tools/testing/selftests/bpf/progs/map_percpu_stats.c
new file mode 100644 (file)
index 0000000..10b2325
--- /dev/null
@@ -0,0 +1,24 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Isovalent */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+__u32 target_id;
+
+__s64 bpf_map_sum_elem_count(struct bpf_map *map) __ksym;
+
+SEC("iter/bpf_map")
+int dump_bpf_map(struct bpf_iter__bpf_map *ctx)
+{
+       struct seq_file *seq = ctx->meta->seq;
+       struct bpf_map *map = ctx->map;
+
+       if (map && map->id == target_id)
+               BPF_SEQ_PRINTF(seq, "%lld", bpf_map_sum_elem_count(map));
+
+       return 0;
+}
+
+char _license[] SEC("license") = "GPL";
index 8b7466a..3376d48 100644 (file)
@@ -47,3 +47,43 @@ int BPF_PROG(fexit_test, int a, __u64 b, int ret)
 
        return 0;
 }
+
+static int sequence2;
+
+__u64 fentry_result2 = 0;
+SEC("fentry/bpf_modify_return_test2")
+int BPF_PROG(fentry_test2, int a, int *b, short c, int d, void *e, char f,
+            int g)
+{
+       sequence2++;
+       fentry_result2 = (sequence2 == 1);
+       return 0;
+}
+
+__u64 fmod_ret_result2 = 0;
+SEC("fmod_ret/bpf_modify_return_test2")
+int BPF_PROG(fmod_ret_test2, int a, int *b, short c, int d, void *e, char f,
+            int g, int ret)
+{
+       sequence2++;
+       /* This is the first fmod_ret program, the ret passed should be 0 */
+       fmod_ret_result2 = (sequence2 == 2 && ret == 0);
+       return input_retval;
+}
+
+__u64 fexit_result2 = 0;
+SEC("fexit/bpf_modify_return_test2")
+int BPF_PROG(fexit_test2, int a, int *b, short c, int d, void *e, char f,
+            int g, int ret)
+{
+       sequence2++;
+       /* If the input_reval is non-zero a successful modification should have
+        * occurred.
+        */
+       if (input_retval)
+               fexit_result2 = (sequence2 == 3 && ret == input_retval);
+       else
+               fexit_result2 = (sequence2 == 3 && ret == 29);
+
+       return 0;
+}
index 0d1aa6b..ea39497 100644 (file)
 
 char _license[] SEC("license") = "GPL";
 
+struct {
+       __uint(type, BPF_MAP_TYPE_SK_STORAGE);
+       __uint(map_flags, BPF_F_NO_PREALLOC);
+       __type(key, int);
+       __type(value, u64);
+} sk_storage_map SEC(".maps");
+
 /* Prototype for all of the program trace events below:
  *
  * TRACE_EVENT(task_newtask,
@@ -31,3 +38,12 @@ int BPF_PROG(test_invalid_nested_offset, struct task_struct *task, u64 clone_fla
        bpf_cpumask_first_zero(&task->cpus_mask);
        return 0;
 }
+
+/* Although R2 is of type sk_buff but sock_common is expected, we will hit untrusted ptr first. */
+SEC("tp_btf/tcp_probe")
+__failure __msg("R2 type=untrusted_ptr_ expected=ptr_, trusted_ptr_, rcu_ptr_")
+int BPF_PROG(test_invalid_skb_field, struct sock *sk, struct sk_buff *skb)
+{
+       bpf_sk_storage_get(&sk_storage_map, skb->next, 0, 0);
+       return 0;
+}
index 886ade4..833840b 100644 (file)
 
 char _license[] SEC("license") = "GPL";
 
+struct {
+       __uint(type, BPF_MAP_TYPE_SK_STORAGE);
+       __uint(map_flags, BPF_F_NO_PREALLOC);
+       __type(key, int);
+       __type(value, u64);
+} sk_storage_map SEC(".maps");
+
 SEC("tp_btf/task_newtask")
 __success
 int BPF_PROG(test_read_cpumask, struct task_struct *task, u64 clone_flags)
@@ -17,3 +24,11 @@ int BPF_PROG(test_read_cpumask, struct task_struct *task, u64 clone_flags)
        bpf_cpumask_test_cpu(0, task->cpus_ptr);
        return 0;
 }
+
+SEC("tp_btf/tcp_probe")
+__success
+int BPF_PROG(test_skb_field, struct sock *sk, struct sk_buff *skb)
+{
+       bpf_sk_storage_get(&sk_storage_map, skb->sk, 0, 0);
+       return 0;
+}
index 2588f23..1fbb73d 100644 (file)
@@ -29,13 +29,16 @@ int my_int SEC(".data.non_array");
 int my_array_first[1] SEC(".data.array_not_last");
 int my_int_last SEC(".data.array_not_last");
 
+int percpu_arr[1] SEC(".data.percpu_arr");
+
 SEC("tp/syscalls/sys_enter_getpid")
 int bss_array_sum(void *ctx)
 {
        if (pid != (bpf_get_current_pid_tgid() >> 32))
                return 0;
 
-       sum = 0;
+       /* this will be zero, we just rely on verifier not rejecting this */
+       sum = percpu_arr[bpf_get_smp_processor_id()];
 
        for (size_t i = 0; i < bss_array_len; ++i)
                sum += array[i];
@@ -49,7 +52,8 @@ int data_array_sum(void *ctx)
        if (pid != (bpf_get_current_pid_tgid() >> 32))
                return 0;
 
-       sum = 0;
+       /* this will be zero, we just rely on verifier not rejecting this */
+       sum = percpu_arr[bpf_get_smp_processor_id()];
 
        for (size_t i = 0; i < data_array_len; ++i)
                sum += my_array[i];
diff --git a/tools/testing/selftests/bpf/progs/test_netfilter_link_attach.c b/tools/testing/selftests/bpf/progs/test_netfilter_link_attach.c
new file mode 100644 (file)
index 0000000..03a4751
--- /dev/null
@@ -0,0 +1,14 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+
+#define NF_ACCEPT 1
+
+SEC("netfilter")
+int nf_link_attach_test(struct bpf_nf_ctx *ctx)
+{
+       return NF_ACCEPT;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/test_ptr_untrusted.c b/tools/testing/selftests/bpf/progs/test_ptr_untrusted.c
new file mode 100644 (file)
index 0000000..4bdd65b
--- /dev/null
@@ -0,0 +1,29 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2023 Yafang Shao <laoar.shao@gmail.com> */
+
+#include "vmlinux.h"
+#include <bpf/bpf_tracing.h>
+
+char tp_name[128];
+
+SEC("lsm/bpf")
+int BPF_PROG(lsm_run, int cmd, union bpf_attr *attr, unsigned int size)
+{
+       switch (cmd) {
+       case BPF_RAW_TRACEPOINT_OPEN:
+               bpf_probe_read_user_str(tp_name, sizeof(tp_name) - 1,
+                                       (void *)attr->raw_tracepoint.name);
+               break;
+       default:
+               break;
+       }
+       return 0;
+}
+
+SEC("raw_tracepoint")
+int BPF_PROG(raw_tp_run)
+{
+       return 0;
+}
+
+char _license[] SEC("license") = "GPL";
index c435a3a..515daef 100644 (file)
@@ -18,6 +18,11 @@ struct bpf_testmod_struct_arg_3 {
        int b[];
 };
 
+struct bpf_testmod_struct_arg_4 {
+       u64 a;
+       int b;
+};
+
 long t1_a_a, t1_a_b, t1_b, t1_c, t1_ret, t1_nregs;
 __u64 t1_reg0, t1_reg1, t1_reg2, t1_reg3;
 long t2_a, t2_b_a, t2_b_b, t2_c, t2_ret;
@@ -25,6 +30,9 @@ long t3_a, t3_b, t3_c_a, t3_c_b, t3_ret;
 long t4_a_a, t4_b, t4_c, t4_d, t4_e_a, t4_e_b, t4_ret;
 long t5_ret;
 int t6;
+long t7_a, t7_b, t7_c, t7_d, t7_e, t7_f_a, t7_f_b, t7_ret;
+long t8_a, t8_b, t8_c, t8_d, t8_e, t8_f_a, t8_f_b, t8_g, t8_ret;
+
 
 SEC("fentry/bpf_testmod_test_struct_arg_1")
 int BPF_PROG2(test_struct_arg_1, struct bpf_testmod_struct_arg_2, a, int, b, int, c)
@@ -130,4 +138,50 @@ int BPF_PROG2(test_struct_arg_11, struct bpf_testmod_struct_arg_3 *, a)
        return 0;
 }
 
+SEC("fentry/bpf_testmod_test_struct_arg_7")
+int BPF_PROG2(test_struct_arg_12, __u64, a, void *, b, short, c, int, d,
+             void *, e, struct bpf_testmod_struct_arg_4, f)
+{
+       t7_a = a;
+       t7_b = (long)b;
+       t7_c = c;
+       t7_d = d;
+       t7_e = (long)e;
+       t7_f_a = f.a;
+       t7_f_b = f.b;
+       return 0;
+}
+
+SEC("fexit/bpf_testmod_test_struct_arg_7")
+int BPF_PROG2(test_struct_arg_13, __u64, a, void *, b, short, c, int, d,
+             void *, e, struct bpf_testmod_struct_arg_4, f, int, ret)
+{
+       t7_ret = ret;
+       return 0;
+}
+
+SEC("fentry/bpf_testmod_test_struct_arg_8")
+int BPF_PROG2(test_struct_arg_14, __u64, a, void *, b, short, c, int, d,
+             void *, e, struct bpf_testmod_struct_arg_4, f, int, g)
+{
+       t8_a = a;
+       t8_b = (long)b;
+       t8_c = c;
+       t8_d = d;
+       t8_e = (long)e;
+       t8_f_a = f.a;
+       t8_f_b = f.b;
+       t8_g = g;
+       return 0;
+}
+
+SEC("fexit/bpf_testmod_test_struct_arg_8")
+int BPF_PROG2(test_struct_arg_15, __u64, a, void *, b, short, c, int, d,
+             void *, e, struct bpf_testmod_struct_arg_4, f, int, g,
+             int, ret)
+{
+       t8_ret = ret;
+       return 0;
+}
+
 char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/verifier_typedef.c b/tools/testing/selftests/bpf/progs/verifier_typedef.c
new file mode 100644 (file)
index 0000000..08481cf
--- /dev/null
@@ -0,0 +1,23 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+SEC("fentry/bpf_fentry_test_sinfo")
+__description("typedef: resolve")
+__success __retval(0)
+__naked void resolve_typedef(void)
+{
+       asm volatile ("                                 \
+       r1 = *(u64 *)(r1 +0);                           \
+       r2 = *(u64 *)(r1 +%[frags_offs]);               \
+       r0 = 0;                                         \
+       exit;                                           \
+"      :
+       : __imm_const(frags_offs,
+                     offsetof(struct skb_shared_info, frags))
+       : __clobber_all);
+}
+
+char _license[] SEC("license") = "GPL";
index 9b070cd..f83d9f6 100644 (file)
@@ -18,7 +18,7 @@
 #define TRACEFS_PIPE   "/sys/kernel/tracing/trace_pipe"
 #define DEBUGFS_PIPE   "/sys/kernel/debug/tracing/trace_pipe"
 
-#define MAX_SYMS 300000
+#define MAX_SYMS 400000
 static struct ksym syms[MAX_SYMS];
 static int sym_cnt;
 
@@ -46,6 +46,9 @@ int load_kallsyms_refresh(void)
                        break;
                if (!addr)
                        continue;
+               if (i >= MAX_SYMS)
+                       return -EFBIG;
+
                syms[i].addr = (long) addr;
                syms[i].name = strdup(func);
                i++;
index b39665f..319337b 100644 (file)
        .result = REJECT,
        .errstr = "R0 invalid mem access",
        .errstr_unpriv = "R10 partial copy of pointer",
+       .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 },
index 83cecfb..0b394a7 100644 (file)
        },
        .result = ACCEPT,
        .prog_type = BPF_PROG_TYPE_SK_SKB,
+       .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 },
 {
        "pkt_end < pkt taken check",
        },
        .result = ACCEPT,
        .prog_type = BPF_PROG_TYPE_SK_SKB,
+       .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 },
index 1a27a62..43776f6 100644 (file)
        .result_unpriv = REJECT,
        .result = ACCEPT,
        .retval = 2,
+       .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 },
 {
        "jgt32: BPF_K",
        .result_unpriv = REJECT,
        .result = ACCEPT,
        .retval = 2,
+       .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 },
 {
        "jle32: BPF_K",
        .result_unpriv = REJECT,
        .result = ACCEPT,
        .retval = 2,
+       .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 },
 {
        "jlt32: BPF_K",
        .result_unpriv = REJECT,
        .result = ACCEPT,
        .retval = 2,
+       .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 },
 {
        "jsge32: BPF_K",
        .result_unpriv = REJECT,
        .result = ACCEPT,
        .retval = 2,
+       .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 },
 {
        "jsgt32: BPF_K",
        .result_unpriv = REJECT,
        .result = ACCEPT,
        .retval = 2,
+       .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 },
 {
        "jsle32: BPF_K",
        .result_unpriv = REJECT,
        .result = ACCEPT,
        .retval = 2,
+       .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 },
 {
        "jslt32: BPF_K",
        .result_unpriv = REJECT,
        .result = ACCEPT,
        .retval = 2,
+       .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 },
 {
        "jgt32: range bound deduction, reg op imm",
index a0cfc06..d25c3e9 100644 (file)
@@ -68,6 +68,7 @@
        .fixup_map_kptr = { 1 },
        .result = REJECT,
        .errstr = "kptr access cannot have variable offset",
+       .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 },
 {
        "map_kptr: bpf_kptr_xchg non-const var_off",
        .fixup_map_kptr = { 1 },
        .result = REJECT,
        .errstr = "kptr access misaligned expected=0 off=7",
+       .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 },
 {
        "map_kptr: reject var_off != 0",
index 99272bb..0d84dd1 100644 (file)
        },
        .fixup_map_ringbuf = { 1 },
        .prog_type = BPF_PROG_TYPE_XDP,
-       .flags = BPF_F_TEST_STATE_FREQ,
+       .flags = BPF_F_TEST_STATE_FREQ | F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
        .errstr = "invalid access to memory, mem_size=1 off=42 size=8",
        .result = REJECT,
 },
index 01c0491..2e986cb 100644 (file)
@@ -167,7 +167,7 @@ $(RESOLVE_BTFIDS): $(HOST_BPFOBJ) | $(HOST_BUILD_DIR)/resolve_btfids        \
                OUTPUT=$(HOST_BUILD_DIR)/resolve_btfids/ BPFOBJ=$(HOST_BPFOBJ)
 
 # Get Clang's default includes on this system, as opposed to those seen by
-# '-target bpf'. This fixes "missing" files on some architectures/distros,
+# '--target=bpf'. This fixes "missing" files on some architectures/distros,
 # such as asm/byteorder.h, asm/socket.h, asm/sockios.h, sys/cdefs.h etc.
 #
 # Use '-idirafter': Don't interfere with include mechanics except where the
@@ -196,12 +196,12 @@ CLANG_CFLAGS = $(CLANG_SYS_INCLUDES) \
 # $3 - CFLAGS
 define CLANG_BPF_BUILD_RULE
        $(call msg,CLNG-BPF,$(TRUNNER_BINARY),$2)
-       $(Q)$(CLANG) $3 -O2 -target bpf -c $1 -mcpu=v3 -o $2
+       $(Q)$(CLANG) $3 -O2 --target=bpf -c $1 -mcpu=v3 -o $2
 endef
 # Similar to CLANG_BPF_BUILD_RULE, but with disabled alu32
 define CLANG_NOALU32_BPF_BUILD_RULE
        $(call msg,CLNG-BPF,$(TRUNNER_BINARY),$2)
-       $(Q)$(CLANG) $3 -O2 -target bpf -c $1 -mcpu=v2 -o $2
+       $(Q)$(CLANG) $3 -O2 --target=bpf -c $1 -mcpu=v2 -o $2
 endef
 # Build BPF object using GCC
 define GCC_BPF_BUILD_RULE
index 7f3ab2a..2f69f72 100644 (file)
@@ -113,7 +113,7 @@ $(MAKE_DIRS):
        mkdir -p $@
 
 # Get Clang's default includes on this system, as opposed to those seen by
-# '-target bpf'. This fixes "missing" files on some architectures/distros,
+# '--target=bpf'. This fixes "missing" files on some architectures/distros,
 # such as asm/byteorder.h, asm/socket.h, asm/sockios.h, sys/cdefs.h etc.
 #
 # Use '-idirafter': Don't interfere with include mechanics except where the
@@ -131,7 +131,7 @@ endif
 CLANG_SYS_INCLUDES = $(call get_sys_includes,$(CLANG),$(CLANG_TARGET_ARCH))
 
 $(OUTPUT)/nat6to4.o: nat6to4.c $(BPFOBJ) | $(MAKE_DIRS)
-       $(CLANG) -O2 -target bpf -c $< $(CCINCLUDE) $(CLANG_SYS_INCLUDES) -o $@
+       $(CLANG) -O2 --target=bpf -c $< $(CCINCLUDE) $(CLANG_SYS_INCLUDES) -o $@
 
 $(BPFOBJ): $(wildcard $(BPFDIR)/*.[ch] $(BPFDIR)/Makefile)                    \
           $(APIDIR)/linux/bpf.h                                               \
index cb553ea..3c4b7fa 100644 (file)
@@ -24,7 +24,7 @@ CLANG_FLAGS = -I. -I$(APIDIR) \
 
 $(OUTPUT)/%.o: %.c
        $(CLANG) $(CLANG_FLAGS) \
-                -O2 -target bpf -emit-llvm -c $< -o - |      \
+                -O2 --target=bpf -emit-llvm -c $< -o - |      \
        $(LLC) -march=bpf -mcpu=$(CPU) $(LLC_FLAGS) -filetype=obj -o $@
 
 TEST_PROGS += ./tdc.sh