Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

author Jakub Kicinski <kuba@kernel.org>

Fri, 22 Jul 2022 23:55:43 +0000 (16:55 -0700)

committer Jakub Kicinski <kuba@kernel.org>

Fri, 22 Jul 2022 23:55:44 +0000 (16:55 -0700)
author Jakub Kicinski <kuba@kernel.org>
Fri, 22 Jul 2022 23:55:43 +0000 (16:55 -0700)
committer Jakub Kicinski <kuba@kernel.org>
Fri, 22 Jul 2022 23:55:44 +0000 (16:55 -0700)
diff --git a/Documentation/bpf/btf.rst b/Documentation/bpf/btf.rst

index f49aeef..cf8722f 100644 (file)
--- a/Documentation/bpf/btf.rst
+++ b/Documentation/bpf/btf.rst
@@ -369,7 +369,8 @@ No additional type data follow ``btf_type``.
    * ``name_off``: offset to a valid C identifier
    * ``info.kind_flag``: 0
    * ``info.kind``: BTF_KIND_FUNC
-  * ``info.vlen``: 0
+  * ``info.vlen``: linkage information (BTF_FUNC_STATIC, BTF_FUNC_GLOBAL
+                   or BTF_FUNC_EXTERN)
    * ``type``: a BTF_KIND_FUNC_PROTO type
  
  No additional type data follow ``btf_type``.
@@ -380,6 +381,9 @@ type. The BTF_KIND_FUNC may in turn be referenced by a func_info in the
  :ref:`BTF_Ext_Section` (ELF) or in the arguments to :ref:`BPF_Prog_Load`
  (ABI).
  
+Currently, only linkage values of BTF_FUNC_STATIC and BTF_FUNC_GLOBAL are
+supported in the kernel.
+
  2.2.13 BTF_KIND_FUNC_PROTO
  ~~~~~~~~~~~~~~~~~~~~~~~~~~
  
diff --git a/Documentation/bpf/index.rst b/Documentation/bpf/index.rst

index 96056a7..1bc2c5c 100644 (file)
--- a/Documentation/bpf/index.rst
+++ b/Documentation/bpf/index.rst
@@ -19,6 +19,7 @@ that goes into great technical depth about the BPF Architecture.
     faq
     syscall_api
     helpers
+   kfuncs
     programs
     maps
     bpf_prog_run
diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst

new file mode 100644 (file)

index 0000000..c0b7dae
--- /dev/null
+++ b/Documentation/bpf/kfuncs.rst
@@ -0,0 +1,170 @@
+=============================
+BPF Kernel Functions (kfuncs)
+=============================
+
+1. Introduction
+===============
+
+BPF Kernel Functions or more commonly known as kfuncs are functions in the Linux
+kernel which are exposed for use by BPF programs. Unlike normal BPF helpers,
+kfuncs do not have a stable interface and can change from one kernel release to
+another. Hence, BPF programs need to be updated in response to changes in the
+kernel.
+
+2. Defining a kfunc
+===================
+
+There are two ways to expose a kernel function to BPF programs, either make an
+existing function in the kernel visible, or add a new wrapper for BPF. In both
+cases, care must be taken that BPF program can only call such function in a
+valid context. To enforce this, visibility of a kfunc can be per program type.
+
+If you are not creating a BPF wrapper for existing kernel function, skip ahead
+to :ref:`BPF_kfunc_nodef`.
+
+2.1 Creating a wrapper kfunc
+----------------------------
+
+When defining a wrapper kfunc, the wrapper function should have extern linkage.
+This prevents the compiler from optimizing away dead code, as this wrapper kfunc
+is not invoked anywhere in the kernel itself. It is not necessary to provide a
+prototype in a header for the wrapper kfunc.
+
+An example is given below::
+
+        /* Disables missing prototype warnings */
+        __diag_push();
+        __diag_ignore_all("-Wmissing-prototypes",
+                          "Global kfuncs as their definitions will be in BTF");
+
+        struct task_struct *bpf_find_get_task_by_vpid(pid_t nr)
+        {
+                return find_get_task_by_vpid(nr);
+        }
+
+        __diag_pop();
+
+A wrapper kfunc is often needed when we need to annotate parameters of the
+kfunc. Otherwise one may directly make the kfunc visible to the BPF program by
+registering it with the BPF subsystem. See :ref:`BPF_kfunc_nodef`.
+
+2.2 Annotating kfunc parameters
+-------------------------------
+
+Similar to BPF helpers, there is sometime need for additional context required
+by the verifier to make the usage of kernel functions safer and more useful.
+Hence, we can annotate a parameter by suffixing the name of the argument of the
+kfunc with a __tag, where tag may be one of the supported annotations.
+
+2.2.1 __sz Annotation
+---------------------
+
+This annotation is used to indicate a memory and size pair in the argument list.
+An example is given below::
+
+        void bpf_memzero(void *mem, int mem__sz)
+        {
+        ...
+        }
+
+Here, the verifier will treat first argument as a PTR_TO_MEM, and second
+argument as its size. By default, without __sz annotation, the size of the type
+of the pointer is used. Without __sz annotation, a kfunc cannot accept a void
+pointer.
+
+.. _BPF_kfunc_nodef:
+
+2.3 Using an existing kernel function
+-------------------------------------
+
+When an existing function in the kernel is fit for consumption by BPF programs,
+it can be directly registered with the BPF subsystem. However, care must still
+be taken to review the context in which it will be invoked by the BPF program
+and whether it is safe to do so.
+
+2.4 Annotating kfuncs
+---------------------
+
+In addition to kfuncs' arguments, verifier may need more information about the
+type of kfunc(s) being registered with the BPF subsystem. To do so, we define
+flags on a set of kfuncs as follows::
+
+        BTF_SET8_START(bpf_task_set)
+        BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE | KF_RET_NULL)
+        BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE)
+        BTF_SET8_END(bpf_task_set)
+
+This set encodes the BTF ID of each kfunc listed above, and encodes the flags
+along with it. Ofcourse, it is also allowed to specify no flags.
+
+2.4.1 KF_ACQUIRE flag
+---------------------
+
+The KF_ACQUIRE flag is used to indicate that the kfunc returns a pointer to a
+refcounted object. The verifier will then ensure that the pointer to the object
+is eventually released using a release kfunc, or transferred to a map using a
+referenced kptr (by invoking bpf_kptr_xchg). If not, the verifier fails the
+loading of the BPF program until no lingering references remain in all possible
+explored states of the program.
+
+2.4.2 KF_RET_NULL flag
+----------------------
+
+The KF_RET_NULL flag is used to indicate that the pointer returned by the kfunc
+may be NULL. Hence, it forces the user to do a NULL check on the pointer
+returned from the kfunc before making use of it (dereferencing or passing to
+another helper). This flag is often used in pairing with KF_ACQUIRE flag, but
+both are orthogonal to each other.
+
+2.4.3 KF_RELEASE flag
+---------------------
+
+The KF_RELEASE flag is used to indicate that the kfunc releases the pointer
+passed in to it. There can be only one referenced pointer that can be passed in.
+All copies of the pointer being released are invalidated as a result of invoking
+kfunc with this flag.
+
+2.4.4 KF_KPTR_GET flag
+----------------------
+
+The KF_KPTR_GET flag is used to indicate that the kfunc takes the first argument
+as a pointer to kptr, safely increments the refcount of the object it points to,
+and returns a reference to the user. The rest of the arguments may be normal
+arguments of a kfunc. The KF_KPTR_GET flag should be used in conjunction with
+KF_ACQUIRE and KF_RET_NULL flags.
+
+2.4.5 KF_TRUSTED_ARGS flag
+--------------------------
+
+The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
+indicates that the all pointer arguments will always be refcounted, and have
+their offset set to 0. It can be used to enforce that a pointer to a refcounted
+object acquired from a kfunc or BPF helper is passed as an argument to this
+kfunc without any modifications (e.g. pointer arithmetic) such that it is
+trusted and points to the original object. This flag is often used for kfuncs
+that operate (change some property, perform some operation) on an object that
+was obtained using an acquire kfunc. Such kfuncs need an unchanged pointer to
+ensure the integrity of the operation being performed on the expected object.
+
+2.5 Registering the kfuncs
+--------------------------
+
+Once the kfunc is prepared for use, the final step to making it visible is
+registering it with the BPF subsystem. Registration is done per BPF program
+type. An example is shown below::
+
+        BTF_SET8_START(bpf_task_set)
+        BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE | KF_RET_NULL)
+        BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE)
+        BTF_SET8_END(bpf_task_set)
+
+        static const struct btf_kfunc_id_set bpf_task_kfunc_set = {
+                .owner = THIS_MODULE,
+                .set   = &bpf_task_set,
+        };
+
+        static int init_subsystem(void)
+        {
+                return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_task_kfunc_set);
+        }
+        late_initcall(init_subsystem);
diff --git a/Documentation/bpf/map_hash.rst b/Documentation/bpf/map_hash.rst

new file mode 100644 (file)

index 0000000..e851208
--- /dev/null
+++ b/Documentation/bpf/map_hash.rst
@@ -0,0 +1,185 @@
+.. SPDX-License-Identifier: GPL-2.0-only
+.. Copyright (C) 2022 Red Hat, Inc.
+
+===============================================
+BPF_MAP_TYPE_HASH, with PERCPU and LRU Variants
+===============================================
+
+.. note::
+   - ``BPF_MAP_TYPE_HASH`` was introduced in kernel version 3.19
+   - ``BPF_MAP_TYPE_PERCPU_HASH`` was introduced in version 4.6
+   - Both ``BPF_MAP_TYPE_LRU_HASH`` and ``BPF_MAP_TYPE_LRU_PERCPU_HASH``
+     were introduced in version 4.10
+
+``BPF_MAP_TYPE_HASH`` and ``BPF_MAP_TYPE_PERCPU_HASH`` provide general
+purpose hash map storage. Both the key and the value can be structs,
+allowing for composite keys and values.
+
+The kernel is responsible for allocating and freeing key/value pairs, up
+to the max_entries limit that you specify. Hash maps use pre-allocation
+of hash table elements by default. The ``BPF_F_NO_PREALLOC`` flag can be
+used to disable pre-allocation when it is too memory expensive.
+
+``BPF_MAP_TYPE_PERCPU_HASH`` provides a separate value slot per
+CPU. The per-cpu values are stored internally in an array.
+
+The ``BPF_MAP_TYPE_LRU_HASH`` and ``BPF_MAP_TYPE_LRU_PERCPU_HASH``
+variants add LRU semantics to their respective hash tables. An LRU hash
+will automatically evict the least recently used entries when the hash
+table reaches capacity. An LRU hash maintains an internal LRU list that
+is used to select elements for eviction. This internal LRU list is
+shared across CPUs but it is possible to request a per CPU LRU list with
+the ``BPF_F_NO_COMMON_LRU`` flag when calling ``bpf_map_create``.
+
+Usage
+=====
+
+.. c:function::
+   long bpf_map_update_elem(struct bpf_map *map, const void *key, const void *value, u64 flags)
+
+Hash entries can be added or updated using the ``bpf_map_update_elem()``
+helper. This helper replaces existing elements atomically. The ``flags``
+parameter can be used to control the update behaviour:
+
+- ``BPF_ANY`` will create a new element or update an existing element
+- ``BPF_NOEXIST`` will create a new element only if one did not already
+  exist
+- ``BPF_EXIST`` will update an existing element
+
+``bpf_map_update_elem()`` returns 0 on success, or negative error in
+case of failure.
+
+.. c:function::
+   void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
+
+Hash entries can be retrieved using the ``bpf_map_lookup_elem()``
+helper. This helper returns a pointer to the value associated with
+``key``, or ``NULL`` if no entry was found.
+
+.. c:function::
+   long bpf_map_delete_elem(struct bpf_map *map, const void *key)
+
+Hash entries can be deleted using the ``bpf_map_delete_elem()``
+helper. This helper will return 0 on success, or negative error in case
+of failure.
+
+Per CPU Hashes
+--------------
+
+For ``BPF_MAP_TYPE_PERCPU_HASH`` and ``BPF_MAP_TYPE_LRU_PERCPU_HASH``
+the ``bpf_map_update_elem()`` and ``bpf_map_lookup_elem()`` helpers
+automatically access the hash slot for the current CPU.
+
+.. c:function::
+   void *bpf_map_lookup_percpu_elem(struct bpf_map *map, const void *key, u32 cpu)
+
+The ``bpf_map_lookup_percpu_elem()`` helper can be used to lookup the
+value in the hash slot for a specific CPU. Returns value associated with
+``key`` on ``cpu`` , or ``NULL`` if no entry was found or ``cpu`` is
+invalid.
+
+Concurrency
+-----------
+
+Values stored in ``BPF_MAP_TYPE_HASH`` can be accessed concurrently by
+programs running on different CPUs.  Since Kernel version 5.1, the BPF
+infrastructure provides ``struct bpf_spin_lock`` to synchronise access.
+See ``tools/testing/selftests/bpf/progs/test_spin_lock.c``.
+
+Userspace
+---------
+
+.. c:function::
+   int bpf_map_get_next_key(int fd, const void *cur_key, void *next_key)
+
+In userspace, it is possible to iterate through the keys of a hash using
+libbpf's ``bpf_map_get_next_key()`` function. The first key can be fetched by
+calling ``bpf_map_get_next_key()`` with ``cur_key`` set to
+``NULL``. Subsequent calls will fetch the next key that follows the
+current key. ``bpf_map_get_next_key()`` returns 0 on success, -ENOENT if
+cur_key is the last key in the hash, or negative error in case of
+failure.
+
+Note that if ``cur_key`` gets deleted then ``bpf_map_get_next_key()``
+will instead return the *first* key in the hash table which is
+undesirable. It is recommended to use batched lookup if there is going
+to be key deletion intermixed with ``bpf_map_get_next_key()``.
+
+Examples
+========
+
+Please see the ``tools/testing/selftests/bpf`` directory for functional
+examples.  The code snippets below demonstrates API usage.
+
+This example shows how to declare an LRU Hash with a struct key and a
+struct value.
+
+.. code-block:: c
+
+    #include <linux/bpf.h>
+    #include <bpf/bpf_helpers.h>
+
+    struct key {
+        __u32 srcip;
+    };
+
+    struct value {
+        __u64 packets;
+        __u64 bytes;
+    };
+
+    struct {
+            __uint(type, BPF_MAP_TYPE_LRU_HASH);
+            __uint(max_entries, 32);
+            __type(key, struct key);
+            __type(value, struct value);
+    } packet_stats SEC(".maps");
+
+This example shows how to create or update hash values using atomic
+instructions:
+
+.. code-block:: c
+
+    static void update_stats(__u32 srcip, int bytes)
+    {
+            struct key key = {
+                    .srcip = srcip,
+            };
+            struct value *value = bpf_map_lookup_elem(&packet_stats, &key);
+
+            if (value) {
+                    __sync_fetch_and_add(&value->packets, 1);
+                    __sync_fetch_and_add(&value->bytes, bytes);
+            } else {
+                    struct value newval = { 1, bytes };
+
+                    bpf_map_update_elem(&packet_stats, &key, &newval, BPF_NOEXIST);
+            }
+    }
+
+Userspace walking the map elements from the map declared above:
+
+.. code-block:: c
+
+    #include <bpf/libbpf.h>
+    #include <bpf/bpf.h>
+
+    static void walk_hash_elements(int map_fd)
+    {
+            struct key *cur_key = NULL;
+            struct key next_key;
+            struct value value;
+            int err;
+
+            for (;;) {
+                    err = bpf_map_get_next_key(map_fd, cur_key, &next_key);
+                    if (err)
+                            break;
+
+                    bpf_map_lookup_elem(map_fd, &next_key, &value);
+
+                    // Use key and value here
+
+                    cur_key = &next_key;
+            }
+    }
diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h

index 6aa2dc8..834bff7 100644 (file)
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -510,6 +510,9 @@ u32 aarch64_insn_gen_load_store_imm(enum aarch64_insn_register reg,
                                     unsigned int imm,
                                     enum aarch64_insn_size_type size,
                                     enum aarch64_insn_ldst_type type);
+u32 aarch64_insn_gen_load_literal(unsigned long pc, unsigned long addr,
+                                 enum aarch64_insn_register reg,
+                                 bool is64bit);
  u32 aarch64_insn_gen_load_store_pair(enum aarch64_insn_register reg1,
                                      enum aarch64_insn_register reg2,
                                      enum aarch64_insn_register base,
diff --git a/arch/arm64/lib/insn.c b/arch/arm64/lib/insn.c

index 695d736..49e972b 100644 (file)
--- a/arch/arm64/lib/insn.c
+++ b/arch/arm64/lib/insn.c
@@ -323,7 +323,7 @@ static u32 aarch64_insn_encode_ldst_size(enum aarch64_insn_size_type type,
         return insn;
  }
  
-static inline long branch_imm_common(unsigned long pc, unsigned long addr,
+static inline long label_imm_common(unsigned long pc, unsigned long addr,
                                      long range)
  {
         long offset;
@@ -354,7 +354,7 @@ u32 __kprobes aarch64_insn_gen_branch_imm(unsigned long pc, unsigned long addr,
          * ARM64 virtual address arrangement guarantees all kernel and module
          * texts are within +/-128M.
          */
-       offset = branch_imm_common(pc, addr, SZ_128M);
+       offset = label_imm_common(pc, addr, SZ_128M);
         if (offset >= SZ_128M)
                 return AARCH64_BREAK_FAULT;
  
@@ -382,7 +382,7 @@ u32 aarch64_insn_gen_comp_branch_imm(unsigned long pc, unsigned long addr,
         u32 insn;
         long offset;
  
-       offset = branch_imm_common(pc, addr, SZ_1M);
+       offset = label_imm_common(pc, addr, SZ_1M);
         if (offset >= SZ_1M)
                 return AARCH64_BREAK_FAULT;
  
@@ -421,7 +421,7 @@ u32 aarch64_insn_gen_cond_branch_imm(unsigned long pc, unsigned long addr,
         u32 insn;
         long offset;
  
-       offset = branch_imm_common(pc, addr, SZ_1M);
+       offset = label_imm_common(pc, addr, SZ_1M);
  
         insn = aarch64_insn_get_bcond_value();
  
@@ -543,6 +543,28 @@ u32 aarch64_insn_gen_load_store_imm(enum aarch64_insn_register reg,
         return aarch64_insn_encode_immediate(AARCH64_INSN_IMM_12, insn, imm);
  }
  
+u32 aarch64_insn_gen_load_literal(unsigned long pc, unsigned long addr,
+                                 enum aarch64_insn_register reg,
+                                 bool is64bit)
+{
+       u32 insn;
+       long offset;
+
+       offset = label_imm_common(pc, addr, SZ_1M);
+       if (offset >= SZ_1M)
+               return AARCH64_BREAK_FAULT;
+
+       insn = aarch64_insn_get_ldr_lit_value();
+
+       if (is64bit)
+               insn |= BIT(30);
+
+       insn = aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RT, insn, reg);
+
+       return aarch64_insn_encode_immediate(AARCH64_INSN_IMM_19, insn,
+                                            offset >> 2);
+}
+
  u32 aarch64_insn_gen_load_store_pair(enum aarch64_insn_register reg1,
                                      enum aarch64_insn_register reg2,
                                      enum aarch64_insn_register base,
diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h

index 194c95c..a6acb94 100644 (file)
--- a/arch/arm64/net/bpf_jit.h
+++ b/arch/arm64/net/bpf_jit.h
@@ -80,6 +80,12 @@
  #define A64_STR64I(Xt, Xn, imm) A64_LS_IMM(Xt, Xn, imm, 64, STORE)
  #define A64_LDR64I(Xt, Xn, imm) A64_LS_IMM(Xt, Xn, imm, 64, LOAD)
  
+/* LDR (literal) */
+#define A64_LDR32LIT(Wt, offset) \
+       aarch64_insn_gen_load_literal(0, offset, Wt, false)
+#define A64_LDR64LIT(Xt, offset) \
+       aarch64_insn_gen_load_literal(0, offset, Xt, true)
+
  /* Load/store register pair */
  #define A64_LS_PAIR(Rt, Rt2, Rn, offset, ls, type) \
         aarch64_insn_gen_load_store_pair(Rt, Rt2, Rn, offset, \
@@ -270,6 +276,7 @@
  #define A64_BTI_C  A64_HINT(AARCH64_INSN_HINT_BTIC)
  #define A64_BTI_J  A64_HINT(AARCH64_INSN_HINT_BTIJ)
  #define A64_BTI_JC A64_HINT(AARCH64_INSN_HINT_BTIJC)
+#define A64_NOP    A64_HINT(AARCH64_INSN_HINT_NOP)
  
  /* DMB */
  #define A64_DMB_ISH aarch64_insn_gen_dmb(AARCH64_INSN_MB_ISH)
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c

index f08a444..7ca8779 100644 (file)
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -10,6 +10,7 @@
  #include <linux/bitfield.h>
  #include <linux/bpf.h>
  #include <linux/filter.h>
+#include <linux/memory.h>
  #include <linux/printk.h>
  #include <linux/slab.h>
  
@@ -18,6 +19,7 @@
  #include <asm/cacheflush.h>
  #include <asm/debug-monitors.h>
  #include <asm/insn.h>
+#include <asm/patching.h>
  #include <asm/set_memory.h>
  
  #include "bpf_jit.h"
@@ -78,6 +80,15 @@ struct jit_ctx {
         int fpb_offset;
  };
  
+struct bpf_plt {
+       u32 insn_ldr; /* load target */
+       u32 insn_br;  /* branch to target */
+       u64 target;   /* target value */
+};
+
+#define PLT_TARGET_SIZE   sizeof_field(struct bpf_plt, target)
+#define PLT_TARGET_OFFSET offsetof(struct bpf_plt, target)
+
  static inline void emit(const u32 insn, struct jit_ctx *ctx)
  {
         if (ctx->image != NULL)
@@ -140,6 +151,12 @@ static inline void emit_a64_mov_i64(const int reg, const u64 val,
         }
  }
  
+static inline void emit_bti(u32 insn, struct jit_ctx *ctx)
+{
+       if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
+               emit(insn, ctx);
+}
+
  /*
   * Kernel addresses in the vmalloc space use at most 48 bits, and the
   * remaining bits are guaranteed to be 0x1. So we can compose the address
@@ -159,6 +176,14 @@ static inline void emit_addr_mov_i64(const int reg, const u64 val,
         }
  }
  
+static inline void emit_call(u64 target, struct jit_ctx *ctx)
+{
+       u8 tmp = bpf2a64[TMP_REG_1];
+
+       emit_addr_mov_i64(tmp, target, ctx);
+       emit(A64_BLR(tmp), ctx);
+}
+
  static inline int bpf2a64_offset(int bpf_insn, int off,
                                  const struct jit_ctx *ctx)
  {
@@ -235,13 +260,30 @@ static bool is_lsi_offset(int offset, int scale)
         return true;
  }
  
+/* generated prologue:
+ *      bti c // if CONFIG_ARM64_BTI_KERNEL
+ *      mov x9, lr
+ *      nop  // POKE_OFFSET
+ *      paciasp // if CONFIG_ARM64_PTR_AUTH_KERNEL
+ *      stp x29, lr, [sp, #-16]!
+ *      mov x29, sp
+ *      stp x19, x20, [sp, #-16]!
+ *      stp x21, x22, [sp, #-16]!
+ *      stp x25, x26, [sp, #-16]!
+ *      stp x27, x28, [sp, #-16]!
+ *      mov x25, sp
+ *      mov tcc, #0
+ *      // PROLOGUE_OFFSET
+ */
+
+#define BTI_INSNS (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) ? 1 : 0)
+#define PAC_INSNS (IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL) ? 1 : 0)
+
+/* Offset of nop instruction in bpf prog entry to be poked */
+#define POKE_OFFSET (BTI_INSNS + 1)
+
  /* Tail call offset to jump into */
-#if IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) || \
-       IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL)
-#define PROLOGUE_OFFSET 9
-#else
-#define PROLOGUE_OFFSET 8
-#endif
+#define PROLOGUE_OFFSET (BTI_INSNS + 2 + PAC_INSNS + 8)
  
  static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
  {
@@ -280,12 +322,14 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
          *
          */
  
+       emit_bti(A64_BTI_C, ctx);
+
+       emit(A64_MOV(1, A64_R(9), A64_LR), ctx);
+       emit(A64_NOP, ctx);
+
         /* Sign lr */
         if (IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL))
                 emit(A64_PACIASP, ctx);
-       /* BTI landing pad */
-       else if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
-               emit(A64_BTI_C, ctx);
  
         /* Save FP and LR registers to stay align with ARM64 AAPCS */
         emit(A64_PUSH(A64_FP, A64_LR, A64_SP), ctx);
@@ -312,8 +356,7 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
                 }
  
                 /* BTI landing pad for the tail call, done with a BR */
-               if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
-                       emit(A64_BTI_J, ctx);
+               emit_bti(A64_BTI_J, ctx);
         }
  
         emit(A64_SUB_I(1, fpb, fp, ctx->fpb_offset), ctx);
@@ -557,6 +600,53 @@ static int emit_ll_sc_atomic(const struct bpf_insn *insn, struct jit_ctx *ctx)
         return 0;
  }
  
+void dummy_tramp(void);
+
+asm (
+"      .pushsection .text, \"ax\", @progbits\n"
+"      .global dummy_tramp\n"
+"      .type dummy_tramp, %function\n"
+"dummy_tramp:"
+#if IS_ENABLED(CONFIG_ARM64_BTI_KERNEL)
+"      bti j\n" /* dummy_tramp is called via "br x10" */
+#endif
+"      mov x10, x30\n"
+"      mov x30, x9\n"
+"      ret x10\n"
+"      .size dummy_tramp, .-dummy_tramp\n"
+"      .popsection\n"
+);
+
+/* build a plt initialized like this:
+ *
+ * plt:
+ *      ldr tmp, target
+ *      br tmp
+ * target:
+ *      .quad dummy_tramp
+ *
+ * when a long jump trampoline is attached, target is filled with the
+ * trampoline address, and when the trampoline is removed, target is
+ * restored to dummy_tramp address.
+ */
+static void build_plt(struct jit_ctx *ctx)
+{
+       const u8 tmp = bpf2a64[TMP_REG_1];
+       struct bpf_plt *plt = NULL;
+
+       /* make sure target is 64-bit aligned */
+       if ((ctx->idx + PLT_TARGET_OFFSET / AARCH64_INSN_SIZE) % 2)
+               emit(A64_NOP, ctx);
+
+       plt = (struct bpf_plt *)(ctx->image + ctx->idx);
+       /* plt is called via bl, no BTI needed here */
+       emit(A64_LDR64LIT(tmp, 2 * AARCH64_INSN_SIZE), ctx);
+       emit(A64_BR(tmp), ctx);
+
+       if (ctx->image)
+               plt->target = (u64)&dummy_tramp;
+}
+
  static void build_epilogue(struct jit_ctx *ctx)
  {
         const u8 r0 = bpf2a64[BPF_REG_0];
@@ -991,8 +1081,7 @@ emit_cond_jmp:
                                             &func_addr, &func_addr_fixed);
                 if (ret < 0)
                         return ret;
-               emit_addr_mov_i64(tmp, func_addr, ctx);
-               emit(A64_BLR(tmp), ctx);
+               emit_call(func_addr, ctx);
                 emit(A64_MOV(1, r0, A64_R(0)), ctx);
                 break;
         }
@@ -1336,6 +1425,13 @@ static int validate_code(struct jit_ctx *ctx)
                 if (a64_insn == AARCH64_BREAK_FAULT)
                         return -1;
         }
+       return 0;
+}
+
+static int validate_ctx(struct jit_ctx *ctx)
+{
+       if (validate_code(ctx))
+               return -1;
  
         if (WARN_ON_ONCE(ctx->exentry_idx != ctx->prog->aux->num_exentries))
                 return -1;
@@ -1356,7 +1452,7 @@ struct arm64_jit_data {
  
  struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
  {
-       int image_size, prog_size, extable_size;
+       int image_size, prog_size, extable_size, extable_align, extable_offset;
         struct bpf_prog *tmp, *orig_prog = prog;
         struct bpf_binary_header *header;
         struct arm64_jit_data *jit_data;
@@ -1426,13 +1522,17 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
  
         ctx.epilogue_offset = ctx.idx;
         build_epilogue(&ctx);
+       build_plt(&ctx);
  
+       extable_align = __alignof__(struct exception_table_entry);
         extable_size = prog->aux->num_exentries *
                 sizeof(struct exception_table_entry);
  
         /* Now we know the actual image size. */
         prog_size = sizeof(u32) * ctx.idx;
-       image_size = prog_size + extable_size;
+       /* also allocate space for plt target */
+       extable_offset = round_up(prog_size + PLT_TARGET_SIZE, extable_align);
+       image_size = extable_offset + extable_size;
         header = bpf_jit_binary_alloc(image_size, &image_ptr,
                                       sizeof(u32), jit_fill_hole);
         if (header == NULL) {
@@ -1444,7 +1544,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
  
         ctx.image = (__le32 *)image_ptr;
         if (extable_size)
-               prog->aux->extable = (void *)image_ptr + prog_size;
+               prog->aux->extable = (void *)image_ptr + extable_offset;
  skip_init_ctx:
         ctx.idx = 0;
         ctx.exentry_idx = 0;
@@ -1458,9 +1558,10 @@ skip_init_ctx:
         }
  
         build_epilogue(&ctx);
+       build_plt(&ctx);
  
         /* 3. Extra pass to validate JITed code. */
-       if (validate_code(&ctx)) {
+       if (validate_ctx(&ctx)) {
                 bpf_jit_binary_free(header);
                 prog = orig_prog;
                 goto out_off;
@@ -1537,3 +1638,583 @@ bool bpf_jit_supports_subprog_tailcalls(void)
  {
         return true;
  }
+
+static void invoke_bpf_prog(struct jit_ctx *ctx, struct bpf_tramp_link *l,
+                           int args_off, int retval_off, int run_ctx_off,
+                           bool save_ret)
+{
+       u32 *branch;
+       u64 enter_prog;
+       u64 exit_prog;
+       struct bpf_prog *p = l->link.prog;
+       int cookie_off = offsetof(struct bpf_tramp_run_ctx, bpf_cookie);
+
+       if (p->aux->sleepable) {
+               enter_prog = (u64)__bpf_prog_enter_sleepable;
+               exit_prog = (u64)__bpf_prog_exit_sleepable;
+       } else {
+               enter_prog = (u64)__bpf_prog_enter;
+               exit_prog = (u64)__bpf_prog_exit;
+       }
+
+       if (l->cookie == 0) {
+               /* if cookie is zero, one instruction is enough to store it */
+               emit(A64_STR64I(A64_ZR, A64_SP, run_ctx_off + cookie_off), ctx);
+       } else {
+               emit_a64_mov_i64(A64_R(10), l->cookie, ctx);
+               emit(A64_STR64I(A64_R(10), A64_SP, run_ctx_off + cookie_off),
+                    ctx);
+       }
+
+       /* save p to callee saved register x19 to avoid loading p with mov_i64
+        * each time.
+        */
+       emit_addr_mov_i64(A64_R(19), (const u64)p, ctx);
+
+       /* arg1: prog */
+       emit(A64_MOV(1, A64_R(0), A64_R(19)), ctx);
+       /* arg2: &run_ctx */
+       emit(A64_ADD_I(1, A64_R(1), A64_SP, run_ctx_off), ctx);
+
+       emit_call(enter_prog, ctx);
+
+       /* if (__bpf_prog_enter(prog) == 0)
+        *         goto skip_exec_of_prog;
+        */
+       branch = ctx->image + ctx->idx;
+       emit(A64_NOP, ctx);
+
+       /* save return value to callee saved register x20 */
+       emit(A64_MOV(1, A64_R(20), A64_R(0)), ctx);
+
+       emit(A64_ADD_I(1, A64_R(0), A64_SP, args_off), ctx);
+       if (!p->jited)
+               emit_addr_mov_i64(A64_R(1), (const u64)p->insnsi, ctx);
+
+       emit_call((const u64)p->bpf_func, ctx);
+
+       if (save_ret)
+               emit(A64_STR64I(A64_R(0), A64_SP, retval_off), ctx);
+
+       if (ctx->image) {
+               int offset = &ctx->image[ctx->idx] - branch;
+               *branch = A64_CBZ(1, A64_R(0), offset);
+       }
+
+       /* arg1: prog */
+       emit(A64_MOV(1, A64_R(0), A64_R(19)), ctx);
+       /* arg2: start time */
+       emit(A64_MOV(1, A64_R(1), A64_R(20)), ctx);
+       /* arg3: &run_ctx */
+       emit(A64_ADD_I(1, A64_R(2), A64_SP, run_ctx_off), ctx);
+
+       emit_call(exit_prog, ctx);
+}
+
+static void invoke_bpf_mod_ret(struct jit_ctx *ctx, struct bpf_tramp_links *tl,
+                              int args_off, int retval_off, int run_ctx_off,
+                              u32 **branches)
+{
+       int i;
+
+       /* The first fmod_ret program will receive a garbage return value.
+        * Set this to 0 to avoid confusing the program.
+        */
+       emit(A64_STR64I(A64_ZR, A64_SP, retval_off), ctx);
+       for (i = 0; i < tl->nr_links; i++) {
+               invoke_bpf_prog(ctx, tl->links[i], args_off, retval_off,
+                               run_ctx_off, true);
+               /* if (*(u64 *)(sp + retval_off) !=  0)
+                *      goto do_fexit;
+                */
+               emit(A64_LDR64I(A64_R(10), A64_SP, retval_off), ctx);
+               /* Save the location of branch, and generate a nop.
+                * This nop will be replaced with a cbnz later.
+                */
+               branches[i] = ctx->image + ctx->idx;
+               emit(A64_NOP, ctx);
+       }
+}
+
+static void save_args(struct jit_ctx *ctx, int args_off, int nargs)
+{
+       int i;
+
+       for (i = 0; i < nargs; i++) {
+               emit(A64_STR64I(i, A64_SP, args_off), ctx);
+               args_off += 8;
+       }
+}
+
+static void restore_args(struct jit_ctx *ctx, int args_off, int nargs)
+{
+       int i;
+
+       for (i = 0; i < nargs; i++) {
+               emit(A64_LDR64I(i, A64_SP, args_off), ctx);
+               args_off += 8;
+       }
+}
+
+/* Based on the x86's implementation of arch_prepare_bpf_trampoline().
+ *
+ * bpf prog and function entry before bpf trampoline hooked:
+ *   mov x9, lr
+ *   nop
+ *
+ * bpf prog and function entry after bpf trampoline hooked:
+ *   mov x9, lr
+ *   bl  <bpf_trampoline or plt>
+ *
+ */
+static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
+                             struct bpf_tramp_links *tlinks, void *orig_call,
+                             int nargs, u32 flags)
+{
+       int i;
+       int stack_size;
+       int retaddr_off;
+       int regs_off;
+       int retval_off;
+       int args_off;
+       int nargs_off;
+       int ip_off;
+       int run_ctx_off;
+       struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
+       struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
+       struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
+       bool save_ret;
+       u32 **branches = NULL;
+
+       /* trampoline stack layout:
+        *                  [ parent ip         ]
+        *                  [ FP                ]
+        * SP + retaddr_off [ self ip           ]
+        *                  [ FP                ]
+        *
+        *                  [ padding           ] align SP to multiples of 16
+        *
+        *                  [ x20               ] callee saved reg x20
+        * SP + regs_off    [ x19               ] callee saved reg x19
+        *
+        * SP + retval_off  [ return value      ] BPF_TRAMP_F_CALL_ORIG or
+        *                                        BPF_TRAMP_F_RET_FENTRY_RET
+        *
+        *                  [ argN              ]
+        *                  [ ...               ]
+        * SP + args_off    [ arg1              ]
+        *
+        * SP + nargs_off   [ args count        ]
+        *
+        * SP + ip_off      [ traced function   ] BPF_TRAMP_F_IP_ARG flag
+        *
+        * SP + run_ctx_off [ bpf_tramp_run_ctx ]
+        */
+
+       stack_size = 0;
+       run_ctx_off = stack_size;
+       /* room for bpf_tramp_run_ctx */
+       stack_size += round_up(sizeof(struct bpf_tramp_run_ctx), 8);
+
+       ip_off = stack_size;
+       /* room for IP address argument */
+       if (flags & BPF_TRAMP_F_IP_ARG)
+               stack_size += 8;
+
+       nargs_off = stack_size;
+       /* room for args count */
+       stack_size += 8;
+
+       args_off = stack_size;
+       /* room for args */
+       stack_size += nargs * 8;
+
+       /* room for return value */
+       retval_off = stack_size;
+       save_ret = flags & (BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_RET_FENTRY_RET);
+       if (save_ret)
+               stack_size += 8;
+
+       /* room for callee saved registers, currently x19 and x20 are used */
+       regs_off = stack_size;
+       stack_size += 16;
+
+       /* round up to multiples of 16 to avoid SPAlignmentFault */
+       stack_size = round_up(stack_size, 16);
+
+       /* return address locates above FP */
+       retaddr_off = stack_size + 8;
+
+       /* bpf trampoline may be invoked by 3 instruction types:
+        * 1. bl, attached to bpf prog or kernel function via short jump
+        * 2. br, attached to bpf prog or kernel function via long jump
+        * 3. blr, working as a function pointer, used by struct_ops.
+        * So BTI_JC should used here to support both br and blr.
+        */
+       emit_bti(A64_BTI_JC, ctx);
+
+       /* frame for parent function */
+       emit(A64_PUSH(A64_FP, A64_R(9), A64_SP), ctx);
+       emit(A64_MOV(1, A64_FP, A64_SP), ctx);
+
+       /* frame for patched function */
+       emit(A64_PUSH(A64_FP, A64_LR, A64_SP), ctx);
+       emit(A64_MOV(1, A64_FP, A64_SP), ctx);
+
+       /* allocate stack space */
+       emit(A64_SUB_I(1, A64_SP, A64_SP, stack_size), ctx);
+
+       if (flags & BPF_TRAMP_F_IP_ARG) {
+               /* save ip address of the traced function */
+               emit_addr_mov_i64(A64_R(10), (const u64)orig_call, ctx);
+               emit(A64_STR64I(A64_R(10), A64_SP, ip_off), ctx);
+       }
+
+       /* save args count*/
+       emit(A64_MOVZ(1, A64_R(10), nargs, 0), ctx);
+       emit(A64_STR64I(A64_R(10), A64_SP, nargs_off), ctx);
+
+       /* save args */
+       save_args(ctx, args_off, nargs);
+
+       /* save callee saved registers */
+       emit(A64_STR64I(A64_R(19), A64_SP, regs_off), ctx);
+       emit(A64_STR64I(A64_R(20), A64_SP, regs_off + 8), ctx);
+
+       if (flags & BPF_TRAMP_F_CALL_ORIG) {
+               emit_addr_mov_i64(A64_R(0), (const u64)im, ctx);
+               emit_call((const u64)__bpf_tramp_enter, ctx);
+       }
+
+       for (i = 0; i < fentry->nr_links; i++)
+               invoke_bpf_prog(ctx, fentry->links[i], args_off,
+                               retval_off, run_ctx_off,
+                               flags & BPF_TRAMP_F_RET_FENTRY_RET);
+
+       if (fmod_ret->nr_links) {
+               branches = kcalloc(fmod_ret->nr_links, sizeof(u32 *),
+                                  GFP_KERNEL);
+               if (!branches)
+                       return -ENOMEM;
+
+               invoke_bpf_mod_ret(ctx, fmod_ret, args_off, retval_off,
+                                  run_ctx_off, branches);
+       }
+
+       if (flags & BPF_TRAMP_F_CALL_ORIG) {
+               restore_args(ctx, args_off, nargs);
+               /* call original func */
+               emit(A64_LDR64I(A64_R(10), A64_SP, retaddr_off), ctx);
+               emit(A64_BLR(A64_R(10)), ctx);
+               /* store return value */
+               emit(A64_STR64I(A64_R(0), A64_SP, retval_off), ctx);
+               /* reserve a nop for bpf_tramp_image_put */
+               im->ip_after_call = ctx->image + ctx->idx;
+               emit(A64_NOP, ctx);
+       }
+
+       /* update the branches saved in invoke_bpf_mod_ret with cbnz */
+       for (i = 0; i < fmod_ret->nr_links && ctx->image != NULL; i++) {
+               int offset = &ctx->image[ctx->idx] - branches[i];
+               *branches[i] = A64_CBNZ(1, A64_R(10), offset);
+       }
+
+       for (i = 0; i < fexit->nr_links; i++)
+               invoke_bpf_prog(ctx, fexit->links[i], args_off, retval_off,
+                               run_ctx_off, false);
+
+       if (flags & BPF_TRAMP_F_CALL_ORIG) {
+               im->ip_epilogue = ctx->image + ctx->idx;
+               emit_addr_mov_i64(A64_R(0), (const u64)im, ctx);
+               emit_call((const u64)__bpf_tramp_exit, ctx);
+       }
+
+       if (flags & BPF_TRAMP_F_RESTORE_REGS)
+               restore_args(ctx, args_off, nargs);
+
+       /* restore callee saved register x19 and x20 */
+       emit(A64_LDR64I(A64_R(19), A64_SP, regs_off), ctx);
+       emit(A64_LDR64I(A64_R(20), A64_SP, regs_off + 8), ctx);
+
+       if (save_ret)
+               emit(A64_LDR64I(A64_R(0), A64_SP, retval_off), ctx);
+
+       /* reset SP  */
+       emit(A64_MOV(1, A64_SP, A64_FP), ctx);
+
+       /* pop frames  */
+       emit(A64_POP(A64_FP, A64_LR, A64_SP), ctx);
+       emit(A64_POP(A64_FP, A64_R(9), A64_SP), ctx);
+
+       if (flags & BPF_TRAMP_F_SKIP_FRAME) {
+               /* skip patched function, return to parent */
+               emit(A64_MOV(1, A64_LR, A64_R(9)), ctx);
+               emit(A64_RET(A64_R(9)), ctx);
+       } else {
+               /* return to patched function */
+               emit(A64_MOV(1, A64_R(10), A64_LR), ctx);
+               emit(A64_MOV(1, A64_LR, A64_R(9)), ctx);
+               emit(A64_RET(A64_R(10)), ctx);
+       }
+
+       if (ctx->image)
+               bpf_flush_icache(ctx->image, ctx->image + ctx->idx);
+
+       kfree(branches);
+
+       return ctx->idx;
+}
+
+int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image,
+                               void *image_end, const struct btf_func_model *m,
+                               u32 flags, struct bpf_tramp_links *tlinks,
+                               void *orig_call)
+{
+       int ret;
+       int nargs = m->nr_args;
+       int max_insns = ((long)image_end - (long)image) / AARCH64_INSN_SIZE;
+       struct jit_ctx ctx = {
+               .image = NULL,
+               .idx = 0,
+       };
+
+       /* the first 8 arguments are passed by registers */
+       if (nargs > 8)
+               return -ENOTSUPP;
+
+       ret = prepare_trampoline(&ctx, im, tlinks, orig_call, nargs, flags);
+       if (ret < 0)
+               return ret;
+
+       if (ret > max_insns)
+               return -EFBIG;
+
+       ctx.image = image;
+       ctx.idx = 0;
+
+       jit_fill_hole(image, (unsigned int)(image_end - image));
+       ret = prepare_trampoline(&ctx, im, tlinks, orig_call, nargs, flags);
+
+       if (ret > 0 && validate_code(&ctx) < 0)
+               ret = -EINVAL;
+
+       if (ret > 0)
+               ret *= AARCH64_INSN_SIZE;
+
+       return ret;
+}
+
+static bool is_long_jump(void *ip, void *target)
+{
+       long offset;
+
+       /* NULL target means this is a NOP */
+       if (!target)
+               return false;
+
+       offset = (long)target - (long)ip;
+       return offset < -SZ_128M || offset >= SZ_128M;
+}
+
+static int gen_branch_or_nop(enum aarch64_insn_branch_type type, void *ip,
+                            void *addr, void *plt, u32 *insn)
+{
+       void *target;
+
+       if (!addr) {
+               *insn = aarch64_insn_gen_nop();
+               return 0;
+       }
+
+       if (is_long_jump(ip, addr))
+               target = plt;
+       else
+               target = addr;
+
+       *insn = aarch64_insn_gen_branch_imm((unsigned long)ip,
+                                           (unsigned long)target,
+                                           type);
+
+       return *insn != AARCH64_BREAK_FAULT ? 0 : -EFAULT;
+}
+
+/* Replace the branch instruction from @ip to @old_addr in a bpf prog or a bpf
+ * trampoline with the branch instruction from @ip to @new_addr. If @old_addr
+ * or @new_addr is NULL, the old or new instruction is NOP.
+ *
+ * When @ip is the bpf prog entry, a bpf trampoline is being attached or
+ * detached. Since bpf trampoline and bpf prog are allocated separately with
+ * vmalloc, the address distance may exceed 128MB, the maximum branch range.
+ * So long jump should be handled.
+ *
+ * When a bpf prog is constructed, a plt pointing to empty trampoline
+ * dummy_tramp is placed at the end:
+ *
+ *      bpf_prog:
+ *              mov x9, lr
+ *              nop // patchsite
+ *              ...
+ *              ret
+ *
+ *      plt:
+ *              ldr x10, target
+ *              br x10
+ *      target:
+ *              .quad dummy_tramp // plt target
+ *
+ * This is also the state when no trampoline is attached.
+ *
+ * When a short-jump bpf trampoline is attached, the patchsite is patched
+ * to a bl instruction to the trampoline directly:
+ *
+ *      bpf_prog:
+ *              mov x9, lr
+ *              bl <short-jump bpf trampoline address> // patchsite
+ *              ...
+ *              ret
+ *
+ *      plt:
+ *              ldr x10, target
+ *              br x10
+ *      target:
+ *              .quad dummy_tramp // plt target
+ *
+ * When a long-jump bpf trampoline is attached, the plt target is filled with
+ * the trampoline address and the patchsite is patched to a bl instruction to
+ * the plt:
+ *
+ *      bpf_prog:
+ *              mov x9, lr
+ *              bl plt // patchsite
+ *              ...
+ *              ret
+ *
+ *      plt:
+ *              ldr x10, target
+ *              br x10
+ *      target:
+ *              .quad <long-jump bpf trampoline address> // plt target
+ *
+ * The dummy_tramp is used to prevent another CPU from jumping to unknown
+ * locations during the patching process, making the patching process easier.
+ */
+int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type poke_type,
+                      void *old_addr, void *new_addr)
+{
+       int ret;
+       u32 old_insn;
+       u32 new_insn;
+       u32 replaced;
+       struct bpf_plt *plt = NULL;
+       unsigned long size = 0UL;
+       unsigned long offset = ~0UL;
+       enum aarch64_insn_branch_type branch_type;
+       char namebuf[KSYM_NAME_LEN];
+       void *image = NULL;
+       u64 plt_target = 0ULL;
+       bool poking_bpf_entry;
+
+       if (!__bpf_address_lookup((unsigned long)ip, &size, &offset, namebuf))
+               /* Only poking bpf text is supported. Since kernel function
+                * entry is set up by ftrace, we reply on ftrace to poke kernel
+                * functions.
+                */
+               return -ENOTSUPP;
+
+       image = ip - offset;
+       /* zero offset means we're poking bpf prog entry */
+       poking_bpf_entry = (offset == 0UL);
+
+       /* bpf prog entry, find plt and the real patchsite */
+       if (poking_bpf_entry) {
+               /* plt locates at the end of bpf prog */
+               plt = image + size - PLT_TARGET_OFFSET;
+
+               /* skip to the nop instruction in bpf prog entry:
+                * bti c // if BTI enabled
+                * mov x9, x30
+                * nop
+                */
+               ip = image + POKE_OFFSET * AARCH64_INSN_SIZE;
+       }
+
+       /* long jump is only possible at bpf prog entry */
+       if (WARN_ON((is_long_jump(ip, new_addr) || is_long_jump(ip, old_addr)) &&
+                   !poking_bpf_entry))
+               return -EINVAL;
+
+       if (poke_type == BPF_MOD_CALL)
+               branch_type = AARCH64_INSN_BRANCH_LINK;
+       else
+               branch_type = AARCH64_INSN_BRANCH_NOLINK;
+
+       if (gen_branch_or_nop(branch_type, ip, old_addr, plt, &old_insn) < 0)
+               return -EFAULT;
+
+       if (gen_branch_or_nop(branch_type, ip, new_addr, plt, &new_insn) < 0)
+               return -EFAULT;
+
+       if (is_long_jump(ip, new_addr))
+               plt_target = (u64)new_addr;
+       else if (is_long_jump(ip, old_addr))
+               /* if the old target is a long jump and the new target is not,
+                * restore the plt target to dummy_tramp, so there is always a
+                * legal and harmless address stored in plt target, and we'll
+                * never jump from plt to an unknown place.
+                */
+               plt_target = (u64)&dummy_tramp;
+
+       if (plt_target) {
+               /* non-zero plt_target indicates we're patching a bpf prog,
+                * which is read only.
+                */
+               if (set_memory_rw(PAGE_MASK & ((uintptr_t)&plt->target), 1))
+                       return -EFAULT;
+               WRITE_ONCE(plt->target, plt_target);
+               set_memory_ro(PAGE_MASK & ((uintptr_t)&plt->target), 1);
+               /* since plt target points to either the new trampoline
+                * or dummy_tramp, even if another CPU reads the old plt
+                * target value before fetching the bl instruction to plt,
+                * it will be brought back by dummy_tramp, so no barrier is
+                * required here.
+                */
+       }
+
+       /* if the old target and the new target are both long jumps, no
+        * patching is required
+        */
+       if (old_insn == new_insn)
+               return 0;
+
+       mutex_lock(&text_mutex);
+       if (aarch64_insn_read(ip, &replaced)) {
+               ret = -EFAULT;
+               goto out;
+       }
+
+       if (replaced != old_insn) {
+               ret = -EFAULT;
+               goto out;
+       }
+
+       /* We call aarch64_insn_patch_text_nosync() to replace instruction
+        * atomically, so no other CPUs will fetch a half-new and half-old
+        * instruction. But there is chance that another CPU executes the
+        * old instruction after the patching operation finishes (e.g.,
+        * pipeline not flushed, or icache not synchronized yet).
+        *
+        * 1. when a new trampoline is attached, it is not a problem for
+        *    different CPUs to jump to different trampolines temporarily.
+        *
+        * 2. when an old trampoline is freed, we should wait for all other
+        *    CPUs to exit the trampoline and make sure the trampoline is no
+        *    longer reachable, since bpf_tramp_image_put() function already
+        *    uses percpu_ref and task-based rcu to do the sync, no need to call
+        *    the sync version here, see bpf_tramp_image_put() for details.
+        */
+       ret = aarch64_insn_patch_text_nosync(ip, new_insn);
+out:
+       mutex_unlock(&text_mutex);
+
+       return ret;
+}
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c

index 7e95697..c1f6c1c 100644 (file)
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1950,23 +1950,6 @@ static int invoke_bpf_mod_ret(const struct btf_func_model *m, u8 **pprog,
         return 0;
  }
  
-static bool is_valid_bpf_tramp_flags(unsigned int flags)
-{
-       if ((flags & BPF_TRAMP_F_RESTORE_REGS) &&
-           (flags & BPF_TRAMP_F_SKIP_FRAME))
-               return false;
-
-       /*
-        * BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops,
-        * and it must be used alone.
-        */
-       if ((flags & BPF_TRAMP_F_RET_FENTRY_RET) &&
-           (flags & ~BPF_TRAMP_F_RET_FENTRY_RET))
-               return false;
-
-       return true;
-}
-
  /* Example:
   * __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev);
   * its 'struct btf_func_model' will be nr_args=2
@@ -2045,9 +2028,6 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
         if (nr_args > 6)
                 return -ENOTSUPP;
  
-       if (!is_valid_bpf_tramp_flags(flags))
-               return -EINVAL;
-
         /* Generated trampoline stack layout:
          *
          * RBP + 8         [ return address  ]
@@ -2153,10 +2133,15 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
         if (flags & BPF_TRAMP_F_CALL_ORIG) {
                 restore_regs(m, &prog, nr_args, regs_off);
  
-               /* call original function */
-               if (emit_call(&prog, orig_call, prog)) {
-                       ret = -EINVAL;
-                       goto cleanup;
+               if (flags & BPF_TRAMP_F_ORIG_STACK) {
+                       emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, 8);
+                       EMIT2(0xff, 0xd0); /* call *rax */
+               } else {
+                       /* call original function */
+                       if (emit_call(&prog, orig_call, prog)) {
+                               ret = -EINVAL;
+                               goto cleanup;
+                       }
                 }
                 /* remember return value in a stack for bpf prog to access */
                 emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -8);
@@ -2520,3 +2505,28 @@ bool bpf_jit_supports_subprog_tailcalls(void)
  {
         return true;
  }
+
+void bpf_jit_free(struct bpf_prog *prog)
+{
+       if (prog->jited) {
+               struct x64_jit_data *jit_data = prog->aux->jit_data;
+               struct bpf_binary_header *hdr;
+
+               /*
+                * If we fail the final pass of JIT (from jit_subprogs),
+                * the program may not be finalized yet. Call finalize here
+                * before freeing it.
+                */
+               if (jit_data) {
+                       bpf_jit_binary_pack_finalize(prog, jit_data->header,
+                                                    jit_data->rw_header);
+                       kvfree(jit_data->addrs);
+                       kfree(jit_data);
+               }
+               hdr = bpf_jit_binary_pack_hdr(prog);
+               bpf_jit_binary_pack_free(hdr, NULL);
+               WARN_ON_ONCE(!bpf_prog_kallsyms_verify_off(prog));
+       }
+
+       bpf_prog_unlock_free(prog);
+}
diff --git a/include/linux/bpf.h b/include/linux/bpf.h

index 2b21f2a..20c26ae 100644 (file)
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -47,6 +47,7 @@ struct kobject;
  struct mem_cgroup;
  struct module;
  struct bpf_func_state;
+struct ftrace_ops;
  
  extern struct idr btf_idr;
  extern spinlock_t btf_idr_lock;
@@ -221,7 +222,7 @@ struct bpf_map {
         u32 btf_vmlinux_value_type_id;
         struct btf *btf;
  #ifdef CONFIG_MEMCG_KMEM
-       struct mem_cgroup *memcg;
+       struct obj_cgroup *objcg;
  #endif
         char name[BPF_OBJ_NAME_LEN];
         struct bpf_map_off_arr *off_arr;
@@ -751,6 +752,16 @@ struct btf_func_model {
  /* Return the return value of fentry prog. Only used by bpf_struct_ops. */
  #define BPF_TRAMP_F_RET_FENTRY_RET     BIT(4)
  
+/* Get original function from stack instead of from provided direct address.
+ * Makes sense for trampolines with fexit or fmod_ret programs.
+ */
+#define BPF_TRAMP_F_ORIG_STACK         BIT(5)
+
+/* This trampoline is on a function with another ftrace_ops with IPMODIFY,
+ * e.g., a live patch. This flag is set and cleared by ftrace call backs,
+ */
+#define BPF_TRAMP_F_SHARE_IPMODIFY     BIT(6)
+
  /* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50
   * bytes on x86.
   */
@@ -833,9 +844,11 @@ struct bpf_tramp_image {
  struct bpf_trampoline {
         /* hlist for trampoline_table */
         struct hlist_node hlist;
+       struct ftrace_ops *fops;
         /* serializes access to fields of this trampoline */
         struct mutex mutex;
         refcount_t refcnt;
+       u32 flags;
         u64 key;
         struct {
                 struct btf_func_model model;
@@ -1044,7 +1057,6 @@ struct bpf_prog_aux {
         bool sleepable;
         bool tail_call_reachable;
         bool xdp_has_frags;
-       bool use_bpf_prog_pack;
         /* BTF_KIND_FUNC_PROTO for valid attach_btf_id */
         const struct btf_type *attach_func_proto;
         /* function name for valid attach_btf_id */
@@ -1255,9 +1267,6 @@ struct bpf_dummy_ops {
  int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr,
                             union bpf_attr __user *uattr);
  #endif
-int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog,
-                                   int cgroup_atype);
-void bpf_trampoline_unlink_cgroup_shim(struct bpf_prog *prog);
  #else
  static inline const struct bpf_struct_ops *bpf_struct_ops_find(u32 type_id)
  {
@@ -1281,6 +1290,13 @@ static inline int bpf_struct_ops_map_sys_lookup_elem(struct bpf_map *map,
  {
         return -EINVAL;
  }
+#endif
+
+#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_LSM)
+int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog,
+                                   int cgroup_atype);
+void bpf_trampoline_unlink_cgroup_shim(struct bpf_prog *prog);
+#else
  static inline int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog,
                                                   int cgroup_atype)
  {
@@ -1921,7 +1937,8 @@ int btf_check_subprog_arg_match(struct bpf_verifier_env *env, int subprog,
                                 struct bpf_reg_state *regs);
  int btf_check_kfunc_arg_match(struct bpf_verifier_env *env,
                               const struct btf *btf, u32 func_id,
-                             struct bpf_reg_state *regs);
+                             struct bpf_reg_state *regs,
+                             u32 kfunc_flags);
  int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog,
                           struct bpf_reg_state *reg);
  int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog *prog,
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h

index 81b1966..2e3bad8 100644 (file)
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -345,10 +345,10 @@ struct bpf_verifier_state_list {
  };
  
  struct bpf_loop_inline_state {
-       int initialized:1; /* set to true upon first entry */
-       int fit_for_inline:1; /* true if callback function is the same
-                              * at each call and flags are always zero
-                              */
+       unsigned int initialized:1; /* set to true upon first entry */
+       unsigned int fit_for_inline:1; /* true if callback function is the same
+                                       * at each call and flags are always zero
+                                       */
         u32 callback_subprogno; /* valid when fit_for_inline is true */
  };
  
diff --git a/include/linux/btf.h b/include/linux/btf.h

index 1bfed7f..cdb376d 100644 (file)
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -12,14 +12,43 @@
  #define BTF_TYPE_EMIT(type) ((void)(type *)0)
  #define BTF_TYPE_EMIT_ENUM(enum_val) ((void)enum_val)
  
-enum btf_kfunc_type {
-       BTF_KFUNC_TYPE_CHECK,
-       BTF_KFUNC_TYPE_ACQUIRE,
-       BTF_KFUNC_TYPE_RELEASE,
-       BTF_KFUNC_TYPE_RET_NULL,
-       BTF_KFUNC_TYPE_KPTR_ACQUIRE,
-       BTF_KFUNC_TYPE_MAX,
-};
+/* These need to be macros, as the expressions are used in assembler input */
+#define KF_ACQUIRE     (1 << 0) /* kfunc is an acquire function */
+#define KF_RELEASE     (1 << 1) /* kfunc is a release function */
+#define KF_RET_NULL    (1 << 2) /* kfunc returns a pointer that may be NULL */
+#define KF_KPTR_GET    (1 << 3) /* kfunc returns reference to a kptr */
+/* Trusted arguments are those which are meant to be referenced arguments with
+ * unchanged offset. It is used to enforce that pointers obtained from acquire
+ * kfuncs remain unmodified when being passed to helpers taking trusted args.
+ *
+ * Consider
+ *     struct foo {
+ *             int data;
+ *             struct foo *next;
+ *     };
+ *
+ *     struct bar {
+ *             int data;
+ *             struct foo f;
+ *     };
+ *
+ *     struct foo *f = alloc_foo(); // Acquire kfunc
+ *     struct bar *b = alloc_bar(); // Acquire kfunc
+ *
+ * If a kfunc set_foo_data() wants to operate only on the allocated object, it
+ * will set the KF_TRUSTED_ARGS flag, which will prevent unsafe usage like:
+ *
+ *     set_foo_data(f, 42);       // Allowed
+ *     set_foo_data(f->next, 42); // Rejected, non-referenced pointer
+ *     set_foo_data(&f->next, 42);// Rejected, referenced, but wrong type
+ *     set_foo_data(&b->f, 42);   // Rejected, referenced, but bad offset
+ *
+ * In the final case, usually for the purposes of type matching, it is deduced
+ * by looking at the type of the member at the offset, but due to the
+ * requirement of trusted argument, this deduction will be strict and not done
+ * for this case.
+ */
+#define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */
  
  struct btf;
  struct btf_member;
@@ -30,16 +59,7 @@ struct btf_id_set;
  
  struct btf_kfunc_id_set {
         struct module *owner;
-       union {
-               struct {
-                       struct btf_id_set *check_set;
-                       struct btf_id_set *acquire_set;
-                       struct btf_id_set *release_set;
-                       struct btf_id_set *ret_null_set;
-                       struct btf_id_set *kptr_acquire_set;
-               };
-               struct btf_id_set *sets[BTF_KFUNC_TYPE_MAX];
-       };
+       struct btf_id_set8 *set;
  };
  
  struct btf_id_dtor_kfunc {
@@ -378,9 +398,9 @@ const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);
  const char *btf_name_by_offset(const struct btf *btf, u32 offset);
  struct btf *btf_parse_vmlinux(void);
  struct btf *bpf_prog_get_target_btf(const struct bpf_prog *prog);
-bool btf_kfunc_id_set_contains(const struct btf *btf,
+u32 *btf_kfunc_id_set_contains(const struct btf *btf,
                                enum bpf_prog_type prog_type,
-                              enum btf_kfunc_type type, u32 kfunc_btf_id);
+                              u32 kfunc_btf_id);
  int register_btf_kfunc_id_set(enum bpf_prog_type prog_type,
                               const struct btf_kfunc_id_set *s);
  s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id);
@@ -397,12 +417,11 @@ static inline const char *btf_name_by_offset(const struct btf *btf,
  {
         return NULL;
  }
-static inline bool btf_kfunc_id_set_contains(const struct btf *btf,
+static inline u32 *btf_kfunc_id_set_contains(const struct btf *btf,
                                              enum bpf_prog_type prog_type,
-                                            enum btf_kfunc_type type,
                                              u32 kfunc_btf_id)
  {
-       return false;
+       return NULL;
  }
  static inline int register_btf_kfunc_id_set(enum bpf_prog_type prog_type,
                                             const struct btf_kfunc_id_set *s)
diff --git a/include/linux/btf_ids.h b/include/linux/btf_ids.h

index 252a4be..2aea877 100644 (file)
--- a/include/linux/btf_ids.h
+++ b/include/linux/btf_ids.h
@@ -8,6 +8,15 @@ struct btf_id_set {
         u32 ids[];
  };
  
+struct btf_id_set8 {
+       u32 cnt;
+       u32 flags;
+       struct {
+               u32 id;
+               u32 flags;
+       } pairs[];
+};
+
  #ifdef CONFIG_DEBUG_INFO_BTF
  
  #include <linux/compiler.h> /* for __PASTE */
@@ -25,7 +34,7 @@ struct btf_id_set {
  
  #define BTF_IDS_SECTION ".BTF_ids"
  
-#define ____BTF_ID(symbol)                             \
+#define ____BTF_ID(symbol, word)                       \
  asm(                                                   \
  ".pushsection " BTF_IDS_SECTION ",\"a\";       \n"     \
  ".local " #symbol " ;                          \n"     \
@@ -33,10 +42,11 @@ asm(                                                        \
  ".size  " #symbol ", 4;                        \n"     \
  #symbol ":                                     \n"     \
  ".zero 4                                       \n"     \
+word                                                   \
  ".popsection;                                  \n");
  
-#define __BTF_ID(symbol) \
-       ____BTF_ID(symbol)
+#define __BTF_ID(symbol, word) \
+       ____BTF_ID(symbol, word)
  
  #define __ID(prefix) \
         __PASTE(prefix, __COUNTER__)
@@ -46,7 +56,14 @@ asm(                                                 \
   * to 4 zero bytes.
   */
  #define BTF_ID(prefix, name) \
-       __BTF_ID(__ID(__BTF_ID__##prefix##__##name##__))
+       __BTF_ID(__ID(__BTF_ID__##prefix##__##name##__), "")
+
+#define ____BTF_ID_FLAGS(prefix, name, flags) \
+       __BTF_ID(__ID(__BTF_ID__##prefix##__##name##__), ".long " #flags "\n")
+#define __BTF_ID_FLAGS(prefix, name, flags, ...) \
+       ____BTF_ID_FLAGS(prefix, name, flags)
+#define BTF_ID_FLAGS(prefix, name, ...) \
+       __BTF_ID_FLAGS(prefix, name, ##__VA_ARGS__, 0)
  
  /*
   * The BTF_ID_LIST macro defines pure (unsorted) list
@@ -145,10 +162,51 @@ asm(                                                      \
  ".popsection;                                 \n");    \
  extern struct btf_id_set name;
  
+/*
+ * The BTF_SET8_START/END macros pair defines sorted list of
+ * BTF IDs and their flags plus its members count, with the
+ * following layout:
+ *
+ * BTF_SET8_START(list)
+ * BTF_ID_FLAGS(type1, name1, flags)
+ * BTF_ID_FLAGS(type2, name2, flags)
+ * BTF_SET8_END(list)
+ *
+ * __BTF_ID__set8__list:
+ * .zero 8
+ * list:
+ * __BTF_ID__type1__name1__3:
+ * .zero 4
+ * .word (1 << 0) | (1 << 2)
+ * __BTF_ID__type2__name2__5:
+ * .zero 4
+ * .word (1 << 3) | (1 << 1) | (1 << 2)
+ *
+ */
+#define __BTF_SET8_START(name, scope)                  \
+asm(                                                   \
+".pushsection " BTF_IDS_SECTION ",\"a\";       \n"     \
+"." #scope " __BTF_ID__set8__" #name ";        \n"     \
+"__BTF_ID__set8__" #name ":;                   \n"     \
+".zero 8                                       \n"     \
+".popsection;                                  \n");
+
+#define BTF_SET8_START(name)                           \
+__BTF_ID_LIST(name, local)                             \
+__BTF_SET8_START(name, local)
+
+#define BTF_SET8_END(name)                             \
+asm(                                                   \
+".pushsection " BTF_IDS_SECTION ",\"a\";      \n"      \
+".size __BTF_ID__set8__" #name ", .-" #name "  \n"     \
+".popsection;                                 \n");    \
+extern struct btf_id_set8 name;
+
  #else
  
  #define BTF_ID_LIST(name) static u32 __maybe_unused name[5];
  #define BTF_ID(prefix, name)
+#define BTF_ID_FLAGS(prefix, name, ...)
  #define BTF_ID_UNUSED
  #define BTF_ID_LIST_GLOBAL(name, n) u32 __maybe_unused name[n];
  #define BTF_ID_LIST_SINGLE(name, prefix, typename) static u32 __maybe_unused name[1];
@@ -156,6 +214,8 @@ extern struct btf_id_set name;
  #define BTF_SET_START(name) static struct btf_id_set __maybe_unused name = { 0 };
  #define BTF_SET_START_GLOBAL(name) static struct btf_id_set __maybe_unused name = { 0 };
  #define BTF_SET_END(name)
+#define BTF_SET8_START(name) static struct btf_id_set8 __maybe_unused name = { 0 };
+#define BTF_SET8_END(name)
  
  #endif /* CONFIG_DEBUG_INFO_BTF */
  
diff --git a/include/linux/filter.h b/include/linux/filter.h

index 4c1a8b2..a5f21dc 100644 (file)
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1027,6 +1027,14 @@ u64 bpf_jit_alloc_exec_limit(void);
  void *bpf_jit_alloc_exec(unsigned long size);
  void bpf_jit_free_exec(void *addr);
  void bpf_jit_free(struct bpf_prog *fp);
+struct bpf_binary_header *
+bpf_jit_binary_pack_hdr(const struct bpf_prog *fp);
+
+static inline bool bpf_prog_kallsyms_verify_off(const struct bpf_prog *fp)
+{
+       return list_empty(&fp->aux->ksym.lnode) ||
+              fp->aux->ksym.lnode.prev == LIST_POISON2;
+}
  
  struct bpf_binary_header *
  bpf_jit_binary_pack_alloc(unsigned int proglen, u8 **ro_image,
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h

index 979f6bf..0b61371 100644 (file)
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -208,6 +208,43 @@ enum {
         FTRACE_OPS_FL_DIRECT                    = BIT(17),
  };
  
+/*
+ * FTRACE_OPS_CMD_* commands allow the ftrace core logic to request changes
+ * to a ftrace_ops. Note, the requests may fail.
+ *
+ * ENABLE_SHARE_IPMODIFY_SELF - enable a DIRECT ops to work on the same
+ *                              function as an ops with IPMODIFY. Called
+ *                              when the DIRECT ops is being registered.
+ *                              This is called with both direct_mutex and
+ *                              ftrace_lock are locked.
+ *
+ * ENABLE_SHARE_IPMODIFY_PEER - enable a DIRECT ops to work on the same
+ *                              function as an ops with IPMODIFY. Called
+ *                              when the other ops (the one with IPMODIFY)
+ *                              is being registered.
+ *                              This is called with direct_mutex locked.
+ *
+ * DISABLE_SHARE_IPMODIFY_PEER - disable a DIRECT ops to work on the same
+ *                               function as an ops with IPMODIFY. Called
+ *                               when the other ops (the one with IPMODIFY)
+ *                               is being unregistered.
+ *                               This is called with direct_mutex locked.
+ */
+enum ftrace_ops_cmd {
+       FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_SELF,
+       FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_PEER,
+       FTRACE_OPS_CMD_DISABLE_SHARE_IPMODIFY_PEER,
+};
+
+/*
+ * For most ftrace_ops_cmd,
+ * Returns:
+ *        0 - Success.
+ *        Negative on failure. The return value is dependent on the
+ *        callback.
+ */
+typedef int (*ftrace_ops_func_t)(struct ftrace_ops *op, enum ftrace_ops_cmd cmd);
+
  #ifdef CONFIG_DYNAMIC_FTRACE
  /* The hash used to know what functions callbacks trace */
  struct ftrace_ops_hash {
@@ -250,6 +287,7 @@ struct ftrace_ops {
         unsigned long                   trampoline;
         unsigned long                   trampoline_size;
         struct list_head                list;
+       ftrace_ops_func_t               ops_func;
  #endif
  };
  
@@ -340,6 +378,7 @@ unsigned long ftrace_find_rec_direct(unsigned long ip);
  int register_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr);
  int unregister_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr);
  int modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr);
+int modify_ftrace_direct_multi_nolock(struct ftrace_ops *ops, unsigned long addr);
  
  #else
  struct ftrace_ops;
@@ -384,6 +423,10 @@ static inline int modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned lo
  {
         return -ENODEV;
  }
+static inline int modify_ftrace_direct_multi_nolock(struct ftrace_ops *ops, unsigned long addr)
+{
+       return -ENODEV;
+}
  #endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
  
  #ifndef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h

index 5d7ff88..ca8afa3 100644 (file)
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2487,6 +2487,14 @@ static inline void skb_set_tail_pointer(struct sk_buff *skb, const int offset)
  
  #endif /* NET_SKBUFF_DATA_USES_OFFSET */
  
+static inline void skb_assert_len(struct sk_buff *skb)
+{
+#ifdef CONFIG_DEBUG_NET
+       if (WARN_ONCE(!skb->len, "%s\n", __func__))
+               DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false);
+#endif /* CONFIG_DEBUG_NET */
+}
+
  /*
   *     Add data to an sk_buff
   */
diff --git a/include/net/netfilter/nf_conntrack_core.h b/include/net/netfilter/nf_conntrack_core.h

index 37866c8..3cd3a6e 100644 (file)
--- a/include/net/netfilter/nf_conntrack_core.h
+++ b/include/net/netfilter/nf_conntrack_core.h
@@ -84,4 +84,23 @@ void nf_conntrack_lock(spinlock_t *lock);
  
  extern spinlock_t nf_conntrack_expect_lock;
  
+/* ctnetlink code shared by both ctnetlink and nf_conntrack_bpf */
+
+#if (IS_BUILTIN(CONFIG_NF_CONNTRACK) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) || \
+    (IS_MODULE(CONFIG_NF_CONNTRACK) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES) || \
+    IS_ENABLED(CONFIG_NF_CT_NETLINK))
+
+static inline void __nf_ct_set_timeout(struct nf_conn *ct, u64 timeout)
+{
+       if (timeout > INT_MAX)
+               timeout = INT_MAX;
+       WRITE_ONCE(ct->timeout, nfct_time_stamp + (u32)timeout);
+}
+
+int __nf_ct_change_timeout(struct nf_conn *ct, u64 cta_timeout);
+void __nf_ct_change_status(struct nf_conn *ct, unsigned long on, unsigned long off);
+int nf_ct_change_status_common(struct nf_conn *ct, unsigned int status);
+
+#endif
+
  #endif /* _NF_CONNTRACK_CORE_H */
diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h

index 4aa0318..4277b0d 100644 (file)
--- a/include/net/xdp_sock_drv.h
+++ b/include/net/xdp_sock_drv.h
@@ -44,6 +44,15 @@ static inline void xsk_pool_set_rxq_info(struct xsk_buff_pool *pool,
         xp_set_rxq_info(pool, rxq);
  }
  
+static inline unsigned int xsk_pool_get_napi_id(struct xsk_buff_pool *pool)
+{
+#ifdef CONFIG_NET_RX_BUSY_POLL
+       return pool->heads[0].xdp.rxq->napi_id;
+#else
+       return 0;
+#endif
+}
+
  static inline void xsk_pool_dma_unmap(struct xsk_buff_pool *pool,
                                       unsigned long attrs)
  {
@@ -198,6 +207,11 @@ static inline void xsk_pool_set_rxq_info(struct xsk_buff_pool *pool,
  {
  }
  
+static inline unsigned int xsk_pool_get_napi_id(struct xsk_buff_pool *pool)
+{
+       return 0;
+}
+
  static inline void xsk_pool_dma_unmap(struct xsk_buff_pool *pool,
                                       unsigned long attrs)
  {
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h

index 3dd13fe..59a217c 100644 (file)
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2361,7 +2361,8 @@ union bpf_attr {
   *             Pull in non-linear data in case the *skb* is non-linear and not
   *             all of *len* are part of the linear section. Make *len* bytes
   *             from *skb* readable and writable. If a zero value is passed for
- *             *len*, then the whole length of the *skb* is pulled.
+ *             *len*, then all bytes in the linear part of *skb* will be made
+ *             readable and writable.
   *
   *             This helper is only needed for reading and writing with direct
   *             packet access.
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c

index fe40d3b..d3e734b 100644 (file)
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -70,10 +70,8 @@ int array_map_alloc_check(union bpf_attr *attr)
             attr->map_flags & BPF_F_PRESERVE_ELEMS)
                 return -EINVAL;
  
-       if (attr->value_size > KMALLOC_MAX_SIZE)
-               /* if value_size is bigger, the user space won't be able to
-                * access the elements.
-                */
+       /* avoid overflow on round_up(map->value_size) */
+       if (attr->value_size > INT_MAX)
                 return -E2BIG;
  
         return 0;
@@ -156,6 +154,11 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
         return &array->map;
  }
  
+static void *array_map_elem_ptr(struct bpf_array* array, u32 index)
+{
+       return array->value + (u64)array->elem_size * index;
+}
+
  /* Called from syscall or from eBPF program */
  static void *array_map_lookup_elem(struct bpf_map *map, void *key)
  {
@@ -165,7 +168,7 @@ static void *array_map_lookup_elem(struct bpf_map *map, void *key)
         if (unlikely(index >= array->map.max_entries))
                 return NULL;
  
-       return array->value + array->elem_size * (index & array->index_mask);
+       return array->value + (u64)array->elem_size * (index & array->index_mask);
  }
  
  static int array_map_direct_value_addr(const struct bpf_map *map, u64 *imm,
@@ -203,7 +206,7 @@ static int array_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf)
  {
         struct bpf_array *array = container_of(map, struct bpf_array, map);
         struct bpf_insn *insn = insn_buf;
-       u32 elem_size = round_up(map->value_size, 8);
+       u32 elem_size = array->elem_size;
         const int ret = BPF_REG_0;
         const int map_ptr = BPF_REG_1;
         const int index = BPF_REG_2;
@@ -272,7 +275,7 @@ int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value)
          * access 'value_size' of them, so copying rounded areas
          * will not leak any kernel data
          */
-       size = round_up(map->value_size, 8);
+       size = array->elem_size;
         rcu_read_lock();
         pptr = array->pptrs[index & array->index_mask];
         for_each_possible_cpu(cpu) {
@@ -339,7 +342,7 @@ static int array_map_update_elem(struct bpf_map *map, void *key, void *value,
                        value, map->value_size);
         } else {
                 val = array->value +
-                       array->elem_size * (index & array->index_mask);
+                       (u64)array->elem_size * (index & array->index_mask);
                 if (map_flags & BPF_F_LOCK)
                         copy_map_value_locked(map, val, value, false);
                 else
@@ -376,7 +379,7 @@ int bpf_percpu_array_update(struct bpf_map *map, void *key, void *value,
          * returned or zeros which were zero-filled by percpu_alloc,
          * so no kernel data leaks possible
          */
-       size = round_up(map->value_size, 8);
+       size = array->elem_size;
         rcu_read_lock();
         pptr = array->pptrs[index & array->index_mask];
         for_each_possible_cpu(cpu) {
@@ -408,8 +411,7 @@ static void array_map_free_timers(struct bpf_map *map)
                 return;
  
         for (i = 0; i < array->map.max_entries; i++)
-               bpf_timer_cancel_and_free(array->value + array->elem_size * i +
-                                         map->timer_off);
+               bpf_timer_cancel_and_free(array_map_elem_ptr(array, i) + map->timer_off);
  }
  
  /* Called when map->refcnt goes to zero, either from workqueue or from syscall */
@@ -420,7 +422,7 @@ static void array_map_free(struct bpf_map *map)
  
         if (map_value_has_kptrs(map)) {
                 for (i = 0; i < array->map.max_entries; i++)
-                       bpf_map_free_kptrs(map, array->value + array->elem_size * i);
+                       bpf_map_free_kptrs(map, array_map_elem_ptr(array, i));
                 bpf_map_free_kptr_off_tab(map);
         }
  
@@ -556,7 +558,7 @@ static void *bpf_array_map_seq_start(struct seq_file *seq, loff_t *pos)
         index = info->index & array->index_mask;
         if (info->percpu_value_buf)
                return array->pptrs[index];
-       return array->value + array->elem_size * index;
+       return array_map_elem_ptr(array, index);
  }
  
  static void *bpf_array_map_seq_next(struct seq_file *seq, void *v, loff_t *pos)
@@ -575,7 +577,7 @@ static void *bpf_array_map_seq_next(struct seq_file *seq, void *v, loff_t *pos)
         index = info->index & array->index_mask;
         if (info->percpu_value_buf)
                return array->pptrs[index];
-       return array->value + array->elem_size * index;
+       return array_map_elem_ptr(array, index);
  }
  
  static int __bpf_array_map_seq_show(struct seq_file *seq, void *v)
@@ -583,6 +585,7 @@ static int __bpf_array_map_seq_show(struct seq_file *seq, void *v)
         struct bpf_iter_seq_array_map_info *info = seq->private;
         struct bpf_iter__bpf_map_elem ctx = {};
         struct bpf_map *map = info->map;
+       struct bpf_array *array = container_of(map, struct bpf_array, map);
         struct bpf_iter_meta meta;
         struct bpf_prog *prog;
         int off = 0, cpu = 0;
@@ -603,7 +606,7 @@ static int __bpf_array_map_seq_show(struct seq_file *seq, void *v)
                         ctx.value = v;
                 } else {
                         pptr = v;
-                       size = round_up(map->value_size, 8);
+                       size = array->elem_size;
                         for_each_possible_cpu(cpu) {
                                 bpf_long_memcpy(info->percpu_value_buf + off,
                                                 per_cpu_ptr(pptr, cpu),
@@ -633,11 +636,12 @@ static int bpf_iter_init_array_map(void *priv_data,
  {
         struct bpf_iter_seq_array_map_info *seq_info = priv_data;
         struct bpf_map *map = aux->map;
+       struct bpf_array *array = container_of(map, struct bpf_array, map);
         void *value_buf;
         u32 buf_size;
  
         if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
-               buf_size = round_up(map->value_size, 8) * num_possible_cpus();
+               buf_size = array->elem_size * num_possible_cpus();
                 value_buf = kmalloc(buf_size, GFP_USER | __GFP_NOWARN);
                 if (!value_buf)
                         return -ENOMEM;
@@ -690,7 +694,7 @@ static int bpf_for_each_array_elem(struct bpf_map *map, bpf_callback_t callback_
                 if (is_percpu)
                         val = this_cpu_ptr(array->pptrs[i]);
                 else
-                       val = array->value + array->elem_size * i;
+                       val = array_map_elem_ptr(array, i);
                 num_elems++;
                 key = i;
                 ret = callback_fn((u64)(long)map, (u64)(long)&key,
@@ -1322,7 +1326,7 @@ static int array_of_map_gen_lookup(struct bpf_map *map,
                                    struct bpf_insn *insn_buf)
  {
         struct bpf_array *array = container_of(map, struct bpf_array, map);
-       u32 elem_size = round_up(map->value_size, 8);
+       u32 elem_size = array->elem_size;
         struct bpf_insn *insn = insn_buf;
         const int ret = BPF_REG_0;
         const int map_ptr = BPF_REG_1;
diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c

index d469b7f..fa71d58 100644 (file)
--- a/kernel/bpf/bpf_lsm.c
+++ b/kernel/bpf/bpf_lsm.c
@@ -63,10 +63,11 @@ BTF_ID(func, bpf_lsm_socket_post_create)
  BTF_ID(func, bpf_lsm_socket_socketpair)
  BTF_SET_END(bpf_lsm_unlocked_sockopt_hooks)
  
+#ifdef CONFIG_CGROUP_BPF
  void bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog,
                              bpf_func_t *bpf_func)
  {
-       const struct btf_param *args;
+       const struct btf_param *args __maybe_unused;
  
         if (btf_type_vlen(prog->aux->attach_func_proto) < 1 ||
             btf_id_set_contains(&bpf_lsm_current_hooks,
@@ -75,9 +76,9 @@ void bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog,
                 return;
         }
  
+#ifdef CONFIG_NET
         args = btf_params(prog->aux->attach_func_proto);
  
-#ifdef CONFIG_NET
         if (args[0].type == btf_sock_ids[BTF_SOCK_TYPE_SOCKET])
                 *bpf_func = __cgroup_bpf_run_lsm_socket;
         else if (args[0].type == btf_sock_ids[BTF_SOCK_TYPE_SOCK])
@@ -86,6 +87,7 @@ void bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog,
  #endif
                 *bpf_func = __cgroup_bpf_run_lsm_current;
  }
+#endif
  
  int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
                         const struct bpf_prog *prog)
@@ -219,6 +221,7 @@ bpf_lsm_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
         case BPF_FUNC_get_retval:
                 return prog->expected_attach_type == BPF_LSM_CGROUP ?
                         &bpf_get_retval_proto : NULL;
+#ifdef CONFIG_NET
         case BPF_FUNC_setsockopt:
                 if (prog->expected_attach_type != BPF_LSM_CGROUP)
                         return NULL;
@@ -239,6 +242,7 @@ bpf_lsm_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
                                         prog->aux->attach_btf_id))
                         return &bpf_unlocked_sk_getsockopt_proto;
                 return NULL;
+#endif
         default:
                 return tracing_prog_func_proto(func_id, prog);
         }
diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c

index 7e0068c..84b2d9d 100644 (file)
--- a/kernel/bpf/bpf_struct_ops.c
+++ b/kernel/bpf/bpf_struct_ops.c
@@ -341,6 +341,9 @@ int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks,
  
         tlinks[BPF_TRAMP_FENTRY].links[0] = link;
         tlinks[BPF_TRAMP_FENTRY].nr_links = 1;
+       /* BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops,
+        * and it must be used alone.
+        */
         flags = model->ret_size > 0 ? BPF_TRAMP_F_RET_FENTRY_RET : 0;
         return arch_prepare_bpf_trampoline(NULL, image, image_end,
                                            model, flags, tlinks, NULL);
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c

index 4423045..7ac971e 100644 (file)
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -213,7 +213,7 @@ enum {
  };
  
  struct btf_kfunc_set_tab {
-       struct btf_id_set *sets[BTF_KFUNC_HOOK_MAX][BTF_KFUNC_TYPE_MAX];
+       struct btf_id_set8 *sets[BTF_KFUNC_HOOK_MAX];
  };
  
  struct btf_id_dtor_kfunc_tab {
@@ -1116,7 +1116,8 @@ __printf(2, 3) static void btf_show(struct btf_show *show, const char *fmt, ...)
   */
  #define btf_show_type_value(show, fmt, value)                                 \
         do {                                                                   \
-               if ((value) != 0 || (show->flags & BTF_SHOW_ZERO) ||           \
+               if ((value) != (__typeof__(value))0 ||                         \
+                   (show->flags & BTF_SHOW_ZERO) ||                           \
                     show->state.depth == 0) {                                  \
                         btf_show(show, "%s%s" fmt "%s%s",                      \
                                  btf_show_indent(show),                        \
@@ -1615,7 +1616,7 @@ static void btf_free_id(struct btf *btf)
  static void btf_free_kfunc_set_tab(struct btf *btf)
  {
         struct btf_kfunc_set_tab *tab = btf->kfunc_set_tab;
-       int hook, type;
+       int hook;
  
         if (!tab)
                 return;
@@ -1624,10 +1625,8 @@ static void btf_free_kfunc_set_tab(struct btf *btf)
          */
         if (btf_is_module(btf))
                 goto free_tab;
-       for (hook = 0; hook < ARRAY_SIZE(tab->sets); hook++) {
-               for (type = 0; type < ARRAY_SIZE(tab->sets[0]); type++)
-                       kfree(tab->sets[hook][type]);
-       }
+       for (hook = 0; hook < ARRAY_SIZE(tab->sets); hook++)
+               kfree(tab->sets[hook]);
  free_tab:
         kfree(tab);
         btf->kfunc_set_tab = NULL;
@@ -6171,13 +6170,14 @@ static bool is_kfunc_arg_mem_size(const struct btf *btf,
  static int btf_check_func_arg_match(struct bpf_verifier_env *env,
                                     const struct btf *btf, u32 func_id,
                                     struct bpf_reg_state *regs,
-                                   bool ptr_to_mem_ok)
+                                   bool ptr_to_mem_ok,
+                                   u32 kfunc_flags)
  {
         enum bpf_prog_type prog_type = resolve_prog_type(env->prog);
+       bool rel = false, kptr_get = false, trusted_arg = false;
         struct bpf_verifier_log *log = &env->log;
         u32 i, nargs, ref_id, ref_obj_id = 0;
         bool is_kfunc = btf_is_kernel(btf);
-       bool rel = false, kptr_get = false;
         const char *func_name, *ref_tname;
         const struct btf_type *t, *ref_t;
         const struct btf_param *args;
@@ -6209,10 +6209,9 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
  
         if (is_kfunc) {
                 /* Only kfunc can be release func */
-               rel = btf_kfunc_id_set_contains(btf, resolve_prog_type(env->prog),
-                                               BTF_KFUNC_TYPE_RELEASE, func_id);
-               kptr_get = btf_kfunc_id_set_contains(btf, resolve_prog_type(env->prog),
-                                                    BTF_KFUNC_TYPE_KPTR_ACQUIRE, func_id);
+               rel = kfunc_flags & KF_RELEASE;
+               kptr_get = kfunc_flags & KF_KPTR_GET;
+               trusted_arg = kfunc_flags & KF_TRUSTED_ARGS;
         }
  
         /* check that BTF function arguments match actual types that the
@@ -6237,10 +6236,19 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
                         return -EINVAL;
                 }
  
+               /* Check if argument must be a referenced pointer, args + i has
+                * been verified to be a pointer (after skipping modifiers).
+                */
+               if (is_kfunc && trusted_arg && !reg->ref_obj_id) {
+                       bpf_log(log, "R%d must be referenced\n", regno);
+                       return -EINVAL;
+               }
+
                 ref_t = btf_type_skip_modifiers(btf, t->type, &ref_id);
                 ref_tname = btf_name_by_offset(btf, ref_t->name_off);
  
-               if (rel && reg->ref_obj_id)
+               /* Trusted args have the same offset checks as release arguments */
+               if (trusted_arg || (rel && reg->ref_obj_id))
                         arg_type |= OBJ_RELEASE;
                 ret = check_func_arg_reg_off(env, reg, regno, arg_type);
                 if (ret < 0)
@@ -6338,7 +6346,8 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
                         reg_ref_tname = btf_name_by_offset(reg_btf,
                                                            reg_ref_t->name_off);
                         if (!btf_struct_ids_match(log, reg_btf, reg_ref_id,
-                                                 reg->off, btf, ref_id, rel && reg->ref_obj_id)) {
+                                                 reg->off, btf, ref_id,
+                                                 trusted_arg || (rel && reg->ref_obj_id))) {
                                 bpf_log(log, "kernel function %s args#%d expected pointer to %s %s but R%d has a pointer to %s %s\n",
                                         func_name, i,
                                         btf_type_str(ref_t), ref_tname,
@@ -6441,7 +6450,7 @@ int btf_check_subprog_arg_match(struct bpf_verifier_env *env, int subprog,
                 return -EINVAL;
  
         is_global = prog->aux->func_info_aux[subprog].linkage == BTF_FUNC_GLOBAL;
-       err = btf_check_func_arg_match(env, btf, btf_id, regs, is_global);
+       err = btf_check_func_arg_match(env, btf, btf_id, regs, is_global, 0);
  
         /* Compiler optimizations can remove arguments from static functions
          * or mismatched type can be passed into a global function.
@@ -6454,9 +6463,10 @@ int btf_check_subprog_arg_match(struct bpf_verifier_env *env, int subprog,
  
  int btf_check_kfunc_arg_match(struct bpf_verifier_env *env,
                               const struct btf *btf, u32 func_id,
-                             struct bpf_reg_state *regs)
+                             struct bpf_reg_state *regs,
+                             u32 kfunc_flags)
  {
-       return btf_check_func_arg_match(env, btf, func_id, regs, true);
+       return btf_check_func_arg_match(env, btf, func_id, regs, true, kfunc_flags);
  }
  
  /* Convert BTF of a function into bpf_reg_state if possible
@@ -6853,6 +6863,11 @@ bool btf_id_set_contains(const struct btf_id_set *set, u32 id)
         return bsearch(&id, set->ids, set->cnt, sizeof(u32), btf_id_cmp_func) != NULL;
  }
  
+static void *btf_id_set8_contains(const struct btf_id_set8 *set, u32 id)
+{
+       return bsearch(&id, set->pairs, set->cnt, sizeof(set->pairs[0]), btf_id_cmp_func);
+}
+
  enum {
         BTF_MODULE_F_LIVE = (1 << 0),
  };
@@ -7101,16 +7116,16 @@ BTF_TRACING_TYPE_xxx
  
  /* Kernel Function (kfunc) BTF ID set registration API */
  
-static int __btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
-                                   enum btf_kfunc_type type,
-                                   struct btf_id_set *add_set, bool vmlinux_set)
+static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
+                                 struct btf_id_set8 *add_set)
  {
+       bool vmlinux_set = !btf_is_module(btf);
         struct btf_kfunc_set_tab *tab;
-       struct btf_id_set *set;
+       struct btf_id_set8 *set;
         u32 set_cnt;
         int ret;
  
-       if (hook >= BTF_KFUNC_HOOK_MAX || type >= BTF_KFUNC_TYPE_MAX) {
+       if (hook >= BTF_KFUNC_HOOK_MAX) {
                 ret = -EINVAL;
                 goto end;
         }
@@ -7126,7 +7141,7 @@ static int __btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
                 btf->kfunc_set_tab = tab;
         }
  
-       set = tab->sets[hook][type];
+       set = tab->sets[hook];
         /* Warn when register_btf_kfunc_id_set is called twice for the same hook
          * for module sets.
          */
@@ -7140,7 +7155,7 @@ static int __btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
          * pointer and return.
          */
         if (!vmlinux_set) {
-               tab->sets[hook][type] = add_set;
+               tab->sets[hook] = add_set;
                 return 0;
         }
  
@@ -7149,7 +7164,7 @@ static int __btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
          * and concatenate all individual sets being registered. While each set
          * is individually sorted, they may become unsorted when concatenated,
          * hence re-sorting the final set again is required to make binary
-        * searching the set using btf_id_set_contains function work.
+        * searching the set using btf_id_set8_contains function work.
          */
         set_cnt = set ? set->cnt : 0;
  
@@ -7164,8 +7179,8 @@ static int __btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
         }
  
         /* Grow set */
-       set = krealloc(tab->sets[hook][type],
-                      offsetof(struct btf_id_set, ids[set_cnt + add_set->cnt]),
+       set = krealloc(tab->sets[hook],
+                      offsetof(struct btf_id_set8, pairs[set_cnt + add_set->cnt]),
                        GFP_KERNEL | __GFP_NOWARN);
         if (!set) {
                 ret = -ENOMEM;
@@ -7173,15 +7188,15 @@ static int __btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
         }
  
         /* For newly allocated set, initialize set->cnt to 0 */
-       if (!tab->sets[hook][type])
+       if (!tab->sets[hook])
                 set->cnt = 0;
-       tab->sets[hook][type] = set;
+       tab->sets[hook] = set;
  
         /* Concatenate the two sets */
-       memcpy(set->ids + set->cnt, add_set->ids, add_set->cnt * sizeof(set->ids[0]));
+       memcpy(set->pairs + set->cnt, add_set->pairs, add_set->cnt * sizeof(set->pairs[0]));
         set->cnt += add_set->cnt;
  
-       sort(set->ids, set->cnt, sizeof(set->ids[0]), btf_id_cmp_func, NULL);
+       sort(set->pairs, set->cnt, sizeof(set->pairs[0]), btf_id_cmp_func, NULL);
  
         return 0;
  end:
@@ -7189,38 +7204,25 @@ end:
         return ret;
  }
  
-static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
-                                 const struct btf_kfunc_id_set *kset)
-{
-       bool vmlinux_set = !btf_is_module(btf);
-       int type, ret = 0;
-
-       for (type = 0; type < ARRAY_SIZE(kset->sets); type++) {
-               if (!kset->sets[type])
-                       continue;
-
-               ret = __btf_populate_kfunc_set(btf, hook, type, kset->sets[type], vmlinux_set);
-               if (ret)
-                       break;
-       }
-       return ret;
-}
-
-static bool __btf_kfunc_id_set_contains(const struct btf *btf,
+static u32 *__btf_kfunc_id_set_contains(const struct btf *btf,
                                         enum btf_kfunc_hook hook,
-                                       enum btf_kfunc_type type,
                                         u32 kfunc_btf_id)
  {
-       struct btf_id_set *set;
+       struct btf_id_set8 *set;
+       u32 *id;
  
-       if (hook >= BTF_KFUNC_HOOK_MAX || type >= BTF_KFUNC_TYPE_MAX)
-               return false;
+       if (hook >= BTF_KFUNC_HOOK_MAX)
+               return NULL;
         if (!btf->kfunc_set_tab)
-               return false;
-       set = btf->kfunc_set_tab->sets[hook][type];
+               return NULL;
+       set = btf->kfunc_set_tab->sets[hook];
         if (!set)
-               return false;
-       return btf_id_set_contains(set, kfunc_btf_id);
+               return NULL;
+       id = btf_id_set8_contains(set, kfunc_btf_id);
+       if (!id)
+               return NULL;
+       /* The flags for BTF ID are located next to it */
+       return id + 1;
  }
  
  static int bpf_prog_type_to_kfunc_hook(enum bpf_prog_type prog_type)
@@ -7248,14 +7250,14 @@ static int bpf_prog_type_to_kfunc_hook(enum bpf_prog_type prog_type)
   * keeping the reference for the duration of the call provides the necessary
   * protection for looking up a well-formed btf->kfunc_set_tab.
   */
-bool btf_kfunc_id_set_contains(const struct btf *btf,
+u32 *btf_kfunc_id_set_contains(const struct btf *btf,
                                enum bpf_prog_type prog_type,
-                              enum btf_kfunc_type type, u32 kfunc_btf_id)
+                              u32 kfunc_btf_id)
  {
         enum btf_kfunc_hook hook;
  
         hook = bpf_prog_type_to_kfunc_hook(prog_type);
-       return __btf_kfunc_id_set_contains(btf, hook, type, kfunc_btf_id);
+       return __btf_kfunc_id_set_contains(btf, hook, kfunc_btf_id);
  }
  
  /* This function must be invoked only from initcalls/module init functions */
@@ -7282,7 +7284,7 @@ int register_btf_kfunc_id_set(enum bpf_prog_type prog_type,
                 return PTR_ERR(btf);
  
         hook = bpf_prog_type_to_kfunc_hook(prog_type);
-       ret = btf_populate_kfunc_set(btf, hook, kset);
+       ret = btf_populate_kfunc_set(btf, hook, kset->set);
         btf_put(btf);
         return ret;
  }
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c

index bfeb9b9..c1e10d0 100644 (file)
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -652,12 +652,6 @@ static bool bpf_prog_kallsyms_candidate(const struct bpf_prog *fp)
         return fp->jited && !bpf_prog_was_classic(fp);
  }
  
-static bool bpf_prog_kallsyms_verify_off(const struct bpf_prog *fp)
-{
-       return list_empty(&fp->aux->ksym.lnode) ||
-              fp->aux->ksym.lnode.prev == LIST_POISON2;
-}
-
  void bpf_prog_kallsyms_add(struct bpf_prog *fp)
  {
         if (!bpf_prog_kallsyms_candidate(fp) ||
@@ -833,15 +827,6 @@ struct bpf_prog_pack {
  
  #define BPF_PROG_SIZE_TO_NBITS(size)   (round_up(size, BPF_PROG_CHUNK_SIZE) / BPF_PROG_CHUNK_SIZE)
  
-static size_t bpf_prog_pack_size = -1;
-static size_t bpf_prog_pack_mask = -1;
-
-static int bpf_prog_chunk_count(void)
-{
-       WARN_ON_ONCE(bpf_prog_pack_size == -1);
-       return bpf_prog_pack_size / BPF_PROG_CHUNK_SIZE;
-}
-
  static DEFINE_MUTEX(pack_mutex);
  static LIST_HEAD(pack_list);
  
@@ -849,55 +834,33 @@ static LIST_HEAD(pack_list);
   * CONFIG_MMU=n. Use PAGE_SIZE in these cases.
   */
  #ifdef PMD_SIZE
-#define BPF_HPAGE_SIZE PMD_SIZE
-#define BPF_HPAGE_MASK PMD_MASK
+#define BPF_PROG_PACK_SIZE (PMD_SIZE * num_possible_nodes())
  #else
-#define BPF_HPAGE_SIZE PAGE_SIZE
-#define BPF_HPAGE_MASK PAGE_MASK
+#define BPF_PROG_PACK_SIZE PAGE_SIZE
  #endif
  
-static size_t select_bpf_prog_pack_size(void)
-{
-       size_t size;
-       void *ptr;
-
-       size = BPF_HPAGE_SIZE * num_online_nodes();
-       ptr = module_alloc(size);
-
-       /* Test whether we can get huge pages. If not just use PAGE_SIZE
-        * packs.
-        */
-       if (!ptr || !is_vm_area_hugepages(ptr)) {
-               size = PAGE_SIZE;
-               bpf_prog_pack_mask = PAGE_MASK;
-       } else {
-               bpf_prog_pack_mask = BPF_HPAGE_MASK;
-       }
-
-       vfree(ptr);
-       return size;
-}
+#define BPF_PROG_CHUNK_COUNT (BPF_PROG_PACK_SIZE / BPF_PROG_CHUNK_SIZE)
  
  static struct bpf_prog_pack *alloc_new_pack(bpf_jit_fill_hole_t bpf_fill_ill_insns)
  {
         struct bpf_prog_pack *pack;
  
-       pack = kzalloc(struct_size(pack, bitmap, BITS_TO_LONGS(bpf_prog_chunk_count())),
+       pack = kzalloc(struct_size(pack, bitmap, BITS_TO_LONGS(BPF_PROG_CHUNK_COUNT)),
                        GFP_KERNEL);
         if (!pack)
                 return NULL;
-       pack->ptr = module_alloc(bpf_prog_pack_size);
+       pack->ptr = module_alloc(BPF_PROG_PACK_SIZE);
         if (!pack->ptr) {
                 kfree(pack);
                 return NULL;
         }
-       bpf_fill_ill_insns(pack->ptr, bpf_prog_pack_size);
-       bitmap_zero(pack->bitmap, bpf_prog_pack_size / BPF_PROG_CHUNK_SIZE);
+       bpf_fill_ill_insns(pack->ptr, BPF_PROG_PACK_SIZE);
+       bitmap_zero(pack->bitmap, BPF_PROG_PACK_SIZE / BPF_PROG_CHUNK_SIZE);
         list_add_tail(&pack->list, &pack_list);
  
         set_vm_flush_reset_perms(pack->ptr);
-       set_memory_ro((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
-       set_memory_x((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
+       set_memory_ro((unsigned long)pack->ptr, BPF_PROG_PACK_SIZE / PAGE_SIZE);
+       set_memory_x((unsigned long)pack->ptr, BPF_PROG_PACK_SIZE / PAGE_SIZE);
         return pack;
  }
  
@@ -909,10 +872,7 @@ static void *bpf_prog_pack_alloc(u32 size, bpf_jit_fill_hole_t bpf_fill_ill_insn
         void *ptr = NULL;
  
         mutex_lock(&pack_mutex);
-       if (bpf_prog_pack_size == -1)
-               bpf_prog_pack_size = select_bpf_prog_pack_size();
-
-       if (size > bpf_prog_pack_size) {
+       if (size > BPF_PROG_PACK_SIZE) {
                 size = round_up(size, PAGE_SIZE);
                 ptr = module_alloc(size);
                 if (ptr) {
@@ -924,9 +884,9 @@ static void *bpf_prog_pack_alloc(u32 size, bpf_jit_fill_hole_t bpf_fill_ill_insn
                 goto out;
         }
         list_for_each_entry(pack, &pack_list, list) {
-               pos = bitmap_find_next_zero_area(pack->bitmap, bpf_prog_chunk_count(), 0,
+               pos = bitmap_find_next_zero_area(pack->bitmap, BPF_PROG_CHUNK_COUNT, 0,
                                                  nbits, 0);
-               if (pos < bpf_prog_chunk_count())
+               if (pos < BPF_PROG_CHUNK_COUNT)
                         goto found_free_area;
         }
  
@@ -950,18 +910,15 @@ static void bpf_prog_pack_free(struct bpf_binary_header *hdr)
         struct bpf_prog_pack *pack = NULL, *tmp;
         unsigned int nbits;
         unsigned long pos;
-       void *pack_ptr;
  
         mutex_lock(&pack_mutex);
-       if (hdr->size > bpf_prog_pack_size) {
+       if (hdr->size > BPF_PROG_PACK_SIZE) {
                 module_memfree(hdr);
                 goto out;
         }
  
-       pack_ptr = (void *)((unsigned long)hdr & bpf_prog_pack_mask);
-
         list_for_each_entry(tmp, &pack_list, list) {
-               if (tmp->ptr == pack_ptr) {
+               if ((void *)hdr >= tmp->ptr && (tmp->ptr + BPF_PROG_PACK_SIZE) > (void *)hdr) {
                         pack = tmp;
                         break;
                 }
@@ -971,14 +928,14 @@ static void bpf_prog_pack_free(struct bpf_binary_header *hdr)
                 goto out;
  
         nbits = BPF_PROG_SIZE_TO_NBITS(hdr->size);
-       pos = ((unsigned long)hdr - (unsigned long)pack_ptr) >> BPF_PROG_CHUNK_SHIFT;
+       pos = ((unsigned long)hdr - (unsigned long)pack->ptr) >> BPF_PROG_CHUNK_SHIFT;
  
         WARN_ONCE(bpf_arch_text_invalidate(hdr, hdr->size),
                   "bpf_prog_pack bug: missing bpf_arch_text_invalidate?\n");
  
         bitmap_clear(pack->bitmap, pos, nbits);
-       if (bitmap_find_next_zero_area(pack->bitmap, bpf_prog_chunk_count(), 0,
-                                      bpf_prog_chunk_count(), 0) == 0) {
+       if (bitmap_find_next_zero_area(pack->bitmap, BPF_PROG_CHUNK_COUNT, 0,
+                                      BPF_PROG_CHUNK_COUNT, 0) == 0) {
                 list_del(&pack->list);
                 module_memfree(pack->ptr);
                 kfree(pack);
@@ -1155,7 +1112,6 @@ int bpf_jit_binary_pack_finalize(struct bpf_prog *prog,
                 bpf_prog_pack_free(ro_header);
                 return PTR_ERR(ptr);
         }
-       prog->aux->use_bpf_prog_pack = true;
         return 0;
  }
  
@@ -1179,17 +1135,23 @@ void bpf_jit_binary_pack_free(struct bpf_binary_header *ro_header,
         bpf_jit_uncharge_modmem(size);
  }
  
+struct bpf_binary_header *
+bpf_jit_binary_pack_hdr(const struct bpf_prog *fp)
+{
+       unsigned long real_start = (unsigned long)fp->bpf_func;
+       unsigned long addr;
+
+       addr = real_start & BPF_PROG_CHUNK_MASK;
+       return (void *)addr;
+}
+
  static inline struct bpf_binary_header *
  bpf_jit_binary_hdr(const struct bpf_prog *fp)
  {
         unsigned long real_start = (unsigned long)fp->bpf_func;
         unsigned long addr;
  
-       if (fp->aux->use_bpf_prog_pack)
-               addr = real_start & BPF_PROG_CHUNK_MASK;
-       else
-               addr = real_start & PAGE_MASK;
-
+       addr = real_start & PAGE_MASK;
         return (void *)addr;
  }
  
@@ -1202,11 +1164,7 @@ void __weak bpf_jit_free(struct bpf_prog *fp)
         if (fp->jited) {
                 struct bpf_binary_header *hdr = bpf_jit_binary_hdr(fp);
  
-               if (fp->aux->use_bpf_prog_pack)
-                       bpf_jit_binary_pack_free(hdr, NULL /* rw_buffer */);
-               else
-                       bpf_jit_binary_free(hdr);
-
+               bpf_jit_binary_free(hdr);
                 WARN_ON_ONCE(!bpf_prog_kallsyms_verify_off(fp));
         }
  
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c

index c286706..1400561 100644 (file)
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -845,7 +845,7 @@ static struct bpf_dtab_netdev *__dev_map_alloc_node(struct net *net,
         struct bpf_dtab_netdev *dev;
  
         dev = bpf_map_kmalloc_node(&dtab->map, sizeof(*dev),
-                                  GFP_ATOMIC | __GFP_NOWARN,
+                                  GFP_NOWAIT | __GFP_NOWARN,
                                    dtab->map.numa_node);
         if (!dev)
                 return ERR_PTR(-ENOMEM);
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c

index 17fb69c..da75784 100644 (file)
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -61,7 +61,7 @@
   *
   * As regular device interrupt handlers and soft interrupts are forced into
   * thread context, the existing code which does
- *   spin_lock*(); alloc(GPF_ATOMIC); spin_unlock*();
+ *   spin_lock*(); alloc(GFP_ATOMIC); spin_unlock*();
   * just works.
   *
   * In theory the BPF locks could be converted to regular spinlocks as well,
@@ -978,7 +978,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key,
                                 goto dec_count;
                         }
                 l_new = bpf_map_kmalloc_node(&htab->map, htab->elem_size,
-                                            GFP_ATOMIC | __GFP_NOWARN,
+                                            GFP_NOWAIT | __GFP_NOWARN,
                                              htab->map.numa_node);
                 if (!l_new) {
                         l_new = ERR_PTR(-ENOMEM);
@@ -996,7 +996,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key,
                 } else {
                         /* alloc_percpu zero-fills */
                         pptr = bpf_map_alloc_percpu(&htab->map, size, 8,
-                                                   GFP_ATOMIC | __GFP_NOWARN);
+                                                   GFP_NOWAIT | __GFP_NOWARN);
                         if (!pptr) {
                                 kfree(l_new);
                                 l_new = ERR_PTR(-ENOMEM);
diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c

index 8654fc9..49ef0ce 100644 (file)
--- a/kernel/bpf/local_storage.c
+++ b/kernel/bpf/local_storage.c
@@ -165,7 +165,7 @@ static int cgroup_storage_update_elem(struct bpf_map *map, void *key,
         }
  
         new = bpf_map_kmalloc_node(map, struct_size(new, data, map->value_size),
-                                  __GFP_ZERO | GFP_ATOMIC | __GFP_NOWARN,
+                                  __GFP_ZERO | GFP_NOWAIT | __GFP_NOWARN,
                                    map->numa_node);
         if (!new)
                 return -ENOMEM;
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c

index f0d05a3..d789e3b 100644 (file)
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -285,7 +285,7 @@ static struct lpm_trie_node *lpm_trie_node_alloc(const struct lpm_trie *trie,
         if (value)
                 size += trie->map.value_size;
  
-       node = bpf_map_kmalloc_node(&trie->map, size, GFP_ATOMIC | __GFP_NOWARN,
+       node = bpf_map_kmalloc_node(&trie->map, size, GFP_NOWAIT | __GFP_NOWARN,
                                     trie->map.numa_node);
         if (!node)
                 return NULL;
diff --git a/kernel/bpf/preload/iterators/Makefile b/kernel/bpf/preload/iterators/Makefile

index bfe24f8..6762b12 100644 (file)
--- a/kernel/bpf/preload/iterators/Makefile
+++ b/kernel/bpf/preload/iterators/Makefile
@@ -9,7 +9,7 @@ LLVM_STRIP ?= llvm-strip
  TOOLS_PATH := $(abspath ../../../../tools)
  BPFTOOL_SRC := $(TOOLS_PATH)/bpf/bpftool
  BPFTOOL_OUTPUT := $(abs_out)/bpftool
-DEFAULT_BPFTOOL := $(OUTPUT)/sbin/bpftool
+DEFAULT_BPFTOOL := $(BPFTOOL_OUTPUT)/bootstrap/bpftool
  BPFTOOL ?= $(DEFAULT_BPFTOOL)
  
  LIBBPF_SRC := $(TOOLS_PATH)/lib/bpf
@@ -61,9 +61,5 @@ $(BPFOBJ): $(wildcard $(LIBBPF_SRC)/*.[ch] $(LIBBPF_SRC)/Makefile) | $(LIBBPF_OU
                     OUTPUT=$(abspath $(dir $@))/ prefix=                       \
                     DESTDIR=$(LIBBPF_DESTDIR) $(abspath $@) install_headers
  
-$(DEFAULT_BPFTOOL): $(BPFOBJ) | $(BPFTOOL_OUTPUT)
-       $(Q)$(MAKE) $(submake_extras) -C $(BPFTOOL_SRC)                        \
-                   OUTPUT=$(BPFTOOL_OUTPUT)/                                  \
-                   LIBBPF_OUTPUT=$(LIBBPF_OUTPUT)/                            \
-                   LIBBPF_DESTDIR=$(LIBBPF_DESTDIR)/                          \
-                   prefix= DESTDIR=$(abs_out)/ install-bin
+$(DEFAULT_BPFTOOL): | $(BPFTOOL_OUTPUT)
+       $(Q)$(MAKE) $(submake_extras) -C $(BPFTOOL_SRC) OUTPUT=$(BPFTOOL_OUTPUT)/ bootstrap
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c

index ab688d8..83c7136 100644 (file)
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -419,35 +419,53 @@ void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock)
  #ifdef CONFIG_MEMCG_KMEM
  static void bpf_map_save_memcg(struct bpf_map *map)
  {
-       map->memcg = get_mem_cgroup_from_mm(current->mm);
+       /* Currently if a map is created by a process belonging to the root
+        * memory cgroup, get_obj_cgroup_from_current() will return NULL.
+        * So we have to check map->objcg for being NULL each time it's
+        * being used.
+        */
+       map->objcg = get_obj_cgroup_from_current();
  }
  
  static void bpf_map_release_memcg(struct bpf_map *map)
  {
-       mem_cgroup_put(map->memcg);
+       if (map->objcg)
+               obj_cgroup_put(map->objcg);
+}
+
+static struct mem_cgroup *bpf_map_get_memcg(const struct bpf_map *map)
+{
+       if (map->objcg)
+               return get_mem_cgroup_from_objcg(map->objcg);
+
+       return root_mem_cgroup;
  }
  
  void *bpf_map_kmalloc_node(const struct bpf_map *map, size_t size, gfp_t flags,
                            int node)
  {
-       struct mem_cgroup *old_memcg;
+       struct mem_cgroup *memcg, *old_memcg;
         void *ptr;
  
-       old_memcg = set_active_memcg(map->memcg);
+       memcg = bpf_map_get_memcg(map);
+       old_memcg = set_active_memcg(memcg);
         ptr = kmalloc_node(size, flags | __GFP_ACCOUNT, node);
         set_active_memcg(old_memcg);
+       mem_cgroup_put(memcg);
  
         return ptr;
  }
  
  void *bpf_map_kzalloc(const struct bpf_map *map, size_t size, gfp_t flags)
  {
-       struct mem_cgroup *old_memcg;
+       struct mem_cgroup *memcg, *old_memcg;
         void *ptr;
  
-       old_memcg = set_active_memcg(map->memcg);
+       memcg = bpf_map_get_memcg(map);
+       old_memcg = set_active_memcg(memcg);
         ptr = kzalloc(size, flags | __GFP_ACCOUNT);
         set_active_memcg(old_memcg);
+       mem_cgroup_put(memcg);
  
         return ptr;
  }
@@ -455,12 +473,14 @@ void *bpf_map_kzalloc(const struct bpf_map *map, size_t size, gfp_t flags)
  void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size,
                                     size_t align, gfp_t flags)
  {
-       struct mem_cgroup *old_memcg;
+       struct mem_cgroup *memcg, *old_memcg;
         void __percpu *ptr;
  
-       old_memcg = set_active_memcg(map->memcg);
+       memcg = bpf_map_get_memcg(map);
+       old_memcg = set_active_memcg(memcg);
         ptr = __alloc_percpu_gfp(size, align, flags | __GFP_ACCOUNT);
         set_active_memcg(old_memcg);
+       mem_cgroup_put(memcg);
  
         return ptr;
  }
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c

index 6cd2265..42e387a 100644 (file)
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -13,6 +13,7 @@
  #include <linux/static_call.h>
  #include <linux/bpf_verifier.h>
  #include <linux/bpf_lsm.h>
+#include <linux/delay.h>
  
  /* dummy _ops. The verifier will operate on target program's ops. */
  const struct bpf_verifier_ops bpf_extension_verifier_ops = {
@@ -29,6 +30,81 @@ static struct hlist_head trampoline_table[TRAMPOLINE_TABLE_SIZE];
  /* serializes access to trampoline_table */
  static DEFINE_MUTEX(trampoline_mutex);
  
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mutex);
+
+static int bpf_tramp_ftrace_ops_func(struct ftrace_ops *ops, enum ftrace_ops_cmd cmd)
+{
+       struct bpf_trampoline *tr = ops->private;
+       int ret = 0;
+
+       if (cmd == FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_SELF) {
+               /* This is called inside register_ftrace_direct_multi(), so
+                * tr->mutex is already locked.
+                */
+               lockdep_assert_held_once(&tr->mutex);
+
+               /* Instead of updating the trampoline here, we propagate
+                * -EAGAIN to register_ftrace_direct_multi(). Then we can
+                * retry register_ftrace_direct_multi() after updating the
+                * trampoline.
+                */
+               if ((tr->flags & BPF_TRAMP_F_CALL_ORIG) &&
+                   !(tr->flags & BPF_TRAMP_F_ORIG_STACK)) {
+                       if (WARN_ON_ONCE(tr->flags & BPF_TRAMP_F_SHARE_IPMODIFY))
+                               return -EBUSY;
+
+                       tr->flags |= BPF_TRAMP_F_SHARE_IPMODIFY;
+                       return -EAGAIN;
+               }
+
+               return 0;
+       }
+
+       /* The normal locking order is
+        *    tr->mutex => direct_mutex (ftrace.c) => ftrace_lock (ftrace.c)
+        *
+        * The following two commands are called from
+        *
+        *   prepare_direct_functions_for_ipmodify
+        *   cleanup_direct_functions_after_ipmodify
+        *
+        * In both cases, direct_mutex is already locked. Use
+        * mutex_trylock(&tr->mutex) to avoid deadlock in race condition
+        * (something else is making changes to this same trampoline).
+        */
+       if (!mutex_trylock(&tr->mutex)) {
+               /* sleep 1 ms to make sure whatever holding tr->mutex makes
+                * some progress.
+                */
+               msleep(1);
+               return -EAGAIN;
+       }
+
+       switch (cmd) {
+       case FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_PEER:
+               tr->flags |= BPF_TRAMP_F_SHARE_IPMODIFY;
+
+               if ((tr->flags & BPF_TRAMP_F_CALL_ORIG) &&
+                   !(tr->flags & BPF_TRAMP_F_ORIG_STACK))
+                       ret = bpf_trampoline_update(tr, false /* lock_direct_mutex */);
+               break;
+       case FTRACE_OPS_CMD_DISABLE_SHARE_IPMODIFY_PEER:
+               tr->flags &= ~BPF_TRAMP_F_SHARE_IPMODIFY;
+
+               if (tr->flags & BPF_TRAMP_F_ORIG_STACK)
+                       ret = bpf_trampoline_update(tr, false /* lock_direct_mutex */);
+               break;
+       default:
+               ret = -EINVAL;
+               break;
+       };
+
+       mutex_unlock(&tr->mutex);
+       return ret;
+}
+#endif
+
  bool bpf_prog_has_trampoline(const struct bpf_prog *prog)
  {
         enum bpf_attach_type eatype = prog->expected_attach_type;
@@ -89,6 +165,16 @@ static struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
         tr = kzalloc(sizeof(*tr), GFP_KERNEL);
         if (!tr)
                 goto out;
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+       tr->fops = kzalloc(sizeof(struct ftrace_ops), GFP_KERNEL);
+       if (!tr->fops) {
+               kfree(tr);
+               tr = NULL;
+               goto out;
+       }
+       tr->fops->private = tr;
+       tr->fops->ops_func = bpf_tramp_ftrace_ops_func;
+#endif
  
         tr->key = key;
         INIT_HLIST_NODE(&tr->hlist);
@@ -128,7 +214,7 @@ static int unregister_fentry(struct bpf_trampoline *tr, void *old_addr)
         int ret;
  
         if (tr->func.ftrace_managed)
-               ret = unregister_ftrace_direct((long)ip, (long)old_addr);
+               ret = unregister_ftrace_direct_multi(tr->fops, (long)old_addr);
         else
                 ret = bpf_arch_text_poke(ip, BPF_MOD_CALL, old_addr, NULL);
  
@@ -137,15 +223,20 @@ static int unregister_fentry(struct bpf_trampoline *tr, void *old_addr)
         return ret;
  }
  
-static int modify_fentry(struct bpf_trampoline *tr, void *old_addr, void *new_addr)
+static int modify_fentry(struct bpf_trampoline *tr, void *old_addr, void *new_addr,
+                        bool lock_direct_mutex)
  {
         void *ip = tr->func.addr;
         int ret;
  
-       if (tr->func.ftrace_managed)
-               ret = modify_ftrace_direct((long)ip, (long)old_addr, (long)new_addr);
-       else
+       if (tr->func.ftrace_managed) {
+               if (lock_direct_mutex)
+                       ret = modify_ftrace_direct_multi(tr->fops, (long)new_addr);
+               else
+                       ret = modify_ftrace_direct_multi_nolock(tr->fops, (long)new_addr);
+       } else {
                 ret = bpf_arch_text_poke(ip, BPF_MOD_CALL, old_addr, new_addr);
+       }
         return ret;
  }
  
@@ -163,10 +254,12 @@ static int register_fentry(struct bpf_trampoline *tr, void *new_addr)
         if (bpf_trampoline_module_get(tr))
                 return -ENOENT;
  
-       if (tr->func.ftrace_managed)
-               ret = register_ftrace_direct((long)ip, (long)new_addr);
-       else
+       if (tr->func.ftrace_managed) {
+               ftrace_set_filter_ip(tr->fops, (unsigned long)ip, 0, 0);
+               ret = register_ftrace_direct_multi(tr->fops, (long)new_addr);
+       } else {
                 ret = bpf_arch_text_poke(ip, BPF_MOD_CALL, NULL, new_addr);
+       }
  
         if (ret)
                 bpf_trampoline_module_put(tr);
@@ -332,11 +425,11 @@ out:
         return ERR_PTR(err);
  }
  
-static int bpf_trampoline_update(struct bpf_trampoline *tr)
+static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mutex)
  {
         struct bpf_tramp_image *im;
         struct bpf_tramp_links *tlinks;
-       u32 flags = BPF_TRAMP_F_RESTORE_REGS;
+       u32 orig_flags = tr->flags;
         bool ip_arg = false;
         int err, total;
  
@@ -358,15 +451,31 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr)
                 goto out;
         }
  
+       /* clear all bits except SHARE_IPMODIFY */
+       tr->flags &= BPF_TRAMP_F_SHARE_IPMODIFY;
+
         if (tlinks[BPF_TRAMP_FEXIT].nr_links ||
-           tlinks[BPF_TRAMP_MODIFY_RETURN].nr_links)
-               flags = BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME;
+           tlinks[BPF_TRAMP_MODIFY_RETURN].nr_links) {
+               /* NOTE: BPF_TRAMP_F_RESTORE_REGS and BPF_TRAMP_F_SKIP_FRAME
+                * should not be set together.
+                */
+               tr->flags |= BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME;
+       } else {
+               tr->flags |= BPF_TRAMP_F_RESTORE_REGS;
+       }
  
         if (ip_arg)
-               flags |= BPF_TRAMP_F_IP_ARG;
+               tr->flags |= BPF_TRAMP_F_IP_ARG;
+
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+again:
+       if ((tr->flags & BPF_TRAMP_F_SHARE_IPMODIFY) &&
+           (tr->flags & BPF_TRAMP_F_CALL_ORIG))
+               tr->flags |= BPF_TRAMP_F_ORIG_STACK;
+#endif
  
         err = arch_prepare_bpf_trampoline(im, im->image, im->image + PAGE_SIZE,
-                                         &tr->func.model, flags, tlinks,
+                                         &tr->func.model, tr->flags, tlinks,
                                           tr->func.addr);
         if (err < 0)
                 goto out;
@@ -375,17 +484,34 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr)
         WARN_ON(!tr->cur_image && tr->selector);
         if (tr->cur_image)
                 /* progs already running at this address */
-               err = modify_fentry(tr, tr->cur_image->image, im->image);
+               err = modify_fentry(tr, tr->cur_image->image, im->image, lock_direct_mutex);
         else
                 /* first time registering */
                 err = register_fentry(tr, im->image);
+
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+       if (err == -EAGAIN) {
+               /* -EAGAIN from bpf_tramp_ftrace_ops_func. Now
+                * BPF_TRAMP_F_SHARE_IPMODIFY is set, we can generate the
+                * trampoline again, and retry register.
+                */
+               /* reset fops->func and fops->trampoline for re-register */
+               tr->fops->func = NULL;
+               tr->fops->trampoline = 0;
+               goto again;
+       }
+#endif
         if (err)
                 goto out;
+
         if (tr->cur_image)
                 bpf_tramp_image_put(tr->cur_image);
         tr->cur_image = im;
         tr->selector++;
  out:
+       /* If any error happens, restore previous flags */
+       if (err)
+               tr->flags = orig_flags;
         kfree(tlinks);
         return err;
  }
@@ -451,7 +577,7 @@ static int __bpf_trampoline_link_prog(struct bpf_tramp_link *link, struct bpf_tr
  
         hlist_add_head(&link->tramp_hlist, &tr->progs_hlist[kind]);
         tr->progs_cnt[kind]++;
-       err = bpf_trampoline_update(tr);
+       err = bpf_trampoline_update(tr, true /* lock_direct_mutex */);
         if (err) {
                 hlist_del_init(&link->tramp_hlist);
                 tr->progs_cnt[kind]--;
@@ -484,7 +610,7 @@ static int __bpf_trampoline_unlink_prog(struct bpf_tramp_link *link, struct bpf_
         }
         hlist_del_init(&link->tramp_hlist);
         tr->progs_cnt[kind]--;
-       return bpf_trampoline_update(tr);
+       return bpf_trampoline_update(tr, true /* lock_direct_mutex */);
  }
  
  /* bpf_trampoline_unlink_prog() should never fail. */
@@ -498,7 +624,7 @@ int bpf_trampoline_unlink_prog(struct bpf_tramp_link *link, struct bpf_trampolin
         return err;
  }
  
-#if defined(CONFIG_BPF_JIT) && defined(CONFIG_BPF_SYSCALL)
+#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_LSM)
  static void bpf_shim_tramp_link_release(struct bpf_link *link)
  {
         struct bpf_shim_tramp_link *shim_link =
@@ -712,6 +838,7 @@ void bpf_trampoline_put(struct bpf_trampoline *tr)
          * multiple rcu callbacks.
          */
         hlist_del(&tr->hlist);
+       kfree(tr->fops);
         kfree(tr);
  out:
         mutex_unlock(&trampoline_mutex);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c

index 328cfab..096fdac 100644 (file)
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5533,17 +5533,6 @@ static bool arg_type_is_mem_size(enum bpf_arg_type type)
                type == ARG_CONST_SIZE_OR_ZERO;
  }
  
-static bool arg_type_is_alloc_size(enum bpf_arg_type type)
-{
-       return type == ARG_CONST_ALLOC_SIZE_OR_ZERO;
-}
-
-static bool arg_type_is_int_ptr(enum bpf_arg_type type)
-{
-       return type == ARG_PTR_TO_INT ||
-              type == ARG_PTR_TO_LONG;
-}
-
  static bool arg_type_is_release(enum bpf_arg_type type)
  {
         return type & OBJ_RELEASE;
@@ -5929,7 +5918,8 @@ skip_type_check:
                 meta->ref_obj_id = reg->ref_obj_id;
         }
  
-       if (arg_type == ARG_CONST_MAP_PTR) {
+       switch (base_type(arg_type)) {
+       case ARG_CONST_MAP_PTR:
                 /* bpf_map_xxx(map_ptr) call: remember that map_ptr */
                 if (meta->map_ptr) {
                         /* Use map_uid (which is unique id of inner map) to reject:
@@ -5954,7 +5944,8 @@ skip_type_check:
                 }
                 meta->map_ptr = reg->map_ptr;
                 meta->map_uid = reg->map_uid;
-       } else if (arg_type == ARG_PTR_TO_MAP_KEY) {
+               break;
+       case ARG_PTR_TO_MAP_KEY:
                 /* bpf_map_xxx(..., map_ptr, ..., key) call:
                  * check that [key, key + map->key_size) are within
                  * stack limits and initialized
@@ -5971,7 +5962,8 @@ skip_type_check:
                 err = check_helper_mem_access(env, regno,
                                               meta->map_ptr->key_size, false,
                                               NULL);
-       } else if (base_type(arg_type) == ARG_PTR_TO_MAP_VALUE) {
+               break;
+       case ARG_PTR_TO_MAP_VALUE:
                 if (type_may_be_null(arg_type) && register_is_null(reg))
                         return 0;
  
@@ -5987,14 +5979,16 @@ skip_type_check:
                 err = check_helper_mem_access(env, regno,
                                               meta->map_ptr->value_size, false,
                                               meta);
-       } else if (arg_type == ARG_PTR_TO_PERCPU_BTF_ID) {
+               break;
+       case ARG_PTR_TO_PERCPU_BTF_ID:
                 if (!reg->btf_id) {
                         verbose(env, "Helper has invalid btf_id in R%d\n", regno);
                         return -EACCES;
                 }
                 meta->ret_btf = reg->btf;
                 meta->ret_btf_id = reg->btf_id;
-       } else if (arg_type == ARG_PTR_TO_SPIN_LOCK) {
+               break;
+       case ARG_PTR_TO_SPIN_LOCK:
                 if (meta->func_id == BPF_FUNC_spin_lock) {
                         if (process_spin_lock(env, regno, true))
                                 return -EACCES;
@@ -6005,12 +5999,15 @@ skip_type_check:
                         verbose(env, "verifier internal error\n");
                         return -EFAULT;
                 }
-       } else if (arg_type == ARG_PTR_TO_TIMER) {
+               break;
+       case ARG_PTR_TO_TIMER:
                 if (process_timer_func(env, regno, meta))
                         return -EACCES;
-       } else if (arg_type == ARG_PTR_TO_FUNC) {
+               break;
+       case ARG_PTR_TO_FUNC:
                 meta->subprogno = reg->subprogno;
-       } else if (base_type(arg_type) == ARG_PTR_TO_MEM) {
+               break;
+       case ARG_PTR_TO_MEM:
                 /* The access to this pointer is only checked when we hit the
                  * next is_mem_size argument below.
                  */
@@ -6020,11 +6017,14 @@ skip_type_check:
                                                       fn->arg_size[arg], false,
                                                       meta);
                 }
-       } else if (arg_type_is_mem_size(arg_type)) {
-               bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO);
-
-               err = check_mem_size_reg(env, reg, regno, zero_size_allowed, meta);
-       } else if (arg_type_is_dynptr(arg_type)) {
+               break;
+       case ARG_CONST_SIZE:
+               err = check_mem_size_reg(env, reg, regno, false, meta);
+               break;
+       case ARG_CONST_SIZE_OR_ZERO:
+               err = check_mem_size_reg(env, reg, regno, true, meta);
+               break;
+       case ARG_PTR_TO_DYNPTR:
                 if (arg_type & MEM_UNINIT) {
                         if (!is_dynptr_reg_valid_uninit(env, reg)) {
                                 verbose(env, "Dynptr has to be an uninitialized dynptr\n");
@@ -6058,21 +6058,28 @@ skip_type_check:
                                 err_extra, arg + 1);
                         return -EINVAL;
                 }
-       } else if (arg_type_is_alloc_size(arg_type)) {
+               break;
+       case ARG_CONST_ALLOC_SIZE_OR_ZERO:
                 if (!tnum_is_const(reg->var_off)) {
                         verbose(env, "R%d is not a known constant'\n",
                                 regno);
                         return -EACCES;
                 }
                 meta->mem_size = reg->var_off.value;
-       } else if (arg_type_is_int_ptr(arg_type)) {
+               break;
+       case ARG_PTR_TO_INT:
+       case ARG_PTR_TO_LONG:
+       {
                 int size = int_ptr_type_to_size(arg_type);
  
                 err = check_helper_mem_access(env, regno, size, false, meta);
                 if (err)
                         return err;
                 err = check_ptr_alignment(env, reg, 0, size, true);
-       } else if (arg_type == ARG_PTR_TO_CONST_STR) {
+               break;
+       }
+       case ARG_PTR_TO_CONST_STR:
+       {
                 struct bpf_map *map = reg->map_ptr;
                 int map_off;
                 u64 map_addr;
@@ -6111,9 +6118,12 @@ skip_type_check:
                         verbose(env, "string is not zero-terminated\n");
                         return -EINVAL;
                 }
-       } else if (arg_type == ARG_PTR_TO_KPTR) {
+               break;
+       }
+       case ARG_PTR_TO_KPTR:
                 if (process_kptr_func(env, regno, meta))
                         return -EACCES;
+               break;
         }
  
         return err;
@@ -7160,6 +7170,7 @@ static void update_loop_inline_state(struct bpf_verifier_env *env, u32 subprogno
  static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
                              int *insn_idx_p)
  {
+       enum bpf_prog_type prog_type = resolve_prog_type(env->prog);
         const struct bpf_func_proto *fn = NULL;
         enum bpf_return_type ret_type;
         enum bpf_type_flag ret_flag;
@@ -7321,7 +7332,8 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
                 }
                 break;
         case BPF_FUNC_set_retval:
-               if (env->prog->expected_attach_type == BPF_LSM_CGROUP) {
+               if (prog_type == BPF_PROG_TYPE_LSM &&
+                   env->prog->expected_attach_type == BPF_LSM_CGROUP) {
                         if (!env->prog->aux->attach_func_proto->type) {
                                 /* Make sure programs that attach to void
                                  * hooks don't try to modify return value.
@@ -7550,6 +7562,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
         int err, insn_idx = *insn_idx_p;
         const struct btf_param *args;
         struct btf *desc_btf;
+       u32 *kfunc_flags;
         bool acq;
  
         /* skip for now, but return error when we find this in fixup_kfunc_call */
@@ -7565,18 +7578,16 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
         func_name = btf_name_by_offset(desc_btf, func->name_off);
         func_proto = btf_type_by_id(desc_btf, func->type);
  
-       if (!btf_kfunc_id_set_contains(desc_btf, resolve_prog_type(env->prog),
-                                     BTF_KFUNC_TYPE_CHECK, func_id)) {
+       kfunc_flags = btf_kfunc_id_set_contains(desc_btf, resolve_prog_type(env->prog), func_id);
+       if (!kfunc_flags) {
                 verbose(env, "calling kernel function %s is not allowed\n",
                         func_name);
                 return -EACCES;
         }
-
-       acq = btf_kfunc_id_set_contains(desc_btf, resolve_prog_type(env->prog),
-                                       BTF_KFUNC_TYPE_ACQUIRE, func_id);
+       acq = *kfunc_flags & KF_ACQUIRE;
  
         /* Check the arguments */
-       err = btf_check_kfunc_arg_match(env, desc_btf, func_id, regs);
+       err = btf_check_kfunc_arg_match(env, desc_btf, func_id, regs, *kfunc_flags);
         if (err < 0)
                 return err;
         /* In case of release function, we get register number of refcounted
@@ -7620,8 +7631,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
                 regs[BPF_REG_0].btf = desc_btf;
                 regs[BPF_REG_0].type = PTR_TO_BTF_ID;
                 regs[BPF_REG_0].btf_id = ptr_type_id;
-               if (btf_kfunc_id_set_contains(desc_btf, resolve_prog_type(env->prog),
-                                             BTF_KFUNC_TYPE_RET_NULL, func_id)) {
+               if (*kfunc_flags & KF_RET_NULL) {
                         regs[BPF_REG_0].type |= PTR_MAYBE_NULL;
                         /* For mark_ptr_or_null_reg, see 93c230e3f5bd6 */
                         regs[BPF_REG_0].id = ++env->id_gen;
@@ -12562,6 +12572,7 @@ static bool is_tracing_prog_type(enum bpf_prog_type type)
         case BPF_PROG_TYPE_TRACEPOINT:
         case BPF_PROG_TYPE_PERF_EVENT:
         case BPF_PROG_TYPE_RAW_TRACEPOINT:
+       case BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE:
                 return true;
         default:
                 return false;
@@ -13620,6 +13631,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
                 /* Below members will be freed only at prog->aux */
                 func[i]->aux->btf = prog->aux->btf;
                 func[i]->aux->func_info = prog->aux->func_info;
+               func[i]->aux->func_info_cnt = prog->aux->func_info_cnt;
                 func[i]->aux->poke_tab = prog->aux->poke_tab;
                 func[i]->aux->size_poke_tab = prog->aux->size_poke_tab;
  
@@ -13632,9 +13644,6 @@ static int jit_subprogs(struct bpf_verifier_env *env)
                                 poke->aux = func[i]->aux;
                 }
  
-               /* Use bpf_prog_F_tag to indicate functions in stack traces.
-                * Long term would need debug info to populate names
-                */
                 func[i]->aux->name[0] = 'F';
                 func[i]->aux->stack_depth = env->subprog_info[i].stack_depth;
                 func[i]->jit_requested = 1;
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c

index fbdf8d3..79a8583 100644 (file)
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -30,6 +30,7 @@
  #include <linux/module.h>
  #include <linux/kernel.h>
  #include <linux/bsearch.h>
+#include <linux/btf_ids.h>
  
  /*
   * These will be re-linked against their real values
@@ -799,6 +800,96 @@ static const struct seq_operations kallsyms_op = {
         .show = s_show
  };
  
+#ifdef CONFIG_BPF_SYSCALL
+
+struct bpf_iter__ksym {
+       __bpf_md_ptr(struct bpf_iter_meta *, meta);
+       __bpf_md_ptr(struct kallsym_iter *, ksym);
+};
+
+static int ksym_prog_seq_show(struct seq_file *m, bool in_stop)
+{
+       struct bpf_iter__ksym ctx;
+       struct bpf_iter_meta meta;
+       struct bpf_prog *prog;
+
+       meta.seq = m;
+       prog = bpf_iter_get_info(&meta, in_stop);
+       if (!prog)
+               return 0;
+
+       ctx.meta = &meta;
+       ctx.ksym = m ? m->private : NULL;
+       return bpf_iter_run_prog(prog, &ctx);
+}
+
+static int bpf_iter_ksym_seq_show(struct seq_file *m, void *p)
+{
+       return ksym_prog_seq_show(m, false);
+}
+
+static void bpf_iter_ksym_seq_stop(struct seq_file *m, void *p)
+{
+       if (!p)
+               (void) ksym_prog_seq_show(m, true);
+       else
+               s_stop(m, p);
+}
+
+static const struct seq_operations bpf_iter_ksym_ops = {
+       .start = s_start,
+       .next = s_next,
+       .stop = bpf_iter_ksym_seq_stop,
+       .show = bpf_iter_ksym_seq_show,
+};
+
+static int bpf_iter_ksym_init(void *priv_data, struct bpf_iter_aux_info *aux)
+{
+       struct kallsym_iter *iter = priv_data;
+
+       reset_iter(iter, 0);
+
+       /* cache here as in kallsyms_open() case; use current process
+        * credentials to tell BPF iterators if values should be shown.
+        */
+       iter->show_value = kallsyms_show_value(current_cred());
+
+       return 0;
+}
+
+DEFINE_BPF_ITER_FUNC(ksym, struct bpf_iter_meta *meta, struct kallsym_iter *ksym)
+
+static const struct bpf_iter_seq_info ksym_iter_seq_info = {
+       .seq_ops                = &bpf_iter_ksym_ops,
+       .init_seq_private       = bpf_iter_ksym_init,
+       .fini_seq_private       = NULL,
+       .seq_priv_size          = sizeof(struct kallsym_iter),
+};
+
+static struct bpf_iter_reg ksym_iter_reg_info = {
+       .target                 = "ksym",
+       .feature                = BPF_ITER_RESCHED,
+       .ctx_arg_info_size      = 1,
+       .ctx_arg_info           = {
+               { offsetof(struct bpf_iter__ksym, ksym),
+                 PTR_TO_BTF_ID_OR_NULL },
+       },
+       .seq_info               = &ksym_iter_seq_info,
+};
+
+BTF_ID_LIST(btf_ksym_iter_id)
+BTF_ID(struct, kallsym_iter)
+
+static int __init bpf_ksym_iter_register(void)
+{
+       ksym_iter_reg_info.ctx_arg_info[0].btf_id = *btf_ksym_iter_id;
+       return bpf_iter_reg_target(&ksym_iter_reg_info);
+}
+
+late_initcall(bpf_ksym_iter_register);
+
+#endif /* CONFIG_BPF_SYSCALL */
+
  static inline int kallsyms_for_perf(void)
  {
  #ifdef CONFIG_PERF_EVENTS
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c

index 601ccf1..bc921a3 100644 (file)
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1861,6 +1861,8 @@ static void ftrace_hash_rec_enable_modify(struct ftrace_ops *ops,
         ftrace_hash_rec_update_modify(ops, filter_hash, 1);
  }
  
+static bool ops_references_ip(struct ftrace_ops *ops, unsigned long ip);
+
  /*
   * Try to update IPMODIFY flag on each ftrace_rec. Return 0 if it is OK
   * or no-needed to update, -EBUSY if it detects a conflict of the flag
@@ -1869,6 +1871,13 @@ static void ftrace_hash_rec_enable_modify(struct ftrace_ops *ops,
   *  - If the hash is NULL, it hits all recs (if IPMODIFY is set, this is rejected)
   *  - If the hash is EMPTY_HASH, it hits nothing
   *  - Anything else hits the recs which match the hash entries.
+ *
+ * DIRECT ops does not have IPMODIFY flag, but we still need to check it
+ * against functions with FTRACE_FL_IPMODIFY. If there is any overlap, call
+ * ops_func(SHARE_IPMODIFY_SELF) to make sure current ops can share with
+ * IPMODIFY. If ops_func(SHARE_IPMODIFY_SELF) returns non-zero, propagate
+ * the return value to the caller and eventually to the owner of the DIRECT
+ * ops.
   */
  static int __ftrace_hash_update_ipmodify(struct ftrace_ops *ops,
                                          struct ftrace_hash *old_hash,
@@ -1877,17 +1886,26 @@ static int __ftrace_hash_update_ipmodify(struct ftrace_ops *ops,
         struct ftrace_page *pg;
         struct dyn_ftrace *rec, *end = NULL;
         int in_old, in_new;
+       bool is_ipmodify, is_direct;
  
         /* Only update if the ops has been registered */
         if (!(ops->flags & FTRACE_OPS_FL_ENABLED))
                 return 0;
  
-       if (!(ops->flags & FTRACE_OPS_FL_IPMODIFY))
+       is_ipmodify = ops->flags & FTRACE_OPS_FL_IPMODIFY;
+       is_direct = ops->flags & FTRACE_OPS_FL_DIRECT;
+
+       /* neither IPMODIFY nor DIRECT, skip */
+       if (!is_ipmodify && !is_direct)
+               return 0;
+
+       if (WARN_ON_ONCE(is_ipmodify && is_direct))
                 return 0;
  
         /*
-        * Since the IPMODIFY is a very address sensitive action, we do not
-        * allow ftrace_ops to set all functions to new hash.
+        * Since the IPMODIFY and DIRECT are very address sensitive
+        * actions, we do not allow ftrace_ops to set all functions to new
+        * hash.
          */
         if (!new_hash || !old_hash)
                 return -EINVAL;
@@ -1905,12 +1923,32 @@ static int __ftrace_hash_update_ipmodify(struct ftrace_ops *ops,
                         continue;
  
                 if (in_new) {
-                       /* New entries must ensure no others are using it */
-                       if (rec->flags & FTRACE_FL_IPMODIFY)
-                               goto rollback;
-                       rec->flags |= FTRACE_FL_IPMODIFY;
-               } else /* Removed entry */
+                       if (rec->flags & FTRACE_FL_IPMODIFY) {
+                               int ret;
+
+                               /* Cannot have two ipmodify on same rec */
+                               if (is_ipmodify)
+                                       goto rollback;
+
+                               FTRACE_WARN_ON(rec->flags & FTRACE_FL_DIRECT);
+
+                               /*
+                                * Another ops with IPMODIFY is already
+                                * attached. We are now attaching a direct
+                                * ops. Run SHARE_IPMODIFY_SELF, to check
+                                * whether sharing is supported.
+                                */
+                               if (!ops->ops_func)
+                                       return -EBUSY;
+                               ret = ops->ops_func(ops, FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_SELF);
+                               if (ret)
+                                       return ret;
+                       } else if (is_ipmodify) {
+                               rec->flags |= FTRACE_FL_IPMODIFY;
+                       }
+               } else if (is_ipmodify) {
                         rec->flags &= ~FTRACE_FL_IPMODIFY;
+               }
         } while_for_each_ftrace_rec();
  
         return 0;
@@ -2454,8 +2492,7 @@ static void call_direct_funcs(unsigned long ip, unsigned long pip,
  
  struct ftrace_ops direct_ops = {
         .func           = call_direct_funcs,
-       .flags          = FTRACE_OPS_FL_IPMODIFY
-                         | FTRACE_OPS_FL_DIRECT | FTRACE_OPS_FL_SAVE_REGS
+       .flags          = FTRACE_OPS_FL_DIRECT | FTRACE_OPS_FL_SAVE_REGS
                           | FTRACE_OPS_FL_PERMANENT,
         /*
          * By declaring the main trampoline as this trampoline
@@ -3072,14 +3109,14 @@ static inline int ops_traces_mod(struct ftrace_ops *ops)
  }
  
  /*
- * Check if the current ops references the record.
+ * Check if the current ops references the given ip.
   *
   * If the ops traces all functions, then it was already accounted for.
   * If the ops does not trace the current record function, skip it.
   * If the ops ignores the function via notrace filter, skip it.
   */
-static inline bool
-ops_references_rec(struct ftrace_ops *ops, struct dyn_ftrace *rec)
+static bool
+ops_references_ip(struct ftrace_ops *ops, unsigned long ip)
  {
         /* If ops isn't enabled, ignore it */
         if (!(ops->flags & FTRACE_OPS_FL_ENABLED))
@@ -3091,16 +3128,29 @@ ops_references_rec(struct ftrace_ops *ops, struct dyn_ftrace *rec)
  
         /* The function must be in the filter */
         if (!ftrace_hash_empty(ops->func_hash->filter_hash) &&
-           !__ftrace_lookup_ip(ops->func_hash->filter_hash, rec->ip))
+           !__ftrace_lookup_ip(ops->func_hash->filter_hash, ip))
                 return false;
  
         /* If in notrace hash, we ignore it too */
-       if (ftrace_lookup_ip(ops->func_hash->notrace_hash, rec->ip))
+       if (ftrace_lookup_ip(ops->func_hash->notrace_hash, ip))
                 return false;
  
         return true;
  }
  
+/*
+ * Check if the current ops references the record.
+ *
+ * If the ops traces all functions, then it was already accounted for.
+ * If the ops does not trace the current record function, skip it.
+ * If the ops ignores the function via notrace filter, skip it.
+ */
+static bool
+ops_references_rec(struct ftrace_ops *ops, struct dyn_ftrace *rec)
+{
+       return ops_references_ip(ops, rec->ip);
+}
+
  static int ftrace_update_code(struct module *mod, struct ftrace_page *new_pgs)
  {
         bool init_nop = ftrace_need_init_nop();
@@ -5215,6 +5265,8 @@ static struct ftrace_direct_func *ftrace_alloc_direct_func(unsigned long addr)
         return direct;
  }
  
+static int register_ftrace_function_nolock(struct ftrace_ops *ops);
+
  /**
   * register_ftrace_direct - Call a custom trampoline directly
   * @ip: The address of the nop at the beginning of a function
@@ -5286,7 +5338,7 @@ int register_ftrace_direct(unsigned long ip, unsigned long addr)
         ret = ftrace_set_filter_ip(&direct_ops, ip, 0, 0);
  
         if (!ret && !(direct_ops.flags & FTRACE_OPS_FL_ENABLED)) {
-               ret = register_ftrace_function(&direct_ops);
+               ret = register_ftrace_function_nolock(&direct_ops);
                 if (ret)
                         ftrace_set_filter_ip(&direct_ops, ip, 1, 0);
         }
@@ -5545,8 +5597,7 @@ int modify_ftrace_direct(unsigned long ip,
  }
  EXPORT_SYMBOL_GPL(modify_ftrace_direct);
  
-#define MULTI_FLAGS (FTRACE_OPS_FL_IPMODIFY | FTRACE_OPS_FL_DIRECT | \
-                    FTRACE_OPS_FL_SAVE_REGS)
+#define MULTI_FLAGS (FTRACE_OPS_FL_DIRECT | FTRACE_OPS_FL_SAVE_REGS)
  
  static int check_direct_multi(struct ftrace_ops *ops)
  {
@@ -5639,7 +5690,7 @@ int register_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr)
         ops->flags = MULTI_FLAGS;
         ops->trampoline = FTRACE_REGS_ADDR;
  
-       err = register_ftrace_function(ops);
+       err = register_ftrace_function_nolock(ops);
  
   out_remove:
         if (err)
@@ -5691,22 +5742,8 @@ int unregister_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr)
  }
  EXPORT_SYMBOL_GPL(unregister_ftrace_direct_multi);
  
-/**
- * modify_ftrace_direct_multi - Modify an existing direct 'multi' call
- * to call something else
- * @ops: The address of the struct ftrace_ops object
- * @addr: The address of the new trampoline to call at @ops functions
- *
- * This is used to unregister currently registered direct caller and
- * register new one @addr on functions registered in @ops object.
- *
- * Note there's window between ftrace_shutdown and ftrace_startup calls
- * where there will be no callbacks called.
- *
- * Returns: zero on success. Non zero on error, which includes:
- *  -EINVAL - The @ops object was not properly registered.
- */
-int modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr)
+static int
+__modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr)
  {
         struct ftrace_hash *hash;
         struct ftrace_func_entry *entry, *iter;
@@ -5717,20 +5754,15 @@ int modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr)
         int i, size;
         int err;
  
-       if (check_direct_multi(ops))
-               return -EINVAL;
-       if (!(ops->flags & FTRACE_OPS_FL_ENABLED))
-               return -EINVAL;
-
-       mutex_lock(&direct_mutex);
+       lockdep_assert_held_once(&direct_mutex);
  
         /* Enable the tmp_ops to have the same functions as the direct ops */
         ftrace_ops_init(&tmp_ops);
         tmp_ops.func_hash = ops->func_hash;
  
-       err = register_ftrace_function(&tmp_ops);
+       err = register_ftrace_function_nolock(&tmp_ops);
         if (err)
-               goto out_direct;
+               return err;
  
         /*
          * Now the ftrace_ops_list_func() is called to do the direct callers.
@@ -5754,7 +5786,64 @@ int modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr)
         /* Removing the tmp_ops will add the updated direct callers to the functions */
         unregister_ftrace_function(&tmp_ops);
  
- out_direct:
+       return err;
+}
+
+/**
+ * modify_ftrace_direct_multi_nolock - Modify an existing direct 'multi' call
+ * to call something else
+ * @ops: The address of the struct ftrace_ops object
+ * @addr: The address of the new trampoline to call at @ops functions
+ *
+ * This is used to unregister currently registered direct caller and
+ * register new one @addr on functions registered in @ops object.
+ *
+ * Note there's window between ftrace_shutdown and ftrace_startup calls
+ * where there will be no callbacks called.
+ *
+ * Caller should already have direct_mutex locked, so we don't lock
+ * direct_mutex here.
+ *
+ * Returns: zero on success. Non zero on error, which includes:
+ *  -EINVAL - The @ops object was not properly registered.
+ */
+int modify_ftrace_direct_multi_nolock(struct ftrace_ops *ops, unsigned long addr)
+{
+       if (check_direct_multi(ops))
+               return -EINVAL;
+       if (!(ops->flags & FTRACE_OPS_FL_ENABLED))
+               return -EINVAL;
+
+       return __modify_ftrace_direct_multi(ops, addr);
+}
+EXPORT_SYMBOL_GPL(modify_ftrace_direct_multi_nolock);
+
+/**
+ * modify_ftrace_direct_multi - Modify an existing direct 'multi' call
+ * to call something else
+ * @ops: The address of the struct ftrace_ops object
+ * @addr: The address of the new trampoline to call at @ops functions
+ *
+ * This is used to unregister currently registered direct caller and
+ * register new one @addr on functions registered in @ops object.
+ *
+ * Note there's window between ftrace_shutdown and ftrace_startup calls
+ * where there will be no callbacks called.
+ *
+ * Returns: zero on success. Non zero on error, which includes:
+ *  -EINVAL - The @ops object was not properly registered.
+ */
+int modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr)
+{
+       int err;
+
+       if (check_direct_multi(ops))
+               return -EINVAL;
+       if (!(ops->flags & FTRACE_OPS_FL_ENABLED))
+               return -EINVAL;
+
+       mutex_lock(&direct_mutex);
+       err = __modify_ftrace_direct_multi(ops, addr);
         mutex_unlock(&direct_mutex);
         return err;
  }
@@ -7965,6 +8054,143 @@ int ftrace_is_dead(void)
         return ftrace_disabled;
  }
  
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+/*
+ * When registering ftrace_ops with IPMODIFY, it is necessary to make sure
+ * it doesn't conflict with any direct ftrace_ops. If there is existing
+ * direct ftrace_ops on a kernel function being patched, call
+ * FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_PEER on it to enable sharing.
+ *
+ * @ops:     ftrace_ops being registered.
+ *
+ * Returns:
+ *         0 on success;
+ *         Negative on failure.
+ */
+static int prepare_direct_functions_for_ipmodify(struct ftrace_ops *ops)
+{
+       struct ftrace_func_entry *entry;
+       struct ftrace_hash *hash;
+       struct ftrace_ops *op;
+       int size, i, ret;
+
+       lockdep_assert_held_once(&direct_mutex);
+
+       if (!(ops->flags & FTRACE_OPS_FL_IPMODIFY))
+               return 0;
+
+       hash = ops->func_hash->filter_hash;
+       size = 1 << hash->size_bits;
+       for (i = 0; i < size; i++) {
+               hlist_for_each_entry(entry, &hash->buckets[i], hlist) {
+                       unsigned long ip = entry->ip;
+                       bool found_op = false;
+
+                       mutex_lock(&ftrace_lock);
+                       do_for_each_ftrace_op(op, ftrace_ops_list) {
+                               if (!(op->flags & FTRACE_OPS_FL_DIRECT))
+                                       continue;
+                               if (ops_references_ip(op, ip)) {
+                                       found_op = true;
+                                       break;
+                               }
+                       } while_for_each_ftrace_op(op);
+                       mutex_unlock(&ftrace_lock);
+
+                       if (found_op) {
+                               if (!op->ops_func)
+                                       return -EBUSY;
+
+                               ret = op->ops_func(op, FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_PEER);
+                               if (ret)
+                                       return ret;
+                       }
+               }
+       }
+
+       return 0;
+}
+
+/*
+ * Similar to prepare_direct_functions_for_ipmodify, clean up after ops
+ * with IPMODIFY is unregistered. The cleanup is optional for most DIRECT
+ * ops.
+ */
+static void cleanup_direct_functions_after_ipmodify(struct ftrace_ops *ops)
+{
+       struct ftrace_func_entry *entry;
+       struct ftrace_hash *hash;
+       struct ftrace_ops *op;
+       int size, i;
+
+       if (!(ops->flags & FTRACE_OPS_FL_IPMODIFY))
+               return;
+
+       mutex_lock(&direct_mutex);
+
+       hash = ops->func_hash->filter_hash;
+       size = 1 << hash->size_bits;
+       for (i = 0; i < size; i++) {
+               hlist_for_each_entry(entry, &hash->buckets[i], hlist) {
+                       unsigned long ip = entry->ip;
+                       bool found_op = false;
+
+                       mutex_lock(&ftrace_lock);
+                       do_for_each_ftrace_op(op, ftrace_ops_list) {
+                               if (!(op->flags & FTRACE_OPS_FL_DIRECT))
+                                       continue;
+                               if (ops_references_ip(op, ip)) {
+                                       found_op = true;
+                                       break;
+                               }
+                       } while_for_each_ftrace_op(op);
+                       mutex_unlock(&ftrace_lock);
+
+                       /* The cleanup is optional, ignore any errors */
+                       if (found_op && op->ops_func)
+                               op->ops_func(op, FTRACE_OPS_CMD_DISABLE_SHARE_IPMODIFY_PEER);
+               }
+       }
+       mutex_unlock(&direct_mutex);
+}
+
+#define lock_direct_mutex()    mutex_lock(&direct_mutex)
+#define unlock_direct_mutex()  mutex_unlock(&direct_mutex)
+
+#else  /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
+
+static int prepare_direct_functions_for_ipmodify(struct ftrace_ops *ops)
+{
+       return 0;
+}
+
+static void cleanup_direct_functions_after_ipmodify(struct ftrace_ops *ops)
+{
+}
+
+#define lock_direct_mutex()    do { } while (0)
+#define unlock_direct_mutex()  do { } while (0)
+
+#endif  /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
+
+/*
+ * Similar to register_ftrace_function, except we don't lock direct_mutex.
+ */
+static int register_ftrace_function_nolock(struct ftrace_ops *ops)
+{
+       int ret;
+
+       ftrace_ops_init(ops);
+
+       mutex_lock(&ftrace_lock);
+
+       ret = ftrace_startup(ops, 0);
+
+       mutex_unlock(&ftrace_lock);
+
+       return ret;
+}
+
  /**
   * register_ftrace_function - register a function for profiling
   * @ops:       ops structure that holds the function for profiling.
@@ -7980,14 +8206,15 @@ int register_ftrace_function(struct ftrace_ops *ops)
  {
         int ret;
  
-       ftrace_ops_init(ops);
-
-       mutex_lock(&ftrace_lock);
-
-       ret = ftrace_startup(ops, 0);
+       lock_direct_mutex();
+       ret = prepare_direct_functions_for_ipmodify(ops);
+       if (ret < 0)
+               goto out_unlock;
  
-       mutex_unlock(&ftrace_lock);
+       ret = register_ftrace_function_nolock(ops);
  
+out_unlock:
+       unlock_direct_mutex();
         return ret;
  }
  EXPORT_SYMBOL_GPL(register_ftrace_function);
@@ -8006,6 +8233,7 @@ int unregister_ftrace_function(struct ftrace_ops *ops)
         ret = ftrace_shutdown(ops, 0);
         mutex_unlock(&ftrace_lock);
  
+       cleanup_direct_functions_after_ipmodify(ops);
         return ret;
  }
  EXPORT_SYMBOL_GPL(unregister_ftrace_function);
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c

index 2ca96ac..cbc9cd5 100644 (file)
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -691,52 +691,35 @@ noinline void bpf_kfunc_call_test_mem_len_fail2(u64 *mem, int len)
  {
  }
  
+noinline void bpf_kfunc_call_test_ref(struct prog_test_ref_kfunc *p)
+{
+}
+
  __diag_pop();
  
  ALLOW_ERROR_INJECTION(bpf_modify_return_test, ERRNO);
  
-BTF_SET_START(test_sk_check_kfunc_ids)
-BTF_ID(func, bpf_kfunc_call_test1)
-BTF_ID(func, bpf_kfunc_call_test2)
-BTF_ID(func, bpf_kfunc_call_test3)
-BTF_ID(func, bpf_kfunc_call_test_acquire)
-BTF_ID(func, bpf_kfunc_call_memb_acquire)
-BTF_ID(func, bpf_kfunc_call_test_release)
-BTF_ID(func, bpf_kfunc_call_memb_release)
-BTF_ID(func, bpf_kfunc_call_memb1_release)
-BTF_ID(func, bpf_kfunc_call_test_kptr_get)
-BTF_ID(func, bpf_kfunc_call_test_pass_ctx)
-BTF_ID(func, bpf_kfunc_call_test_pass1)
-BTF_ID(func, bpf_kfunc_call_test_pass2)
-BTF_ID(func, bpf_kfunc_call_test_fail1)
-BTF_ID(func, bpf_kfunc_call_test_fail2)
-BTF_ID(func, bpf_kfunc_call_test_fail3)
-BTF_ID(func, bpf_kfunc_call_test_mem_len_pass1)
-BTF_ID(func, bpf_kfunc_call_test_mem_len_fail1)
-BTF_ID(func, bpf_kfunc_call_test_mem_len_fail2)
-BTF_SET_END(test_sk_check_kfunc_ids)
-
-BTF_SET_START(test_sk_acquire_kfunc_ids)
-BTF_ID(func, bpf_kfunc_call_test_acquire)
-BTF_ID(func, bpf_kfunc_call_memb_acquire)
-BTF_ID(func, bpf_kfunc_call_test_kptr_get)
-BTF_SET_END(test_sk_acquire_kfunc_ids)
-
-BTF_SET_START(test_sk_release_kfunc_ids)
-BTF_ID(func, bpf_kfunc_call_test_release)
-BTF_ID(func, bpf_kfunc_call_memb_release)
-BTF_ID(func, bpf_kfunc_call_memb1_release)
-BTF_SET_END(test_sk_release_kfunc_ids)
-
-BTF_SET_START(test_sk_ret_null_kfunc_ids)
-BTF_ID(func, bpf_kfunc_call_test_acquire)
-BTF_ID(func, bpf_kfunc_call_memb_acquire)
-BTF_ID(func, bpf_kfunc_call_test_kptr_get)
-BTF_SET_END(test_sk_ret_null_kfunc_ids)
-
-BTF_SET_START(test_sk_kptr_acquire_kfunc_ids)
-BTF_ID(func, bpf_kfunc_call_test_kptr_get)
-BTF_SET_END(test_sk_kptr_acquire_kfunc_ids)
+BTF_SET8_START(test_sk_check_kfunc_ids)
+BTF_ID_FLAGS(func, bpf_kfunc_call_test1)
+BTF_ID_FLAGS(func, bpf_kfunc_call_test2)
+BTF_ID_FLAGS(func, bpf_kfunc_call_test3)
+BTF_ID_FLAGS(func, bpf_kfunc_call_test_acquire, KF_ACQUIRE | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_kfunc_call_memb_acquire, KF_ACQUIRE | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_kfunc_call_test_release, KF_RELEASE)
+BTF_ID_FLAGS(func, bpf_kfunc_call_memb_release, KF_RELEASE)
+BTF_ID_FLAGS(func, bpf_kfunc_call_memb1_release, KF_RELEASE)
+BTF_ID_FLAGS(func, bpf_kfunc_call_test_kptr_get, KF_ACQUIRE | KF_RET_NULL | KF_KPTR_GET)
+BTF_ID_FLAGS(func, bpf_kfunc_call_test_pass_ctx)
+BTF_ID_FLAGS(func, bpf_kfunc_call_test_pass1)
+BTF_ID_FLAGS(func, bpf_kfunc_call_test_pass2)
+BTF_ID_FLAGS(func, bpf_kfunc_call_test_fail1)
+BTF_ID_FLAGS(func, bpf_kfunc_call_test_fail2)
+BTF_ID_FLAGS(func, bpf_kfunc_call_test_fail3)
+BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_pass1)
+BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_fail1)
+BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_fail2)
+BTF_ID_FLAGS(func, bpf_kfunc_call_test_ref, KF_TRUSTED_ARGS)
+BTF_SET8_END(test_sk_check_kfunc_ids)
  
  static void *bpf_test_init(const union bpf_attr *kattr, u32 user_size,
                            u32 size, u32 headroom, u32 tailroom)
@@ -955,6 +938,9 @@ static int convert___skb_to_skb(struct sk_buff *skb, struct __sk_buff *__skb)
  {
         struct qdisc_skb_cb *cb = (struct qdisc_skb_cb *)skb->cb;
  
+       if (!skb->len)
+               return -EINVAL;
+
         if (!__skb)
                 return 0;
  
@@ -1617,12 +1603,8 @@ out:
  }
  
  static const struct btf_kfunc_id_set bpf_prog_test_kfunc_set = {
-       .owner        = THIS_MODULE,
-       .check_set        = &test_sk_check_kfunc_ids,
-       .acquire_set      = &test_sk_acquire_kfunc_ids,
-       .release_set      = &test_sk_release_kfunc_ids,
-       .ret_null_set     = &test_sk_ret_null_kfunc_ids,
-       .kptr_acquire_set = &test_sk_kptr_acquire_kfunc_ids
+       .owner = THIS_MODULE,
+       .set   = &test_sk_check_kfunc_ids,
  };
  
  BTF_ID_LIST(bpf_prog_test_dtor_kfunc_ids)
diff --git a/net/core/dev.c b/net/core/dev.c

index d588fd0..716df64 100644 (file)
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4168,6 +4168,7 @@ int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
         bool again = false;
  
         skb_reset_mac_header(skb);
+       skb_assert_len(skb);
  
         if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP))
                 __skb_tstamp_tx(skb, NULL, NULL, skb->sk, SCM_TSTAMP_SCHED);
diff --git a/net/core/filter.c b/net/core/filter.c

index a0c6109..57c5e4c 100644 (file)
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -237,7 +237,7 @@ BPF_CALL_2(bpf_skb_load_helper_8_no_cache, const struct sk_buff *, skb,
  BPF_CALL_4(bpf_skb_load_helper_16, const struct sk_buff *, skb, const void *,
            data, int, headlen, int, offset)
  {
-       u16 tmp, *ptr;
+       __be16 tmp, *ptr;
         const int len = sizeof(tmp);
  
         if (offset >= 0) {
@@ -264,7 +264,7 @@ BPF_CALL_2(bpf_skb_load_helper_16_no_cache, const struct sk_buff *, skb,
  BPF_CALL_4(bpf_skb_load_helper_32, const struct sk_buff *, skb, const void *,
            data, int, headlen, int, offset)
  {
-       u32 tmp, *ptr;
+       __be32 tmp, *ptr;
         const int len = sizeof(tmp);
  
         if (likely(offset >= 0)) {
diff --git a/net/core/skmsg.c b/net/core/skmsg.c

index 266d3b7..8162789 100644 (file)
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -462,7 +462,7 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg,
  
                         if (copied == len)
                                 break;
-               } while (i != msg_rx->sg.end);
+               } while (!sg_is_last(sge));
  
                 if (unlikely(peek)) {
                         msg_rx = sk_psock_next_msg(psock, msg_rx);
@@ -472,7 +472,7 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg,
                 }
  
                 msg_rx->sg.start = i;
-               if (!sge->length && msg_rx->sg.start == msg_rx->sg.end) {
+               if (!sge->length && sg_is_last(sge)) {
                         msg_rx = sk_psock_dequeue_msg(psock);
                         kfree_sk_msg(msg_rx);
                 }
diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c

index 7a18163..85a9e50 100644 (file)
--- a/net/ipv4/bpf_tcp_ca.c
+++ b/net/ipv4/bpf_tcp_ca.c
@@ -197,17 +197,17 @@ bpf_tcp_ca_get_func_proto(enum bpf_func_id func_id,
         }
  }
  
-BTF_SET_START(bpf_tcp_ca_check_kfunc_ids)
-BTF_ID(func, tcp_reno_ssthresh)
-BTF_ID(func, tcp_reno_cong_avoid)
-BTF_ID(func, tcp_reno_undo_cwnd)
-BTF_ID(func, tcp_slow_start)
-BTF_ID(func, tcp_cong_avoid_ai)
-BTF_SET_END(bpf_tcp_ca_check_kfunc_ids)
+BTF_SET8_START(bpf_tcp_ca_check_kfunc_ids)
+BTF_ID_FLAGS(func, tcp_reno_ssthresh)
+BTF_ID_FLAGS(func, tcp_reno_cong_avoid)
+BTF_ID_FLAGS(func, tcp_reno_undo_cwnd)
+BTF_ID_FLAGS(func, tcp_slow_start)
+BTF_ID_FLAGS(func, tcp_cong_avoid_ai)
+BTF_SET8_END(bpf_tcp_ca_check_kfunc_ids)
  
  static const struct btf_kfunc_id_set bpf_tcp_ca_kfunc_set = {
-       .owner     = THIS_MODULE,
-       .check_set = &bpf_tcp_ca_check_kfunc_ids,
+       .owner = THIS_MODULE,
+       .set   = &bpf_tcp_ca_check_kfunc_ids,
  };
  
  static const struct bpf_verifier_ops bpf_tcp_ca_verifier_ops = {
diff --git a/net/ipv4/tcp_bbr.c b/net/ipv4/tcp_bbr.c

index 075e744..54eec33 100644 (file)
--- a/net/ipv4/tcp_bbr.c
+++ b/net/ipv4/tcp_bbr.c
@@ -1154,24 +1154,24 @@ static struct tcp_congestion_ops tcp_bbr_cong_ops __read_mostly = {
         .set_state      = bbr_set_state,
  };
  
-BTF_SET_START(tcp_bbr_check_kfunc_ids)
+BTF_SET8_START(tcp_bbr_check_kfunc_ids)
  #ifdef CONFIG_X86
  #ifdef CONFIG_DYNAMIC_FTRACE
-BTF_ID(func, bbr_init)
-BTF_ID(func, bbr_main)
-BTF_ID(func, bbr_sndbuf_expand)
-BTF_ID(func, bbr_undo_cwnd)
-BTF_ID(func, bbr_cwnd_event)
-BTF_ID(func, bbr_ssthresh)
-BTF_ID(func, bbr_min_tso_segs)
-BTF_ID(func, bbr_set_state)
+BTF_ID_FLAGS(func, bbr_init)
+BTF_ID_FLAGS(func, bbr_main)
+BTF_ID_FLAGS(func, bbr_sndbuf_expand)
+BTF_ID_FLAGS(func, bbr_undo_cwnd)
+BTF_ID_FLAGS(func, bbr_cwnd_event)
+BTF_ID_FLAGS(func, bbr_ssthresh)
+BTF_ID_FLAGS(func, bbr_min_tso_segs)
+BTF_ID_FLAGS(func, bbr_set_state)
  #endif
  #endif
-BTF_SET_END(tcp_bbr_check_kfunc_ids)
+BTF_SET8_END(tcp_bbr_check_kfunc_ids)
  
  static const struct btf_kfunc_id_set tcp_bbr_kfunc_set = {
-       .owner     = THIS_MODULE,
-       .check_set = &tcp_bbr_check_kfunc_ids,
+       .owner = THIS_MODULE,
+       .set   = &tcp_bbr_check_kfunc_ids,
  };
  
  static int __init bbr_register(void)
diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c

index 68178e7..768c10c 100644 (file)
--- a/net/ipv4/tcp_cubic.c
+++ b/net/ipv4/tcp_cubic.c
@@ -485,22 +485,22 @@ static struct tcp_congestion_ops cubictcp __read_mostly = {
         .name           = "cubic",
  };
  
-BTF_SET_START(tcp_cubic_check_kfunc_ids)
+BTF_SET8_START(tcp_cubic_check_kfunc_ids)
  #ifdef CONFIG_X86
  #ifdef CONFIG_DYNAMIC_FTRACE
-BTF_ID(func, cubictcp_init)
-BTF_ID(func, cubictcp_recalc_ssthresh)
-BTF_ID(func, cubictcp_cong_avoid)
-BTF_ID(func, cubictcp_state)
-BTF_ID(func, cubictcp_cwnd_event)
-BTF_ID(func, cubictcp_acked)
+BTF_ID_FLAGS(func, cubictcp_init)
+BTF_ID_FLAGS(func, cubictcp_recalc_ssthresh)
+BTF_ID_FLAGS(func, cubictcp_cong_avoid)
+BTF_ID_FLAGS(func, cubictcp_state)
+BTF_ID_FLAGS(func, cubictcp_cwnd_event)
+BTF_ID_FLAGS(func, cubictcp_acked)
  #endif
  #endif
-BTF_SET_END(tcp_cubic_check_kfunc_ids)
+BTF_SET8_END(tcp_cubic_check_kfunc_ids)
  
  static const struct btf_kfunc_id_set tcp_cubic_kfunc_set = {
-       .owner     = THIS_MODULE,
-       .check_set = &tcp_cubic_check_kfunc_ids,
+       .owner = THIS_MODULE,
+       .set   = &tcp_cubic_check_kfunc_ids,
  };
  
  static int __init cubictcp_register(void)
diff --git a/net/ipv4/tcp_dctcp.c b/net/ipv4/tcp_dctcp.c

index ab034a4..2a6c0dd 100644 (file)
--- a/net/ipv4/tcp_dctcp.c
+++ b/net/ipv4/tcp_dctcp.c
@@ -239,22 +239,22 @@ static struct tcp_congestion_ops dctcp_reno __read_mostly = {
         .name           = "dctcp-reno",
  };
  
-BTF_SET_START(tcp_dctcp_check_kfunc_ids)
+BTF_SET8_START(tcp_dctcp_check_kfunc_ids)
  #ifdef CONFIG_X86
  #ifdef CONFIG_DYNAMIC_FTRACE
-BTF_ID(func, dctcp_init)
-BTF_ID(func, dctcp_update_alpha)
-BTF_ID(func, dctcp_cwnd_event)
-BTF_ID(func, dctcp_ssthresh)
-BTF_ID(func, dctcp_cwnd_undo)
-BTF_ID(func, dctcp_state)
+BTF_ID_FLAGS(func, dctcp_init)
+BTF_ID_FLAGS(func, dctcp_update_alpha)
+BTF_ID_FLAGS(func, dctcp_cwnd_event)
+BTF_ID_FLAGS(func, dctcp_ssthresh)
+BTF_ID_FLAGS(func, dctcp_cwnd_undo)
+BTF_ID_FLAGS(func, dctcp_state)
  #endif
  #endif
-BTF_SET_END(tcp_dctcp_check_kfunc_ids)
+BTF_SET8_END(tcp_dctcp_check_kfunc_ids)
  
  static const struct btf_kfunc_id_set tcp_dctcp_kfunc_set = {
-       .owner     = THIS_MODULE,
-       .check_set = &tcp_dctcp_check_kfunc_ids,
+       .owner = THIS_MODULE,
+       .set   = &tcp_dctcp_check_kfunc_ids,
  };
  
  static int __init dctcp_register(void)
diff --git a/net/netfilter/nf_conntrack_bpf.c b/net/netfilter/nf_conntrack_bpf.c

index bc4d5cd..1cd87b2 100644 (file)
--- a/net/netfilter/nf_conntrack_bpf.c
+++ b/net/netfilter/nf_conntrack_bpf.c
@@ -55,57 +55,131 @@ enum {
         NF_BPF_CT_OPTS_SZ = 12,
  };
  
-static struct nf_conn *__bpf_nf_ct_lookup(struct net *net,
-                                         struct bpf_sock_tuple *bpf_tuple,
-                                         u32 tuple_len, u8 protonum,
-                                         s32 netns_id, u8 *dir)
+static int bpf_nf_ct_tuple_parse(struct bpf_sock_tuple *bpf_tuple,
+                                u32 tuple_len, u8 protonum, u8 dir,
+                                struct nf_conntrack_tuple *tuple)
  {
-       struct nf_conntrack_tuple_hash *hash;
-       struct nf_conntrack_tuple tuple;
-       struct nf_conn *ct;
+       union nf_inet_addr *src = dir ? &tuple->dst.u3 : &tuple->src.u3;
+       union nf_inet_addr *dst = dir ? &tuple->src.u3 : &tuple->dst.u3;
+       union nf_conntrack_man_proto *sport = dir ? (void *)&tuple->dst.u
+                                                 : &tuple->src.u;
+       union nf_conntrack_man_proto *dport = dir ? &tuple->src.u
+                                                 : (void *)&tuple->dst.u;
  
         if (unlikely(protonum != IPPROTO_TCP && protonum != IPPROTO_UDP))
-               return ERR_PTR(-EPROTO);
-       if (unlikely(netns_id < BPF_F_CURRENT_NETNS))
-               return ERR_PTR(-EINVAL);
+               return -EPROTO;
+
+       memset(tuple, 0, sizeof(*tuple));
  
-       memset(&tuple, 0, sizeof(tuple));
         switch (tuple_len) {
         case sizeof(bpf_tuple->ipv4):
-               tuple.src.l3num = AF_INET;
-               tuple.src.u3.ip = bpf_tuple->ipv4.saddr;
-               tuple.src.u.tcp.port = bpf_tuple->ipv4.sport;
-               tuple.dst.u3.ip = bpf_tuple->ipv4.daddr;
-               tuple.dst.u.tcp.port = bpf_tuple->ipv4.dport;
+               tuple->src.l3num = AF_INET;
+               src->ip = bpf_tuple->ipv4.saddr;
+               sport->tcp.port = bpf_tuple->ipv4.sport;
+               dst->ip = bpf_tuple->ipv4.daddr;
+               dport->tcp.port = bpf_tuple->ipv4.dport;
                 break;
         case sizeof(bpf_tuple->ipv6):
-               tuple.src.l3num = AF_INET6;
-               memcpy(tuple.src.u3.ip6, bpf_tuple->ipv6.saddr, sizeof(bpf_tuple->ipv6.saddr));
-               tuple.src.u.tcp.port = bpf_tuple->ipv6.sport;
-               memcpy(tuple.dst.u3.ip6, bpf_tuple->ipv6.daddr, sizeof(bpf_tuple->ipv6.daddr));
-               tuple.dst.u.tcp.port = bpf_tuple->ipv6.dport;
+               tuple->src.l3num = AF_INET6;
+               memcpy(src->ip6, bpf_tuple->ipv6.saddr, sizeof(bpf_tuple->ipv6.saddr));
+               sport->tcp.port = bpf_tuple->ipv6.sport;
+               memcpy(dst->ip6, bpf_tuple->ipv6.daddr, sizeof(bpf_tuple->ipv6.daddr));
+               dport->tcp.port = bpf_tuple->ipv6.dport;
                 break;
         default:
-               return ERR_PTR(-EAFNOSUPPORT);
+               return -EAFNOSUPPORT;
+       }
+       tuple->dst.protonum = protonum;
+       tuple->dst.dir = dir;
+
+       return 0;
+}
+
+static struct nf_conn *
+__bpf_nf_ct_alloc_entry(struct net *net, struct bpf_sock_tuple *bpf_tuple,
+                       u32 tuple_len, struct bpf_ct_opts *opts, u32 opts_len,
+                       u32 timeout)
+{
+       struct nf_conntrack_tuple otuple, rtuple;
+       struct nf_conn *ct;
+       int err;
+
+       if (!opts || !bpf_tuple || opts->reserved[0] || opts->reserved[1] ||
+           opts_len != NF_BPF_CT_OPTS_SZ)
+               return ERR_PTR(-EINVAL);
+
+       if (unlikely(opts->netns_id < BPF_F_CURRENT_NETNS))
+               return ERR_PTR(-EINVAL);
+
+       err = bpf_nf_ct_tuple_parse(bpf_tuple, tuple_len, opts->l4proto,
+                                   IP_CT_DIR_ORIGINAL, &otuple);
+       if (err < 0)
+               return ERR_PTR(err);
+
+       err = bpf_nf_ct_tuple_parse(bpf_tuple, tuple_len, opts->l4proto,
+                                   IP_CT_DIR_REPLY, &rtuple);
+       if (err < 0)
+               return ERR_PTR(err);
+
+       if (opts->netns_id >= 0) {
+               net = get_net_ns_by_id(net, opts->netns_id);
+               if (unlikely(!net))
+                       return ERR_PTR(-ENONET);
         }
  
-       tuple.dst.protonum = protonum;
+       ct = nf_conntrack_alloc(net, &nf_ct_zone_dflt, &otuple, &rtuple,
+                               GFP_ATOMIC);
+       if (IS_ERR(ct))
+               goto out;
+
+       memset(&ct->proto, 0, sizeof(ct->proto));
+       __nf_ct_set_timeout(ct, timeout * HZ);
+       ct->status |= IPS_CONFIRMED;
+
+out:
+       if (opts->netns_id >= 0)
+               put_net(net);
+
+       return ct;
+}
+
+static struct nf_conn *__bpf_nf_ct_lookup(struct net *net,
+                                         struct bpf_sock_tuple *bpf_tuple,
+                                         u32 tuple_len, struct bpf_ct_opts *opts,
+                                         u32 opts_len)
+{
+       struct nf_conntrack_tuple_hash *hash;
+       struct nf_conntrack_tuple tuple;
+       struct nf_conn *ct;
+       int err;
+
+       if (!opts || !bpf_tuple || opts->reserved[0] || opts->reserved[1] ||
+           opts_len != NF_BPF_CT_OPTS_SZ)
+               return ERR_PTR(-EINVAL);
+       if (unlikely(opts->l4proto != IPPROTO_TCP && opts->l4proto != IPPROTO_UDP))
+               return ERR_PTR(-EPROTO);
+       if (unlikely(opts->netns_id < BPF_F_CURRENT_NETNS))
+               return ERR_PTR(-EINVAL);
+
+       err = bpf_nf_ct_tuple_parse(bpf_tuple, tuple_len, opts->l4proto,
+                                   IP_CT_DIR_ORIGINAL, &tuple);
+       if (err < 0)
+               return ERR_PTR(err);
  
-       if (netns_id >= 0) {
-               net = get_net_ns_by_id(net, netns_id);
+       if (opts->netns_id >= 0) {
+               net = get_net_ns_by_id(net, opts->netns_id);
                 if (unlikely(!net))
                         return ERR_PTR(-ENONET);
         }
  
         hash = nf_conntrack_find_get(net, &nf_ct_zone_dflt, &tuple);
-       if (netns_id >= 0)
+       if (opts->netns_id >= 0)
                 put_net(net);
         if (!hash)
                 return ERR_PTR(-ENOENT);
  
         ct = nf_ct_tuplehash_to_ctrack(hash);
-       if (dir)
-               *dir = NF_CT_DIRECTION(hash);
+       opts->dir = NF_CT_DIRECTION(hash);
  
         return ct;
  }
@@ -114,6 +188,43 @@ __diag_push();
  __diag_ignore_all("-Wmissing-prototypes",
                   "Global functions as their definitions will be in nf_conntrack BTF");
  
+struct nf_conn___init {
+       struct nf_conn ct;
+};
+
+/* bpf_xdp_ct_alloc - Allocate a new CT entry
+ *
+ * Parameters:
+ * @xdp_ctx    - Pointer to ctx (xdp_md) in XDP program
+ *                 Cannot be NULL
+ * @bpf_tuple  - Pointer to memory representing the tuple to look up
+ *                 Cannot be NULL
+ * @tuple__sz  - Length of the tuple structure
+ *                 Must be one of sizeof(bpf_tuple->ipv4) or
+ *                 sizeof(bpf_tuple->ipv6)
+ * @opts       - Additional options for allocation (documented above)
+ *                 Cannot be NULL
+ * @opts__sz   - Length of the bpf_ct_opts structure
+ *                 Must be NF_BPF_CT_OPTS_SZ (12)
+ */
+struct nf_conn___init *
+bpf_xdp_ct_alloc(struct xdp_md *xdp_ctx, struct bpf_sock_tuple *bpf_tuple,
+                u32 tuple__sz, struct bpf_ct_opts *opts, u32 opts__sz)
+{
+       struct xdp_buff *ctx = (struct xdp_buff *)xdp_ctx;
+       struct nf_conn *nfct;
+
+       nfct = __bpf_nf_ct_alloc_entry(dev_net(ctx->rxq->dev), bpf_tuple, tuple__sz,
+                                      opts, opts__sz, 10);
+       if (IS_ERR(nfct)) {
+               if (opts)
+                       opts->error = PTR_ERR(nfct);
+               return NULL;
+       }
+
+       return (struct nf_conn___init *)nfct;
+}
+
  /* bpf_xdp_ct_lookup - Lookup CT entry for the given tuple, and acquire a
   *                    reference to it
   *
@@ -138,25 +249,50 @@ bpf_xdp_ct_lookup(struct xdp_md *xdp_ctx, struct bpf_sock_tuple *bpf_tuple,
         struct net *caller_net;
         struct nf_conn *nfct;
  
-       BUILD_BUG_ON(sizeof(struct bpf_ct_opts) != NF_BPF_CT_OPTS_SZ);
-
-       if (!opts)
-               return NULL;
-       if (!bpf_tuple || opts->reserved[0] || opts->reserved[1] ||
-           opts__sz != NF_BPF_CT_OPTS_SZ) {
-               opts->error = -EINVAL;
-               return NULL;
-       }
         caller_net = dev_net(ctx->rxq->dev);
-       nfct = __bpf_nf_ct_lookup(caller_net, bpf_tuple, tuple__sz, opts->l4proto,
-                                 opts->netns_id, &opts->dir);
+       nfct = __bpf_nf_ct_lookup(caller_net, bpf_tuple, tuple__sz, opts, opts__sz);
         if (IS_ERR(nfct)) {
-               opts->error = PTR_ERR(nfct);
+               if (opts)
+                       opts->error = PTR_ERR(nfct);
                 return NULL;
         }
         return nfct;
  }
  
+/* bpf_skb_ct_alloc - Allocate a new CT entry
+ *
+ * Parameters:
+ * @skb_ctx    - Pointer to ctx (__sk_buff) in TC program
+ *                 Cannot be NULL
+ * @bpf_tuple  - Pointer to memory representing the tuple to look up
+ *                 Cannot be NULL
+ * @tuple__sz  - Length of the tuple structure
+ *                 Must be one of sizeof(bpf_tuple->ipv4) or
+ *                 sizeof(bpf_tuple->ipv6)
+ * @opts       - Additional options for allocation (documented above)
+ *                 Cannot be NULL
+ * @opts__sz   - Length of the bpf_ct_opts structure
+ *                 Must be NF_BPF_CT_OPTS_SZ (12)
+ */
+struct nf_conn___init *
+bpf_skb_ct_alloc(struct __sk_buff *skb_ctx, struct bpf_sock_tuple *bpf_tuple,
+                u32 tuple__sz, struct bpf_ct_opts *opts, u32 opts__sz)
+{
+       struct sk_buff *skb = (struct sk_buff *)skb_ctx;
+       struct nf_conn *nfct;
+       struct net *net;
+
+       net = skb->dev ? dev_net(skb->dev) : sock_net(skb->sk);
+       nfct = __bpf_nf_ct_alloc_entry(net, bpf_tuple, tuple__sz, opts, opts__sz, 10);
+       if (IS_ERR(nfct)) {
+               if (opts)
+                       opts->error = PTR_ERR(nfct);
+               return NULL;
+       }
+
+       return (struct nf_conn___init *)nfct;
+}
+
  /* bpf_skb_ct_lookup - Lookup CT entry for the given tuple, and acquire a
   *                    reference to it
   *
@@ -181,20 +317,31 @@ bpf_skb_ct_lookup(struct __sk_buff *skb_ctx, struct bpf_sock_tuple *bpf_tuple,
         struct net *caller_net;
         struct nf_conn *nfct;
  
-       BUILD_BUG_ON(sizeof(struct bpf_ct_opts) != NF_BPF_CT_OPTS_SZ);
-
-       if (!opts)
-               return NULL;
-       if (!bpf_tuple || opts->reserved[0] || opts->reserved[1] ||
-           opts__sz != NF_BPF_CT_OPTS_SZ) {
-               opts->error = -EINVAL;
-               return NULL;
-       }
         caller_net = skb->dev ? dev_net(skb->dev) : sock_net(skb->sk);
-       nfct = __bpf_nf_ct_lookup(caller_net, bpf_tuple, tuple__sz, opts->l4proto,
-                                 opts->netns_id, &opts->dir);
+       nfct = __bpf_nf_ct_lookup(caller_net, bpf_tuple, tuple__sz, opts, opts__sz);
         if (IS_ERR(nfct)) {
-               opts->error = PTR_ERR(nfct);
+               if (opts)
+                       opts->error = PTR_ERR(nfct);
+               return NULL;
+       }
+       return nfct;
+}
+
+/* bpf_ct_insert_entry - Add the provided entry into a CT map
+ *
+ * This must be invoked for referenced PTR_TO_BTF_ID.
+ *
+ * @nfct        - Pointer to referenced nf_conn___init object, obtained
+ *                using bpf_xdp_ct_alloc or bpf_skb_ct_alloc.
+ */
+struct nf_conn *bpf_ct_insert_entry(struct nf_conn___init *nfct_i)
+{
+       struct nf_conn *nfct = (struct nf_conn *)nfct_i;
+       int err;
+
+       err = nf_conntrack_hash_check_insert(nfct);
+       if (err < 0) {
+               nf_conntrack_free(nfct);
                 return NULL;
         }
         return nfct;
@@ -217,50 +364,90 @@ void bpf_ct_release(struct nf_conn *nfct)
         nf_ct_put(nfct);
  }
  
+/* bpf_ct_set_timeout - Set timeout of allocated nf_conn
+ *
+ * Sets the default timeout of newly allocated nf_conn before insertion.
+ * This helper must be invoked for refcounted pointer to nf_conn___init.
+ *
+ * Parameters:
+ * @nfct        - Pointer to referenced nf_conn object, obtained using
+ *                 bpf_xdp_ct_alloc or bpf_skb_ct_alloc.
+ * @timeout      - Timeout in msecs.
+ */
+void bpf_ct_set_timeout(struct nf_conn___init *nfct, u32 timeout)
+{
+       __nf_ct_set_timeout((struct nf_conn *)nfct, msecs_to_jiffies(timeout));
+}
+
+/* bpf_ct_change_timeout - Change timeout of inserted nf_conn
+ *
+ * Change timeout associated of the inserted or looked up nf_conn.
+ * This helper must be invoked for refcounted pointer to nf_conn.
+ *
+ * Parameters:
+ * @nfct        - Pointer to referenced nf_conn object, obtained using
+ *                bpf_ct_insert_entry, bpf_xdp_ct_lookup, or bpf_skb_ct_lookup.
+ * @timeout      - New timeout in msecs.
+ */
+int bpf_ct_change_timeout(struct nf_conn *nfct, u32 timeout)
+{
+       return __nf_ct_change_timeout(nfct, msecs_to_jiffies(timeout));
+}
+
+/* bpf_ct_set_status - Set status field of allocated nf_conn
+ *
+ * Set the status field of the newly allocated nf_conn before insertion.
+ * This must be invoked for referenced PTR_TO_BTF_ID to nf_conn___init.
+ *
+ * Parameters:
+ * @nfct        - Pointer to referenced nf_conn object, obtained using
+ *                bpf_xdp_ct_alloc or bpf_skb_ct_alloc.
+ * @status       - New status value.
+ */
+int bpf_ct_set_status(const struct nf_conn___init *nfct, u32 status)
+{
+       return nf_ct_change_status_common((struct nf_conn *)nfct, status);
+}
+
+/* bpf_ct_change_status - Change status of inserted nf_conn
+ *
+ * Change the status field of the provided connection tracking entry.
+ * This must be invoked for referenced PTR_TO_BTF_ID to nf_conn.
+ *
+ * Parameters:
+ * @nfct        - Pointer to referenced nf_conn object, obtained using
+ *                bpf_ct_insert_entry, bpf_xdp_ct_lookup or bpf_skb_ct_lookup.
+ * @status       - New status value.
+ */
+int bpf_ct_change_status(struct nf_conn *nfct, u32 status)
+{
+       return nf_ct_change_status_common(nfct, status);
+}
+
  __diag_pop()
  
-BTF_SET_START(nf_ct_xdp_check_kfunc_ids)
-BTF_ID(func, bpf_xdp_ct_lookup)
-BTF_ID(func, bpf_ct_release)
-BTF_SET_END(nf_ct_xdp_check_kfunc_ids)
-
-BTF_SET_START(nf_ct_tc_check_kfunc_ids)
-BTF_ID(func, bpf_skb_ct_lookup)
-BTF_ID(func, bpf_ct_release)
-BTF_SET_END(nf_ct_tc_check_kfunc_ids)
-
-BTF_SET_START(nf_ct_acquire_kfunc_ids)
-BTF_ID(func, bpf_xdp_ct_lookup)
-BTF_ID(func, bpf_skb_ct_lookup)
-BTF_SET_END(nf_ct_acquire_kfunc_ids)
-
-BTF_SET_START(nf_ct_release_kfunc_ids)
-BTF_ID(func, bpf_ct_release)
-BTF_SET_END(nf_ct_release_kfunc_ids)
-
-/* Both sets are identical */
-#define nf_ct_ret_null_kfunc_ids nf_ct_acquire_kfunc_ids
-
-static const struct btf_kfunc_id_set nf_conntrack_xdp_kfunc_set = {
-       .owner        = THIS_MODULE,
-       .check_set    = &nf_ct_xdp_check_kfunc_ids,
-       .acquire_set  = &nf_ct_acquire_kfunc_ids,
-       .release_set  = &nf_ct_release_kfunc_ids,
-       .ret_null_set = &nf_ct_ret_null_kfunc_ids,
-};
+BTF_SET8_START(nf_ct_kfunc_set)
+BTF_ID_FLAGS(func, bpf_xdp_ct_alloc, KF_ACQUIRE | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_xdp_ct_lookup, KF_ACQUIRE | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_skb_ct_alloc, KF_ACQUIRE | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_skb_ct_lookup, KF_ACQUIRE | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_ct_insert_entry, KF_ACQUIRE | KF_RET_NULL | KF_RELEASE)
+BTF_ID_FLAGS(func, bpf_ct_release, KF_RELEASE)
+BTF_ID_FLAGS(func, bpf_ct_set_timeout, KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, bpf_ct_change_timeout, KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, bpf_ct_set_status, KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, bpf_ct_change_status, KF_TRUSTED_ARGS)
+BTF_SET8_END(nf_ct_kfunc_set)
  
-static const struct btf_kfunc_id_set nf_conntrack_tc_kfunc_set = {
-       .owner        = THIS_MODULE,
-       .check_set    = &nf_ct_tc_check_kfunc_ids,
-       .acquire_set  = &nf_ct_acquire_kfunc_ids,
-       .release_set  = &nf_ct_release_kfunc_ids,
-       .ret_null_set = &nf_ct_ret_null_kfunc_ids,
+static const struct btf_kfunc_id_set nf_conntrack_kfunc_set = {
+       .owner = THIS_MODULE,
+       .set   = &nf_ct_kfunc_set,
  };
  
  int register_nf_conntrack_bpf(void)
  {
         int ret;
  
-       ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &nf_conntrack_xdp_kfunc_set);
-       return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &nf_conntrack_tc_kfunc_set);
+       ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &nf_conntrack_kfunc_set);
+       return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &nf_conntrack_kfunc_set);
  }
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c

index 8c97d06..71c2f4f 100644 (file)
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -2806,3 +2806,65 @@ err_expect:
         free_percpu(net->ct.stat);
         return ret;
  }
+
+#if (IS_BUILTIN(CONFIG_NF_CONNTRACK) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) || \
+    (IS_MODULE(CONFIG_NF_CONNTRACK) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES) || \
+    IS_ENABLED(CONFIG_NF_CT_NETLINK))
+
+/* ctnetlink code shared by both ctnetlink and nf_conntrack_bpf */
+
+int __nf_ct_change_timeout(struct nf_conn *ct, u64 timeout)
+{
+       if (test_bit(IPS_FIXED_TIMEOUT_BIT, &ct->status))
+               return -EPERM;
+
+       __nf_ct_set_timeout(ct, timeout);
+
+       if (test_bit(IPS_DYING_BIT, &ct->status))
+               return -ETIME;
+
+       return 0;
+}
+EXPORT_SYMBOL_GPL(__nf_ct_change_timeout);
+
+void __nf_ct_change_status(struct nf_conn *ct, unsigned long on, unsigned long off)
+{
+       unsigned int bit;
+
+       /* Ignore these unchangable bits */
+       on &= ~IPS_UNCHANGEABLE_MASK;
+       off &= ~IPS_UNCHANGEABLE_MASK;
+
+       for (bit = 0; bit < __IPS_MAX_BIT; bit++) {
+               if (on & (1 << bit))
+                       set_bit(bit, &ct->status);
+               else if (off & (1 << bit))
+                       clear_bit(bit, &ct->status);
+       }
+}
+EXPORT_SYMBOL_GPL(__nf_ct_change_status);
+
+int nf_ct_change_status_common(struct nf_conn *ct, unsigned int status)
+{
+       unsigned long d;
+
+       d = ct->status ^ status;
+
+       if (d & (IPS_EXPECTED|IPS_CONFIRMED|IPS_DYING))
+               /* unchangeable */
+               return -EBUSY;
+
+       if (d & IPS_SEEN_REPLY && !(status & IPS_SEEN_REPLY))
+               /* SEEN_REPLY bit can only be set */
+               return -EBUSY;
+
+       if (d & IPS_ASSURED && !(status & IPS_ASSURED))
+               /* ASSURED bit can only be set */
+               return -EBUSY;
+
+       __nf_ct_change_status(ct, status, 0);
+       return 0;
+}
+EXPORT_SYMBOL_GPL(nf_ct_change_status_common);
+
+#endif
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c

index f8dd4ed..04169b5 100644 (file)
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -1891,45 +1891,10 @@ ctnetlink_parse_nat_setup(struct nf_conn *ct,
  }
  #endif
  
-static void
-__ctnetlink_change_status(struct nf_conn *ct, unsigned long on,
-                         unsigned long off)
-{
-       unsigned int bit;
-
-       /* Ignore these unchangable bits */
-       on &= ~IPS_UNCHANGEABLE_MASK;
-       off &= ~IPS_UNCHANGEABLE_MASK;
-
-       for (bit = 0; bit < __IPS_MAX_BIT; bit++) {
-               if (on & (1 << bit))
-                       set_bit(bit, &ct->status);
-               else if (off & (1 << bit))
-                       clear_bit(bit, &ct->status);
-       }
-}
-
  static int
  ctnetlink_change_status(struct nf_conn *ct, const struct nlattr * const cda[])
  {
-       unsigned long d;
-       unsigned int status = ntohl(nla_get_be32(cda[CTA_STATUS]));
-       d = ct->status ^ status;
-
-       if (d & (IPS_EXPECTED|IPS_CONFIRMED|IPS_DYING))
-               /* unchangeable */
-               return -EBUSY;
-
-       if (d & IPS_SEEN_REPLY && !(status & IPS_SEEN_REPLY))
-               /* SEEN_REPLY bit can only be set */
-               return -EBUSY;
-
-       if (d & IPS_ASSURED && !(status & IPS_ASSURED))
-               /* ASSURED bit can only be set */
-               return -EBUSY;
-
-       __ctnetlink_change_status(ct, status, 0);
-       return 0;
+       return nf_ct_change_status_common(ct, ntohl(nla_get_be32(cda[CTA_STATUS])));
  }
  
  static int
@@ -2024,16 +1989,7 @@ static int ctnetlink_change_helper(struct nf_conn *ct,
  static int ctnetlink_change_timeout(struct nf_conn *ct,
                                     const struct nlattr * const cda[])
  {
-       u64 timeout = (u64)ntohl(nla_get_be32(cda[CTA_TIMEOUT])) * HZ;
-
-       if (timeout > INT_MAX)
-               timeout = INT_MAX;
-       WRITE_ONCE(ct->timeout, nfct_time_stamp + (u32)timeout);
-
-       if (test_bit(IPS_DYING_BIT, &ct->status))
-               return -ETIME;
-
-       return 0;
+       return __nf_ct_change_timeout(ct, (u64)ntohl(nla_get_be32(cda[CTA_TIMEOUT])) * HZ);
  }
  
  #if defined(CONFIG_NF_CONNTRACK_MARK)
@@ -2293,9 +2249,7 @@ ctnetlink_create_conntrack(struct net *net,
                 goto err1;
  
         timeout = (u64)ntohl(nla_get_be32(cda[CTA_TIMEOUT])) * HZ;
-       if (timeout > INT_MAX)
-               timeout = INT_MAX;
-       ct->timeout = (u32)timeout + nfct_time_stamp;
+       __nf_ct_set_timeout(ct, timeout);
  
         rcu_read_lock();
         if (cda[CTA_HELP]) {
@@ -2837,7 +2791,7 @@ ctnetlink_update_status(struct nf_conn *ct, const struct nlattr * const cda[])
          * unchangeable bits but do not error out. Also user programs
          * are allowed to clear the bits that they are allowed to change.
          */
-       __ctnetlink_change_status(ct, status, ~status);
+       __nf_ct_change_status(ct, status, ~status);
         return 0;
  }
  
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c

index 0900238..5b4ce6b 100644 (file)
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -639,8 +639,11 @@ static int __xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len
         if (unlikely(need_wait))
                 return -EOPNOTSUPP;
  
-       if (sk_can_busy_loop(sk))
+       if (sk_can_busy_loop(sk)) {
+               if (xs->zc)
+                       __sk_mark_napi_id_once(sk, xsk_pool_get_napi_id(xs->pool));
                 sk_busy_loop(sk, 1); /* only support non-blocking sockets */
+       }
  
         if (xs->zc && xsk_no_wakeup(sk))
                 return 0;
diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile

index 5002a5b..727da3c 100644 (file)
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -282,12 +282,10 @@ $(LIBBPF): $(wildcard $(LIBBPF_SRC)/*.[ch] $(LIBBPF_SRC)/Makefile) | $(LIBBPF_OU
  
  BPFTOOLDIR := $(TOOLS_PATH)/bpf/bpftool
  BPFTOOL_OUTPUT := $(abspath $(BPF_SAMPLES_PATH))/bpftool
-BPFTOOL := $(BPFTOOL_OUTPUT)/bpftool
-$(BPFTOOL): $(LIBBPF) $(wildcard $(BPFTOOLDIR)/*.[ch] $(BPFTOOLDIR)/Makefile) | $(BPFTOOL_OUTPUT)
-           $(MAKE) -C $(BPFTOOLDIR) srctree=$(BPF_SAMPLES_PATH)/../../ \
-               OUTPUT=$(BPFTOOL_OUTPUT)/ \
-               LIBBPF_OUTPUT=$(LIBBPF_OUTPUT)/ \
-               LIBBPF_DESTDIR=$(LIBBPF_DESTDIR)/
+BPFTOOL := $(BPFTOOL_OUTPUT)/bootstrap/bpftool
+$(BPFTOOL): $(wildcard $(BPFTOOLDIR)/*.[ch] $(BPFTOOLDIR)/Makefile) | $(BPFTOOL_OUTPUT)
+       $(MAKE) -C $(BPFTOOLDIR) srctree=$(BPF_SAMPLES_PATH)/../../             \
+               OUTPUT=$(BPFTOOL_OUTPUT)/ bootstrap
  
  $(LIBBPF_OUTPUT) $(BPFTOOL_OUTPUT):
         $(call msg,MKDIR,$@)
diff --git a/samples/bpf/fds_example.c b/samples/bpf/fds_example.c

index 16dbf49..88a26f3 100644 (file)
--- a/samples/bpf/fds_example.c
+++ b/samples/bpf/fds_example.c
@@ -17,6 +17,7 @@
  #include <bpf/libbpf.h>
  #include "bpf_insn.h"
  #include "sock_example.h"
+#include "bpf_util.h"
  
  #define BPF_F_PIN      (1 << 0)
  #define BPF_F_GET      (1 << 1)
@@ -52,7 +53,7 @@ static int bpf_prog_create(const char *object)
                 BPF_MOV64_IMM(BPF_REG_0, 1),
                 BPF_EXIT_INSN(),
         };
-       size_t insns_cnt = sizeof(insns) / sizeof(struct bpf_insn);
+       size_t insns_cnt = ARRAY_SIZE(insns);
         struct bpf_object *obj;
         int err;
  
diff --git a/samples/bpf/sock_example.c b/samples/bpf/sock_example.c

index a88f695..5b66f24 100644 (file)
--- a/samples/bpf/sock_example.c
+++ b/samples/bpf/sock_example.c
@@ -29,6 +29,7 @@
  #include <bpf/bpf.h>
  #include "bpf_insn.h"
  #include "sock_example.h"
+#include "bpf_util.h"
  
  char bpf_log_buf[BPF_LOG_BUF_SIZE];
  
@@ -58,7 +59,7 @@ static int test_sock(void)
                 BPF_MOV64_IMM(BPF_REG_0, 0), /* r0 = 0 */
                 BPF_EXIT_INSN(),
         };
-       size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn);
+       size_t insns_cnt = ARRAY_SIZE(prog);
         LIBBPF_OPTS(bpf_prog_load_opts, opts,
                 .log_buf = bpf_log_buf,
                 .log_size = BPF_LOG_BUF_SIZE,
diff --git a/samples/bpf/test_cgrp2_attach.c b/samples/bpf/test_cgrp2_attach.c

index 6d90874..68ce694 100644 (file)
--- a/samples/bpf/test_cgrp2_attach.c
+++ b/samples/bpf/test_cgrp2_attach.c
@@ -31,6 +31,7 @@
  #include <bpf/bpf.h>
  
  #include "bpf_insn.h"
+#include "bpf_util.h"
  
  enum {
         MAP_KEY_PACKETS,
@@ -70,7 +71,7 @@ static int prog_load(int map_fd, int verdict)
                 BPF_MOV64_IMM(BPF_REG_0, verdict), /* r0 = verdict */
                 BPF_EXIT_INSN(),
         };
-       size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn);
+       size_t insns_cnt = ARRAY_SIZE(prog);
         LIBBPF_OPTS(bpf_prog_load_opts, opts,
                 .log_buf = bpf_log_buf,
                 .log_size = BPF_LOG_BUF_SIZE,
diff --git a/samples/bpf/test_lru_dist.c b/samples/bpf/test_lru_dist.c

index be98ccb..5efb917 100644 (file)
--- a/samples/bpf/test_lru_dist.c
+++ b/samples/bpf/test_lru_dist.c
@@ -523,7 +523,7 @@ int main(int argc, char **argv)
                 return -1;
         }
  
-       for (f = 0; f < sizeof(map_flags) / sizeof(*map_flags); f++) {
+       for (f = 0; f < ARRAY_SIZE(map_flags); f++) {
                 test_lru_loss0(BPF_MAP_TYPE_LRU_HASH, map_flags[f]);
                 test_lru_loss1(BPF_MAP_TYPE_LRU_HASH, map_flags[f]);
                 test_parallel_lru_loss(BPF_MAP_TYPE_LRU_HASH, map_flags[f],
diff --git a/samples/bpf/test_map_in_map_user.c b/samples/bpf/test_map_in_map_user.c

index e8b4cc1..652ec72 100644 (file)
--- a/samples/bpf/test_map_in_map_user.c
+++ b/samples/bpf/test_map_in_map_user.c
@@ -12,6 +12,8 @@
  #include <bpf/bpf.h>
  #include <bpf/libbpf.h>
  
+#include "bpf_util.h"
+
  static int map_fd[7];
  
  #define PORT_A         (map_fd[0])
@@ -28,7 +30,7 @@ static const char * const test_names[] = {
         "Hash of Hash",
  };
  
-#define NR_TESTS (sizeof(test_names) / sizeof(*test_names))
+#define NR_TESTS ARRAY_SIZE(test_names)
  
  static void check_map_id(int inner_map_fd, int map_in_map_fd, uint32_t key)
  {
diff --git a/samples/bpf/tracex5_user.c b/samples/bpf/tracex5_user.c

index e910dc2..9d7d79f 100644 (file)
--- a/samples/bpf/tracex5_user.c
+++ b/samples/bpf/tracex5_user.c
@@ -8,6 +8,7 @@
  #include <bpf/bpf.h>
  #include <bpf/libbpf.h>
  #include "trace_helpers.h"
+#include "bpf_util.h"
  
  #ifdef __mips__
  #define        MAX_ENTRIES  6000 /* MIPS n64 syscalls start at 5000 */
@@ -24,7 +25,7 @@ static void install_accept_all_seccomp(void)
                 BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
         };
         struct sock_fprog prog = {
-               .len = (unsigned short)(sizeof(filter)/sizeof(filter[0])),
+               .len = (unsigned short)ARRAY_SIZE(filter),
                 .filter = filter,
         };
         if (prctl(PR_SET_SECCOMP, 2, &prog))
diff --git a/samples/bpf/xdp_redirect_map.bpf.c b/samples/bpf/xdp_redirect_map.bpf.c

index 415bac1..8557c27 100644 (file)
--- a/samples/bpf/xdp_redirect_map.bpf.c
+++ b/samples/bpf/xdp_redirect_map.bpf.c
@@ -33,7 +33,7 @@ struct {
  } tx_port_native SEC(".maps");
  
  /* store egress interface mac address */
-const volatile char tx_mac_addr[ETH_ALEN];
+const volatile __u8 tx_mac_addr[ETH_ALEN];
  
  static __always_inline int xdp_redirect_map(struct xdp_md *ctx, void *redirect_map)
  {
@@ -73,6 +73,7 @@ int xdp_redirect_map_egress(struct xdp_md *ctx)
  {
         void *data_end = (void *)(long)ctx->data_end;
         void *data = (void *)(long)ctx->data;
+       u8 *mac_addr = (u8 *) tx_mac_addr;
         struct ethhdr *eth = data;
         u64 nh_off;
  
@@ -80,7 +81,8 @@ int xdp_redirect_map_egress(struct xdp_md *ctx)
         if (data + nh_off > data_end)
                 return XDP_DROP;
  
-       __builtin_memcpy(eth->h_source, (const char *)tx_mac_addr, ETH_ALEN);
+       barrier_var(mac_addr); /* prevent optimizing out memcpy */
+       __builtin_memcpy(eth->h_source, mac_addr, ETH_ALEN);
  
         return XDP_PASS;
  }
diff --git a/samples/bpf/xdp_redirect_map_user.c b/samples/bpf/xdp_redirect_map_user.c

index b6e4fc8..c889a13 100644 (file)
--- a/samples/bpf/xdp_redirect_map_user.c
+++ b/samples/bpf/xdp_redirect_map_user.c
@@ -40,6 +40,8 @@ static const struct option long_options[] = {
         {}
  };
  
+static int verbose = 0;
+
  int main(int argc, char **argv)
  {
         struct bpf_devmap_val devmap_val = {};
@@ -79,6 +81,7 @@ int main(int argc, char **argv)
                         break;
                 case 'v':
                         sample_switch_mode();
+                       verbose = 1;
                         break;
                 case 's':
                         mask |= SAMPLE_REDIRECT_MAP_CNT;
@@ -134,6 +137,12 @@ int main(int argc, char **argv)
                         ret = EXIT_FAIL;
                         goto end_destroy;
                 }
+               if (verbose)
+                       printf("Egress ifindex:%d using src MAC %02x:%02x:%02x:%02x:%02x:%02x\n",
+                              ifindex_out,
+                              skel->rodata->tx_mac_addr[0], skel->rodata->tx_mac_addr[1],
+                              skel->rodata->tx_mac_addr[2], skel->rodata->tx_mac_addr[3],
+                              skel->rodata->tx_mac_addr[4], skel->rodata->tx_mac_addr[5]);
         }
  
         skel->rodata->from_match[0] = ifindex_in;
diff --git a/scripts/bpf_doc.py b/scripts/bpf_doc.py

index a0ec321..dfb260d 100755 (executable)
--- a/scripts/bpf_doc.py
+++ b/scripts/bpf_doc.py
@@ -333,27 +333,7 @@ class PrinterRST(Printer):
  .. Copyright (C) All BPF authors and contributors from 2014 to present.
  .. See git log include/uapi/linux/bpf.h in kernel tree for details.
  .. 
-.. %%%LICENSE_START(VERBATIM)
-.. Permission is granted to make and distribute verbatim copies of this
-.. manual provided the copyright notice and this permission notice are
-.. preserved on all copies.
-.. 
-.. Permission is granted to copy and distribute modified versions of this
-.. manual under the conditions for verbatim copying, provided that the
-.. entire resulting derived work is distributed under the terms of a
-.. permission notice identical to this one.
-.. 
-.. Since the Linux kernel and libraries are constantly changing, this
-.. manual page may be incorrect or out-of-date.  The author(s) assume no
-.. responsibility for errors or omissions, or for damages resulting from
-.. the use of the information contained herein.  The author(s) may not
-.. have taken the same level of care in the production of this manual,
-.. which is licensed free of charge, as they might when working
-.. professionally.
-.. 
-.. Formatted or processed versions of this manual, if unaccompanied by
-.. the source, must acknowledge the copyright and authors of this work.
-.. %%%LICENSE_END
+.. SPDX-License-Identifier:  Linux-man-pages-copyleft
  .. 
  .. Please do not edit this file. It was generated from the documentation
  .. located in file include/uapi/linux/bpf.h of the Linux kernel sources
diff --git a/tools/bpf/resolve_btfids/main.c b/tools/bpf/resolve_btfids/main.c

index 5d26f3c..80cd784 100644 (file)
--- a/tools/bpf/resolve_btfids/main.c
+++ b/tools/bpf/resolve_btfids/main.c
@@ -45,6 +45,19 @@
   *             .zero 4
   *             __BTF_ID__func__vfs_fallocate__4:
   *             .zero 4
+ *
+ *   set8    - store symbol size into first 4 bytes and sort following
+ *             ID list
+ *
+ *             __BTF_ID__set8__list:
+ *             .zero 8
+ *             list:
+ *             __BTF_ID__func__vfs_getattr__3:
+ *             .zero 4
+ *            .word (1 << 0) | (1 << 2)
+ *             __BTF_ID__func__vfs_fallocate__5:
+ *             .zero 4
+ *            .word (1 << 3) | (1 << 1) | (1 << 2)
   */
  
  #define  _GNU_SOURCE
@@ -72,6 +85,7 @@
  #define BTF_TYPEDEF    "typedef"
  #define BTF_FUNC       "func"
  #define BTF_SET                "set"
+#define BTF_SET8       "set8"
  
  #define ADDR_CNT       100
  
@@ -84,6 +98,7 @@ struct btf_id {
         };
         int              addr_cnt;
         bool             is_set;
+       bool             is_set8;
         Elf64_Addr       addr[ADDR_CNT];
  };
  
@@ -231,14 +246,14 @@ static char *get_id(const char *prefix_end)
         return id;
  }
  
-static struct btf_id *add_set(struct object *obj, char *name)
+static struct btf_id *add_set(struct object *obj, char *name, bool is_set8)
  {
         /*
          * __BTF_ID__set__name
          * name =    ^
          * id   =         ^
          */
-       char *id = name + sizeof(BTF_SET "__") - 1;
+       char *id = name + (is_set8 ? sizeof(BTF_SET8 "__") : sizeof(BTF_SET "__")) - 1;
         int len = strlen(name);
  
         if (id >= name + len) {
@@ -444,9 +459,21 @@ static int symbols_collect(struct object *obj)
                 } else if (!strncmp(prefix, BTF_FUNC, sizeof(BTF_FUNC) - 1)) {
                         obj->nr_funcs++;
                         id = add_symbol(&obj->funcs, prefix, sizeof(BTF_FUNC) - 1);
+               /* set8 */
+               } else if (!strncmp(prefix, BTF_SET8, sizeof(BTF_SET8) - 1)) {
+                       id = add_set(obj, prefix, true);
+                       /*
+                        * SET8 objects store list's count, which is encoded
+                        * in symbol's size, together with 'cnt' field hence
+                        * that - 1.
+                        */
+                       if (id) {
+                               id->cnt = sym.st_size / sizeof(uint64_t) - 1;
+                               id->is_set8 = true;
+                       }
                 /* set */
                 } else if (!strncmp(prefix, BTF_SET, sizeof(BTF_SET) - 1)) {
-                       id = add_set(obj, prefix);
+                       id = add_set(obj, prefix, false);
                         /*
                          * SET objects store list's count, which is encoded
                          * in symbol's size, together with 'cnt' field hence
@@ -571,7 +598,8 @@ static int id_patch(struct object *obj, struct btf_id *id)
         int *ptr = data->d_buf;
         int i;
  
-       if (!id->id && !id->is_set)
+       /* For set, set8, id->id may be 0 */
+       if (!id->id && !id->is_set && !id->is_set8)
                 pr_err("WARN: resolve_btfids: unresolved symbol %s\n", id->name);
  
         for (i = 0; i < id->addr_cnt; i++) {
@@ -643,13 +671,13 @@ static int sets_patch(struct object *obj)
                 }
  
                 idx = idx / sizeof(int);
-               base = &ptr[idx] + 1;
+               base = &ptr[idx] + (id->is_set8 ? 2 : 1);
                 cnt = ptr[idx];
  
                 pr_debug("sorting  addr %5lu: cnt %6d [%s]\n",
                          (idx + 1) * sizeof(int), cnt, id->name);
  
-               qsort(base, cnt, sizeof(int), cmp_id);
+               qsort(base, cnt, id->is_set8 ? sizeof(uint64_t) : sizeof(int), cmp_id);
  
                 next = rb_next(next);
         }
diff --git a/tools/bpf/runqslower/Makefile b/tools/bpf/runqslower/Makefile

index da6de16..8b3d87b 100644 (file)
--- a/tools/bpf/runqslower/Makefile
+++ b/tools/bpf/runqslower/Makefile
@@ -4,7 +4,7 @@ include ../../scripts/Makefile.include
  OUTPUT ?= $(abspath .output)/
  
  BPFTOOL_OUTPUT := $(OUTPUT)bpftool/
-DEFAULT_BPFTOOL := $(BPFTOOL_OUTPUT)bpftool
+DEFAULT_BPFTOOL := $(BPFTOOL_OUTPUT)bootstrap/bpftool
  BPFTOOL ?= $(DEFAULT_BPFTOOL)
  LIBBPF_SRC := $(abspath ../../lib/bpf)
  BPFOBJ_OUTPUT := $(OUTPUT)libbpf/
@@ -86,6 +86,5 @@ $(BPFOBJ): $(wildcard $(LIBBPF_SRC)/*.[ch] $(LIBBPF_SRC)/Makefile) | $(BPFOBJ_OU
         $(Q)$(MAKE) $(submake_extras) -C $(LIBBPF_SRC) OUTPUT=$(BPFOBJ_OUTPUT) \
                     DESTDIR=$(BPFOBJ_OUTPUT) prefix= $(abspath $@) install_headers
  
-$(DEFAULT_BPFTOOL): $(BPFOBJ) | $(BPFTOOL_OUTPUT)
-       $(Q)$(MAKE) $(submake_extras) -C ../bpftool OUTPUT=$(BPFTOOL_OUTPUT)   \
-                   ARCH= CROSS_COMPILE= CC=$(HOSTCC) LD=$(HOSTLD)
+$(DEFAULT_BPFTOOL): | $(BPFTOOL_OUTPUT)
+       $(Q)$(MAKE) $(submake_extras) -C ../bpftool OUTPUT=$(BPFTOOL_OUTPUT) bootstrap
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h

index 3dd13fe..59a217c 100644 (file)
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2361,7 +2361,8 @@ union bpf_attr {
   *             Pull in non-linear data in case the *skb* is non-linear and not
   *             all of *len* are part of the linear section. Make *len* bytes
   *             from *skb* readable and writable. If a zero value is passed for
- *             *len*, then the whole length of the *skb* is pulled.
+ *             *len*, then all bytes in the linear part of *skb* will be made
+ *             readable and writable.
   *
   *             This helper is only needed for reading and writing with direct
   *             packet access.
diff --git a/tools/lib/bpf/bpf_tracing.h b/tools/lib/bpf/bpf_tracing.h

index 11f9096..f4d3e1e 100644 (file)
--- a/tools/lib/bpf/bpf_tracing.h
+++ b/tools/lib/bpf/bpf_tracing.h
@@ -2,6 +2,8 @@
  #ifndef __BPF_TRACING_H__
  #define __BPF_TRACING_H__
  
+#include <bpf/bpf_helpers.h>
+
  /* Scan the ARCH passed in from ARCH env variable (see Makefile) */
  #if defined(__TARGET_ARCH_x86)
         #define bpf_target_x86
@@ -140,7 +142,7 @@ struct pt_regs___s390 {
  #define __PT_RC_REG gprs[2]
  #define __PT_SP_REG gprs[15]
  #define __PT_IP_REG psw.addr
-#define PT_REGS_PARM1_SYSCALL(x) ({ _Pragma("GCC error \"use PT_REGS_PARM1_CORE_SYSCALL() instead\""); 0l; })
+#define PT_REGS_PARM1_SYSCALL(x) PT_REGS_PARM1_CORE_SYSCALL(x)
  #define PT_REGS_PARM1_CORE_SYSCALL(x) BPF_CORE_READ((const struct pt_regs___s390 *)(x), orig_gpr2)
  
  #elif defined(bpf_target_arm)
@@ -174,7 +176,7 @@ struct pt_regs___arm64 {
  #define __PT_RC_REG regs[0]
  #define __PT_SP_REG sp
  #define __PT_IP_REG pc
-#define PT_REGS_PARM1_SYSCALL(x) ({ _Pragma("GCC error \"use PT_REGS_PARM1_CORE_SYSCALL() instead\""); 0l; })
+#define PT_REGS_PARM1_SYSCALL(x) PT_REGS_PARM1_CORE_SYSCALL(x)
  #define PT_REGS_PARM1_CORE_SYSCALL(x) BPF_CORE_READ((const struct pt_regs___arm64 *)(x), orig_x0)
  
  #elif defined(bpf_target_mips)
@@ -493,39 +495,62 @@ typeof(name(0)) name(struct pt_regs *ctx)                             \
  }                                                                          \
  static __always_inline typeof(name(0)) ____##name(struct pt_regs *ctx, ##args)
  
+/* If kernel has CONFIG_ARCH_HAS_SYSCALL_WRAPPER, read pt_regs directly */
  #define ___bpf_syscall_args0()           ctx
-#define ___bpf_syscall_args1(x)          ___bpf_syscall_args0(), (void *)PT_REGS_PARM1_CORE_SYSCALL(regs)
-#define ___bpf_syscall_args2(x, args...) ___bpf_syscall_args1(args), (void *)PT_REGS_PARM2_CORE_SYSCALL(regs)
-#define ___bpf_syscall_args3(x, args...) ___bpf_syscall_args2(args), (void *)PT_REGS_PARM3_CORE_SYSCALL(regs)
-#define ___bpf_syscall_args4(x, args...) ___bpf_syscall_args3(args), (void *)PT_REGS_PARM4_CORE_SYSCALL(regs)
-#define ___bpf_syscall_args5(x, args...) ___bpf_syscall_args4(args), (void *)PT_REGS_PARM5_CORE_SYSCALL(regs)
+#define ___bpf_syscall_args1(x)          ___bpf_syscall_args0(), (void *)PT_REGS_PARM1_SYSCALL(regs)
+#define ___bpf_syscall_args2(x, args...) ___bpf_syscall_args1(args), (void *)PT_REGS_PARM2_SYSCALL(regs)
+#define ___bpf_syscall_args3(x, args...) ___bpf_syscall_args2(args), (void *)PT_REGS_PARM3_SYSCALL(regs)
+#define ___bpf_syscall_args4(x, args...) ___bpf_syscall_args3(args), (void *)PT_REGS_PARM4_SYSCALL(regs)
+#define ___bpf_syscall_args5(x, args...) ___bpf_syscall_args4(args), (void *)PT_REGS_PARM5_SYSCALL(regs)
  #define ___bpf_syscall_args(args...)     ___bpf_apply(___bpf_syscall_args, ___bpf_narg(args))(args)
  
+/* If kernel doesn't have CONFIG_ARCH_HAS_SYSCALL_WRAPPER, we have to BPF_CORE_READ from pt_regs */
+#define ___bpf_syswrap_args0()           ctx
+#define ___bpf_syswrap_args1(x)          ___bpf_syswrap_args0(), (void *)PT_REGS_PARM1_CORE_SYSCALL(regs)
+#define ___bpf_syswrap_args2(x, args...) ___bpf_syswrap_args1(args), (void *)PT_REGS_PARM2_CORE_SYSCALL(regs)
+#define ___bpf_syswrap_args3(x, args...) ___bpf_syswrap_args2(args), (void *)PT_REGS_PARM3_CORE_SYSCALL(regs)
+#define ___bpf_syswrap_args4(x, args...) ___bpf_syswrap_args3(args), (void *)PT_REGS_PARM4_CORE_SYSCALL(regs)
+#define ___bpf_syswrap_args5(x, args...) ___bpf_syswrap_args4(args), (void *)PT_REGS_PARM5_CORE_SYSCALL(regs)
+#define ___bpf_syswrap_args(args...)     ___bpf_apply(___bpf_syswrap_args, ___bpf_narg(args))(args)
+
  /*
- * BPF_KPROBE_SYSCALL is a variant of BPF_KPROBE, which is intended for
+ * BPF_KSYSCALL is a variant of BPF_KPROBE, which is intended for
   * tracing syscall functions, like __x64_sys_close. It hides the underlying
   * platform-specific low-level way of getting syscall input arguments from
   * struct pt_regs, and provides a familiar typed and named function arguments
   * syntax and semantics of accessing syscall input parameters.
   *
- * Original struct pt_regs* context is preserved as 'ctx' argument. This might
+ * Original struct pt_regs * context is preserved as 'ctx' argument. This might
   * be necessary when using BPF helpers like bpf_perf_event_output().
   *
- * This macro relies on BPF CO-RE support.
+ * At the moment BPF_KSYSCALL does not handle all the calling convention
+ * quirks for mmap(), clone() and compat syscalls transparrently. This may or
+ * may not change in the future. User needs to take extra measures to handle
+ * such quirks explicitly, if necessary.
+ *
+ * This macro relies on BPF CO-RE support and virtual __kconfig externs.
   */
-#define BPF_KPROBE_SYSCALL(name, args...)                                  \
+#define BPF_KSYSCALL(name, args...)                                        \
  name(struct pt_regs *ctx);                                                 \
+extern _Bool LINUX_HAS_SYSCALL_WRAPPER __kconfig;                          \
  static __attribute__((always_inline)) typeof(name(0))                      \
  ____##name(struct pt_regs *ctx, ##args);                                   \
  typeof(name(0)) name(struct pt_regs *ctx)                                  \
  {                                                                          \
-       struct pt_regs *regs = PT_REGS_SYSCALL_REGS(ctx);                   \
+       struct pt_regs *regs = LINUX_HAS_SYSCALL_WRAPPER                    \
+                              ? (struct pt_regs *)PT_REGS_PARM1(ctx)       \
+                              : ctx;                                       \
         _Pragma("GCC diagnostic push")                                      \
         _Pragma("GCC diagnostic ignored \"-Wint-conversion\"")              \
-       return ____##name(___bpf_syscall_args(args));                       \
+       if (LINUX_HAS_SYSCALL_WRAPPER)                                      \
+               return ____##name(___bpf_syswrap_args(args));               \
+       else                                                                \
+               return ____##name(___bpf_syscall_args(args));               \
         _Pragma("GCC diagnostic pop")                                       \
  }                                                                          \
  static __attribute__((always_inline)) typeof(name(0))                      \
  ____##name(struct pt_regs *ctx, ##args)
  
+#define BPF_KPROBE_SYSCALL BPF_KSYSCALL
+
  #endif
diff --git a/tools/lib/bpf/btf_dump.c b/tools/lib/bpf/btf_dump.c

index 400e84f..627edb5 100644 (file)
--- a/tools/lib/bpf/btf_dump.c
+++ b/tools/lib/bpf/btf_dump.c
@@ -2045,7 +2045,7 @@ static int btf_dump_get_enum_value(struct btf_dump *d,
                 *value = *(__s64 *)data;
                 return 0;
         case 4:
-               *value = is_signed ? *(__s32 *)data : *(__u32 *)data;
+               *value = is_signed ? (__s64)*(__s32 *)data : *(__u32 *)data;
                 return 0;
         case 2:
                 *value = is_signed ? *(__s16 *)data : *(__u16 *)data;
diff --git a/tools/lib/bpf/gen_loader.c b/tools/lib/bpf/gen_loader.c

index 927745b..23f5c46 100644 (file)
--- a/tools/lib/bpf/gen_loader.c
+++ b/tools/lib/bpf/gen_loader.c
@@ -533,7 +533,7 @@ void bpf_gen__record_attach_target(struct bpf_gen *gen, const char *attach_name,
         gen->attach_kind = kind;
         ret = snprintf(gen->attach_target, sizeof(gen->attach_target), "%s%s",
                        prefix, attach_name);
-       if (ret == sizeof(gen->attach_target))
+       if (ret >= sizeof(gen->attach_target))
                 gen->error = -ENOSPC;
  }
  
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c

index cb49408..b01fe01 100644 (file)
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1694,7 +1694,7 @@ static int set_kcfg_value_tri(struct extern_desc *ext, void *ext_val,
         switch (ext->kcfg.type) {
         case KCFG_BOOL:
                 if (value == 'm') {
-                       pr_warn("extern (kcfg) %s=%c should be tristate or char\n",
+                       pr_warn("extern (kcfg) '%s': value '%c' implies tristate or char type\n",
                                 ext->name, value);
                         return -EINVAL;
                 }
@@ -1715,7 +1715,7 @@ static int set_kcfg_value_tri(struct extern_desc *ext, void *ext_val,
         case KCFG_INT:
         case KCFG_CHAR_ARR:
         default:
-               pr_warn("extern (kcfg) %s=%c should be bool, tristate, or char\n",
+               pr_warn("extern (kcfg) '%s': value '%c' implies bool, tristate, or char type\n",
                         ext->name, value);
                 return -EINVAL;
         }
@@ -1729,7 +1729,8 @@ static int set_kcfg_value_str(struct extern_desc *ext, char *ext_val,
         size_t len;
  
         if (ext->kcfg.type != KCFG_CHAR_ARR) {
-               pr_warn("extern (kcfg) %s=%s should be char array\n", ext->name, value);
+               pr_warn("extern (kcfg) '%s': value '%s' implies char array type\n",
+                       ext->name, value);
                 return -EINVAL;
         }
  
@@ -1743,7 +1744,7 @@ static int set_kcfg_value_str(struct extern_desc *ext, char *ext_val,
         /* strip quotes */
         len -= 2;
         if (len >= ext->kcfg.sz) {
-               pr_warn("extern (kcfg) '%s': long string config %s of (%zu bytes) truncated to %d bytes\n",
+               pr_warn("extern (kcfg) '%s': long string '%s' of (%zu bytes) truncated to %d bytes\n",
                         ext->name, value, len, ext->kcfg.sz - 1);
                 len = ext->kcfg.sz - 1;
         }
@@ -1800,13 +1801,20 @@ static bool is_kcfg_value_in_range(const struct extern_desc *ext, __u64 v)
  static int set_kcfg_value_num(struct extern_desc *ext, void *ext_val,
                               __u64 value)
  {
-       if (ext->kcfg.type != KCFG_INT && ext->kcfg.type != KCFG_CHAR) {
-               pr_warn("extern (kcfg) %s=%llu should be integer\n",
+       if (ext->kcfg.type != KCFG_INT && ext->kcfg.type != KCFG_CHAR &&
+           ext->kcfg.type != KCFG_BOOL) {
+               pr_warn("extern (kcfg) '%s': value '%llu' implies integer, char, or boolean type\n",
                         ext->name, (unsigned long long)value);
                 return -EINVAL;
         }
+       if (ext->kcfg.type == KCFG_BOOL && value > 1) {
+               pr_warn("extern (kcfg) '%s': value '%llu' isn't boolean compatible\n",
+                       ext->name, (unsigned long long)value);
+               return -EINVAL;
+
+       }
         if (!is_kcfg_value_in_range(ext, value)) {
-               pr_warn("extern (kcfg) %s=%llu value doesn't fit in %d bytes\n",
+               pr_warn("extern (kcfg) '%s': value '%llu' doesn't fit in %d bytes\n",
                         ext->name, (unsigned long long)value, ext->kcfg.sz);
                 return -ERANGE;
         }
@@ -1870,16 +1878,19 @@ static int bpf_object__process_kconfig_line(struct bpf_object *obj,
                 /* assume integer */
                 err = parse_u64(value, &num);
                 if (err) {
-                       pr_warn("extern (kcfg) %s=%s should be integer\n",
-                               ext->name, value);
+                       pr_warn("extern (kcfg) '%s': value '%s' isn't a valid integer\n", ext->name, value);
                         return err;
                 }
+               if (ext->kcfg.type != KCFG_INT && ext->kcfg.type != KCFG_CHAR) {
+                       pr_warn("extern (kcfg) '%s': value '%s' implies integer type\n", ext->name, value);
+                       return -EINVAL;
+               }
                 err = set_kcfg_value_num(ext, ext_val, num);
                 break;
         }
         if (err)
                 return err;
-       pr_debug("extern (kcfg) %s=%s\n", ext->name, value);
+       pr_debug("extern (kcfg) '%s': set to %s\n", ext->name, value);
         return 0;
  }
  
@@ -2320,6 +2331,37 @@ int parse_btf_map_def(const char *map_name, struct btf *btf,
         return 0;
  }
  
+static size_t adjust_ringbuf_sz(size_t sz)
+{
+       __u32 page_sz = sysconf(_SC_PAGE_SIZE);
+       __u32 mul;
+
+       /* if user forgot to set any size, make sure they see error */
+       if (sz == 0)
+               return 0;
+       /* Kernel expects BPF_MAP_TYPE_RINGBUF's max_entries to be
+        * a power-of-2 multiple of kernel's page size. If user diligently
+        * satisified these conditions, pass the size through.
+        */
+       if ((sz % page_sz) == 0 && is_pow_of_2(sz / page_sz))
+               return sz;
+
+       /* Otherwise find closest (page_sz * power_of_2) product bigger than
+        * user-set size to satisfy both user size request and kernel
+        * requirements and substitute correct max_entries for map creation.
+        */
+       for (mul = 1; mul <= UINT_MAX / page_sz; mul <<= 1) {
+               if (mul * page_sz > sz)
+                       return mul * page_sz;
+       }
+
+       /* if it's impossible to satisfy the conditions (i.e., user size is
+        * very close to UINT_MAX but is not a power-of-2 multiple of
+        * page_size) then just return original size and let kernel reject it
+        */
+       return sz;
+}
+
  static void fill_map_from_def(struct bpf_map *map, const struct btf_map_def *def)
  {
         map->def.type = def->map_type;
@@ -2333,6 +2375,10 @@ static void fill_map_from_def(struct bpf_map *map, const struct btf_map_def *def
         map->btf_key_type_id = def->key_type_id;
         map->btf_value_type_id = def->value_type_id;
  
+       /* auto-adjust BPF ringbuf map max_entries to be a multiple of page size */
+       if (map->def.type == BPF_MAP_TYPE_RINGBUF)
+               map->def.max_entries = adjust_ringbuf_sz(map->def.max_entries);
+
         if (def->parts & MAP_DEF_MAP_TYPE)
                 pr_debug("map '%s': found type = %u.\n", map->name, def->map_type);
  
@@ -3687,7 +3733,7 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
                         ext->kcfg.type = find_kcfg_type(obj->btf, t->type,
                                                         &ext->kcfg.is_signed);
                         if (ext->kcfg.type == KCFG_UNKNOWN) {
-                               pr_warn("extern (kcfg) '%s' type is unsupported\n", ext_name);
+                               pr_warn("extern (kcfg) '%s': type is unsupported\n", ext_name);
                                 return -ENOTSUP;
                         }
                 } else if (strcmp(sec_name, KSYMS_SEC) == 0) {
@@ -4232,7 +4278,7 @@ int bpf_map__set_autocreate(struct bpf_map *map, bool autocreate)
  int bpf_map__reuse_fd(struct bpf_map *map, int fd)
  {
         struct bpf_map_info info = {};
-       __u32 len = sizeof(info);
+       __u32 len = sizeof(info), name_len;
         int new_fd, err;
         char *new_name;
  
@@ -4242,7 +4288,12 @@ int bpf_map__reuse_fd(struct bpf_map *map, int fd)
         if (err)
                 return libbpf_err(err);
  
-       new_name = strdup(info.name);
+       name_len = strlen(info.name);
+       if (name_len == BPF_OBJ_NAME_LEN - 1 && strncmp(map->name, info.name, name_len) == 0)
+               new_name = strdup(map->name);
+       else
+               new_name = strdup(info.name);
+
         if (!new_name)
                 return libbpf_err(-errno);
  
@@ -4301,9 +4352,15 @@ struct bpf_map *bpf_map__inner_map(struct bpf_map *map)
  
  int bpf_map__set_max_entries(struct bpf_map *map, __u32 max_entries)
  {
-       if (map->fd >= 0)
+       if (map->obj->loaded)
                 return libbpf_err(-EBUSY);
+
         map->def.max_entries = max_entries;
+
+       /* auto-adjust BPF ringbuf map max_entries to be a multiple of page size */
+       if (map->def.type == BPF_MAP_TYPE_RINGBUF)
+               map->def.max_entries = adjust_ringbuf_sz(map->def.max_entries);
+
         return 0;
  }
  
@@ -4654,6 +4711,8 @@ static int probe_kern_btf_enum64(void)
                                              strs, sizeof(strs)));
  }
  
+static int probe_kern_syscall_wrapper(void);
+
  enum kern_feature_result {
         FEAT_UNKNOWN = 0,
         FEAT_SUPPORTED = 1,
@@ -4722,6 +4781,9 @@ static struct kern_feature_desc {
         [FEAT_BTF_ENUM64] = {
                 "BTF_KIND_ENUM64 support", probe_kern_btf_enum64,
         },
+       [FEAT_SYSCALL_WRAPPER] = {
+               "Kernel using syscall wrapper", probe_kern_syscall_wrapper,
+       },
  };
  
  bool kernel_supports(const struct bpf_object *obj, enum kern_feature_id feat_id)
@@ -4854,37 +4916,6 @@ bpf_object__populate_internal_map(struct bpf_object *obj, struct bpf_map *map)
  
  static void bpf_map__destroy(struct bpf_map *map);
  
-static size_t adjust_ringbuf_sz(size_t sz)
-{
-       __u32 page_sz = sysconf(_SC_PAGE_SIZE);
-       __u32 mul;
-
-       /* if user forgot to set any size, make sure they see error */
-       if (sz == 0)
-               return 0;
-       /* Kernel expects BPF_MAP_TYPE_RINGBUF's max_entries to be
-        * a power-of-2 multiple of kernel's page size. If user diligently
-        * satisified these conditions, pass the size through.
-        */
-       if ((sz % page_sz) == 0 && is_pow_of_2(sz / page_sz))
-               return sz;
-
-       /* Otherwise find closest (page_sz * power_of_2) product bigger than
-        * user-set size to satisfy both user size request and kernel
-        * requirements and substitute correct max_entries for map creation.
-        */
-       for (mul = 1; mul <= UINT_MAX / page_sz; mul <<= 1) {
-               if (mul * page_sz > sz)
-                       return mul * page_sz;
-       }
-
-       /* if it's impossible to satisfy the conditions (i.e., user size is
-        * very close to UINT_MAX but is not a power-of-2 multiple of
-        * page_size) then just return original size and let kernel reject it
-        */
-       return sz;
-}
-
  static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, bool is_inner)
  {
         LIBBPF_OPTS(bpf_map_create_opts, create_attr);
@@ -4923,9 +4954,6 @@ static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, b
         }
  
         switch (def->type) {
-       case BPF_MAP_TYPE_RINGBUF:
-               map->def.max_entries = adjust_ringbuf_sz(map->def.max_entries);
-               /* fallthrough */
         case BPF_MAP_TYPE_PERF_EVENT_ARRAY:
         case BPF_MAP_TYPE_CGROUP_ARRAY:
         case BPF_MAP_TYPE_STACK_TRACE:
@@ -7282,14 +7310,14 @@ static int kallsyms_cb(unsigned long long sym_addr, char sym_type,
                 return 0;
  
         if (ext->is_set && ext->ksym.addr != sym_addr) {
-               pr_warn("extern (ksym) '%s' resolution is ambiguous: 0x%llx or 0x%llx\n",
+               pr_warn("extern (ksym) '%s': resolution is ambiguous: 0x%llx or 0x%llx\n",
                         sym_name, ext->ksym.addr, sym_addr);
                 return -EINVAL;
         }
         if (!ext->is_set) {
                 ext->is_set = true;
                 ext->ksym.addr = sym_addr;
-               pr_debug("extern (ksym) %s=0x%llx\n", sym_name, sym_addr);
+               pr_debug("extern (ksym) '%s': set to 0x%llx\n", sym_name, sym_addr);
         }
         return 0;
  }
@@ -7493,28 +7521,52 @@ static int bpf_object__resolve_externs(struct bpf_object *obj,
         for (i = 0; i < obj->nr_extern; i++) {
                 ext = &obj->externs[i];
  
-               if (ext->type == EXT_KCFG &&
-                   strcmp(ext->name, "LINUX_KERNEL_VERSION") == 0) {
-                       void *ext_val = kcfg_data + ext->kcfg.data_off;
-                       __u32 kver = get_kernel_version();
+               if (ext->type == EXT_KSYM) {
+                       if (ext->ksym.type_id)
+                               need_vmlinux_btf = true;
+                       else
+                               need_kallsyms = true;
+                       continue;
+               } else if (ext->type == EXT_KCFG) {
+                       void *ext_ptr = kcfg_data + ext->kcfg.data_off;
+                       __u64 value = 0;
+
+                       /* Kconfig externs need actual /proc/config.gz */
+                       if (str_has_pfx(ext->name, "CONFIG_")) {
+                               need_config = true;
+                               continue;
+                       }
  
-                       if (!kver) {
-                               pr_warn("failed to get kernel version\n");
+                       /* Virtual kcfg externs are customly handled by libbpf */
+                       if (strcmp(ext->name, "LINUX_KERNEL_VERSION") == 0) {
+                               value = get_kernel_version();
+                               if (!value) {
+                                       pr_warn("extern (kcfg) '%s': failed to get kernel version\n", ext->name);
+                                       return -EINVAL;
+                               }
+                       } else if (strcmp(ext->name, "LINUX_HAS_BPF_COOKIE") == 0) {
+                               value = kernel_supports(obj, FEAT_BPF_COOKIE);
+                       } else if (strcmp(ext->name, "LINUX_HAS_SYSCALL_WRAPPER") == 0) {
+                               value = kernel_supports(obj, FEAT_SYSCALL_WRAPPER);
+                       } else if (!str_has_pfx(ext->name, "LINUX_") || !ext->is_weak) {
+                               /* Currently libbpf supports only CONFIG_ and LINUX_ prefixed
+                                * __kconfig externs, where LINUX_ ones are virtual and filled out
+                                * customly by libbpf (their values don't come from Kconfig).
+                                * If LINUX_xxx variable is not recognized by libbpf, but is marked
+                                * __weak, it defaults to zero value, just like for CONFIG_xxx
+                                * externs.
+                                */
+                               pr_warn("extern (kcfg) '%s': unrecognized virtual extern\n", ext->name);
                                 return -EINVAL;
                         }
-                       err = set_kcfg_value_num(ext, ext_val, kver);
+
+                       err = set_kcfg_value_num(ext, ext_ptr, value);
                         if (err)
                                 return err;
-                       pr_debug("extern (kcfg) %s=0x%x\n", ext->name, kver);
-               } else if (ext->type == EXT_KCFG && str_has_pfx(ext->name, "CONFIG_")) {
-                       need_config = true;
-               } else if (ext->type == EXT_KSYM) {
-                       if (ext->ksym.type_id)
-                               need_vmlinux_btf = true;
-                       else
-                               need_kallsyms = true;
+                       pr_debug("extern (kcfg) '%s': set to 0x%llx\n",
+                                ext->name, (long long)value);
                 } else {
-                       pr_warn("unrecognized extern '%s'\n", ext->name);
+                       pr_warn("extern '%s': unrecognized extern kind\n", ext->name);
                         return -EINVAL;
                 }
         }
@@ -7550,10 +7602,10 @@ static int bpf_object__resolve_externs(struct bpf_object *obj,
                 ext = &obj->externs[i];
  
                 if (!ext->is_set && !ext->is_weak) {
-                       pr_warn("extern %s (strong) not resolved\n", ext->name);
+                       pr_warn("extern '%s' (strong): not resolved\n", ext->name);
                         return -ESRCH;
                 } else if (!ext->is_set) {
-                       pr_debug("extern %s (weak) not resolved, defaulting to zero\n",
+                       pr_debug("extern '%s' (weak): not resolved, defaulting to zero\n",
                                  ext->name);
                 }
         }
@@ -8381,6 +8433,7 @@ int bpf_program__set_log_buf(struct bpf_program *prog, char *log_buf, size_t log
  
  static int attach_kprobe(const struct bpf_program *prog, long cookie, struct bpf_link **link);
  static int attach_uprobe(const struct bpf_program *prog, long cookie, struct bpf_link **link);
+static int attach_ksyscall(const struct bpf_program *prog, long cookie, struct bpf_link **link);
  static int attach_usdt(const struct bpf_program *prog, long cookie, struct bpf_link **link);
  static int attach_tp(const struct bpf_program *prog, long cookie, struct bpf_link **link);
  static int attach_raw_tp(const struct bpf_program *prog, long cookie, struct bpf_link **link);
@@ -8401,6 +8454,8 @@ static const struct bpf_sec_def section_defs[] = {
         SEC_DEF("uretprobe.s+",         KPROBE, 0, SEC_SLEEPABLE, attach_uprobe),
         SEC_DEF("kprobe.multi+",        KPROBE, BPF_TRACE_KPROBE_MULTI, SEC_NONE, attach_kprobe_multi),
         SEC_DEF("kretprobe.multi+",     KPROBE, BPF_TRACE_KPROBE_MULTI, SEC_NONE, attach_kprobe_multi),
+       SEC_DEF("ksyscall+",            KPROBE, 0, SEC_NONE, attach_ksyscall),
+       SEC_DEF("kretsyscall+",         KPROBE, 0, SEC_NONE, attach_ksyscall),
         SEC_DEF("usdt+",                KPROBE, 0, SEC_NONE, attach_usdt),
         SEC_DEF("tc",                   SCHED_CLS, 0, SEC_NONE),
         SEC_DEF("classifier",           SCHED_CLS, 0, SEC_NONE),
@@ -9757,7 +9812,7 @@ static int perf_event_open_probe(bool uprobe, bool retprobe, const char *name,
  {
         struct perf_event_attr attr = {};
         char errmsg[STRERR_BUFSIZE];
-       int type, pfd, err;
+       int type, pfd;
  
         if (ref_ctr_off >= (1ULL << PERF_UPROBE_REF_CTR_OFFSET_BITS))
                 return -EINVAL;
@@ -9793,14 +9848,7 @@ static int perf_event_open_probe(bool uprobe, bool retprobe, const char *name,
                       pid < 0 ? -1 : pid /* pid */,
                       pid == -1 ? 0 : -1 /* cpu */,
                       -1 /* group_fd */, PERF_FLAG_FD_CLOEXEC);
-       if (pfd < 0) {
-               err = -errno;
-               pr_warn("%s perf_event_open() failed: %s\n",
-                       uprobe ? "uprobe" : "kprobe",
-                       libbpf_strerror_r(err, errmsg, sizeof(errmsg)));
-               return err;
-       }
-       return pfd;
+       return pfd >= 0 ? pfd : -errno;
  }
  
  static int append_to_file(const char *file, const char *fmt, ...)
@@ -9823,6 +9871,34 @@ static int append_to_file(const char *file, const char *fmt, ...)
         return err;
  }
  
+#define DEBUGFS "/sys/kernel/debug/tracing"
+#define TRACEFS "/sys/kernel/tracing"
+
+static bool use_debugfs(void)
+{
+       static int has_debugfs = -1;
+
+       if (has_debugfs < 0)
+               has_debugfs = access(DEBUGFS, F_OK) == 0;
+
+       return has_debugfs == 1;
+}
+
+static const char *tracefs_path(void)
+{
+       return use_debugfs() ? DEBUGFS : TRACEFS;
+}
+
+static const char *tracefs_kprobe_events(void)
+{
+       return use_debugfs() ? DEBUGFS"/kprobe_events" : TRACEFS"/kprobe_events";
+}
+
+static const char *tracefs_uprobe_events(void)
+{
+       return use_debugfs() ? DEBUGFS"/uprobe_events" : TRACEFS"/uprobe_events";
+}
+
  static void gen_kprobe_legacy_event_name(char *buf, size_t buf_sz,
                                          const char *kfunc_name, size_t offset)
  {
@@ -9835,9 +9911,7 @@ static void gen_kprobe_legacy_event_name(char *buf, size_t buf_sz,
  static int add_kprobe_event_legacy(const char *probe_name, bool retprobe,
                                    const char *kfunc_name, size_t offset)
  {
-       const char *file = "/sys/kernel/debug/tracing/kprobe_events";
-
-       return append_to_file(file, "%c:%s/%s %s+0x%zx",
+       return append_to_file(tracefs_kprobe_events(), "%c:%s/%s %s+0x%zx",
                               retprobe ? 'r' : 'p',
                               retprobe ? "kretprobes" : "kprobes",
                               probe_name, kfunc_name, offset);
@@ -9845,18 +9919,16 @@ static int add_kprobe_event_legacy(const char *probe_name, bool retprobe,
  
  static int remove_kprobe_event_legacy(const char *probe_name, bool retprobe)
  {
-       const char *file = "/sys/kernel/debug/tracing/kprobe_events";
-
-       return append_to_file(file, "-:%s/%s", retprobe ? "kretprobes" : "kprobes", probe_name);
+       return append_to_file(tracefs_kprobe_events(), "-:%s/%s",
+                             retprobe ? "kretprobes" : "kprobes", probe_name);
  }
  
  static int determine_kprobe_perf_type_legacy(const char *probe_name, bool retprobe)
  {
         char file[256];
  
-       snprintf(file, sizeof(file),
-                "/sys/kernel/debug/tracing/events/%s/%s/id",
-                retprobe ? "kretprobes" : "kprobes", probe_name);
+       snprintf(file, sizeof(file), "%s/events/%s/%s/id",
+                tracefs_path(), retprobe ? "kretprobes" : "kprobes", probe_name);
  
         return parse_uint_from_file(file, "%d\n");
  }
@@ -9905,6 +9977,60 @@ err_clean_legacy:
         return err;
  }
  
+static const char *arch_specific_syscall_pfx(void)
+{
+#if defined(__x86_64__)
+       return "x64";
+#elif defined(__i386__)
+       return "ia32";
+#elif defined(__s390x__)
+       return "s390x";
+#elif defined(__s390__)
+       return "s390";
+#elif defined(__arm__)
+       return "arm";
+#elif defined(__aarch64__)
+       return "arm64";
+#elif defined(__mips__)
+       return "mips";
+#elif defined(__riscv)
+       return "riscv";
+#else
+       return NULL;
+#endif
+}
+
+static int probe_kern_syscall_wrapper(void)
+{
+       char syscall_name[64];
+       const char *ksys_pfx;
+
+       ksys_pfx = arch_specific_syscall_pfx();
+       if (!ksys_pfx)
+               return 0;
+
+       snprintf(syscall_name, sizeof(syscall_name), "__%s_sys_bpf", ksys_pfx);
+
+       if (determine_kprobe_perf_type() >= 0) {
+               int pfd;
+
+               pfd = perf_event_open_probe(false, false, syscall_name, 0, getpid(), 0);
+               if (pfd >= 0)
+                       close(pfd);
+
+               return pfd >= 0 ? 1 : 0;
+       } else { /* legacy mode */
+               char probe_name[128];
+
+               gen_kprobe_legacy_event_name(probe_name, sizeof(probe_name), syscall_name, 0);
+               if (add_kprobe_event_legacy(probe_name, false, syscall_name, 0) < 0)
+                       return 0;
+
+               (void)remove_kprobe_event_legacy(probe_name, false);
+               return 1;
+       }
+}
+
  struct bpf_link *
  bpf_program__attach_kprobe_opts(const struct bpf_program *prog,
                                 const char *func_name,
@@ -9990,6 +10116,29 @@ struct bpf_link *bpf_program__attach_kprobe(const struct bpf_program *prog,
         return bpf_program__attach_kprobe_opts(prog, func_name, &opts);
  }
  
+struct bpf_link *bpf_program__attach_ksyscall(const struct bpf_program *prog,
+                                             const char *syscall_name,
+                                             const struct bpf_ksyscall_opts *opts)
+{
+       LIBBPF_OPTS(bpf_kprobe_opts, kprobe_opts);
+       char func_name[128];
+
+       if (!OPTS_VALID(opts, bpf_ksyscall_opts))
+               return libbpf_err_ptr(-EINVAL);
+
+       if (kernel_supports(prog->obj, FEAT_SYSCALL_WRAPPER)) {
+               snprintf(func_name, sizeof(func_name), "__%s_sys_%s",
+                        arch_specific_syscall_pfx(), syscall_name);
+       } else {
+               snprintf(func_name, sizeof(func_name), "__se_sys_%s", syscall_name);
+       }
+
+       kprobe_opts.retprobe = OPTS_GET(opts, retprobe, false);
+       kprobe_opts.bpf_cookie = OPTS_GET(opts, bpf_cookie, 0);
+
+       return bpf_program__attach_kprobe_opts(prog, func_name, &kprobe_opts);
+}
+
  /* Adapted from perf/util/string.c */
  static bool glob_match(const char *str, const char *pat)
  {
@@ -10160,6 +10309,27 @@ static int attach_kprobe(const struct bpf_program *prog, long cookie, struct bpf
         return libbpf_get_error(*link);
  }
  
+static int attach_ksyscall(const struct bpf_program *prog, long cookie, struct bpf_link **link)
+{
+       LIBBPF_OPTS(bpf_ksyscall_opts, opts);
+       const char *syscall_name;
+
+       *link = NULL;
+
+       /* no auto-attach for SEC("ksyscall") and SEC("kretsyscall") */
+       if (strcmp(prog->sec_name, "ksyscall") == 0 || strcmp(prog->sec_name, "kretsyscall") == 0)
+               return 0;
+
+       opts.retprobe = str_has_pfx(prog->sec_name, "kretsyscall/");
+       if (opts.retprobe)
+               syscall_name = prog->sec_name + sizeof("kretsyscall/") - 1;
+       else
+               syscall_name = prog->sec_name + sizeof("ksyscall/") - 1;
+
+       *link = bpf_program__attach_ksyscall(prog, syscall_name, &opts);
+       return *link ? 0 : -errno;
+}
+
  static int attach_kprobe_multi(const struct bpf_program *prog, long cookie, struct bpf_link **link)
  {
         LIBBPF_OPTS(bpf_kprobe_multi_opts, opts);
@@ -10208,9 +10378,7 @@ static void gen_uprobe_legacy_event_name(char *buf, size_t buf_sz,
  static inline int add_uprobe_event_legacy(const char *probe_name, bool retprobe,
                                           const char *binary_path, size_t offset)
  {
-       const char *file = "/sys/kernel/debug/tracing/uprobe_events";
-
-       return append_to_file(file, "%c:%s/%s %s:0x%zx",
+       return append_to_file(tracefs_uprobe_events(), "%c:%s/%s %s:0x%zx",
                               retprobe ? 'r' : 'p',
                               retprobe ? "uretprobes" : "uprobes",
                               probe_name, binary_path, offset);
@@ -10218,18 +10386,16 @@ static inline int add_uprobe_event_legacy(const char *probe_name, bool retprobe,
  
  static inline int remove_uprobe_event_legacy(const char *probe_name, bool retprobe)
  {
-       const char *file = "/sys/kernel/debug/tracing/uprobe_events";
-
-       return append_to_file(file, "-:%s/%s", retprobe ? "uretprobes" : "uprobes", probe_name);
+       return append_to_file(tracefs_uprobe_events(), "-:%s/%s",
+                             retprobe ? "uretprobes" : "uprobes", probe_name);
  }
  
  static int determine_uprobe_perf_type_legacy(const char *probe_name, bool retprobe)
  {
         char file[512];
  
-       snprintf(file, sizeof(file),
-                "/sys/kernel/debug/tracing/events/%s/%s/id",
-                retprobe ? "uretprobes" : "uprobes", probe_name);
+       snprintf(file, sizeof(file), "%s/events/%s/%s/id",
+                tracefs_path(), retprobe ? "uretprobes" : "uprobes", probe_name);
  
         return parse_uint_from_file(file, "%d\n");
  }
@@ -10545,7 +10711,10 @@ bpf_program__attach_uprobe_opts(const struct bpf_program *prog, pid_t pid,
         ref_ctr_off = OPTS_GET(opts, ref_ctr_offset, 0);
         pe_opts.bpf_cookie = OPTS_GET(opts, bpf_cookie, 0);
  
-       if (binary_path && !strchr(binary_path, '/')) {
+       if (!binary_path)
+               return libbpf_err_ptr(-EINVAL);
+
+       if (!strchr(binary_path, '/')) {
                 err = resolve_full_path(binary_path, full_binary_path,
                                         sizeof(full_binary_path));
                 if (err) {
@@ -10559,11 +10728,6 @@ bpf_program__attach_uprobe_opts(const struct bpf_program *prog, pid_t pid,
         if (func_name) {
                 long sym_off;
  
-               if (!binary_path) {
-                       pr_warn("prog '%s': name-based attach requires binary_path\n",
-                               prog->name);
-                       return libbpf_err_ptr(-EINVAL);
-               }
                 sym_off = elf_find_func_offset(binary_path, func_name);
                 if (sym_off < 0)
                         return libbpf_err_ptr(sym_off);
@@ -10711,6 +10875,9 @@ struct bpf_link *bpf_program__attach_usdt(const struct bpf_program *prog,
                 return libbpf_err_ptr(-EINVAL);
         }
  
+       if (!binary_path)
+               return libbpf_err_ptr(-EINVAL);
+
         if (!strchr(binary_path, '/')) {
                 err = resolve_full_path(binary_path, resolved_path, sizeof(resolved_path));
                 if (err) {
@@ -10776,9 +10943,8 @@ static int determine_tracepoint_id(const char *tp_category,
         char file[PATH_MAX];
         int ret;
  
-       ret = snprintf(file, sizeof(file),
-                      "/sys/kernel/debug/tracing/events/%s/%s/id",
-                      tp_category, tp_name);
+       ret = snprintf(file, sizeof(file), "%s/events/%s/%s/id",
+                      tracefs_path(), tp_category, tp_name);
         if (ret < 0)
                 return -errno;
         if (ret >= sizeof(file)) {
@@ -11728,6 +11894,22 @@ int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx)
         return cpu_buf->fd;
  }
  
+int perf_buffer__buffer(struct perf_buffer *pb, int buf_idx, void **buf, size_t *buf_size)
+{
+       struct perf_cpu_buf *cpu_buf;
+
+       if (buf_idx >= pb->cpu_cnt)
+               return libbpf_err(-EINVAL);
+
+       cpu_buf = pb->cpu_bufs[buf_idx];
+       if (!cpu_buf)
+               return libbpf_err(-ENOENT);
+
+       *buf = cpu_buf->base;
+       *buf_size = pb->mmap_size;
+       return 0;
+}
+
  /*
   * Consume data from perf ring buffer corresponding to slot *buf_idx* in
   * PERF_EVENT_ARRAY BPF map without waiting/polling. If there is no data to
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h

index e4d5353..61493c4 100644 (file)
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -457,6 +457,52 @@ bpf_program__attach_kprobe_multi_opts(const struct bpf_program *prog,
                                       const char *pattern,
                                       const struct bpf_kprobe_multi_opts *opts);
  
+struct bpf_ksyscall_opts {
+       /* size of this struct, for forward/backward compatiblity */
+       size_t sz;
+       /* custom user-provided value fetchable through bpf_get_attach_cookie() */
+       __u64 bpf_cookie;
+       /* attach as return probe? */
+       bool retprobe;
+       size_t :0;
+};
+#define bpf_ksyscall_opts__last_field retprobe
+
+/**
+ * @brief **bpf_program__attach_ksyscall()** attaches a BPF program
+ * to kernel syscall handler of a specified syscall. Optionally it's possible
+ * to request to install retprobe that will be triggered at syscall exit. It's
+ * also possible to associate BPF cookie (though options).
+ *
+ * Libbpf automatically will determine correct full kernel function name,
+ * which depending on system architecture and kernel version/configuration
+ * could be of the form __<arch>_sys_<syscall> or __se_sys_<syscall>, and will
+ * attach specified program using kprobe/kretprobe mechanism.
+ *
+ * **bpf_program__attach_ksyscall()** is an API counterpart of declarative
+ * **SEC("ksyscall/<syscall>")** annotation of BPF programs.
+ *
+ * At the moment **SEC("ksyscall")** and **bpf_program__attach_ksyscall()** do
+ * not handle all the calling convention quirks for mmap(), clone() and compat
+ * syscalls. It also only attaches to "native" syscall interfaces. If host
+ * system supports compat syscalls or defines 32-bit syscalls in 64-bit
+ * kernel, such syscall interfaces won't be attached to by libbpf.
+ *
+ * These limitations may or may not change in the future. Therefore it is
+ * recommended to use SEC("kprobe") for these syscalls or if working with
+ * compat and 32-bit interfaces is required.
+ *
+ * @param prog BPF program to attach
+ * @param syscall_name Symbolic name of the syscall (e.g., "bpf")
+ * @param opts Additional options (see **struct bpf_ksyscall_opts**)
+ * @return Reference to the newly created BPF link; or NULL is returned on
+ * error, error code is stored in errno
+ */
+LIBBPF_API struct bpf_link *
+bpf_program__attach_ksyscall(const struct bpf_program *prog,
+                            const char *syscall_name,
+                            const struct bpf_ksyscall_opts *opts);
+
  struct bpf_uprobe_opts {
         /* size of this struct, for forward/backward compatiblity */
         size_t sz;
@@ -1053,6 +1099,22 @@ LIBBPF_API int perf_buffer__consume(struct perf_buffer *pb);
  LIBBPF_API int perf_buffer__consume_buffer(struct perf_buffer *pb, size_t buf_idx);
  LIBBPF_API size_t perf_buffer__buffer_cnt(const struct perf_buffer *pb);
  LIBBPF_API int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx);
+/**
+ * @brief **perf_buffer__buffer()** returns the per-cpu raw mmap()'ed underlying
+ * memory region of the ring buffer.
+ * This ring buffer can be used to implement a custom events consumer.
+ * The ring buffer starts with the *struct perf_event_mmap_page*, which
+ * holds the ring buffer managment fields, when accessing the header
+ * structure it's important to be SMP aware.
+ * You can refer to *perf_event_read_simple* for a simple example.
+ * @param pb the perf buffer structure
+ * @param buf_idx the buffer index to retreive
+ * @param buf (out) gets the base pointer of the mmap()'ed memory
+ * @param buf_size (out) gets the size of the mmap()'ed region
+ * @return 0 on success, negative error code for failure
+ */
+LIBBPF_API int perf_buffer__buffer(struct perf_buffer *pb, int buf_idx, void **buf,
+                                  size_t *buf_size);
  
  struct bpf_prog_linfo;
  struct bpf_prog_info;
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map

index 94b589e..0625adb 100644 (file)
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -356,10 +356,12 @@ LIBBPF_0.8.0 {
  LIBBPF_1.0.0 {
         global:
                 bpf_prog_query_opts;
+               bpf_program__attach_ksyscall;
                 btf__add_enum64;
                 btf__add_enum64_value;
                 libbpf_bpf_attach_type_str;
                 libbpf_bpf_link_type_str;
                 libbpf_bpf_map_type_str;
                 libbpf_bpf_prog_type_str;
+               perf_buffer__buffer;
  };
diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h

index 9cd7829..4135ae0 100644 (file)
--- a/tools/lib/bpf/libbpf_internal.h
+++ b/tools/lib/bpf/libbpf_internal.h
@@ -108,9 +108,9 @@ static inline bool str_has_sfx(const char *str, const char *sfx)
         size_t str_len = strlen(str);
         size_t sfx_len = strlen(sfx);
  
-       if (sfx_len <= str_len)
-               return strcmp(str + str_len - sfx_len, sfx);
-       return false;
+       if (sfx_len > str_len)
+               return false;
+       return strcmp(str + str_len - sfx_len, sfx) == 0;
  }
  
  /* Symbol versioning is different between static and shared library.
@@ -352,6 +352,8 @@ enum kern_feature_id {
         FEAT_BPF_COOKIE,
         /* BTF_KIND_ENUM64 support and BTF_KIND_ENUM kflag support */
         FEAT_BTF_ENUM64,
+       /* Kernel uses syscall wrapper (CONFIG_ARCH_HAS_SYSCALL_WRAPPER) */
+       FEAT_SYSCALL_WRAPPER,
         __FEAT_CNT,
  };
  
diff --git a/tools/lib/bpf/usdt.bpf.h b/tools/lib/bpf/usdt.bpf.h

index 4181fdd..4f2adc0 100644 (file)
--- a/tools/lib/bpf/usdt.bpf.h
+++ b/tools/lib/bpf/usdt.bpf.h
@@ -6,7 +6,6 @@
  #include <linux/errno.h>
  #include <bpf/bpf_helpers.h>
  #include <bpf/bpf_tracing.h>
-#include <bpf/bpf_core_read.h>
  
  /* Below types and maps are internal implementation details of libbpf's USDT
   * support and are subjects to change. Also, bpf_usdt_xxx() API helpers should
@@ -30,14 +29,6 @@
  #ifndef BPF_USDT_MAX_IP_CNT
  #define BPF_USDT_MAX_IP_CNT (4 * BPF_USDT_MAX_SPEC_CNT)
  #endif
-/* We use BPF CO-RE to detect support for BPF cookie from BPF side. This is
- * the only dependency on CO-RE, so if it's undesirable, user can override
- * BPF_USDT_HAS_BPF_COOKIE to specify whether to BPF cookie is supported or not.
- */
-#ifndef BPF_USDT_HAS_BPF_COOKIE
-#define BPF_USDT_HAS_BPF_COOKIE \
-       bpf_core_enum_value_exists(enum bpf_func_id___usdt, BPF_FUNC_get_attach_cookie___usdt)
-#endif
  
  enum __bpf_usdt_arg_type {
         BPF_USDT_ARG_CONST,
@@ -83,15 +74,12 @@ struct {
         __type(value, __u32);
  } __bpf_usdt_ip_to_spec_id SEC(".maps") __weak;
  
-/* don't rely on user's BPF code to have latest definition of bpf_func_id */
-enum bpf_func_id___usdt {
-       BPF_FUNC_get_attach_cookie___usdt = 0xBAD, /* value doesn't matter */
-};
+extern const _Bool LINUX_HAS_BPF_COOKIE __kconfig;
  
  static __always_inline
  int __bpf_usdt_spec_id(struct pt_regs *ctx)
  {
-       if (!BPF_USDT_HAS_BPF_COOKIE) {
+       if (!LINUX_HAS_BPF_COOKIE) {
                 long ip = PT_REGS_IP(ctx);
                 int *spec_id_ptr;
  
diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c

index e585e1c..792cb15 100644 (file)
--- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
+++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
@@ -148,13 +148,13 @@ static struct bin_attribute bin_attr_bpf_testmod_file __ro_after_init = {
         .write = bpf_testmod_test_write,
  };
  
-BTF_SET_START(bpf_testmod_check_kfunc_ids)
-BTF_ID(func, bpf_testmod_test_mod_kfunc)
-BTF_SET_END(bpf_testmod_check_kfunc_ids)
+BTF_SET8_START(bpf_testmod_check_kfunc_ids)
+BTF_ID_FLAGS(func, bpf_testmod_test_mod_kfunc)
+BTF_SET8_END(bpf_testmod_check_kfunc_ids)
  
  static const struct btf_kfunc_id_set bpf_testmod_kfunc_set = {
-       .owner     = THIS_MODULE,
-       .check_set = &bpf_testmod_check_kfunc_ids,
+       .owner = THIS_MODULE,
+       .set   = &bpf_testmod_check_kfunc_ids,
  };
  
  extern int bpf_fentry_test1(int a);
diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c

index 7ff5fa9..a33874b 100644 (file)
--- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
@@ -27,6 +27,7 @@
  #include "bpf_iter_test_kern5.skel.h"
  #include "bpf_iter_test_kern6.skel.h"
  #include "bpf_iter_bpf_link.skel.h"
+#include "bpf_iter_ksym.skel.h"
  
  static int duration;
  
@@ -1120,6 +1121,19 @@ static void test_link_iter(void)
         bpf_iter_bpf_link__destroy(skel);
  }
  
+static void test_ksym_iter(void)
+{
+       struct bpf_iter_ksym *skel;
+
+       skel = bpf_iter_ksym__open_and_load();
+       if (!ASSERT_OK_PTR(skel, "bpf_iter_ksym__open_and_load"))
+               return;
+
+       do_dummy_read(skel->progs.dump_ksym);
+
+       bpf_iter_ksym__destroy(skel);
+}
+
  #define CMP_BUFFER_SIZE 1024
  static char task_vma_output[CMP_BUFFER_SIZE];
  static char proc_maps_output[CMP_BUFFER_SIZE];
@@ -1267,4 +1281,6 @@ void test_bpf_iter(void)
                 test_buf_neg_offset();
         if (test__start_subtest("link-iter"))
                 test_link_iter();
+       if (test__start_subtest("ksym"))
+               test_ksym_iter();
  }
diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_nf.c b/tools/testing/selftests/bpf/prog_tests/bpf_nf.c

index dd30b1e..7a74a15 100644 (file)
--- a/tools/testing/selftests/bpf/prog_tests/bpf_nf.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_nf.c
@@ -2,13 +2,29 @@
  #include <test_progs.h>
  #include <network_helpers.h>
  #include "test_bpf_nf.skel.h"
+#include "test_bpf_nf_fail.skel.h"
+
+static char log_buf[1024 * 1024];
+
+struct {
+       const char *prog_name;
+       const char *err_msg;
+} test_bpf_nf_fail_tests[] = {
+       { "alloc_release", "kernel function bpf_ct_release args#0 expected pointer to STRUCT nf_conn but" },
+       { "insert_insert", "kernel function bpf_ct_insert_entry args#0 expected pointer to STRUCT nf_conn___init but" },
+       { "lookup_insert", "kernel function bpf_ct_insert_entry args#0 expected pointer to STRUCT nf_conn___init but" },
+       { "set_timeout_after_insert", "kernel function bpf_ct_set_timeout args#0 expected pointer to STRUCT nf_conn___init but" },
+       { "set_status_after_insert", "kernel function bpf_ct_set_status args#0 expected pointer to STRUCT nf_conn___init but" },
+       { "change_timeout_after_alloc", "kernel function bpf_ct_change_timeout args#0 expected pointer to STRUCT nf_conn but" },
+       { "change_status_after_alloc", "kernel function bpf_ct_change_status args#0 expected pointer to STRUCT nf_conn but" },
+};
  
  enum {
         TEST_XDP,
         TEST_TC_BPF,
  };
  
-void test_bpf_nf_ct(int mode)
+static void test_bpf_nf_ct(int mode)
  {
         struct test_bpf_nf *skel;
         int prog_fd, err;
@@ -39,14 +55,60 @@ void test_bpf_nf_ct(int mode)
         ASSERT_EQ(skel->bss->test_enonet_netns_id, -ENONET, "Test ENONET for bad but valid netns_id");
         ASSERT_EQ(skel->bss->test_enoent_lookup, -ENOENT, "Test ENOENT for failed lookup");
         ASSERT_EQ(skel->bss->test_eafnosupport, -EAFNOSUPPORT, "Test EAFNOSUPPORT for invalid len__tuple");
+       ASSERT_EQ(skel->data->test_alloc_entry, 0, "Test for alloc new entry");
+       ASSERT_EQ(skel->data->test_insert_entry, 0, "Test for insert new entry");
+       ASSERT_EQ(skel->data->test_succ_lookup, 0, "Test for successful lookup");
+       /* allow some tolerance for test_delta_timeout value to avoid races. */
+       ASSERT_GT(skel->bss->test_delta_timeout, 8, "Test for min ct timeout update");
+       ASSERT_LE(skel->bss->test_delta_timeout, 10, "Test for max ct timeout update");
+       /* expected status is IPS_SEEN_REPLY */
+       ASSERT_EQ(skel->bss->test_status, 2, "Test for ct status update ");
  end:
         test_bpf_nf__destroy(skel);
  }
  
+static void test_bpf_nf_ct_fail(const char *prog_name, const char *err_msg)
+{
+       LIBBPF_OPTS(bpf_object_open_opts, opts, .kernel_log_buf = log_buf,
+                                               .kernel_log_size = sizeof(log_buf),
+                                               .kernel_log_level = 1);
+       struct test_bpf_nf_fail *skel;
+       struct bpf_program *prog;
+       int ret;
+
+       skel = test_bpf_nf_fail__open_opts(&opts);
+       if (!ASSERT_OK_PTR(skel, "test_bpf_nf_fail__open"))
+               return;
+
+       prog = bpf_object__find_program_by_name(skel->obj, prog_name);
+       if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
+               goto end;
+
+       bpf_program__set_autoload(prog, true);
+
+       ret = test_bpf_nf_fail__load(skel);
+       if (!ASSERT_ERR(ret, "test_bpf_nf_fail__load must fail"))
+               goto end;
+
+       if (!ASSERT_OK_PTR(strstr(log_buf, err_msg), "expected error message")) {
+               fprintf(stderr, "Expected: %s\n", err_msg);
+               fprintf(stderr, "Verifier: %s\n", log_buf);
+       }
+
+end:
+       test_bpf_nf_fail__destroy(skel);
+}
+
  void test_bpf_nf(void)
  {
+       int i;
         if (test__start_subtest("xdp-ct"))
                 test_bpf_nf_ct(TEST_XDP);
         if (test__start_subtest("tc-bpf-ct"))
                 test_bpf_nf_ct(TEST_TC_BPF);
+       for (i = 0; i < ARRAY_SIZE(test_bpf_nf_fail_tests); i++) {
+               if (test__start_subtest(test_bpf_nf_fail_tests[i].prog_name))
+                       test_bpf_nf_ct_fail(test_bpf_nf_fail_tests[i].prog_name,
+                                           test_bpf_nf_fail_tests[i].err_msg);
+       }
  }
diff --git a/tools/testing/selftests/bpf/prog_tests/btf.c b/tools/testing/selftests/bpf/prog_tests/btf.c

index 941b010..ef6528b 100644 (file)
--- a/tools/testing/selftests/bpf/prog_tests/btf.c
+++ b/tools/testing/selftests/bpf/prog_tests/btf.c
@@ -5338,7 +5338,7 @@ static void do_test_pprint(int test_num)
         ret = snprintf(pin_path, sizeof(pin_path), "%s/%s",
                        "/sys/fs/bpf", test->map_name);
  
-       if (CHECK(ret == sizeof(pin_path), "pin_path %s/%s is too long",
+       if (CHECK(ret >= sizeof(pin_path), "pin_path %s/%s is too long",
                   "/sys/fs/bpf", test->map_name)) {
                 err = -1;
                 goto done;
diff --git a/tools/testing/selftests/bpf/prog_tests/core_extern.c b/tools/testing/selftests/bpf/prog_tests/core_extern.c

index 1931a15..63a51e9 100644 (file)
--- a/tools/testing/selftests/bpf/prog_tests/core_extern.c
+++ b/tools/testing/selftests/bpf/prog_tests/core_extern.c
@@ -39,6 +39,7 @@ static struct test_case {
                        "CONFIG_STR=\"abracad\"\n"
                        "CONFIG_MISSING=0",
                 .data = {
+                       .unkn_virt_val = 0,
                         .bpf_syscall = false,
                         .tristate_val = TRI_MODULE,
                         .bool_val = true,
@@ -121,7 +122,7 @@ static struct test_case {
  void test_core_extern(void)
  {
         const uint32_t kern_ver = get_kernel_version();
-       int err, duration = 0, i, j;
+       int err, i, j;
         struct test_core_extern *skel = NULL;
         uint64_t *got, *exp;
         int n = sizeof(*skel->data) / sizeof(uint64_t);
@@ -136,19 +137,17 @@ void test_core_extern(void)
                         continue;
  
                 skel = test_core_extern__open_opts(&opts);
-               if (CHECK(!skel, "skel_open", "skeleton open failed\n"))
+               if (!ASSERT_OK_PTR(skel, "skel_open"))
                         goto cleanup;
                 err = test_core_extern__load(skel);
                 if (t->fails) {
-                       CHECK(!err, "skel_load",
-                             "shouldn't succeed open/load of skeleton\n");
+                       ASSERT_ERR(err, "skel_load_should_fail");
                         goto cleanup;
-               } else if (CHECK(err, "skel_load",
-                                "failed to open/load skeleton\n")) {
+               } else if (!ASSERT_OK(err, "skel_load")) {
                         goto cleanup;
                 }
                 err = test_core_extern__attach(skel);
-               if (CHECK(err, "attach_raw_tp", "failed attach: %d\n", err))
+               if (!ASSERT_OK(err, "attach_raw_tp"))
                         goto cleanup;
  
                 usleep(1);
@@ -158,9 +157,7 @@ void test_core_extern(void)
                 got = (uint64_t *)skel->data;
                 exp = (uint64_t *)&t->data;
                 for (j = 0; j < n; j++) {
-                       CHECK(got[j] != exp[j], "check_res",
-                             "result #%d: expected %llx, but got %llx\n",
-                              j, (__u64)exp[j], (__u64)got[j]);
+                       ASSERT_EQ(got[j], exp[j], "result");
                 }
  cleanup:
                 test_core_extern__destroy(skel);
diff --git a/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c b/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c

index 335917d..d457a55 100644 (file)
--- a/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c
+++ b/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c
@@ -364,6 +364,8 @@ static int get_syms(char ***symsp, size_t *cntp)
                         continue;
                 if (!strncmp(name, "rcu_", 4))
                         continue;
+               if (!strcmp(name, "bpf_dispatcher_xdp_func"))
+                       continue;
                 if (!strncmp(name, "__ftrace_invalid_address__",
                              sizeof("__ftrace_invalid_address__") - 1))
                         continue;
diff --git a/tools/testing/selftests/bpf/prog_tests/ringbuf_multi.c b/tools/testing/selftests/bpf/prog_tests/ringbuf_multi.c

index eb5f7f5..1455911 100644 (file)
--- a/tools/testing/selftests/bpf/prog_tests/ringbuf_multi.c
+++ b/tools/testing/selftests/bpf/prog_tests/ringbuf_multi.c
@@ -50,6 +50,13 @@ void test_ringbuf_multi(void)
         if (CHECK(!skel, "skel_open", "skeleton open failed\n"))
                 return;
  
+       /* validate ringbuf size adjustment logic */
+       ASSERT_EQ(bpf_map__max_entries(skel->maps.ringbuf1), page_size, "rb1_size_before");
+       ASSERT_OK(bpf_map__set_max_entries(skel->maps.ringbuf1, page_size + 1), "rb1_resize");
+       ASSERT_EQ(bpf_map__max_entries(skel->maps.ringbuf1), 2 * page_size, "rb1_size_after");
+       ASSERT_OK(bpf_map__set_max_entries(skel->maps.ringbuf1, page_size), "rb1_reset");
+       ASSERT_EQ(bpf_map__max_entries(skel->maps.ringbuf1), page_size, "rb1_size_final");
+
         proto_fd = bpf_map_create(BPF_MAP_TYPE_RINGBUF, NULL, 0, 0, page_size, NULL);
         if (CHECK(proto_fd < 0, "bpf_map_create", "bpf_map_create failed\n"))
                 goto cleanup;
@@ -65,6 +72,10 @@ void test_ringbuf_multi(void)
         close(proto_fd);
         proto_fd = -1;
  
+       /* make sure we can't resize ringbuf after object load */
+       if (!ASSERT_ERR(bpf_map__set_max_entries(skel->maps.ringbuf1, 3 * page_size), "rb1_resize_after_load"))
+               goto cleanup;
+
         /* only trigger BPF program for current process */
         skel->bss->pid = getpid();
  
diff --git a/tools/testing/selftests/bpf/prog_tests/skeleton.c b/tools/testing/selftests/bpf/prog_tests/skeleton.c

index 180afd6..99dac52 100644 (file)
--- a/tools/testing/selftests/bpf/prog_tests/skeleton.c
+++ b/tools/testing/selftests/bpf/prog_tests/skeleton.c
@@ -122,6 +122,8 @@ void test_skeleton(void)
  
         ASSERT_EQ(skel->bss->out_mostly_var, 123, "out_mostly_var");
  
+       ASSERT_EQ(bss->huge_arr[ARRAY_SIZE(bss->huge_arr) - 1], 123, "huge_arr");
+
         elf_bytes = test_skeleton__elf_bytes(&elf_bytes_sz);
         ASSERT_OK_PTR(elf_bytes, "elf_bytes");
         ASSERT_GE(elf_bytes_sz, 0, "elf_bytes_sz");
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter.h b/tools/testing/selftests/bpf/progs/bpf_iter.h

index 97ec8bc..e984660 100644 (file)
--- a/tools/testing/selftests/bpf/progs/bpf_iter.h
+++ b/tools/testing/selftests/bpf/progs/bpf_iter.h
@@ -22,6 +22,7 @@
  #define BTF_F_NONAME BTF_F_NONAME___not_used
  #define BTF_F_PTR_RAW BTF_F_PTR_RAW___not_used
  #define BTF_F_ZERO BTF_F_ZERO___not_used
+#define bpf_iter__ksym bpf_iter__ksym___not_used
  #include "vmlinux.h"
  #undef bpf_iter_meta
  #undef bpf_iter__bpf_map
@@ -44,6 +45,7 @@
  #undef BTF_F_NONAME
  #undef BTF_F_PTR_RAW
  #undef BTF_F_ZERO
+#undef bpf_iter__ksym
  
  struct bpf_iter_meta {
         struct seq_file *seq;
@@ -151,3 +153,8 @@ enum {
         BTF_F_PTR_RAW   =       (1ULL << 2),
         BTF_F_ZERO      =       (1ULL << 3),
  };
+
+struct bpf_iter__ksym {
+       struct bpf_iter_meta *meta;
+       struct kallsym_iter *ksym;
+};
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_ksym.c b/tools/testing/selftests/bpf/progs/bpf_iter_ksym.c

new file mode 100644 (file)

index 0000000..285c008
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_ksym.c
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022, Oracle and/or its affiliates. */
+#include "bpf_iter.h"
+#include <bpf/bpf_helpers.h>
+
+char _license[] SEC("license") = "GPL";
+
+unsigned long last_sym_value = 0;
+
+static inline char tolower(char c)
+{
+       if (c >= 'A' && c <= 'Z')
+               c += ('a' - 'A');
+       return c;
+}
+
+static inline char toupper(char c)
+{
+       if (c >= 'a' && c <= 'z')
+               c -= ('a' - 'A');
+       return c;
+}
+
+/* Dump symbols with max size; the latter is calculated by caching symbol N value
+ * and when iterating on symbol N+1, we can print max size of symbol N via
+ * address of N+1 - address of N.
+ */
+SEC("iter/ksym")
+int dump_ksym(struct bpf_iter__ksym *ctx)
+{
+       struct seq_file *seq = ctx->meta->seq;
+       struct kallsym_iter *iter = ctx->ksym;
+       __u32 seq_num = ctx->meta->seq_num;
+       unsigned long value;
+       char type;
+       int ret;
+
+       if (!iter)
+               return 0;
+
+       if (seq_num == 0) {
+               BPF_SEQ_PRINTF(seq, "ADDR TYPE NAME MODULE_NAME KIND MAX_SIZE\n");
+               return 0;
+       }
+       if (last_sym_value)
+               BPF_SEQ_PRINTF(seq, "0x%x\n", iter->value - last_sym_value);
+       else
+               BPF_SEQ_PRINTF(seq, "\n");
+
+       value = iter->show_value ? iter->value : 0;
+
+       last_sym_value = value;
+
+       type = iter->type;
+
+       if (iter->module_name[0]) {
+               type = iter->exported ? toupper(type) : tolower(type);
+               BPF_SEQ_PRINTF(seq, "0x%llx %c %s [ %s ] ",
+                              value, type, iter->name, iter->module_name);
+       } else {
+               BPF_SEQ_PRINTF(seq, "0x%llx %c %s ", value, type, iter->name);
+       }
+       if (!iter->pos_arch_end || iter->pos_arch_end > iter->pos)
+               BPF_SEQ_PRINTF(seq, "CORE ");
+       else if (!iter->pos_mod_end || iter->pos_mod_end > iter->pos)
+               BPF_SEQ_PRINTF(seq, "MOD ");
+       else if (!iter->pos_ftrace_mod_end || iter->pos_ftrace_mod_end > iter->pos)
+               BPF_SEQ_PRINTF(seq, "FTRACE_MOD ");
+       else if (!iter->pos_bpf_end || iter->pos_bpf_end > iter->pos)
+               BPF_SEQ_PRINTF(seq, "BPF ");
+       else
+               BPF_SEQ_PRINTF(seq, "KPROBE ");
+       return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/bpf_syscall_macro.c b/tools/testing/selftests/bpf/progs/bpf_syscall_macro.c

index 05838ed..e1e1189 100644 (file)
--- a/tools/testing/selftests/bpf/progs/bpf_syscall_macro.c
+++ b/tools/testing/selftests/bpf/progs/bpf_syscall_macro.c
@@ -64,9 +64,9 @@ int BPF_KPROBE(handle_sys_prctl)
         return 0;
  }
  
-SEC("kprobe/" SYS_PREFIX "sys_prctl")
-int BPF_KPROBE_SYSCALL(prctl_enter, int option, unsigned long arg2,
-                      unsigned long arg3, unsigned long arg4, unsigned long arg5)
+SEC("ksyscall/prctl")
+int BPF_KSYSCALL(prctl_enter, int option, unsigned long arg2,
+                unsigned long arg3, unsigned long arg4, unsigned long arg5)
  {
         pid_t pid = bpf_get_current_pid_tgid() >> 32;
  
diff --git a/tools/testing/selftests/bpf/progs/test_attach_probe.c b/tools/testing/selftests/bpf/progs/test_attach_probe.c

index f1c88ad..a1e45fe 100644 (file)
--- a/tools/testing/selftests/bpf/progs/test_attach_probe.c
+++ b/tools/testing/selftests/bpf/progs/test_attach_probe.c
@@ -1,11 +1,10 @@
  // SPDX-License-Identifier: GPL-2.0
  // Copyright (c) 2017 Facebook
  
-#include <linux/ptrace.h>
-#include <linux/bpf.h>
+#include "vmlinux.h"
  #include <bpf/bpf_helpers.h>
  #include <bpf/bpf_tracing.h>
-#include <stdbool.h>
+#include <bpf/bpf_core_read.h>
  #include "bpf_misc.h"
  
  int kprobe_res = 0;
@@ -31,8 +30,8 @@ int handle_kprobe(struct pt_regs *ctx)
         return 0;
  }
  
-SEC("kprobe/" SYS_PREFIX "sys_nanosleep")
-int BPF_KPROBE(handle_kprobe_auto)
+SEC("ksyscall/nanosleep")
+int BPF_KSYSCALL(handle_kprobe_auto, struct __kernel_timespec *req, struct __kernel_timespec *rem)
  {
         kprobe2_res = 11;
         return 0;
@@ -56,11 +55,11 @@ int handle_kretprobe(struct pt_regs *ctx)
         return 0;
  }
  
-SEC("kretprobe/" SYS_PREFIX "sys_nanosleep")
-int BPF_KRETPROBE(handle_kretprobe_auto)
+SEC("kretsyscall/nanosleep")
+int BPF_KRETPROBE(handle_kretprobe_auto, int ret)
  {
         kretprobe2_res = 22;
-       return 0;
+       return ret;
  }
  
  SEC("uprobe")
diff --git a/tools/testing/selftests/bpf/progs/test_bpf_nf.c b/tools/testing/selftests/bpf/progs/test_bpf_nf.c

index f00a973..196cd8d 100644 (file)
--- a/tools/testing/selftests/bpf/progs/test_bpf_nf.c
+++ b/tools/testing/selftests/bpf/progs/test_bpf_nf.c
@@ -8,6 +8,8 @@
  #define EINVAL 22
  #define ENOENT 2
  
+extern unsigned long CONFIG_HZ __kconfig;
+
  int test_einval_bpf_tuple = 0;
  int test_einval_reserved = 0;
  int test_einval_netns_id = 0;
@@ -16,6 +18,11 @@ int test_eproto_l4proto = 0;
  int test_enonet_netns_id = 0;
  int test_enoent_lookup = 0;
  int test_eafnosupport = 0;
+int test_alloc_entry = -EINVAL;
+int test_insert_entry = -EAFNOSUPPORT;
+int test_succ_lookup = -ENOENT;
+u32 test_delta_timeout = 0;
+u32 test_status = 0;
  
  struct nf_conn;
  
@@ -26,31 +33,44 @@ struct bpf_ct_opts___local {
         u8 reserved[3];
  } __attribute__((preserve_access_index));
  
+struct nf_conn *bpf_xdp_ct_alloc(struct xdp_md *, struct bpf_sock_tuple *, u32,
+                                struct bpf_ct_opts___local *, u32) __ksym;
  struct nf_conn *bpf_xdp_ct_lookup(struct xdp_md *, struct bpf_sock_tuple *, u32,
                                   struct bpf_ct_opts___local *, u32) __ksym;
+struct nf_conn *bpf_skb_ct_alloc(struct __sk_buff *, struct bpf_sock_tuple *, u32,
+                                struct bpf_ct_opts___local *, u32) __ksym;
  struct nf_conn *bpf_skb_ct_lookup(struct __sk_buff *, struct bpf_sock_tuple *, u32,
                                   struct bpf_ct_opts___local *, u32) __ksym;
+struct nf_conn *bpf_ct_insert_entry(struct nf_conn *) __ksym;
  void bpf_ct_release(struct nf_conn *) __ksym;
+void bpf_ct_set_timeout(struct nf_conn *, u32) __ksym;
+int bpf_ct_change_timeout(struct nf_conn *, u32) __ksym;
+int bpf_ct_set_status(struct nf_conn *, u32) __ksym;
+int bpf_ct_change_status(struct nf_conn *, u32) __ksym;
  
  static __always_inline void
-nf_ct_test(struct nf_conn *(*func)(void *, struct bpf_sock_tuple *, u32,
-                                  struct bpf_ct_opts___local *, u32),
+nf_ct_test(struct nf_conn *(*lookup_fn)(void *, struct bpf_sock_tuple *, u32,
+                                       struct bpf_ct_opts___local *, u32),
+          struct nf_conn *(*alloc_fn)(void *, struct bpf_sock_tuple *, u32,
+                                      struct bpf_ct_opts___local *, u32),
            void *ctx)
  {
         struct bpf_ct_opts___local opts_def = { .l4proto = IPPROTO_TCP, .netns_id = -1 };
         struct bpf_sock_tuple bpf_tuple;
         struct nf_conn *ct;
+       int err;
  
         __builtin_memset(&bpf_tuple, 0, sizeof(bpf_tuple.ipv4));
  
-       ct = func(ctx, NULL, 0, &opts_def, sizeof(opts_def));
+       ct = lookup_fn(ctx, NULL, 0, &opts_def, sizeof(opts_def));
         if (ct)
                 bpf_ct_release(ct);
         else
                 test_einval_bpf_tuple = opts_def.error;
  
         opts_def.reserved[0] = 1;
-       ct = func(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, sizeof(opts_def));
+       ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def,
+                      sizeof(opts_def));
         opts_def.reserved[0] = 0;
         opts_def.l4proto = IPPROTO_TCP;
         if (ct)
@@ -59,21 +79,24 @@ nf_ct_test(struct nf_conn *(*func)(void *, struct bpf_sock_tuple *, u32,
                 test_einval_reserved = opts_def.error;
  
         opts_def.netns_id = -2;
-       ct = func(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, sizeof(opts_def));
+       ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def,
+                      sizeof(opts_def));
         opts_def.netns_id = -1;
         if (ct)
                 bpf_ct_release(ct);
         else
                 test_einval_netns_id = opts_def.error;
  
-       ct = func(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, sizeof(opts_def) - 1);
+       ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def,
+                      sizeof(opts_def) - 1);
         if (ct)
                 bpf_ct_release(ct);
         else
                 test_einval_len_opts = opts_def.error;
  
         opts_def.l4proto = IPPROTO_ICMP;
-       ct = func(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, sizeof(opts_def));
+       ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def,
+                      sizeof(opts_def));
         opts_def.l4proto = IPPROTO_TCP;
         if (ct)
                 bpf_ct_release(ct);
@@ -81,37 +104,75 @@ nf_ct_test(struct nf_conn *(*func)(void *, struct bpf_sock_tuple *, u32,
                 test_eproto_l4proto = opts_def.error;
  
         opts_def.netns_id = 0xf00f;
-       ct = func(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, sizeof(opts_def));
+       ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def,
+                      sizeof(opts_def));
         opts_def.netns_id = -1;
         if (ct)
                 bpf_ct_release(ct);
         else
                 test_enonet_netns_id = opts_def.error;
  
-       ct = func(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, sizeof(opts_def));
+       ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def,
+                      sizeof(opts_def));
         if (ct)
                 bpf_ct_release(ct);
         else
                 test_enoent_lookup = opts_def.error;
  
-       ct = func(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4) - 1, &opts_def, sizeof(opts_def));
+       ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4) - 1, &opts_def,
+                      sizeof(opts_def));
         if (ct)
                 bpf_ct_release(ct);
         else
                 test_eafnosupport = opts_def.error;
+
+       bpf_tuple.ipv4.saddr = bpf_get_prandom_u32(); /* src IP */
+       bpf_tuple.ipv4.daddr = bpf_get_prandom_u32(); /* dst IP */
+       bpf_tuple.ipv4.sport = bpf_get_prandom_u32(); /* src port */
+       bpf_tuple.ipv4.dport = bpf_get_prandom_u32(); /* dst port */
+
+       ct = alloc_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def,
+                     sizeof(opts_def));
+       if (ct) {
+               struct nf_conn *ct_ins;
+
+               bpf_ct_set_timeout(ct, 10000);
+               bpf_ct_set_status(ct, IPS_CONFIRMED);
+
+               ct_ins = bpf_ct_insert_entry(ct);
+               if (ct_ins) {
+                       struct nf_conn *ct_lk;
+
+                       ct_lk = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4),
+                                         &opts_def, sizeof(opts_def));
+                       if (ct_lk) {
+                               /* update ct entry timeout */
+                               bpf_ct_change_timeout(ct_lk, 10000);
+                               test_delta_timeout = ct_lk->timeout - bpf_jiffies64();
+                               test_delta_timeout /= CONFIG_HZ;
+                               test_status = IPS_SEEN_REPLY;
+                               bpf_ct_change_status(ct_lk, IPS_SEEN_REPLY);
+                               bpf_ct_release(ct_lk);
+                               test_succ_lookup = 0;
+                       }
+                       bpf_ct_release(ct_ins);
+                       test_insert_entry = 0;
+               }
+               test_alloc_entry = 0;
+       }
  }
  
  SEC("xdp")
  int nf_xdp_ct_test(struct xdp_md *ctx)
  {
-       nf_ct_test((void *)bpf_xdp_ct_lookup, ctx);
+       nf_ct_test((void *)bpf_xdp_ct_lookup, (void *)bpf_xdp_ct_alloc, ctx);
         return 0;
  }
  
  SEC("tc")
  int nf_skb_ct_test(struct __sk_buff *ctx)
  {
-       nf_ct_test((void *)bpf_skb_ct_lookup, ctx);
+       nf_ct_test((void *)bpf_skb_ct_lookup, (void *)bpf_skb_ct_alloc, ctx);
         return 0;
  }
  
diff --git a/tools/testing/selftests/bpf/progs/test_bpf_nf_fail.c b/tools/testing/selftests/bpf/progs/test_bpf_nf_fail.c

new file mode 100644 (file)

index 0000000..bf79af1
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_bpf_nf_fail.c
@@ -0,0 +1,134 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+
+struct nf_conn;
+
+struct bpf_ct_opts___local {
+       s32 netns_id;
+       s32 error;
+       u8 l4proto;
+       u8 reserved[3];
+} __attribute__((preserve_access_index));
+
+struct nf_conn *bpf_skb_ct_alloc(struct __sk_buff *, struct bpf_sock_tuple *, u32,
+                                struct bpf_ct_opts___local *, u32) __ksym;
+struct nf_conn *bpf_skb_ct_lookup(struct __sk_buff *, struct bpf_sock_tuple *, u32,
+                                 struct bpf_ct_opts___local *, u32) __ksym;
+struct nf_conn *bpf_ct_insert_entry(struct nf_conn *) __ksym;
+void bpf_ct_release(struct nf_conn *) __ksym;
+void bpf_ct_set_timeout(struct nf_conn *, u32) __ksym;
+int bpf_ct_change_timeout(struct nf_conn *, u32) __ksym;
+int bpf_ct_set_status(struct nf_conn *, u32) __ksym;
+int bpf_ct_change_status(struct nf_conn *, u32) __ksym;
+
+SEC("?tc")
+int alloc_release(struct __sk_buff *ctx)
+{
+       struct bpf_ct_opts___local opts = {};
+       struct bpf_sock_tuple tup = {};
+       struct nf_conn *ct;
+
+       ct = bpf_skb_ct_alloc(ctx, &tup, sizeof(tup.ipv4), &opts, sizeof(opts));
+       if (!ct)
+               return 0;
+       bpf_ct_release(ct);
+       return 0;
+}
+
+SEC("?tc")
+int insert_insert(struct __sk_buff *ctx)
+{
+       struct bpf_ct_opts___local opts = {};
+       struct bpf_sock_tuple tup = {};
+       struct nf_conn *ct;
+
+       ct = bpf_skb_ct_alloc(ctx, &tup, sizeof(tup.ipv4), &opts, sizeof(opts));
+       if (!ct)
+               return 0;
+       ct = bpf_ct_insert_entry(ct);
+       if (!ct)
+               return 0;
+       ct = bpf_ct_insert_entry(ct);
+       return 0;
+}
+
+SEC("?tc")
+int lookup_insert(struct __sk_buff *ctx)
+{
+       struct bpf_ct_opts___local opts = {};
+       struct bpf_sock_tuple tup = {};
+       struct nf_conn *ct;
+
+       ct = bpf_skb_ct_lookup(ctx, &tup, sizeof(tup.ipv4), &opts, sizeof(opts));
+       if (!ct)
+               return 0;
+       bpf_ct_insert_entry(ct);
+       return 0;
+}
+
+SEC("?tc")
+int set_timeout_after_insert(struct __sk_buff *ctx)
+{
+       struct bpf_ct_opts___local opts = {};
+       struct bpf_sock_tuple tup = {};
+       struct nf_conn *ct;
+
+       ct = bpf_skb_ct_alloc(ctx, &tup, sizeof(tup.ipv4), &opts, sizeof(opts));
+       if (!ct)
+               return 0;
+       ct = bpf_ct_insert_entry(ct);
+       if (!ct)
+               return 0;
+       bpf_ct_set_timeout(ct, 0);
+       return 0;
+}
+
+SEC("?tc")
+int set_status_after_insert(struct __sk_buff *ctx)
+{
+       struct bpf_ct_opts___local opts = {};
+       struct bpf_sock_tuple tup = {};
+       struct nf_conn *ct;
+
+       ct = bpf_skb_ct_alloc(ctx, &tup, sizeof(tup.ipv4), &opts, sizeof(opts));
+       if (!ct)
+               return 0;
+       ct = bpf_ct_insert_entry(ct);
+       if (!ct)
+               return 0;
+       bpf_ct_set_status(ct, 0);
+       return 0;
+}
+
+SEC("?tc")
+int change_timeout_after_alloc(struct __sk_buff *ctx)
+{
+       struct bpf_ct_opts___local opts = {};
+       struct bpf_sock_tuple tup = {};
+       struct nf_conn *ct;
+
+       ct = bpf_skb_ct_alloc(ctx, &tup, sizeof(tup.ipv4), &opts, sizeof(opts));
+       if (!ct)
+               return 0;
+       bpf_ct_change_timeout(ct, 0);
+       return 0;
+}
+
+SEC("?tc")
+int change_status_after_alloc(struct __sk_buff *ctx)
+{
+       struct bpf_ct_opts___local opts = {};
+       struct bpf_sock_tuple tup = {};
+       struct nf_conn *ct;
+
+       ct = bpf_skb_ct_alloc(ctx, &tup, sizeof(tup.ipv4), &opts, sizeof(opts));
+       if (!ct)
+               return 0;
+       bpf_ct_change_status(ct, 0);
+       return 0;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/test_core_extern.c b/tools/testing/selftests/bpf/progs/test_core_extern.c

index 3ac3603..a3c7c10 100644 (file)
--- a/tools/testing/selftests/bpf/progs/test_core_extern.c
+++ b/tools/testing/selftests/bpf/progs/test_core_extern.c
@@ -11,6 +11,7 @@
  static int (*bpf_missing_helper)(const void *arg1, int arg2) = (void *) 999;
  
  extern int LINUX_KERNEL_VERSION __kconfig;
+extern int LINUX_UNKNOWN_VIRTUAL_EXTERN __kconfig __weak;
  extern bool CONFIG_BPF_SYSCALL __kconfig; /* strong */
  extern enum libbpf_tristate CONFIG_TRISTATE __kconfig __weak;
  extern bool CONFIG_BOOL __kconfig __weak;
@@ -22,6 +23,7 @@ extern const char CONFIG_STR[8] __kconfig __weak;
  extern uint64_t CONFIG_MISSING __kconfig __weak;
  
  uint64_t kern_ver = -1;
+uint64_t unkn_virt_val = -1;
  uint64_t bpf_syscall = -1;
  uint64_t tristate_val = -1;
  uint64_t bool_val = -1;
@@ -38,6 +40,7 @@ int handle_sys_enter(struct pt_regs *ctx)
         int i;
  
         kern_ver = LINUX_KERNEL_VERSION;
+       unkn_virt_val = LINUX_UNKNOWN_VIRTUAL_EXTERN;
         bpf_syscall = CONFIG_BPF_SYSCALL;
         tristate_val = CONFIG_TRISTATE;
         bool_val = CONFIG_BOOL;
diff --git a/tools/testing/selftests/bpf/progs/test_probe_user.c b/tools/testing/selftests/bpf/progs/test_probe_user.c

index 702578a..8e14950 100644 (file)
--- a/tools/testing/selftests/bpf/progs/test_probe_user.c
+++ b/tools/testing/selftests/bpf/progs/test_probe_user.c
@@ -1,35 +1,20 @@
  // SPDX-License-Identifier: GPL-2.0
-
-#include <linux/ptrace.h>
-#include <linux/bpf.h>
-
-#include <netinet/in.h>
-
+#include "vmlinux.h"
  #include <bpf/bpf_helpers.h>
  #include <bpf/bpf_tracing.h>
+#include <bpf/bpf_core_read.h>
  #include "bpf_misc.h"
  
  static struct sockaddr_in old;
  
-SEC("kprobe/" SYS_PREFIX "sys_connect")
-int BPF_KPROBE(handle_sys_connect)
+SEC("ksyscall/connect")
+int BPF_KSYSCALL(handle_sys_connect, int fd, struct sockaddr_in *uservaddr, int addrlen)
  {
-#if SYSCALL_WRAPPER == 1
-       struct pt_regs *real_regs;
-#endif
         struct sockaddr_in new;
-       void *ptr;
-
-#if SYSCALL_WRAPPER == 0
-       ptr = (void *)PT_REGS_PARM2(ctx);
-#else
-       real_regs = (struct pt_regs *)PT_REGS_PARM1(ctx);
-       bpf_probe_read_kernel(&ptr, sizeof(ptr), &PT_REGS_PARM2(real_regs));
-#endif
  
-       bpf_probe_read_user(&old, sizeof(old), ptr);
+       bpf_probe_read_user(&old, sizeof(old), uservaddr);
         __builtin_memset(&new, 0xab, sizeof(new));
-       bpf_probe_write_user(ptr, &new, sizeof(new));
+       bpf_probe_write_user(uservaddr, &new, sizeof(new));
  
         return 0;
  }
diff --git a/tools/testing/selftests/bpf/progs/test_skeleton.c b/tools/testing/selftests/bpf/progs/test_skeleton.c

index 1b1187d..1a4e93f 100644 (file)
--- a/tools/testing/selftests/bpf/progs/test_skeleton.c
+++ b/tools/testing/selftests/bpf/progs/test_skeleton.c
@@ -51,6 +51,8 @@ int out_dynarr[4] SEC(".data.dyn") = { 1, 2, 3, 4 };
  int read_mostly_var __read_mostly;
  int out_mostly_var;
  
+char huge_arr[16 * 1024 * 1024];
+
  SEC("raw_tp/sys_enter")
  int handler(const void *ctx)
  {
@@ -71,6 +73,8 @@ int handler(const void *ctx)
  
         out_mostly_var = read_mostly_var;
  
+       huge_arr[sizeof(huge_arr) - 1] = 123;
+
         return 0;
  }
  
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_noinline.c b/tools/testing/selftests/bpf/progs/test_xdp_noinline.c

index 125d872..ba48fcb 100644 (file)
--- a/tools/testing/selftests/bpf/progs/test_xdp_noinline.c
+++ b/tools/testing/selftests/bpf/progs/test_xdp_noinline.c
@@ -239,7 +239,7 @@ bool parse_udp(void *data, void *data_end,
         udp = data + off;
  
         if (udp + 1 > data_end)
-               return 0;
+               return false;
         if (!is_icmp) {
                 pckt->flow.port16[0] = udp->source;
                 pckt->flow.port16[1] = udp->dest;
@@ -247,7 +247,7 @@ bool parse_udp(void *data, void *data_end,
                 pckt->flow.port16[0] = udp->dest;
                 pckt->flow.port16[1] = udp->source;
         }
-       return 1;
+       return true;
  }
  
  static __attribute__ ((noinline))
@@ -261,7 +261,7 @@ bool parse_tcp(void *data, void *data_end,
  
         tcp = data + off;
         if (tcp + 1 > data_end)
-               return 0;
+               return false;
         if (tcp->syn)
                 pckt->flags |= (1 << 1);
         if (!is_icmp) {
@@ -271,7 +271,7 @@ bool parse_tcp(void *data, void *data_end,
                 pckt->flow.port16[0] = tcp->dest;
                 pckt->flow.port16[1] = tcp->source;
         }
-       return 1;
+       return true;
  }
  
  static __attribute__ ((noinline))
@@ -287,7 +287,7 @@ bool encap_v6(struct xdp_md *xdp, struct ctl_value *cval,
         void *data;
  
         if (bpf_xdp_adjust_head(xdp, 0 - (int)sizeof(struct ipv6hdr)))
-               return 0;
+               return false;
         data = (void *)(long)xdp->data;
         data_end = (void *)(long)xdp->data_end;
         new_eth = data;
@@ -295,7 +295,7 @@ bool encap_v6(struct xdp_md *xdp, struct ctl_value *cval,
         old_eth = data + sizeof(struct ipv6hdr);
         if (new_eth + 1 > data_end ||
             old_eth + 1 > data_end || ip6h + 1 > data_end)
-               return 0;
+               return false;
         memcpy(new_eth->eth_dest, cval->mac, 6);
         memcpy(new_eth->eth_source, old_eth->eth_dest, 6);
         new_eth->eth_proto = 56710;
@@ -314,7 +314,7 @@ bool encap_v6(struct xdp_md *xdp, struct ctl_value *cval,
         ip6h->saddr.in6_u.u6_addr32[2] = 3;
         ip6h->saddr.in6_u.u6_addr32[3] = ip_suffix;
         memcpy(ip6h->daddr.in6_u.u6_addr32, dst->dstv6, 16);
-       return 1;
+       return true;
  }
  
  static __attribute__ ((noinline))
@@ -335,7 +335,7 @@ bool encap_v4(struct xdp_md *xdp, struct ctl_value *cval,
         ip_suffix <<= 15;
         ip_suffix ^= pckt->flow.src;
         if (bpf_xdp_adjust_head(xdp, 0 - (int)sizeof(struct iphdr)))
-               return 0;
+               return false;
         data = (void *)(long)xdp->data;
         data_end = (void *)(long)xdp->data_end;
         new_eth = data;
@@ -343,7 +343,7 @@ bool encap_v4(struct xdp_md *xdp, struct ctl_value *cval,
         old_eth = data + sizeof(struct iphdr);
         if (new_eth + 1 > data_end ||
             old_eth + 1 > data_end || iph + 1 > data_end)
-               return 0;
+               return false;
         memcpy(new_eth->eth_dest, cval->mac, 6);
         memcpy(new_eth->eth_source, old_eth->eth_dest, 6);
         new_eth->eth_proto = 8;
@@ -367,8 +367,8 @@ bool encap_v4(struct xdp_md *xdp, struct ctl_value *cval,
                 csum += *next_iph_u16++;
         iph->check = ~((csum & 0xffff) + (csum >> 16));
         if (bpf_xdp_adjust_head(xdp, (int)sizeof(struct iphdr)))
-               return 0;
-       return 1;
+               return false;
+       return true;
  }
  
  static __attribute__ ((noinline))
@@ -386,10 +386,10 @@ bool decap_v6(struct xdp_md *xdp, void **data, void **data_end, bool inner_v4)
         else
                 new_eth->eth_proto = 56710;
         if (bpf_xdp_adjust_head(xdp, (int)sizeof(struct ipv6hdr)))
-               return 0;
+               return false;
         *data = (void *)(long)xdp->data;
         *data_end = (void *)(long)xdp->data_end;
-       return 1;
+       return true;
  }
  
  static __attribute__ ((noinline))
@@ -404,10 +404,10 @@ bool decap_v4(struct xdp_md *xdp, void **data, void **data_end)
         memcpy(new_eth->eth_dest, old_eth->eth_dest, 6);
         new_eth->eth_proto = 8;
         if (bpf_xdp_adjust_head(xdp, (int)sizeof(struct iphdr)))
-               return 0;
+               return false;
         *data = (void *)(long)xdp->data;
         *data_end = (void *)(long)xdp->data_end;
-       return 1;
+       return true;
  }
  
  static __attribute__ ((noinline))
diff --git a/tools/testing/selftests/bpf/test_xdp_veth.sh b/tools/testing/selftests/bpf/test_xdp_veth.sh

index 392d28c..49936c4 100755 (executable)
--- a/tools/testing/selftests/bpf/test_xdp_veth.sh
+++ b/tools/testing/selftests/bpf/test_xdp_veth.sh
@@ -106,9 +106,9 @@ bpftool prog loadall \
  bpftool map update pinned $BPF_DIR/maps/tx_port key 0 0 0 0 value 122 0 0 0
  bpftool map update pinned $BPF_DIR/maps/tx_port key 1 0 0 0 value 133 0 0 0
  bpftool map update pinned $BPF_DIR/maps/tx_port key 2 0 0 0 value 111 0 0 0
-ip link set dev veth1 xdp pinned $BPF_DIR/progs/redirect_map_0
-ip link set dev veth2 xdp pinned $BPF_DIR/progs/redirect_map_1
-ip link set dev veth3 xdp pinned $BPF_DIR/progs/redirect_map_2
+ip link set dev veth1 xdp pinned $BPF_DIR/progs/xdp_redirect_map_0
+ip link set dev veth2 xdp pinned $BPF_DIR/progs/xdp_redirect_map_1
+ip link set dev veth3 xdp pinned $BPF_DIR/progs/xdp_redirect_map_2
  
  ip -n ${NS1} link set dev veth11 xdp obj xdp_dummy.o sec xdp
  ip -n ${NS2} link set dev veth22 xdp obj xdp_tx.o sec xdp
diff --git a/tools/testing/selftests/bpf/verifier/bpf_loop_inline.c b/tools/testing/selftests/bpf/verifier/bpf_loop_inline.c

index 2d00236..a535d41 100644 (file)
--- a/tools/testing/selftests/bpf/verifier/bpf_loop_inline.c
+++ b/tools/testing/selftests/bpf/verifier/bpf_loop_inline.c
@@ -251,6 +251,7 @@
         .expected_insns = { PSEUDO_CALL_INSN() },
         .unexpected_insns = { HELPER_CALL_INSN() },
         .result = ACCEPT,
+       .prog_type = BPF_PROG_TYPE_TRACEPOINT,
         .func_info = { { 0, MAIN_TYPE }, { 16, CALLBACK_TYPE } },
         .func_info_cnt = 2,
         BTF_TYPES
diff --git a/tools/testing/selftests/bpf/verifier/calls.c b/tools/testing/selftests/bpf/verifier/calls.c

index 743ed34..3fb4f69 100644 (file)
--- a/tools/testing/selftests/bpf/verifier/calls.c
+++ b/tools/testing/selftests/bpf/verifier/calls.c
@@ -218,6 +218,59 @@
         .result = REJECT,
         .errstr = "variable ptr_ access var_off=(0x0; 0x7) disallowed",
  },
+{
+       "calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID",
+       .insns = {
+       BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),
+       BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8),
+       BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 0),
+       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0),
+       BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+       BPF_EXIT_INSN(),
+       BPF_MOV64_REG(BPF_REG_6, BPF_REG_0),
+       BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0),
+       BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_6, 16),
+       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0),
+       BPF_MOV64_IMM(BPF_REG_0, 0),
+       BPF_EXIT_INSN(),
+       },
+       .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+       .fixup_kfunc_btf_id = {
+               { "bpf_kfunc_call_test_acquire", 3 },
+               { "bpf_kfunc_call_test_ref", 8 },
+               { "bpf_kfunc_call_test_ref", 10 },
+       },
+       .result_unpriv = REJECT,
+       .result = REJECT,
+       .errstr = "R1 must be referenced",
+},
+{
+       "calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID",
+       .insns = {
+       BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),
+       BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8),
+       BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 0),
+       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0),
+       BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+       BPF_EXIT_INSN(),
+       BPF_MOV64_REG(BPF_REG_6, BPF_REG_0),
+       BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0),
+       BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0),
+       BPF_MOV64_IMM(BPF_REG_0, 0),
+       BPF_EXIT_INSN(),
+       },
+       .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+       .fixup_kfunc_btf_id = {
+               { "bpf_kfunc_call_test_acquire", 3 },
+               { "bpf_kfunc_call_test_ref", 8 },
+               { "bpf_kfunc_call_test_release", 10 },
+       },
+       .result_unpriv = REJECT,
+       .result = ACCEPT,
+},
  {
         "calls: basic sanity",
         .insns = {
author	Jakub Kicinski <kuba@kernel.org>
	Fri, 22 Jul 2022 23:55:43 +0000 (16:55 -0700)
committer	Jakub Kicinski <kuba@kernel.org>
	Fri, 22 Jul 2022 23:55:44 +0000 (16:55 -0700)
Documentation/bpf/btf.rst		patch \| blob \| history
Documentation/bpf/index.rst		patch \| blob \| history
Documentation/bpf/kfuncs.rst	[new file with mode: 0644]	patch \| blob
Documentation/bpf/map_hash.rst	[new file with mode: 0644]	patch \| blob
arch/arm64/include/asm/insn.h		patch \| blob \| history
arch/arm64/lib/insn.c		patch \| blob \| history
arch/arm64/net/bpf_jit.h		patch \| blob \| history
arch/arm64/net/bpf_jit_comp.c		patch \| blob \| history
arch/x86/net/bpf_jit_comp.c		patch \| blob \| history
include/linux/bpf.h		patch \| blob \| history
include/linux/bpf_verifier.h		patch \| blob \| history
include/linux/btf.h		patch \| blob \| history
include/linux/btf_ids.h		patch \| blob \| history
include/linux/filter.h		patch \| blob \| history
include/linux/ftrace.h		patch \| blob \| history
include/linux/skbuff.h		patch \| blob \| history
include/net/netfilter/nf_conntrack_core.h		patch \| blob \| history
include/net/xdp_sock_drv.h		patch \| blob \| history
include/uapi/linux/bpf.h		patch \| blob \| history
kernel/bpf/arraymap.c		patch \| blob \| history
kernel/bpf/bpf_lsm.c		patch \| blob \| history
kernel/bpf/bpf_struct_ops.c		patch \| blob \| history
kernel/bpf/btf.c		patch \| blob \| history
kernel/bpf/core.c		patch \| blob \| history
kernel/bpf/devmap.c		patch \| blob \| history
kernel/bpf/hashtab.c		patch \| blob \| history
kernel/bpf/local_storage.c		patch \| blob \| history
kernel/bpf/lpm_trie.c		patch \| blob \| history
kernel/bpf/preload/iterators/Makefile		patch \| blob \| history
kernel/bpf/syscall.c		patch \| blob \| history
kernel/bpf/trampoline.c		patch \| blob \| history
kernel/bpf/verifier.c		patch \| blob \| history
kernel/kallsyms.c		patch \| blob \| history
kernel/trace/ftrace.c		patch \| blob \| history
net/bpf/test_run.c		patch \| blob \| history
net/core/dev.c		patch \| blob \| history
net/core/filter.c		patch \| blob \| history
net/core/skmsg.c		patch \| blob \| history
net/ipv4/bpf_tcp_ca.c		patch \| blob \| history
net/ipv4/tcp_bbr.c		patch \| blob \| history
net/ipv4/tcp_cubic.c		patch \| blob \| history
net/ipv4/tcp_dctcp.c		patch \| blob \| history
net/netfilter/nf_conntrack_bpf.c		patch \| blob \| history
net/netfilter/nf_conntrack_core.c		patch \| blob \| history
net/netfilter/nf_conntrack_netlink.c		patch \| blob \| history
net/xdp/xsk.c		patch \| blob \| history
samples/bpf/Makefile		patch \| blob \| history
samples/bpf/fds_example.c		patch \| blob \| history
samples/bpf/sock_example.c		patch \| blob \| history
samples/bpf/test_cgrp2_attach.c		patch \| blob \| history
samples/bpf/test_lru_dist.c		patch \| blob \| history
samples/bpf/test_map_in_map_user.c		patch \| blob \| history
samples/bpf/tracex5_user.c		patch \| blob \| history
samples/bpf/xdp_redirect_map.bpf.c		patch \| blob \| history
samples/bpf/xdp_redirect_map_user.c		patch \| blob \| history
scripts/bpf_doc.py		patch \| blob \| history
tools/bpf/resolve_btfids/main.c		patch \| blob \| history
tools/bpf/runqslower/Makefile		patch \| blob \| history
tools/include/uapi/linux/bpf.h		patch \| blob \| history
tools/lib/bpf/bpf_tracing.h		patch \| blob \| history
tools/lib/bpf/btf_dump.c		patch \| blob \| history
tools/lib/bpf/gen_loader.c		patch \| blob \| history
tools/lib/bpf/libbpf.c		patch \| blob \| history
tools/lib/bpf/libbpf.h		patch \| blob \| history
tools/lib/bpf/libbpf.map		patch \| blob \| history
tools/lib/bpf/libbpf_internal.h		patch \| blob \| history
tools/lib/bpf/usdt.bpf.h		patch \| blob \| history
tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c		patch \| blob \| history
tools/testing/selftests/bpf/prog_tests/bpf_iter.c		patch \| blob \| history
tools/testing/selftests/bpf/prog_tests/bpf_nf.c		patch \| blob \| history
tools/testing/selftests/bpf/prog_tests/btf.c		patch \| blob \| history
tools/testing/selftests/bpf/prog_tests/core_extern.c		patch \| blob \| history
tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c		patch \| blob \| history
tools/testing/selftests/bpf/prog_tests/ringbuf_multi.c		patch \| blob \| history
tools/testing/selftests/bpf/prog_tests/skeleton.c		patch \| blob \| history
tools/testing/selftests/bpf/progs/bpf_iter.h		patch \| blob \| history
tools/testing/selftests/bpf/progs/bpf_iter_ksym.c	[new file with mode: 0644]	patch \| blob
tools/testing/selftests/bpf/progs/bpf_syscall_macro.c		patch \| blob \| history
tools/testing/selftests/bpf/progs/test_attach_probe.c		patch \| blob \| history
tools/testing/selftests/bpf/progs/test_bpf_nf.c		patch \| blob \| history
tools/testing/selftests/bpf/progs/test_bpf_nf_fail.c	[new file with mode: 0644]	patch \| blob
tools/testing/selftests/bpf/progs/test_core_extern.c		patch \| blob \| history
tools/testing/selftests/bpf/progs/test_probe_user.c		patch \| blob \| history
tools/testing/selftests/bpf/progs/test_skeleton.c		patch \| blob \| history
tools/testing/selftests/bpf/progs/test_xdp_noinline.c		patch \| blob \| history
tools/testing/selftests/bpf/test_xdp_veth.sh		patch \| blob \| history
tools/testing/selftests/bpf/verifier/bpf_loop_inline.c		patch \| blob \| history
tools/testing/selftests/bpf/verifier/calls.c		patch \| blob \| history