Andrii Nakryiko [Fri, 23 Apr 2021 18:13:46 +0000 (11:13 -0700)]
selftests/bpf: Add global variables linking selftest
Add selftest validating various aspects of statically linking global
variables:
- correct resolution of extern variables across .bss, .data, and .rodata
sections;
- correct handling of weak definitions;
- correct de-duplication of repeating special externs (.kconfig, .ksyms).
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-17-andrii@kernel.org
Andrii Nakryiko [Fri, 23 Apr 2021 18:13:45 +0000 (11:13 -0700)]
selftests/bpf: Add function linking selftest
Add selftest validating various aspects of statically linking functions:
- no conflicts and correct resolution for name-conflicting static funcs;
- correct resolution of extern functions;
- correct handling of weak functions, both resolution itself and libbpf's
handling of unused weak function that "lost" (it leaves gaps in code with
no ELF symbols);
- correct handling of hidden visibility to turn global function into
"static" for the purpose of BPF verification.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-16-andrii@kernel.org
Andrii Nakryiko [Fri, 23 Apr 2021 18:13:44 +0000 (11:13 -0700)]
selftests/bpf: Omit skeleton generation for multi-linked BPF object files
Skip generating individual BPF skeletons for files that are supposed to be
linked together to form the final BPF object file. Very often such files are
"incomplete" BPF object files, which will fail libbpf bpf_object__open() step,
if used individually, thus failing BPF skeleton generation. This is by design,
so skip individual BPF skeletons and only validate them as part of their
linked final BPF object file and skeleton.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-15-andrii@kernel.org
Andrii Nakryiko [Fri, 23 Apr 2021 18:13:43 +0000 (11:13 -0700)]
selftests/bpf: Use -O0 instead of -Og in selftests builds
While -Og is designed to work well with debugger, it's still inferior to -O0
in terms of debuggability experience. It will cause some variables to still be
inlined, it will also prevent single-stepping some statements and otherwise
interfere with debugging experience. So switch to -O0 which turns off any
optimization and provides the best debugging experience.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-14-andrii@kernel.org
Andrii Nakryiko [Fri, 23 Apr 2021 18:13:42 +0000 (11:13 -0700)]
libbpf: Support extern resolution for BTF-defined maps in .maps section
Add extra logic to handle map externs (only BTF-defined maps are supported for
linking). Re-use the map parsing logic used during bpf_object__open(). Map
externs are currently restricted to always match complete map definition. So
all the specified attributes will be compared (down to pining, map_flags,
numa_node, etc). In the future this restriction might be relaxed with no
backwards compatibility issues. If any attribute is mismatched between extern
and actual map definition, linker will report an error, pointing out which one
mismatches.
The original intent was to allow for extern to specify attributes that matters
(to user) to enforce. E.g., if you specify just key information and omit
value, then any value fits. Similarly, it should have been possible to enforce
map_flags, pinning, and any other possible map attribute. Unfortunately, that
means that multiple externs can be only partially overlapping with each other,
which means linker would need to combine their type definitions to end up with
the most restrictive and fullest map definition. This requires an extra amount
of BTF manipulation which at this time was deemed unnecessary and would
require further extending generic BTF writer APIs. So that is left for future
follow ups, if there will be demand for that. But the idea seems intresting
and useful, so I want to document it here.
Weak definitions are also supported, but are pretty strict as well, just
like externs: all weak map definitions have to match exactly. In the follow up
patches this most probably will be relaxed, with __weak map definitions being
able to differ between each other (with non-weak definition always winning, of
course).
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-13-andrii@kernel.org
Andrii Nakryiko [Fri, 23 Apr 2021 18:13:41 +0000 (11:13 -0700)]
libbpf: Add linker extern resolution support for functions and global variables
Add BPF static linker logic to resolve extern variables and functions across
multiple linked together BPF object files.
For that, linker maintains a separate list of struct glob_sym structures,
which keeps track of few pieces of metadata (is it extern or resolved global,
is it a weak symbol, which ELF section it belongs to, etc) and ties together
BTF type info and ELF symbol information and keeps them in sync.
With adding support for extern variables/funcs, it's now possible for some
sections to contain both extern and non-extern definitions. This means that
some sections may start out as ephemeral (if only externs are present and thus
there is not corresponding ELF section), but will be "upgraded" to actual ELF
section as symbols are resolved or new non-extern definitions are appended.
Additional care is taken to not duplicate extern entries in sections like
.kconfig and .ksyms.
Given libbpf requires BTF type to always be present for .kconfig/.ksym
externs, linker extends this requirement to all the externs, even those that
are supposed to be resolved during static linking and which won't be visible
to libbpf. With BTF information always present, static linker will check not
just ELF symbol matches, but entire BTF type signature match as well. That
logic is stricter that BPF CO-RE checks. It probably should be re-used by
.ksym resolution logic in libbpf as well, but that's left for follow up
patches.
To make it unnecessary to rewrite ELF symbols and minimize BTF type
rewriting/removal, ELF symbols that correspond to externs initially will be
updated in place once they are resolved. Similarly for BTF type info, VAR/FUNC
and var_secinfo's (sec_vars in struct bpf_linker) are staying stable, but
types they point to might get replaced when extern is resolved. This might
leave some left-over types (even though we try to minimize this for common
cases of having extern funcs with not argument names vs concrete function with
names properly specified). That can be addresses later with a generic BTF
garbage collection. That's left for a follow up as well.
Given BTF type appending phase is separate from ELF symbol
appending/resolution, special struct glob_sym->underlying_btf_id variable is
used to communicate resolution and rewrite decisions. 0 means
underlying_btf_id needs to be appended (it's not yet in final linker->btf), <0
values are used for temporary storage of source BTF type ID (not yet
rewritten), so -glob_sym->underlying_btf_id is BTF type id in obj-btf. But by
the end of linker_append_btf() phase, that underlying_btf_id will be remapped
and will always be > 0. This is the uglies part of the whole process, but
keeps the other parts much simpler due to stability of sec_var and VAR/FUNC
types, as well as ELF symbol, so please keep that in mind while reviewing.
BTF-defined maps require some extra custom logic and is addressed separate in
the next patch, so that to keep this one smaller and easier to review.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-12-andrii@kernel.org
Andrii Nakryiko [Fri, 23 Apr 2021 18:13:40 +0000 (11:13 -0700)]
libbpf: Tighten BTF type ID rewriting with error checking
It should never fail, but if it does, it's better to know about this rather
than end up with nonsensical type IDs.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-11-andrii@kernel.org
Andrii Nakryiko [Fri, 23 Apr 2021 18:13:39 +0000 (11:13 -0700)]
libbpf: Extend sanity checking ELF symbols with externs validation
Add logic to validate extern symbols, plus some other minor extra checks, like
ELF symbol #0 validation, general symbol visibility and binding validations.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-10-andrii@kernel.org
Andrii Nakryiko [Fri, 23 Apr 2021 18:13:38 +0000 (11:13 -0700)]
libbpf: Make few internal helpers available outside of libbpf.c
Make skip_mods_and_typedefs(), btf_kind_str(), and btf_func_linkage() helpers
available outside of libbpf.c, to be used by static linker code.
Also do few cleanups (error code fixes, comment clean up, etc) that don't
deserve their own commit.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-9-andrii@kernel.org
Andrii Nakryiko [Fri, 23 Apr 2021 18:13:37 +0000 (11:13 -0700)]
libbpf: Factor out symtab and relos sanity checks
Factor out logic for sanity checking SHT_SYMTAB and SHT_REL sections into
separate sections. They are already quite extensive and are suffering from too
deep indentation. Subsequent changes will extend SYMTAB sanity checking
further, so it's better to factor each into a separate function.
No functional changes are intended.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-8-andrii@kernel.org
Andrii Nakryiko [Fri, 23 Apr 2021 18:13:36 +0000 (11:13 -0700)]
libbpf: Refactor BTF map definition parsing
Refactor BTF-defined maps parsing logic to allow it to be nicely reused by BPF
static linker. Further, at least for BPF static linker, it's important to know
which attributes of a BPF map were defined explicitly, so provide a bit set
for each known portion of BTF map definition. This allows BPF static linker to
do a simple check when dealing with extern map declarations.
The same capabilities allow to distinguish attributes explicitly set to zero
(e.g., __uint(max_entries, 0)) vs the case of not specifying it at all (no
max_entries attribute at all). Libbpf is currently not utilizing that, but it
could be useful for backwards compatibility reasons later.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-7-andrii@kernel.org
Andrii Nakryiko [Fri, 23 Apr 2021 18:13:35 +0000 (11:13 -0700)]
libbpf: Allow gaps in BPF program sections to support overriden weak functions
Currently libbpf is very strict about parsing BPF program instruction
sections. No gaps are allowed between sequential BPF programs within a given
ELF section. Libbpf enforced that by keeping track of the next section offset
that should start a new BPF (sub)program and cross-checks that by searching
for a corresponding STT_FUNC ELF symbol.
But this is too restrictive once we allow to have weak BPF programs and link
together two or more BPF object files. In such case, some weak BPF programs
might be "overridden" by either non-weak BPF program with the same name and
signature, or even by another weak BPF program that just happened to be linked
first. That, in turn, leaves BPF instructions of the "lost" BPF (sub)program
intact, but there is no corresponding ELF symbol, because no one is going to
be referencing it.
Libbpf already correctly handles such cases in the sense that it won't append
such dead code to actual BPF programs loaded into kernel. So the only change
that needs to be done is to relax the logic of parsing BPF instruction
sections. Instead of assuming next BPF (sub)program section offset, iterate
available STT_FUNC ELF symbols to discover all available BPF subprograms and
programs.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-6-andrii@kernel.org
Andrii Nakryiko [Fri, 23 Apr 2021 18:13:34 +0000 (11:13 -0700)]
libbpf: Mark BPF subprogs with hidden visibility as static for BPF verifier
Define __hidden helper macro in bpf_helpers.h, which is a short-hand for
__attribute__((visibility("hidden"))). Add libbpf support to mark BPF
subprograms marked with __hidden as static in BTF information to enforce BPF
verifier's static function validation algorithm, which takes more information
(caller's context) into account during a subprogram validation.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-5-andrii@kernel.org
Andrii Nakryiko [Fri, 23 Apr 2021 18:13:33 +0000 (11:13 -0700)]
libbpf: Suppress compiler warning when using SEC() macro with externs
When used on externs SEC() macro will trigger compilation warning about
inapplicable `__attribute__((used))`. That's expected for extern declarations,
so suppress it with the corresponding _Pragma.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-4-andrii@kernel.org
Andrii Nakryiko [Fri, 23 Apr 2021 18:13:32 +0000 (11:13 -0700)]
bpftool: Dump more info about DATASEC members
Dump succinct information for each member of DATASEC: its kinds and name. This
is extremely helpful to see at a quick glance what is inside each DATASEC of
a given BTF. Without this, one has to jump around BTF data to just find out
the name of a VAR or FUNC. DATASEC's var_secinfo member is special in that
regard because it doesn't itself contain the name of the member, delegating
that to the referenced VAR and FUNC kinds. Other kinds, like
STRUCT/UNION/FUNC/ENUM, encode member names directly and thus are clearly
identifiable in BTF dump.
The new output looks like this:
[35] DATASEC '.bss' size=0 vlen=6
type_id=8 offset=0 size=4 (VAR 'input_bss1')
type_id=13 offset=0 size=4 (VAR 'input_bss_weak')
type_id=16 offset=0 size=4 (VAR 'output_bss1')
type_id=17 offset=0 size=4 (VAR 'output_data1')
type_id=18 offset=0 size=4 (VAR 'output_rodata1')
type_id=20 offset=0 size=8 (VAR 'output_sink1')
[36] DATASEC '.data' size=0 vlen=2
type_id=9 offset=0 size=4 (VAR 'input_data1')
type_id=14 offset=0 size=4 (VAR 'input_data_weak')
[37] DATASEC '.kconfig' size=0 vlen=2
type_id=25 offset=0 size=4 (VAR 'LINUX_KERNEL_VERSION')
type_id=28 offset=0 size=1 (VAR 'CONFIG_BPF_SYSCALL')
[38] DATASEC '.ksyms' size=0 vlen=1
type_id=30 offset=0 size=1 (VAR 'bpf_link_fops')
[39] DATASEC '.rodata' size=0 vlen=2
type_id=12 offset=0 size=4 (VAR 'input_rodata1')
type_id=15 offset=0 size=4 (VAR 'input_rodata_weak')
[40] DATASEC 'license' size=0 vlen=1
type_id=24 offset=0 size=4 (VAR 'LICENSE')
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-3-andrii@kernel.org
Andrii Nakryiko [Fri, 23 Apr 2021 18:13:31 +0000 (11:13 -0700)]
bpftool: Support dumping BTF VAR's "extern" linkage
Add dumping of "extern" linkage for BTF VAR kind. Also shorten
"global-allocated" to "global" to be in line with FUNC's "global".
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-2-andrii@kernel.org
Alexei Starovoitov [Fri, 23 Apr 2021 16:58:22 +0000 (09:58 -0700)]
Merge branch 'Simplify bpf_snprintf verifier code'
Florent Revest says:
====================
Alexei requested a couple of cleanups to the bpf_snprintf and
ARG_PTR_TO_CONST_STR verifier code.
https://lore.kernel.org/bpf/CABRcYmL_SMT80UTyV98bRsOzW0wBd1sZcYUpTrcOAV+9m+YoWQ@mail.gmail.com/T/#t
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Florent Revest [Thu, 22 Apr 2021 23:55:43 +0000 (01:55 +0200)]
bpf: Remove unnecessary map checks for ARG_PTR_TO_CONST_STR
reg->type is enforced by check_reg_type() and map should never be NULL
(it would already have been dereferenced anyway) so these checks are
unnecessary.
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210422235543.4007694-3-revest@chromium.org
Florent Revest [Thu, 22 Apr 2021 23:55:42 +0000 (01:55 +0200)]
bpf: Notify user if we ever hit a bpf_snprintf verifier bug
In check_bpf_snprintf_call(), a map_direct_value_addr() of the fmt map
should never fail because it has already been checked by
ARG_PTR_TO_CONST_STR. But if it ever fails, it's better to error out
with an explicit debug message rather than silently fail.
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210422235543.4007694-2-revest@chromium.org
Li RongQing [Wed, 14 Apr 2021 05:39:12 +0000 (13:39 +0800)]
xsk: Align XDP socket batch size with DPDK
DPDK default burst size is 32, however, kernel xsk sendto
syscall can not handle all 32 at one time, and return with
error.
So make kernel XDP socket batch size larger to avoid
unnecessary syscall fail and context switch which will help
to increase performance.
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/1618378752-4191-1-git-send-email-lirongqing@baidu.com
Tiezhu Yang [Thu, 22 Apr 2021 03:36:00 +0000 (11:36 +0800)]
bpf, doc: Fix some invalid links in bpf_devel_QA.rst
There exist some errors "404 Not Found" when I click the link
of "MAINTAINERS" [1], "samples/bpf/" [2] and "selftests" [3]
in the documentation "HOWTO interact with BPF subsystem" [4].
As Alexei Starovoitov suggested, just remove "MAINTAINERS" and
"samples/bpf/" links and use correct link of "selftests".
[1] https://www.kernel.org/doc/html/MAINTAINERS
[2] https://www.kernel.org/doc/html/samples/bpf/
[3] https://www.kernel.org/doc/html/tools/testing/selftests/bpf/
[4] https://www.kernel.org/doc/html/latest/bpf/bpf_devel_QA.html
Fixes:
542228384888 ("bpf, doc: convert bpf_devel_QA.rst to use RST formatting")
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/1619062560-30483-1-git-send-email-yangtiezhu@loongson.cn
Martin Willi [Mon, 19 Apr 2021 14:15:59 +0000 (16:15 +0200)]
net, xdp: Update pkt_type if generic XDP changes unicast MAC
If a generic XDP program changes the destination MAC address from/to
multicast/broadcast, the skb->pkt_type is updated to properly handle
the packet when passed up the stack. When changing the MAC from/to
the NICs MAC, PACKET_HOST/OTHERHOST is not updated, though, making
the behavior different from that of native XDP.
Remember the PACKET_HOST/OTHERHOST state before calling the program
in generic XDP, and update pkt_type accordingly if the destination
MAC address has changed. As eth_type_trans() assumes a default
pkt_type of PACKET_HOST, restore that before calling it.
The use case for this is when a XDP program wants to push received
packets up the stack by rewriting the MAC to the NICs MAC, for
example by cluster nodes sharing MAC addresses.
Fixes:
297249569932 ("net: fix generic XDP to handle if eth header was mangled")
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210419141559.8611-1-martin@strongswan.org
Jiri Olsa [Tue, 20 Apr 2021 13:24:28 +0000 (15:24 +0200)]
selftests/bpf: Add docs target as all dependency
Currently docs target is make dependency for TEST_GEN_FILES,
which makes tests to be rebuilt every time you run make.
Adding docs as all target dependency, so when running make
on top of built selftests it will show just:
$ make
make[1]: Nothing to be done for 'docs'.
After cleaning docs, only docs is rebuilt:
$ make docs-clean
CLEAN eBPF_helpers-manpage
CLEAN eBPF_syscall-manpage
$ make
GEN ...selftests/bpf/bpf-helpers.rst
GEN ...selftests/bpf/bpf-helpers.7
GEN ...selftests/bpf/bpf-syscall.rst
GEN ...selftests/bpf/bpf-syscall.2
$ make
make[1]: Nothing to be done for 'docs'.
Fixes:
a01d935b2e09 ("tools/bpf: Remove bpf-helpers from bpftool docs")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210420132428.15710-1-jolsa@kernel.org
Alexei Starovoitov [Tue, 20 Apr 2021 01:23:34 +0000 (18:23 -0700)]
Merge branch 'bpf: refine retval for bpf_get_task_stack helper'
Dave Marchevsky says:
====================
Similarly to the bpf_get_stack helper, bpf_get_task_stack's return value
can be more tightly bound by the verifier - it's the number of bytes
written to a user-supplied buffer, or a negative error value. Currently
the verifier believes bpf_task_get_stack's retval bounds to be unknown,
requiring extraneous bounds checking to remedy.
Adding it to do_refine_retval_range fixes the issue, as evidenced by
new selftests which fail to load if retval bounds are not refined.
v2: Addressed comment nit in patch 3
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Dave Marchevsky [Fri, 16 Apr 2021 20:47:04 +0000 (13:47 -0700)]
bpf/selftests: Add bpf_get_task_stack retval bounds test_prog
Add a libbpf test prog which feeds bpf_get_task_stack's return value
into seq_write after confirming it's positive. No attempt to bound the
value from above is made.
Load will fail if verifier does not refine retval range based on buf sz
input to bpf_get_task_stack.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210416204704.2816874-4-davemarchevsky@fb.com
Dave Marchevsky [Fri, 16 Apr 2021 20:47:03 +0000 (13:47 -0700)]
bpf/selftests: Add bpf_get_task_stack retval bounds verifier test
Add a bpf_iter test which feeds bpf_get_task_stack's return value into
seq_write after confirming it's positive. No attempt to bound the value
from above is made.
Load will fail if verifier does not refine retval range based on
buf sz input to bpf_get_task_stack.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210416204704.2816874-3-davemarchevsky@fb.com
Dave Marchevsky [Fri, 16 Apr 2021 20:47:02 +0000 (13:47 -0700)]
bpf: Refine retval for bpf_get_task_stack helper
Verifier can constrain the min/max bounds of bpf_get_task_stack's return
value more tightly than the default tnum_unknown. Like bpf_get_stack,
return value is num bytes written into a caller-supplied buf, or error,
so do_refine_retval_range will work.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210416204704.2816874-2-davemarchevsky@fb.com
Yaqi Chen [Fri, 16 Apr 2021 15:48:03 +0000 (23:48 +0800)]
samples/bpf: Fix broken tracex1 due to kprobe argument change
>From commit
c0bbbdc32feb ("__netif_receive_skb_core: pass skb by
reference"), the first argument passed into __netif_receive_skb_core
has changed to reference of a skb pointer.
This commit fixes by using bpf_probe_read_kernel.
Signed-off-by: Yaqi Chen <chendotjs@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210416154803.37157-1-chendotjs@gmail.com
Alexei Starovoitov [Mon, 19 Apr 2021 22:27:37 +0000 (15:27 -0700)]
Merge branch 'Add a snprintf eBPF helper'
Florent Revest says:
====================
We have a usecase where we want to audit symbol names (if available) in
callback registration hooks. (ex: fentry/nf_register_net_hook)
A few months back, I proposed a bpf_kallsyms_lookup series but it was
decided in the reviews that a more generic helper, bpf_snprintf, would
be more useful.
This series implements the helper according to the feedback received in
https://lore.kernel.org/bpf/
20201126165748.
1748417-1-revest@google.com/T/#u
- A new arg type guarantees the NULL-termination of string arguments and
lets us pass format strings in only one arg
- A new helper is implemented using that guarantee. Because the format
string is known at verification time, the format string validation is
done by the verifier
- To implement a series of tests for bpf_snprintf, the logic for
marshalling variadic args in a fixed-size array is reworked as per:
https://lore.kernel.org/bpf/
20210310015455.
1095207-1-revest@chromium.org/T/#u
---
Changes in v5:
- Fixed the bpf_printf_buf_used counter logic in try_get_fmt_tmp_buf
- Added a couple of extra incorrect specifiers tests
- Call test_snprintf_single__destroy unconditionally
- Fixed a C++-style comment
---
Changes in v4:
- Moved bpf_snprintf, bpf_printf_prepare and bpf_printf_cleanup to
kernel/bpf/helpers.c so that they get built without CONFIG_BPF_EVENTS
- Added negative test cases (various invalid format strings)
- Renamed put_fmt_tmp_buf() as bpf_printf_cleanup()
- Fixed a mistake that caused temporary buffers to be unconditionally
freed in bpf_printf_prepare
- Fixed a mistake that caused missing 0 character to be ignored
- Fixed a warning about integer to pointer conversion
- Misc cleanups
---
Changes in v3:
- Simplified temporary buffer acquisition with try_get_fmt_tmp_buf()
- Made zero-termination check more consistent
- Allowed NULL output_buffer
- Simplified the BPF_CAST_FMT_ARG macro
- Three new test cases: number padding, simple string with no arg and
string length extraction only with a NULL output buffer
- Clarified helper's description for edge cases (eg: str_size == 0)
- Lots of cosmetic changes
---
Changes in v2:
- Extracted the format validation/argument sanitization in a generic way
for all printf-like helpers.
- bpf_snprintf's str_size can now be 0
- bpf_snprintf is now exposed to all BPF program types
- We now preempt_disable when using a per-cpu temporary buffer
- Addressed a few cosmetic changes
====================
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Florent Revest [Mon, 19 Apr 2021 15:52:43 +0000 (17:52 +0200)]
selftests/bpf: Add a series of tests for bpf_snprintf
The "positive" part tests all format specifiers when things go well.
The "negative" part makes sure that incorrect format strings fail at
load time.
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210419155243.1632274-7-revest@chromium.org
Florent Revest [Mon, 19 Apr 2021 15:52:42 +0000 (17:52 +0200)]
libbpf: Introduce a BPF_SNPRINTF helper macro
Similarly to BPF_SEQ_PRINTF, this macro turns variadic arguments into an
array of u64, making it more natural to call the bpf_snprintf helper.
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210419155243.1632274-6-revest@chromium.org
Florent Revest [Mon, 19 Apr 2021 15:52:41 +0000 (17:52 +0200)]
libbpf: Initialize the bpf_seq_printf parameters array field by field
When initializing the __param array with a one liner, if all args are
const, the initial array value will be placed in the rodata section but
because libbpf does not support relocation in the rodata section, any
pointer in this array will stay NULL.
Fixes:
c09add2fbc5a ("tools/libbpf: Add bpf_iter support")
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210419155243.1632274-5-revest@chromium.org
Florent Revest [Mon, 19 Apr 2021 15:52:40 +0000 (17:52 +0200)]
bpf: Add a bpf_snprintf helper
The implementation takes inspiration from the existing bpf_trace_printk
helper but there are a few differences:
To allow for a large number of format-specifiers, parameters are
provided in an array, like in bpf_seq_printf.
Because the output string takes two arguments and the array of
parameters also takes two arguments, the format string needs to fit in
one argument. Thankfully, ARG_PTR_TO_CONST_STR is guaranteed to point to
a zero-terminated read-only map so we don't need a format string length
arg.
Because the format-string is known at verification time, we also do
a first pass of format string validation in the verifier logic. This
makes debugging easier.
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210419155243.1632274-4-revest@chromium.org
Florent Revest [Mon, 19 Apr 2021 15:52:39 +0000 (17:52 +0200)]
bpf: Add a ARG_PTR_TO_CONST_STR argument type
This type provides the guarantee that an argument is going to be a const
pointer to somewhere in a read-only map value. It also checks that this
pointer is followed by a zero character before the end of the map value.
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210419155243.1632274-3-revest@chromium.org
Florent Revest [Mon, 19 Apr 2021 15:52:38 +0000 (17:52 +0200)]
bpf: Factorize bpf_trace_printk and bpf_seq_printf
Two helpers (trace_printk and seq_printf) have very similar
implementations of format string parsing and a third one is coming
(snprintf). To avoid code duplication and make the code easier to
maintain, this moves the operations associated with format string
parsing (validation and argument sanitization) into one generic
function.
The implementation of the two existing helpers already drifted quite a
bit so unifying them entailed a lot of changes:
- bpf_trace_printk always expected fmt[fmt_size] to be the terminating
NULL character, this is no longer true, the first 0 is terminating.
- bpf_trace_printk now supports %% (which produces the percentage char).
- bpf_trace_printk now skips width formating fields.
- bpf_trace_printk now supports the X modifier (capital hexadecimal).
- bpf_trace_printk now supports %pK, %px, %pB, %pi4, %pI4, %pi6 and %pI6
- argument casting on 32 bit has been simplified into one macro and
using an enum instead of obscure int increments.
- bpf_seq_printf now uses bpf_trace_copy_string instead of
strncpy_from_kernel_nofault and handles the %pks %pus specifiers.
- bpf_seq_printf now prints longs correctly on 32 bit architectures.
- both were changed to use a global per-cpu tmp buffer instead of one
stack buffer for trace_printk and 6 small buffers for seq_printf.
- to avoid per-cpu buffer usage conflict, these helpers disable
preemption while the per-cpu buffer is in use.
- both helpers now support the %ps and %pS specifiers to print symbols.
The implementation is also moved from bpf_trace.c to helpers.c because
the upcoming bpf_snprintf helper will be made available to all BPF
programs and will need it.
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210419155243.1632274-2-revest@chromium.org
Alexei Starovoitov [Thu, 15 Apr 2021 23:50:22 +0000 (16:50 -0700)]
Merge branch 'bpf: tools: support build selftests/bpf with clang'
Yonghong Song says:
====================
To build kernel with clang, people typically use
make -j60 LLVM=1 LLVM_IAS=1
LLVM_IAS=1 is not required for non-LTO build but
is required for LTO build. In my environment,
I am always having LLVM_IAS=1 regardless of
whether LTO is enabled or not.
After kernel is build with clang, the following command
can be used to build selftests with clang:
make -j60 -C tools/testing/selftests/bpf LLVM=1 LLVM_IAS=1
I am using latest bpf-next kernel code base and
latest clang built from source from
https://github.com/llvm/llvm-project.git
Using earlier version of llvm may have compilation errors, see
tools/testing/selftests/bpf
due to continuous development in llvm bpf features and selftests
to use these features.
To run bpf selftest properly, you need have certain necessary
kernel configs like at:
bpf-next:tools/testing/selftests/bpf/config
(not that this is not a complete .config file and some other configs
might still be needed.)
Currently, using the above command, some compilations
still use gcc and there are also compilation errors and warnings.
This patch set intends to fix these issues.
Patch #1 and #2 fixed the issue so clang/clang++ is
used instead of gcc/g++. Patch #3 fixed a compilation
failure. Patch #4 and #5 fixed various compiler warnings.
Changelog:
v2 -> v3:
. more test environment description in cover letter. (Sedat)
. use a different fix, but similar to other use in selftests/bpf
Makefile, to exclude header files from CXX compilation command
line. (Andrii)
. fix codes instead of adding -Wno-format-security. (Andrii)
v1 -> v2:
. add -Wno-unused-command-line-argument and -Wno-format-security
for clang only as (1). gcc does not exhibit those
warnings, and (2). -Wno-unused-command-line-argument is
only supported by clang. (Sedat)
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Yonghong Song [Tue, 13 Apr 2021 15:34:35 +0000 (08:34 -0700)]
bpftool: Fix a clang compilation warning
With clang compiler:
make -j60 LLVM=1 LLVM_IAS=1 <=== compile kernel
# build selftests/bpf or bpftool
make -j60 -C tools/testing/selftests/bpf LLVM=1 LLVM_IAS=1
make -j60 -C tools/bpf/bpftool LLVM=1 LLVM_IAS=1
the following compilation warning showed up,
net.c:160:37: warning: comparison of integers of different signs: '__u32' (aka 'unsigned int') and 'int' [-Wsign-compare]
for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len);
^~~~~~~~~~~~~~~~~
.../tools/include/uapi/linux/netlink.h:99:24: note: expanded from macro 'NLMSG_OK'
(nlh)->nlmsg_len <= (len))
~~~~~~~~~~~~~~~~ ^ ~~~
In this particular case, "len" is defined as "int" and (nlh)->nlmsg_len is "unsigned int".
The macro NLMSG_OK is defined as below in uapi/linux/netlink.h.
#define NLMSG_OK(nlh,len) ((len) >= (int)sizeof(struct nlmsghdr) && \
(nlh)->nlmsg_len >= sizeof(struct nlmsghdr) && \
(nlh)->nlmsg_len <= (len))
The clang compiler complains the comparision "(nlh)->nlmsg_len <= (len))",
but in bpftool/net.c, it is already ensured that "len > 0" must be true.
So theoretically the compiler could deduce that comparison of
"(nlh)->nlmsg_len" and "len" is okay, but this really depends on compiler
internals. Let us add an explicit type conversion (from "int" to "unsigned int")
for "len" in NLMSG_OK to silence this warning right now.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210413153435.3029635-1-yhs@fb.com
Yonghong Song [Tue, 13 Apr 2021 15:34:29 +0000 (08:34 -0700)]
selftests/bpf: Silence clang compilation warnings
With clang compiler:
make -j60 LLVM=1 LLVM_IAS=1 <=== compile kernel
make -j60 -C tools/testing/selftests/bpf LLVM=1 LLVM_IAS=1
Some linker flags are not used/effective for some binaries and
we have warnings like:
warning: -lelf: 'linker' input unused [-Wunused-command-line-argument]
We also have warnings like:
.../selftests/bpf/prog_tests/ns_current_pid_tgid.c:74:57: note: treat the string as an argument to avoid this
if (CHECK(waitpid(cpid, &wstatus, 0) == -1, "waitpid", strerror(errno)))
^
"%s",
.../selftests/bpf/test_progs.h:129:35: note: expanded from macro 'CHECK'
_CHECK(condition, tag, duration, format)
^
.../selftests/bpf/test_progs.h:108:21: note: expanded from macro '_CHECK'
fprintf(stdout, ##format); \
^
The first warning can be silenced with clang option -Wno-unused-command-line-argument.
For the second warning, source codes are modified as suggested by the compiler
to silence the warning. Since gcc does not support the option
-Wno-unused-command-line-argument and the warning only happens with clang
compiler, the option -Wno-unused-command-line-argument is enabled only when
clang compiler is used.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210413153429.3029377-1-yhs@fb.com
Yonghong Song [Tue, 13 Apr 2021 15:34:24 +0000 (08:34 -0700)]
selftests/bpf: Fix test_cpp compilation failure with clang
With clang compiler:
make -j60 LLVM=1 LLVM_IAS=1 <=== compile kernel
make -j60 -C tools/testing/selftests/bpf LLVM=1 LLVM_IAS=1
the test_cpp build failed due to the failure:
warning: treating 'c-header' input as 'c++-header' when in C++ mode, this behavior is deprecated [-Wdeprecated]
clang-13: error: cannot specify -o when generating multiple output files
test_cpp compilation flag looks like:
clang++ -g -Og -rdynamic -Wall -I<...> ... \
-Dbpf_prog_load=bpf_prog_test_load -Dbpf_load_program=bpf_test_load_program \
test_cpp.cpp <...>/test_core_extern.skel.h <...>/libbpf.a <...>/test_stub.o \
-lcap -lelf -lz -lrt -lpthread -o <...>/test_cpp
The clang++ compiler complains the header file in the command line and
also failed the compilation due to this.
Let us remove the header file from the command line which is not intended
any way, and this fixed the compilation problem.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210413153424.3028986-1-yhs@fb.com
Yonghong Song [Tue, 13 Apr 2021 15:34:19 +0000 (08:34 -0700)]
tools: Allow proper CC/CXX/... override with LLVM=1 in Makefile.include
selftests/bpf/Makefile includes tools/scripts/Makefile.include.
With the following command
make -j60 LLVM=1 LLVM_IAS=1 <=== compile kernel
make -j60 -C tools/testing/selftests/bpf LLVM=1 LLVM_IAS=1 V=1
some files are still compiled with gcc. This patch
fixed the case if CC/AR/LD/CXX/STRIP is allowed to be
overridden, it will be written to clang/llvm-ar/..., instead of
gcc binaries. The definition of CC_NO_CLANG is also relocated
to the place after the above CC is defined.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210413153419.3028165-1-yhs@fb.com
Yonghong Song [Tue, 13 Apr 2021 15:34:13 +0000 (08:34 -0700)]
selftests: Set CC to clang in lib.mk if LLVM is set
selftests/bpf/Makefile includes lib.mk. With the following command
make -j60 LLVM=1 LLVM_IAS=1 <=== compile kernel
make -j60 -C tools/testing/selftests/bpf LLVM=1 LLVM_IAS=1 V=1
some files are still compiled with gcc. This patch
fixed lib.mk issue which sets CC to gcc in all cases.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210413153413.3027426-1-yhs@fb.com
Alexei Starovoitov [Thu, 15 Apr 2021 14:18:17 +0000 (07:18 -0700)]
libbpf: Remove unused field.
relo->processed is set, but not used. Remove it.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210415141817.53136-1-alexei.starovoitov@gmail.com
zuoqilin [Wed, 14 Apr 2021 14:16:39 +0000 (22:16 +0800)]
tools/testing: Remove unused variable
Remove unused variable "ret2".
Signed-off-by: zuoqilin <zuoqilin@yulong.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210414141639.1446-1-zuoqilin1@163.com
Florent Revest [Wed, 14 Apr 2021 15:56:32 +0000 (17:56 +0200)]
selftests/bpf: Fix the ASSERT_ERR_PTR macro
It is just missing a ';'. This macro is not used by any test yet.
Fixes:
22ba36351631 ("selftests/bpf: Move and extend ASSERT_xxx() testing macros")
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210414155632.737866-1-revest@chromium.org
Toke Høiland-Jørgensen [Tue, 13 Apr 2021 09:16:07 +0000 (11:16 +0200)]
selftests/bpf: Add tests for target information in bpf_link info queries
Extend the fexit_bpf2bpf test to check that the info for the bpf_link
returned by the kernel matches the expected values.
While we're updating the test, change existing uses of CHEC() to use the
much easier to read ASSERT_*() macros.
v2:
- Convert last CHECK() call and get rid of 'duration' var
- Split ASSERT_OK_PTR() checks to two separate if statements
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210413091607.58945-2-toke@redhat.com
Toke Høiland-Jørgensen [Tue, 13 Apr 2021 09:16:06 +0000 (11:16 +0200)]
bpf: Return target info when a tracing bpf_link is queried
There is currently no way to discover the target of a tracing program
attachment after the fact. Add this information to bpf_link_info and return
it when querying the bpf_link fd.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210413091607.58945-1-toke@redhat.com
Ilya Leoshkevich [Tue, 13 Apr 2021 19:00:43 +0000 (21:00 +0200)]
bpf: Generate BTF_KIND_FLOAT when linking vmlinux
pahole v1.21 supports the --btf_gen_floats flag, which makes it
generate the information about the floating-point types [1].
Adjust link-vmlinux.sh to pass this flag to pahole in case it's
supported, which is determined using a simple version check.
[1] https://lore.kernel.org/dwarves/YHRiXNX1JUF2Az0A@kernel.org/
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210413190043.21918-1-iii@linux.ibm.com
Pedro Tammela [Mon, 12 Apr 2021 19:24:32 +0000 (16:24 -0300)]
libbpf: Clarify flags in ringbuf helpers
In 'bpf_ringbuf_reserve()' we require the flag to '0' at the moment.
For 'bpf_ringbuf_{discard,submit,output}' a flag of '0' might send a
notification to the process if needed.
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210412192434.944343-1-pctammela@mojatatu.com
Cong Wang [Thu, 8 Apr 2021 03:05:56 +0000 (20:05 -0700)]
sock_map: Fix a potential use-after-free in sock_map_close()
The last refcnt of the psock can be gone right after
sock_map_remove_links(), so sk_psock_stop() could trigger a UAF.
The reason why I placed sk_psock_stop() there is to avoid RCU read
critical section, and more importantly, some callee of
sock_map_remove_links() is supposed to be called with RCU read lock,
we can not simply get rid of RCU read lock here. Therefore, the only
choice we have is to grab an additional refcnt with sk_psock_get()
and put it back after sk_psock_stop().
Fixes:
799aa7f98d53 ("skmsg: Avoid lock_sock() in sk_psock_backlog()")
Reported-by: syzbot+7b6548ae483d6f4c64ae@syzkaller.appspotmail.com
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20210408030556.45134-1-xiyou.wangcong@gmail.com
Cong Wang [Wed, 7 Apr 2021 03:21:11 +0000 (20:21 -0700)]
skmsg: Pass psock pointer to ->psock_update_sk_prot()
Using sk_psock() to retrieve psock pointer from sock requires
RCU read lock, but we already get psock pointer before calling
->psock_update_sk_prot() in both cases, so we can just pass it
without bothering sk_psock().
Fixes:
8a59f9d1e3d4 ("sock: Introduce sk->sk_prot->psock_update_sk_prot()")
Reported-by: syzbot+320a3bc8d80f478c37e4@syzkaller.appspotmail.com
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: syzbot+320a3bc8d80f478c37e4@syzkaller.appspotmail.com
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210407032111.33398-1-xiyou.wangcong@gmail.com
Daniel Borkmann [Mon, 12 Apr 2021 15:19:00 +0000 (17:19 +0200)]
bpf: Sync bpf headers in tooling infrastucture
Synchronize tools/include/uapi/linux/bpf.h which was missing changes
from various commits:
-
f3c45326ee71 ("bpf: Document PROG_TEST_RUN limitations")
-
e5e35e754c28 ("bpf: BPF-helper for MTU checking add length input")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Joe Stringer [Sat, 10 Apr 2021 17:45:48 +0000 (10:45 -0700)]
bpf: Document PROG_TEST_RUN limitations
Per net/bpf/test_run.c, particular prog types have additional
restrictions around the parameters that can be provided, so document
these in the header.
I didn't bother documenting the limitation on duration for raw
tracepoints since that's an output parameter anyway.
Tested with ./tools/testing/selftests/bpf/test_doc_build.sh.
Suggested-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
Link: https://lore.kernel.org/bpf/20210410174549.816482-1-joe@cilium.io
Andrii Nakryiko [Fri, 9 Apr 2021 06:47:06 +0000 (23:47 -0700)]
Merge branch 'bpf/selftests: page size fixes'
Yauheni Kaliuta says:
====================
A set of fixes for selftests to make them working on systems with PAGE_SIZE > 4K
+ cleanup (version) and ringbuf_multi extention.
---
v3->v4:
- zero initialize BPF programs' static variables;
- add bpf_map__inner_map to libbpf.map in alphabetical order;
- add bpf_map__set_inner_map_fd test to ringbuf_multi;
v2->v3:
- reorder: move version removing patch first to keep main patches in
one group;
- rename "selftests/bpf: pass page size from userspace in sockopt_sk"
as suggested;
- convert sockopt_sk test to use ASSERT macros;
- set page size from userspace
- split patches to pairs userspace/bpf. It's easier to check that
every conversion works as expected;
v1->v2:
- add missed 'selftests/bpf: test_progs/sockopt_sk: Convert to use BPF skeleton'
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Yauheni Kaliuta [Thu, 8 Apr 2021 06:13:10 +0000 (09:13 +0300)]
selftests/bpf: ringbuf_multi: Test bpf_map__set_inner_map_fd
Test map__set_inner_map_fd() interaction with map-in-map
initialization. Use hashmap of maps just to make it different to
existing array of maps.
Signed-off-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210408061310.95877-9-yauheni.kaliuta@redhat.com
Yauheni Kaliuta [Thu, 8 Apr 2021 06:13:09 +0000 (09:13 +0300)]
selftests/bpf: ringbuf_multi: Use runtime page size
Set bpf table sizes dynamically according to the runtime page size
value.
Do not switch to ASSERT macros, keep CHECK, for consistency with the
rest of the test. Can be a separate cleanup patch.
Signed-off-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210408061310.95877-8-yauheni.kaliuta@redhat.com
Andrii Nakryiko [Thu, 8 Apr 2021 06:13:08 +0000 (09:13 +0300)]
libbpf: Add bpf_map__inner_map API
The API gives access to inner map for map in map types (array or
hash of map). It will be used to dynamically set max_entries in it.
Signed-off-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210408061310.95877-7-yauheni.kaliuta@redhat.com
Yauheni Kaliuta [Thu, 8 Apr 2021 06:13:07 +0000 (09:13 +0300)]
selftests/bpf: ringbuf: Use runtime page size
Replace hardcoded 4096 with runtime value in the userspace part of
the test and set bpf table sizes dynamically according to the value.
Do not switch to ASSERT macros, keep CHECK, for consistency with the
rest of the test. Can be a separate cleanup patch.
Signed-off-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210408061310.95877-6-yauheni.kaliuta@redhat.com
Yauheni Kaliuta [Thu, 8 Apr 2021 06:13:06 +0000 (09:13 +0300)]
selftests/bpf: mmap: Use runtime page size
Replace hardcoded 4096 with runtime value in the userspace part of
the test and set bpf table sizes dynamically according to the value.
Do not switch to ASSERT macros, keep CHECK, for consistency with the
rest of the test. Can be a separate cleanup patch.
Signed-off-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210408061310.95877-5-yauheni.kaliuta@redhat.com
Yauheni Kaliuta [Thu, 8 Apr 2021 06:13:05 +0000 (09:13 +0300)]
selftests/bpf: Pass page size from userspace in map_ptr
Use ASSERT to check result but keep CHECK where format was used to
report error.
Use bpf_map__set_max_entries() to set map size dynamically from
userspace according to page size.
Zero-initialize the variable in bpf prog, otherwise it will cause
problems on some versions of Clang.
Signed-off-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210408061310.95877-4-yauheni.kaliuta@redhat.com
Yauheni Kaliuta [Thu, 8 Apr 2021 06:13:04 +0000 (09:13 +0300)]
selftests/bpf: Pass page size from userspace in sockopt_sk
Since there is no convenient way for bpf program to get PAGE_SIZE
from inside of the kernel, pass the value from userspace.
Zero-initialize the variable in bpf prog, otherwise it will cause
problems on some versions of Clang.
Signed-off-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210408061310.95877-3-yauheni.kaliuta@redhat.com
Yauheni Kaliuta [Thu, 8 Apr 2021 06:13:03 +0000 (09:13 +0300)]
selftests/bpf: test_progs/sockopt_sk: Convert to use BPF skeleton
Switch the test to use BPF skeleton to save some boilerplate and
make it easy to access bpf program bss segment.
The latter will be used to pass PAGE_SIZE from userspace since there
is no convenient way for bpf program to get it from inside of the
kernel.
Signed-off-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210408061310.95877-2-yauheni.kaliuta@redhat.com
Yauheni Kaliuta [Thu, 8 Apr 2021 06:13:02 +0000 (09:13 +0300)]
selftests/bpf: test_progs/sockopt_sk: Remove version
As pointed by Andrii Nakryiko, _version is useless now, remove it.
Signed-off-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210408061310.95877-1-yauheni.kaliuta@redhat.com
Muhammad Usama Anjum [Mon, 5 Apr 2021 19:49:04 +0000 (00:49 +0500)]
bpf, inode: Remove second initialization of the bpf_preload_lock
bpf_preload_lock is already defined with DEFINE_MUTEX(). There is no
need to initialize it again. Remove the extraneous initialization.
Signed-off-by: Muhammad Usama Anjum <musamaanjum@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210405194904.GA148013@LEGION
Cong Wang [Sat, 3 Apr 2021 05:27:15 +0000 (22:27 -0700)]
bpf, udp: Remove some pointless comments
These comments in udp_bpf_update_proto() are copied from the
original TCP code and apparently do not apply to UDP. Just
remove them.
Reported-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210403052715.13854-1-xiyou.wangcong@gmail.com
Hengqi Chen [Mon, 5 Apr 2021 04:01:19 +0000 (12:01 +0800)]
libbpf: Fix KERNEL_VERSION macro
Add missing ')' for KERNEL_VERSION macro.
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210405040119.802188-1-hengqi.chen@gmail.com
Martin KaFai Lau [Sat, 3 Apr 2021 00:29:21 +0000 (17:29 -0700)]
bpf: selftests: Specify CONFIG_DYNAMIC_FTRACE in the testing config
The tracing test and the recent kfunc call test require
CONFIG_DYNAMIC_FTRACE. This patch adds it to the config file.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210403002921.3419721-1-kafai@fb.com
Yang Yingliang [Fri, 2 Apr 2021 01:26:34 +0000 (09:26 +0800)]
libbpf: Remove redundant semi-colon
Remove redundant semi-colon in finalize_btf_ext().
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210402012634.1965453-1-yangyingliang@huawei.com
Wan Jiabing [Thu, 1 Apr 2021 07:20:37 +0000 (15:20 +0800)]
bpf: Remove repeated struct btf_type declaration
struct btf_type is declared twice. One is declared at 35th line. The below
one is not needed, hence remove the duplicate.
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210401072037.995849-1-wanjiabing@vivo.com
Wan Jiabing [Thu, 1 Apr 2021 06:46:37 +0000 (14:46 +0800)]
bpf, cgroup: Delete repeated struct bpf_prog declaration
struct bpf_prog is declared twice. There is one declaration which is
independent on the macro at 18th line. So the below one is not needed
though. Remove the duplicate.
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210401064637.993327-1-wanjiabing@vivo.com
He Fengqing [Wed, 31 Mar 2021 07:51:35 +0000 (07:51 +0000)]
bpf: Remove unused parameter from ___bpf_prog_run
'stack' parameter is not used in ___bpf_prog_run() after
f696b8f471ec
("bpf: split bpf core interpreter"), the base address have been set to
FP reg. So consequently remove it.
Signed-off-by: He Fengqing <hefengqing@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210331075135.3850782-1-hefengqing@huawei.com
John Fastabend [Thu, 1 Apr 2021 22:25:56 +0000 (15:25 -0700)]
bpf, selftests: test_maps generating unrecognized data section
With a relatively recent clang master branch test_map skips a section,
libbpf: elf: skipping unrecognized data section(5) .rodata.str1.1
the cause is some pointless strings from bpf_printks in the BPF program
loaded during testing. After just removing the prints to fix above error
Daniel points out the program is a bit pointless and could be simply the
empty program returning SK_PASS.
Here we do just that and return simply SK_PASS. This program is used with
test_maps selftests to test insert/remove of a program into the sockmap
and sockhash maps. Its not testing actual functionality of the TCP
sockmap programs, these are tested from test_sockmap. So we shouldn't
lose in test coverage and fix above warnings. This original test was
added before test_sockmap existed and has been copied around ever since,
clean it up now.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/161731595664.74613.1603087410166945302.stgit@john-XPS-13-9370
Eric Dumazet [Fri, 2 Apr 2021 18:10:37 +0000 (11:10 -0700)]
tcp: reorder tcp_congestion_ops for better cache locality
Group all the often used fields in the first cache line,
to reduce cache line misses.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 2 Apr 2021 18:07:46 +0000 (11:07 -0700)]
net: reorganize fields in netns_mib
Order fields to increase locality for most used protocols.
udplite and icmp are moved at the end.
Same for proc_net_devsnmp6 which is not used in fast path.
This potentially saves one cache line miss for typical TCP/UDP over IPv4/IPv6.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Fri, 2 Apr 2021 11:44:42 +0000 (14:44 +0300)]
nfc: pn533: prevent potential memory corruption
If the "type_a->nfcid_len" is too large then it would lead to memory
corruption in pn533_target_found_type_a() when we do:
memcpy(nfc_tgt->nfcid1, tgt_type_a->nfcid_data, nfc_tgt->nfcid1_len);
Fixes:
c3b1e1e8a76f ("NFC: Export NFCID1 from pn533")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 2 Apr 2021 21:25:47 +0000 (14:25 -0700)]
Merge branch 'dpaa2-rx-copybreak'
Ioana Ciornei says:
====================
dpaa2-eth: add rx copybreak support
DMA unmapping, allocating a new buffer and DMA mapping it back on the
refill path is really not that efficient. Proper buffer recycling (page
pool, flipping the page and using the other half) cannot be done for
DPAA2 since it's not a ring based controller but it rather deals with
multiple queues which all get their buffers from the same buffer pool on
Rx.
To circumvent these limitations, add support for Rx copybreak in
dpaa2-eth.
Below you can find a summary of the tests that were run to end up
with the default rx copybreak value of 512.
A bit about the setup - a LS2088A SoC, 8 x Cortex A72 @ 1.8GHz, IPfwd
zero loss test @ 20Gbit/s throughput. I tested multiple frame sizes to
get an idea where is the break even point.
Here are 2 sets of results, (1) is the baseline and (2) is just
allocating a new skb for all frames sizes received (as if the copybreak
was even to the MTU). All numbers are in Mpps.
64 128 256 512 640 768 896
(1) 3.23 3.23 3.24 3.21 3.1 2.76 2.71
(2) 3.95 3.88 3.79 3.62 3.3 3.02 2.65
It seems that even for 512 bytes frame sizes it's comfortably better when
allocating a new skb. After that, we see diminishing rewards or even worse.
Changes in v2:
- properly marked dpaa2_eth_copybreak as static
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ioana Ciornei [Fri, 2 Apr 2021 09:55:32 +0000 (12:55 +0300)]
dpaa2-eth: export the rx copybreak value as an ethtool tunable
It's useful, especially for debugging purposes, to have the Rx copybreak
value changeable at runtime. Export it as an ethtool tunable.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ioana Ciornei [Fri, 2 Apr 2021 09:55:31 +0000 (12:55 +0300)]
dpaa2-eth: add rx copybreak support
DMA unmapping, allocating a new buffer and DMA mapping it back on the
refill path is really not that efficient. Proper buffer recycling (page
pool, flipping the page and using the other half) cannot be done for
DPAA2 since it's not a ring based controller but it rather deals with
multiple queues which all get their buffers from the same buffer pool on
Rx.
To circumvent these limitations, add support for Rx copybreak. For small
sized packets instead of creating a skb around the buffer in which the
frame was received, allocate a new sk buffer altogether, copy the
contents of the frame and release the initial page back into the buffer
pool.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ioana Ciornei [Fri, 2 Apr 2021 09:55:30 +0000 (12:55 +0300)]
dpaa2-eth: rename dpaa2_eth_xdp_release_buf into dpaa2_eth_recycle_buf
Rename the dpaa2_eth_xdp_release_buf function into dpaa2_eth_recycle_buf
since in the next patches we'll be using the same recycle mechanism for
the normal stack path beside for XDP_DROP.
Also, rename the array which holds the buffers to be recycled so that it
does not have any reference to XDP.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 2 Apr 2021 21:21:51 +0000 (14:21 -0700)]
Merge branch 'mptcp-misc'
Mat Martineau says:
====================
MPTCP: Miscellaneous changes
Here is a collection of patches from the MPTCP tree:
Patches 1 and 2 add some helpful MIB counters for connection
information.
Patch 3 cleans up some unnecessary checks.
Patch 4 is a new feature, support for the MP_TCPRST option. This option
is used when resetting one subflow within a MPTCP connection, and
provides a reason code that the recipient can use when deciding how to
adapt to the lost subflow.
Patches 5-7 update the existing MPTCP selftests to improve timeout
handling and to share better information when tests fail.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Matthieu Baerts [Thu, 1 Apr 2021 23:19:47 +0000 (16:19 -0700)]
selftests: mptcp: dump more info on mpjoin errors
Very occasionally, MPTCP selftests fail. Yeah, I saw that at least once!
Here we provide more details in case of errors with mptcp_join.sh script
like it was done with mptcp_connect.sh, see
commit
767389c8dd55 ("selftests: mptcp: dump more info on errors")
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matthieu Baerts [Thu, 1 Apr 2021 23:19:46 +0000 (16:19 -0700)]
selftests: mptcp: init nstat history
Not to be impacted by packets sent between sub-tests.
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matthieu Baerts [Thu, 1 Apr 2021 23:19:45 +0000 (16:19 -0700)]
selftests: mptcp: launch mptcp_connect with timeout
'mptcp_connect' already has a timeout for poll() but in some cases, it
is not enough.
With "timeout" tool, we will force the command to fail if it doesn't
finish on time. Thanks to that, the script will continue and display
details about the current state before marking the test as failed.
Displaying this state is very important to be able to understand the
issue. Best to have our CI reporting the issue than just "the test
hanged".
Note that in mptcp_connect.sh, we were using a long timeout to validate
the fact we cannot create a socket if a sysctl is set. We don't need
this timeout.
In diag.sh, we want to send signals to mptcp_connect instances that have
been started in the netns. But we cannot send this signal to 'timeout'
otherwise that will stop the timeout and messages telling us SIGUSR1 has
been received will be printed. Instead of trying to find the right PID
and storing them in an array, we can simply use the output of
'ip netns pids' which is all the PIDs we want to send signal to.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/160
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Thu, 1 Apr 2021 23:19:44 +0000 (16:19 -0700)]
mptcp: add mptcp reset option support
The MPTCP reset option allows to carry a mptcp-specific error code that
provides more information on the nature of a connection reset.
Reset option data received gets stored in the subflow context so it can
be sent to userspace via the 'subflow closed' netlink event.
When a subflow is closed, the desired error code that should be sent to
the peer is also placed in the subflow context structure.
If a reset is sent before subflow establishment could complete, e.g. on
HMAC failure during an MP_JOIN operation, the mptcp skb extension is
used to store the reset information.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 1 Apr 2021 23:19:43 +0000 (16:19 -0700)]
mptcp: remove unneeded check on first subflow
Currently we explicitly check for the first subflow being
NULL in a couple of places, even if we don't need any
special actions in such scenario.
Just drop the unneeded checks, to avoid confusion.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 1 Apr 2021 23:19:42 +0000 (16:19 -0700)]
mptcp: add active MPC mibs
We are not currently tracking the active MPTCP connection
attempts. Let's add the related counters.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 1 Apr 2021 23:19:41 +0000 (16:19 -0700)]
mptcp: add mib for token creation fallback
If the MPTCP protocol is unable to create a new token,
the socket fallback to plain TCP, let's keep track
of such events via a specific MIB.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 2 Apr 2021 21:18:33 +0000 (14:18 -0700)]
Merge branch 'ionic-ptp'
Shannon Nelson says:
====================
ionic: add PTP and hw clock support
This patchset adds support for accessing the DSC hardware clock and
for offloading PTP timestamping.
Tx packet timestamping happens through a separate Tx queue set up with
expanded completion descriptors that can report the timestamp.
Rx timestamping can happen either on all queues, or on a separate
timestamping queue when specific filtering is requested. Again, the
timestamps are reported with the expanded completion descriptors.
The timestamping offload ability is advertised but not enabled until an
OS service asks for it. At that time the driver's queues are reconfigured
to use the different completion descriptors and the private processing
queues as needed.
Reading the raw clock value comes through a new pair of values in the
device info registers in BAR0. These high and low values are interpreted
with help from new clock mask, mult, and shift values in the device
identity information.
First we add the ability to detect new queue features, then the handling
of the new descriptor sizes. After adding the new interface structures,
we start adding the support code, saving the advertising to the stack
for last.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Thu, 1 Apr 2021 17:56:10 +0000 (10:56 -0700)]
ionic: advertise support for hardware timestamps
Let the network stack know we've got support for timestamping
the packets.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Thu, 1 Apr 2021 17:56:09 +0000 (10:56 -0700)]
ionic: ethtool ptp stats
Add the new hwstamp stats to our ethtool stats output.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Thu, 1 Apr 2021 17:56:08 +0000 (10:56 -0700)]
ionic: add ethtool support for PTP
Add the get_ts_info() callback for ethtool support of
timestamping information.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Thu, 1 Apr 2021 17:56:07 +0000 (10:56 -0700)]
ionic: add and enable tx and rx timestamp handling
The Tx and Rx timestamped packets are handled through separate
queues. Here we set them up, service them, and tear them down
along with the normal Tx and Rx queues.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Thu, 1 Apr 2021 17:56:06 +0000 (10:56 -0700)]
ionic: set up hw timestamp queues
We do hardware timestamping through a separate Tx queue,
and optionally through a separate Rx queue. These queues
are allocated, freed, and tracked separately from the basic
queue arrays.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Thu, 1 Apr 2021 17:56:05 +0000 (10:56 -0700)]
ionic: add rx filtering for hw timestamp steering
Add handling of the new Rx packet classification filter type.
This simple bit of classification allows for steering packets
to a separate Rx queue for processing.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Thu, 1 Apr 2021 17:56:04 +0000 (10:56 -0700)]
ionic: link in the new hw timestamp code
These are changes to compile and link the new code, but no
new feature support is available or advertised yet.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Thu, 1 Apr 2021 17:56:03 +0000 (10:56 -0700)]
ionic: add hw timestamp support files
This adds the file of code for supporting Tx and Rx hardware
timestamps and the raw clock interface, but does not yet link
it in for compiling or use.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Thu, 1 Apr 2021 17:56:02 +0000 (10:56 -0700)]
ionic: split adminq post and wait calls
Split the wait part out of adminq_post_wait() into a separate
function so that a caller can have finer grain control over
the sequencing of operations and locking.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Thu, 1 Apr 2021 17:56:01 +0000 (10:56 -0700)]
ionic: add hw timestamp structs to interface
The interface for hardware timestamping includes a new FW
request, device identity fields, Tx and Rx queue feature bits, a
new Rx filter type, the beginnings of Rx packet classifications,
and hardware timestamp registers.
If the IONIC_ETH_HW_TIMESTAMP bit is shown in the
ionic_lif_config features bit string, then we have support
for the hw clock registers. If the IONIC_RXQ_F_HWSTAMP and
IONIC_TXQ_F_HWSTAMP features are shown in the ionic_q_identity
features, then the queues can support HW timestamps on packets.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Thu, 1 Apr 2021 17:56:00 +0000 (10:56 -0700)]
ionic: add handling of larger descriptors
In preparating for hardware timestamping, we need to support
large Tx and Rx completion descriptors. Here we add the new
queue feature ids and handling for the completion descriptor
sizes.
We only are adding support for the Rx 2x sized completion
descriptors in the general Rx queues for now as we will be
using it for PTP Rx support, and we don't have an immediate
use for the large descriptors in the general Tx queues yet;
it will be used in a special Tx queues added in one of the
next few patches.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Thu, 1 Apr 2021 17:55:59 +0000 (10:55 -0700)]
ionic: add new queue features to interface
Add queue feature extensions to prepare for features that
can be queue specific, in addition to the general queue
features already defined. While we're here, change the
existing feature ids from #defines to enum.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 2 Apr 2021 18:03:07 +0000 (11:03 -0700)]
Merge git://git./linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:
====================
pull-request: bpf-next 2021-04-01
The following pull-request contains BPF updates for your *net-next* tree.
We've added 68 non-merge commits during the last 7 day(s) which contain
a total of 70 files changed, 2944 insertions(+), 1139 deletions(-).
The main changes are:
1) UDP support for sockmap, from Cong.
2) Verifier merge conflict resolution fix, from Daniel.
3) xsk selftests enhancements, from Maciej.
4) Unstable helpers aka kernel func calling, from Martin.
5) Batches ops for LPM map, from Pedro.
6) Fix race in bpf_get_local_storage, from Yonghong.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>