Andrii Nakryiko [Thu, 16 Sep 2021 01:58:33 +0000 (18:58 -0700)]
libbpf: Allow skipping attach_func_name in bpf_program__set_attach_target()
Allow to use bpf_program__set_attach_target to only set target attach
program FD, while letting libbpf to use target attach function name from
SEC() definition. This might be useful for some scenarios where
bpf_object contains multiple related freplace BPF programs intended to
replace different sub-programs in target BPF program. In such case all
programs will have the same attach_prog_fd, but different
attach_func_name. It's convenient to specify such target function names
declaratively in SEC() definitions, but attach_prog_fd is a dynamic
runtime setting.
To simplify such scenario, allow bpf_program__set_attach_target() to
delay BTF ID resolution till the BPF program load time by providing NULL
attach_func_name. In that case the behavior will be similar to using
bpf_object_open_opts.attach_prog_fd (which is marked deprecated since
v0.7), but has the benefit of allowing more control by user in what is
attached to what. Such setup allows having BPF programs attached to
different target attach_prog_fd with target functions still declaratively
recorded in BPF source code in SEC() definitions.
Selftests changes in the next patch should make this more obvious.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210916015836.1248906-5-andrii@kernel.org
Andrii Nakryiko [Thu, 16 Sep 2021 01:58:32 +0000 (18:58 -0700)]
libbpf: Deprecated bpf_object_open_opts.relaxed_core_relocs
It's relevant and hasn't been doing anything for a long while now.
Deprecated it.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210916015836.1248906-4-andrii@kernel.org
Andrii Nakryiko [Thu, 16 Sep 2021 01:58:31 +0000 (18:58 -0700)]
selftests/bpf: Stop using relaxed_core_relocs which has no effect
relaxed_core_relocs option hasn't had any effect for a while now, stop
specifying it. Next patch marks it as deprecated.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210916015836.1248906-3-andrii@kernel.org
Andrii Nakryiko [Thu, 16 Sep 2021 01:58:30 +0000 (18:58 -0700)]
libbpf: Use pre-setup sec_def in libbpf_find_attach_btf_id()
Don't perform another search for sec_def inside
libbpf_find_attach_btf_id(), as each recognized bpf_program already has
prog->sec_def set.
Also remove unnecessary NULL check for prog->sec_name, as it can never
be NULL.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210916015836.1248906-2-andrii@kernel.org
Matteo Croce [Tue, 14 Sep 2021 23:54:00 +0000 (01:54 +0200)]
bpf: Update bpf_get_smp_processor_id() documentation
BPF programs run with migration disabled regardless of preemption, as
they are protected by migrate_disable(). Update the uapi documentation
accordingly.
Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210914235400.59427-1-mcroce@linux.microsoft.com
Grant Seltzer [Wed, 15 Sep 2021 02:19:52 +0000 (22:19 -0400)]
libbpf: Add sphinx code documentation comments
This adds comments above five functions in btf.h which document
their uses. These comments are of a format that doxygen and sphinx
can pick up and render. These are rendered by libbpf.readthedocs.org
Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210915021951.117186-1-grantseltzer@gmail.com
Yonghong Song [Wed, 15 Sep 2021 06:10:36 +0000 (23:10 -0700)]
selftests/bpf: Skip btf_tag test if btf_tag attribute not supported
Commit
c240ba287890 ("selftests/bpf: Add a test with a bpf
program with btf_tag attributes") added btf_tag selftest
to test BTF_KIND_TAG generation from C source code, and to
test kernel validation of generated BTF types.
But if an old clang (clang 13 or earlier) is used, the
following compiler warning may be seen:
progs/tag.c:23:20: warning: unknown attribute 'btf_tag' ignored
and the test itself is marked OK. The compiler warning is bad
and the test itself shouldn't be marked OK.
This patch added the check for btf_tag attribute support.
If btf_tag is not supported by the clang, the attribute will
not be used in the code and the test will be marked as skipped.
For example, with clang 13:
./test_progs -t btf_tag
#21 btf_tag:SKIP
Summary: 1/0 PASSED, 1 SKIPPED, 0 FAILED
The selftests/README.rst is updated to clarify when the btf_tag
test may be skipped.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210915061036.2577971-1-yhs@fb.com
Alexei Starovoitov [Wed, 15 Sep 2021 01:45:53 +0000 (18:45 -0700)]
Merge branch 'bpf: add support for new btf kind BTF_KIND_TAG'
Yonghong Song says:
====================
LLVM14 added support for a new C attribute ([1])
__attribute__((btf_tag("arbitrary_str")))
This attribute will be emitted to dwarf ([2]) and pahole
will convert it to BTF. Or for bpf target, this
attribute will be emitted to BTF directly ([3], [4]).
The attribute is intended to provide additional
information for
- struct/union type or struct/union member
- static/global variables
- static/global function or function parameter.
This new attribute can be used to add attributes
to kernel codes, e.g., pre- or post- conditions,
allow/deny info, or any other info in which only
the kernel is interested. Such attributes will
be processed by clang frontend and emitted to
dwarf, converting to BTF by pahole. Ultimiately
the verifier can use these information for
verification purpose.
The new attribute can also be used for bpf
programs, e.g., tagging with __user attributes
for function parameters, specifying global
function preconditions, etc. Such information
may help verifier to detect user program
bugs.
After this series, pahole dwarf->btf converter
will be enhanced to support new llvm tag
for btf_tag attribute. With pahole support,
we will then try to add a few real use case,
e.g., __user/__rcu tagging, allow/deny list,
some kernel function precondition, etc,
in the kernel.
In the rest of the series, Patches 1-2 had
kernel support. Patches 3-4 added
libbpf support. Patch 5 added bpftool
support. Patches 6-10 added various selftests.
Patch 11 added documentation for the new kind.
[1] https://reviews.llvm.org/
D106614
[2] https://reviews.llvm.org/
D106621
[3] https://reviews.llvm.org/
D106622
[4] https://reviews.llvm.org/
D109560
Changelog:
v2 -> v3:
- put NR_BTF_KINDS and BTF_KIND_MAX into enum as well
- check component_idx earlier (check_meta stage) in kernel
- add more tests
- fix misc nits
v1 -> v2:
- BTF ELF format changed in llvm ([4] above),
so cross-board change to use the new format.
- Clarified in commit message that BTF_KIND_TAG
is not emitted by bpftool btf dump format c.
- Fix various comments from Andrii.
====================
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Yonghong Song [Tue, 14 Sep 2021 22:31:03 +0000 (15:31 -0700)]
docs/bpf: Add documentation for BTF_KIND_TAG
Add BTF_KIND_TAG documentation in btf.rst.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210914223103.249100-1-yhs@fb.com
Yonghong Song [Tue, 14 Sep 2021 22:30:58 +0000 (15:30 -0700)]
selftests/bpf: Add a test with a bpf program with btf_tag attributes
Add a bpf program with btf_tag attributes. The program is
loaded successfully with the kernel. With the command
bpftool btf dump file ./tag.o
the following dump shows that tags are properly encoded:
[8] STRUCT 'key_t' size=12 vlen=3
'a' type_id=2 bits_offset=0
'b' type_id=2 bits_offset=32
'c' type_id=2 bits_offset=64
[9] TAG 'tag1' type_id=8 component_id=-1
[10] TAG 'tag2' type_id=8 component_id=-1
[11] TAG 'tag1' type_id=8 component_id=1
[12] TAG 'tag2' type_id=8 component_id=1
...
[21] FUNC_PROTO '(anon)' ret_type_id=2 vlen=1
'x' type_id=2
[22] FUNC 'foo' type_id=21 linkage=static
[23] TAG 'tag1' type_id=22 component_id=0
[24] TAG 'tag2' type_id=22 component_id=0
[25] TAG 'tag1' type_id=22 component_id=-1
[26] TAG 'tag2' type_id=22 component_id=-1
...
[29] VAR 'total' type_id=27, linkage=global
[30] TAG 'tag1' type_id=29 component_id=-1
[31] TAG 'tag2' type_id=29 component_id=-1
If an old clang compiler, which does not support btf_tag attribute,
is used, these btf_tag attributes will be silently ignored.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210914223058.248949-1-yhs@fb.com
Yonghong Song [Tue, 14 Sep 2021 22:30:52 +0000 (15:30 -0700)]
selftests/bpf: Test BTF_KIND_TAG for deduplication
Add unit tests for BTF_KIND_TAG deduplication for
- struct and struct member
- variable
- func and func argument
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210914223052.248535-1-yhs@fb.com
Yonghong Song [Tue, 14 Sep 2021 22:30:47 +0000 (15:30 -0700)]
selftests/bpf: Add BTF_KIND_TAG unit tests
Test good and bad variants of BTF_KIND_TAG encoding.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210914223047.248223-1-yhs@fb.com
Yonghong Song [Tue, 14 Sep 2021 22:30:41 +0000 (15:30 -0700)]
selftests/bpf: Change NAME_NTH/IS_NAME_NTH for BTF_KIND_TAG format
BTF_KIND_TAG ELF format has a component_idx which might have value -1.
test_btf may confuse it with common_type.name as NAME_NTH checkes
high 16bit to be 0xffff. Change NAME_NTH high 16bit check to be
0xfffe so it won't confuse with component_idx.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210914223041.248009-1-yhs@fb.com
Yonghong Song [Tue, 14 Sep 2021 22:30:36 +0000 (15:30 -0700)]
selftests/bpf: Test libbpf API function btf__add_tag()
Add btf_write tests with btf__add_tag() function.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210914223036.247560-1-yhs@fb.com
Yonghong Song [Tue, 14 Sep 2021 22:30:31 +0000 (15:30 -0700)]
bpftool: Add support for BTF_KIND_TAG
Added bpftool support to dump BTF_KIND_TAG information.
The new bpftool will be used in later patches to dump
btf in the test bpf program object file.
Currently, the tags are not emitted with
bpftool btf dump file <path> format c
and they are silently ignored. The tag information is
mostly used in the kernel for verification purpose and the kernel
uses its own btf to check. With adding these tags
to vmlinux.h, tags will be encoded in program's btf but
they will not be used by the kernel, at least for now.
So let us delay adding these tags to format C header files
until there is a real need.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210914223031.246951-1-yhs@fb.com
Yonghong Song [Tue, 14 Sep 2021 22:30:25 +0000 (15:30 -0700)]
libbpf: Add support for BTF_KIND_TAG
Add BTF_KIND_TAG support for parsing and dedup.
Also added sanitization for BTF_KIND_TAG. If BTF_KIND_TAG is not
supported in the kernel, sanitize it to INTs.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210914223025.246687-1-yhs@fb.com
Yonghong Song [Tue, 14 Sep 2021 22:30:20 +0000 (15:30 -0700)]
libbpf: Rename btf_{hash,equal}_int to btf_{hash,equal}_int_tag
This patch renames functions btf_{hash,equal}_int() to
btf_{hash,equal}_int_tag() so they can be reused for
BTF_KIND_TAG support. There is no functionality change for
this patch.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210914223020.245829-1-yhs@fb.com
Yonghong Song [Tue, 14 Sep 2021 22:30:15 +0000 (15:30 -0700)]
bpf: Support for new btf kind BTF_KIND_TAG
LLVM14 added support for a new C attribute ([1])
__attribute__((btf_tag("arbitrary_str")))
This attribute will be emitted to dwarf ([2]) and pahole
will convert it to BTF. Or for bpf target, this
attribute will be emitted to BTF directly ([3], [4]).
The attribute is intended to provide additional
information for
- struct/union type or struct/union member
- static/global variables
- static/global function or function parameter.
For linux kernel, the btf_tag can be applied
in various places to specify user pointer,
function pre- or post- condition, function
allow/deny in certain context, etc. Such information
will be encoded in vmlinux BTF and can be used
by verifier.
The btf_tag can also be applied to bpf programs
to help global verifiable functions, e.g.,
specifying preconditions, etc.
This patch added basic parsing and checking support
in kernel for new BTF_KIND_TAG kind.
[1] https://reviews.llvm.org/
D106614
[2] https://reviews.llvm.org/
D106621
[3] https://reviews.llvm.org/
D106622
[4] https://reviews.llvm.org/
D109560
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210914223015.245546-1-yhs@fb.com
Yonghong Song [Tue, 14 Sep 2021 22:30:09 +0000 (15:30 -0700)]
btf: Change BTF_KIND_* macros to enums
Change BTF_KIND_* macros to enums so they are encoded in dwarf and
appear in vmlinux.h. This will make it easier for bpf programs
to use these constants without macro definitions.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210914223009.245307-1-yhs@fb.com
Andrii Nakryiko [Tue, 14 Sep 2021 16:22:28 +0000 (09:22 -0700)]
selftests/bpf: Fix .gitignore to not ignore test_progs.c
List all possible test_progs flavors explicitly to avoid accidentally
ignoring valid source code files. In this case, test_progs.c was still
ignored after recent
809ed84de8b3 ("selftests/bpf: Whitelist test_progs.h
from .gitignore") fix that added exception only for test_progs.h.
Fixes:
74b5a5968fe8 ("selftests/bpf: Replace test_progs and test_maps w/ general rule")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210914162228.3995740-1-andrii@kernel.org
Jie Meng [Mon, 13 Sep 2021 21:13:37 +0000 (14:13 -0700)]
bpf,x64 Emit IMUL instead of MUL for x86-64
IMUL allows for multiple operands and saving and storing rax/rdx is no
longer needed. Signedness of the operands doesn't matter here because
the we only keep the lower 32/64 bit of the product for 32/64 bit
multiplications.
Signed-off-by: Jie Meng <jmeng@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210913211337.1564014-1-jmeng@fb.com
Alexei Starovoitov [Tue, 14 Sep 2021 22:49:24 +0000 (15:49 -0700)]
Merge branch 'libbpf: Streamline internal BPF program sections handling'
Andrii Nakryiko says:
====================
This small patch set performs internal refactorings around libbpf BPF program
ELF section definitions' handling. This is preparatory changes for further
changes around making libbpf BPF program section handling more strict but also
pluggable and customizable, as part of the libbpf 1.0 effort. See individual
patches for details.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko [Tue, 14 Sep 2021 01:47:33 +0000 (18:47 -0700)]
libbpf: Minimize explicit iterator of section definition array
Remove almost all the code that explicitly iterated BPF program section
definitions in favor of using find_sec_def(). The only remaining user of
section_defs is libbpf_get_type_names that has to iterate all of them to
construct its result.
Having one internal API entry point for section definitions will
simplify further refactorings around libbpf's program section
definitions parsing.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210914014733.2768-5-andrii@kernel.org
Andrii Nakryiko [Tue, 14 Sep 2021 01:47:32 +0000 (18:47 -0700)]
libbpf: Simplify BPF program auto-attach code
Remove the need to explicitly pass bpf_sec_def for auto-attachable BPF
programs, as it is already recorded at bpf_object__open() time for all
recognized type of BPF programs. This further reduces number of explicit
calls to find_sec_def(), simplifying further refactorings.
No functional changes are done by this patch.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210914014733.2768-4-andrii@kernel.org
Andrii Nakryiko [Tue, 14 Sep 2021 01:47:31 +0000 (18:47 -0700)]
libbpf: Ensure BPF prog types are set before relocations
Refactor bpf_object__open() sequencing to perform BPF program type
detection based on SEC() definitions before we get to relocations
collection. This allows to have more information about BPF program by
the time we get to, say, struct_ops relocation gathering. This,
subsequently, simplifies struct_ops logic and removes the need to
perform extra find_sec_def() resolution.
With this patch libbpf will require all struct_ops BPF programs to be
marked with SEC("struct_ops") or SEC("struct_ops/xxx") annotations.
Real-world applications are already doing that through something like
selftests's BPF_STRUCT_OPS() macro. This change streamlines libbpf's
internal handling of SEC() definitions and is in the sprit of
upcoming libbpf-1.0 section strictness changes ([0]).
[0] https://github.com/libbpf/libbpf/wiki/Libbpf:-the-road-to-v1.0#stricter-and-more-uniform-bpf-program-section-name-sec-handling
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210914014733.2768-3-andrii@kernel.org
Andrii Nakryiko [Tue, 14 Sep 2021 01:47:30 +0000 (18:47 -0700)]
selftests/bpf: Update selftests to always provide "struct_ops" SEC
Update struct_ops selftests to always specify "struct_ops" section
prefix. Libbpf will require a proper BPF program type set in the next
patch, so this prevents tests breaking.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210914014733.2768-2-andrii@kernel.org
Rafael David Tinoco [Sun, 12 Sep 2021 06:48:44 +0000 (03:48 -0300)]
libbpf: Introduce legacy kprobe events support
Allow kprobe tracepoint events creation through legacy interface, as the
kprobe dynamic PMUs support, used by default, was only created in v4.17.
Store legacy kprobe name in struct bpf_perf_link, instead of creating
a new "subclass" off of bpf_perf_link. This is ok as it's just two new
fields, which are also going to be reused for legacy uprobe support in
follow up patches.
Signed-off-by: Rafael David Tinoco <rafaeldtinoco@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210912064844.3181742-1-rafaeldtinoco@gmail.com
Andrii Nakryiko [Mon, 13 Sep 2021 22:23:09 +0000 (15:23 -0700)]
libbpf: Make libbpf_version.h non-auto-generated
Turn previously auto-generated libbpf_version.h header into a normal
header file. This prevents various tricky Makefile integration issues,
simplifies the overall build process, but also allows to further extend
it with some more versioning-related APIs in the future.
To prevent accidental out-of-sync versions as defined by libbpf.map and
libbpf_version.h, Makefile checks their consistency at build time.
Simultaneously with this change bump libbpf.map to v0.6.
Also undo adding libbpf's output directory into include path for
kernel/bpf/preload, bpftool, and resolve_btfids, which is not necessary
because libbpf_version.h is just a normal header like any other.
Fixes:
0b46b7550560 ("libbpf: Add LIBBPF_DEPRECATED_SINCE macro for scheduling API deprecations")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210913222309.3220849-1-andrii@kernel.org
Daniel Borkmann [Fri, 10 Sep 2021 09:19:00 +0000 (11:19 +0200)]
bpf, selftests: Replicate tailcall limit test for indirect call case
The tailcall_3 test program uses bpf_tail_call_static() where the JIT
would patch a direct jump. Add a new tailcall_6 test program replicating
exactly the same test just ensuring that bpf_tail_call() uses a map
index where the verifier cannot make assumptions this time.
In other words, this will now cover both on x86-64 JIT, meaning, JIT
images with emit_bpf_tail_call_direct() emission as well as JIT images
with emit_bpf_tail_call_indirect() emission.
# echo 1 > /proc/sys/net/core/bpf_jit_enable
# ./test_progs -t tailcalls
#136/1 tailcalls/tailcall_1:OK
#136/2 tailcalls/tailcall_2:OK
#136/3 tailcalls/tailcall_3:OK
#136/4 tailcalls/tailcall_4:OK
#136/5 tailcalls/tailcall_5:OK
#136/6 tailcalls/tailcall_6:OK
#136/7 tailcalls/tailcall_bpf2bpf_1:OK
#136/8 tailcalls/tailcall_bpf2bpf_2:OK
#136/9 tailcalls/tailcall_bpf2bpf_3:OK
#136/10 tailcalls/tailcall_bpf2bpf_4:OK
#136/11 tailcalls/tailcall_bpf2bpf_5:OK
#136 tailcalls:OK
Summary: 1/11 PASSED, 0 SKIPPED, 0 FAILED
# echo 0 > /proc/sys/net/core/bpf_jit_enable
# ./test_progs -t tailcalls
#136/1 tailcalls/tailcall_1:OK
#136/2 tailcalls/tailcall_2:OK
#136/3 tailcalls/tailcall_3:OK
#136/4 tailcalls/tailcall_4:OK
#136/5 tailcalls/tailcall_5:OK
#136/6 tailcalls/tailcall_6:OK
[...]
For interpreter, the tailcall_1-6 tests are passing as well. The later
tailcall_bpf2bpf_* are failing due lack of bpf2bpf + tailcall support
in interpreter, so this is expected.
Also, manual inspection shows that both loaded programs from tailcall_3
and tailcall_6 test case emit the expected opcodes:
* tailcall_3 disasm, emit_bpf_tail_call_direct():
[...]
b: push %rax
c: push %rbx
d: push %r13
f: mov %rdi,%rbx
12: movabs $0xffff8d3f5afb0200,%r13
1c: mov %rbx,%rdi
1f: mov %r13,%rsi
22: xor %edx,%edx _
24: mov -0x4(%rbp),%eax | limit check
2a: cmp $0x20,%eax |
2d: ja 0x0000000000000046 |
2f: add $0x1,%eax |
32: mov %eax,-0x4(%rbp) |_
38: nopl 0x0(%rax,%rax,1)
3d: pop %r13
3f: pop %rbx
40: pop %rax
41: jmpq 0xffffffffffffe377
[...]
* tailcall_6 disasm, emit_bpf_tail_call_indirect():
[...]
47: movabs $0xffff8d3f59143a00,%rsi
51: mov %edx,%edx
53: cmp %edx,0x24(%rsi)
56: jbe 0x0000000000000093 _
58: mov -0x4(%rbp),%eax | limit check
5e: cmp $0x20,%eax |
61: ja 0x0000000000000093 |
63: add $0x1,%eax |
66: mov %eax,-0x4(%rbp) |_
6c: mov 0x110(%rsi,%rdx,8),%rcx
74: test %rcx,%rcx
77: je 0x0000000000000093
79: pop %rax
7a: mov 0x30(%rcx),%rcx
7e: add $0xb,%rcx
82: callq 0x000000000000008e
87: pause
89: lfence
8c: jmp 0x0000000000000087
8e: mov %rcx,(%rsp)
92: retq
[...]
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Acked-by: Paul Chaignon <paul@cilium.io>
Link: https://lore.kernel.org/bpf/CAM1=_QRyRVCODcXo_Y6qOm1iT163HoiSj8U2pZ8Rj3hzMTT=HQ@mail.gmail.com
Link: https://lore.kernel.org/bpf/20210910091900.16119-1-daniel@iogearbox.net
Alexei Starovoitov [Mon, 13 Sep 2021 17:53:50 +0000 (10:53 -0700)]
Merge branch 'bpf: introduce bpf_get_branch_snapshot'
Song Liu says:
====================
Changes v6 => v7:
1. Improve/fix intel_pmu_snapshot_branch_stack() logic. (Peter).
Changes v5 => v6:
1. Add local_irq_save/restore to intel_pmu_snapshot_branch_stack. (Peter)
2. Remove buf and size check in bpf_get_branch_snapshot, move flags check
to later fo the function. (Peter, Andrii)
3. Revise comments for bpf_get_branch_snapshot in bpf.h (Andrii)
Changes v4 => v5:
1. Modify perf_snapshot_branch_stack_t to save some memcpy. (Andrii)
2. Minor fixes in selftests. (Andrii)
Changes v3 => v4:
1. Do not reshuffle intel_pmu_disable_all(). Use some inline to save LBR
entries. (Peter)
2. Move static_call(perf_snapshot_branch_stack) to the helper. (Alexei)
3. Add argument flags to bpf_get_branch_snapshot. (Andrii)
4. Make MAX_BRANCH_SNAPSHOT an enum (Andrii). And rename it as
PERF_MAX_BRANCH_SNAPSHOT
5. Make bpf_get_branch_snapshot similar to bpf_read_branch_records.
(Andrii)
6. Move the test target function to bpf_testmod. Updated kallsyms_find_next
to work properly with modules. (Andrii)
Changes v2 => v3:
1. Fix the use of static_call. (Peter)
2. Limit the use to perfmon version >= 2. (Peter)
3. Modify intel_pmu_snapshot_branch_stack() to use intel_pmu_disable_all
and intel_pmu_enable_all().
Changes v1 => v2:
1. Rename the helper as bpf_get_branch_snapshot;
2. Fix/simplify the use of static_call;
3. Instead of percpu variables, let intel_pmu_snapshot_branch_stack output
branch records to an output argument of type perf_branch_snapshot.
Branch stack can be very useful in understanding software events. For
example, when a long function, e.g. sys_perf_event_open, returns an errno,
it is not obvious why the function failed. Branch stack could provide very
helpful information in this type of scenarios.
This set adds support to read branch stack with a new BPF helper
bpf_get_branch_trace(). Currently, this is only supported in Intel systems.
It is also possible to support the same feaure for PowerPC.
The hardware that records the branch stace is not stopped automatically on
software events. Therefore, it is necessary to stop it in software soon.
Otherwise, the hardware buffers/registers will be flushed. One of the key
design consideration in this set is to minimize the number of branch record
entries between the event triggers and the hardware recorder is stopped.
Based on this goal, current design is different from the discussions in
original RFC [1]:
1) Static call is used when supported, to save function pointer
dereference;
2) intel_pmu_lbr_disable_all is used instead of perf_pmu_disable(),
because the latter uses about 10 entries before stopping LBR.
With current code, on Intel CPU, LBR is stopped after 7 branch entries
after fexit triggers:
ID: 0 from bpf_get_branch_snapshot+18 to intel_pmu_snapshot_branch_stack+0
ID: 1 from __brk_limit+
477143934 to bpf_get_branch_snapshot+0
ID: 2 from __brk_limit+
477192263 to __brk_limit+
477143880 # trampoline
ID: 3 from __bpf_prog_enter+34 to __brk_limit+
477192251
ID: 4 from migrate_disable+60 to __bpf_prog_enter+9
ID: 5 from __bpf_prog_enter+4 to migrate_disable+0
ID: 6 from bpf_testmod_loop_test+20 to __bpf_prog_enter+0
ID: 7 from bpf_testmod_loop_test+20 to bpf_testmod_loop_test+13
ID: 8 from bpf_testmod_loop_test+20 to bpf_testmod_loop_test+13
...
[1] https://lore.kernel.org/bpf/
20210818012937.
2522409-1-songliubraving@fb.com/
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Song Liu [Fri, 10 Sep 2021 18:33:52 +0000 (11:33 -0700)]
selftests/bpf: Add test for bpf_get_branch_snapshot
This test uses bpf_get_branch_snapshot from a fexit program. The test uses
a target function (bpf_testmod_loop_test) and compares the record against
kallsyms. If there isn't enough record matching kallsyms, the test fails.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210910183352.3151445-4-songliubraving@fb.com
Song Liu [Fri, 10 Sep 2021 18:33:51 +0000 (11:33 -0700)]
bpf: Introduce helper bpf_get_branch_snapshot
Introduce bpf_get_branch_snapshot(), which allows tracing pogram to get
branch trace from hardware (e.g. Intel LBR). To use the feature, the
user need to create perf_event with proper branch_record filtering
on each cpu, and then calls bpf_get_branch_snapshot in the bpf function.
On Intel CPUs, VLBR event (raw event 0x1b00) can be use for this.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210910183352.3151445-3-songliubraving@fb.com
Song Liu [Fri, 10 Sep 2021 18:33:50 +0000 (11:33 -0700)]
perf: Enable branch record for software events
The typical way to access branch record (e.g. Intel LBR) is via hardware
perf_event. For CPUs with FREEZE_LBRS_ON_PMI support, PMI could capture
reliable LBR. On the other hand, LBR could also be useful in non-PMI
scenario. For example, in kretprobe or bpf fexit program, LBR could
provide a lot of information on what happened with the function. Add API
to use branch record for software use.
Note that, when the software event triggers, it is necessary to stop the
branch record hardware asap. Therefore, static_call is used to remove some
branch instructions in this process.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/bpf/20210910183352.3151445-2-songliubraving@fb.com
Vadim Fedorenko [Thu, 9 Sep 2021 22:04:09 +0000 (01:04 +0300)]
selftests/bpf: Test new __sk_buff field hwtstamp
Analogous to the gso_segs selftests introduced in commit
d9ff286a0f59
("bpf: allow BPF programs access skb_shared_info->gso_segs field").
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210909220409.8804-3-vfedorenko@novek.ru
Vadim Fedorenko [Thu, 9 Sep 2021 22:04:08 +0000 (01:04 +0300)]
bpf: Add hardware timestamp field to __sk_buff
BPF programs may want to know hardware timestamps if NIC supports
such timestamping.
Expose this data as hwtstamp field of __sk_buff the same way as
gso_segs/gso_size. This field could be accessed from the same
programs as tstamp field, but it's read-only field. Explicit test
to deny access to padding data is added to bpf_skb_is_valid_access.
Also update BPF_PROG_TEST_RUN tests of the feature.
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210909220409.8804-2-vfedorenko@novek.ru
Daniel Borkmann [Fri, 10 Sep 2021 19:15:32 +0000 (21:15 +0200)]
Merge branch 'bpf-xsk-selftests'
Magnus Karlsson says:
====================
This patch set facilitates adding new tests as well as describing
existing ones in the xsk selftests suite and adds 3 new test suites at
the end. The idea is to isolate the run-time that executes the test
from the actual implementation of the test. Today, implementing a test
amounts to adding test specific if-statements all around the run-time,
which is not scalable or amenable for reuse. This patch set instead
introduces a test specification that is the only thing that a test
fills in. The run-time then gets this specification and acts upon it
completely unaware of what test it is executing. This way, we can get
rid of all test specific if-statements from the run-time and the
implementation of the test can be contained in a single function. This
hopefully makes it easier to add tests and for users to understand
what the test accomplishes.
As a recap of what the run-time does: each test is based on the
run-time launching two threads and connecting a veth link between the
two threads. Each thread opens an AF_XDP socket on that veth interface
and one of them sends traffic that the other one receives and
validates. Each thread has its own umem. Note that this behavior is
not changed by this patch set.
A test specification consists of several items. Most importantly:
* Two packet streams. One for Tx thread that specifies what traffic to
send and one for the Rx thread that specifies what that thread
should receive. If it receives exactly what is specified, the test
passes, otherwise it fails. A packet stream can also specify what
buffers in the umem that should be used by the Rx and Tx threads.
* What kind of AF_XDP sockets it should create and bind to what
interfaces
* How many times it should repeat the socket creation and destruction
* The name of the test
The interface for the test spec is the following:
void test_spec_init(struct test_spec *test, struct ifobject *ifobj_tx,
struct ifobject *ifobj_rx, enum test_mode mode);
/* Reset everything but the interface specifications and the mode */
void test_spec_reset(struct test_spec *test);
void test_spec_set_name(struct test_spec *test, const char *name);
Packet streams have the following interfaces:
struct pkt *pkt_stream_get_pkt(struct pkt_stream *pkt_stream, u32 pkt_nb)
struct pkt *pkt_stream_get_next_rx_pkt(struct pkt_stream *pkt_stream)
struct pkt_stream *pkt_stream_generate(struct xsk_umem_info *umem,
u32 nb_pkts, u32 pkt_len);
void pkt_stream_delete(struct pkt_stream *pkt_stream);
struct pkt_stream *pkt_stream_clone(struct xsk_umem_info *umem,
struct pkt_stream *pkt_stream);
/* Replaces all packets in the stream*/
void pkt_stream_replace(struct test_spec *test, u32 nb_pkts, u32 pkt_len);
/* Replaces every other packet in the stream */
void pkt_stream_replace_half(struct test_spec *test, u32 pkt_len, u32 offset);
/* For creating custom made packet streams */
void pkt_stream_generate_custom(struct test_spec *test, struct pkt *pkts,
u32 nb_pkts);
/* Restores the default packet stream */
void pkt_stream_restore_default(struct test_spec *test);
A test can then then in the most basic case described like this
(provided the test specification has been created before calling the
function):
static bool testapp_aligned(struct test_spec *test)
{
test_spec_set_name(test, "RUN_TO_COMPLETION");
testapp_validate_traffic(test);
}
Running the same test in unaligned mode would then look like this:
static bool testapp_unaligned(struct test_spec *test)
{
if (!hugepages_present(test->ifobj_tx)) {
ksft_test_result_skip("No 2M huge pages present.\n");
return false;
}
test_spec_set_name(test, "UNALIGNED_MODE");
test->ifobj_tx->umem->unaligned_mode = true;
test->ifobj_rx->umem->unaligned_mode = true;
/* Let half of the packets straddle a buffer boundrary */
pkt_stream_replace_half(test, PKT_SIZE,
XSK_UMEM__DEFAULT_FRAME_SIZE - 32);
/* Populate fill ring with addresses in the packet stream */
test->ifobj_rx->pkt_stream->use_addr_for_fill = true;
testapp_validate_traffic(test);
pkt_stream_restore_default(test);
return true;
}
3 of the last 4 patches in the set add 3 new test suites, one for
unaligned mode, one for testing the rejection of tricky invalid
descriptors plus the acceptance of some valid ones in the Tx ring, and
one for testing 2K frame sizes (the default is 4K).
What is left to do for follow-up patches:
* Convert the statistics tests to the new framework.
* Implement a way of registering new tests without having the enum
test_type. Once this has been done (together with the previous
bullet), all the test types can be dropped from the header
file. This means that we should be able to add tests by just writing
a single function with a new test specification, which is one of the
goals.
* Introduce functions for manipulating parts of the test or interface
spec instead of direct manipulations such as
test->ifobj_rx->pkt_stream->use_addr_for_fill = true; which is kind
of awkward.
* Move the run-time and its interface to its own .c and .h files. Then
we can have all the tests in a separate file.
* Better error reporting if a test fails. Today it does not state what
test fails and might not continue execute the rest of the tests due
to this failure. Failures are not propagated upwards through the
functions so a failed test will also be a passed test, which messes
up the stats counting. This needs to be changed.
* Add option to run specific test instead of all of them
* Introduce pacing of sent packets so that they are never dropped
by the receiver even if it is stalled for some reason. If you run
the current tests on a heavily loaded system, they might fail in SKB
mode due to packets being dropped by the driver on Tx. Though I have
never seen it, it might happen.
v1 -> v2:
* Fixed a number of spelling errors [Maciej]
* Fixed use after free bug in pkt_stream_replace() [Maciej]
* pkt_stream_set -> pkt_stream_generate_custom [Maciej]
* Fixed formatting problem in testapp_invalid_desc() [Maciej]
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Magnus Karlsson [Tue, 7 Sep 2021 07:19:28 +0000 (09:19 +0200)]
selftests: xsk: Add tests for 2K frame size
Add tests for 2K frame size. Both a standard send and receive test and
one testing for invalid descriptors when the frame size is 2K.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210907071928.9750-21-magnus.karlsson@gmail.com
Magnus Karlsson [Tue, 7 Sep 2021 07:19:27 +0000 (09:19 +0200)]
selftests: xsk: Add tests for invalid xsk descriptors
Add tests for invalid xsk descriptors in the Tx ring. A number of
handcrafted nasty invalid descriptors are created and submitted to the
tx ring to check that they are validated correctly. Corner case valid
ones are also sent. The tests are run for both aligned and unaligned
mode.
pkt_stream_set() is introduced to be able to create a hand-crafted
packet stream where every single packet is specified in detail.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210907071928.9750-20-magnus.karlsson@gmail.com
Magnus Karlsson [Tue, 7 Sep 2021 07:19:26 +0000 (09:19 +0200)]
selftests: xsk: Eliminate test specific if-statement in test runner
Eliminate a test specific if-statement for the RX_FILL_EMTPY stats
test that is present in the test runner. We can do this as we now have
the use_addr_for_fill option. Just create and empty Rx packet stream
and indicated that the test runner should use the addresses in that to
populate the fill ring. As there are no packets in the stream, the
fill ring will be empty and we will get the error stats that we want
to test.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210907071928.9750-19-magnus.karlsson@gmail.com
Magnus Karlsson [Tue, 7 Sep 2021 07:19:25 +0000 (09:19 +0200)]
selftests: xsk: Add test for unaligned mode
Add a test for unaligned mode in which packet buffers can be placed
anywhere within the umem. Some packets are made to straddle page
boundaries in order to check for correctness. On the Tx side, buffers
are now allocated according to the addresses found in the packet
stream. Thus, the placement of buffers can be controlled with the
boolean use_addr_for_fill in the packet stream.
One new pkt_stream interface is introduced: pkt_stream_replace_half()
that replaces every other packet in the default packet stream with the
specified new packet. The constant DEFAULT_OFFSET is also
introduced. It specifies at what offset from the start of a chunk a Tx
packet is placed by the sending thread. This is just to be able to
test that it is possible to send packets at an offset not equal to
zero.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210907071928.9750-18-magnus.karlsson@gmail.com
Magnus Karlsson [Tue, 7 Sep 2021 07:19:24 +0000 (09:19 +0200)]
selftests: xsk: Introduce replacing the default packet stream
Introduce the concept of a default packet stream that is the set of
packets sent by most tests. Then add the ability to replace it for a
test that would like to send or receive something else through the use
of the function pkt_stream_replace() and then restored with
pkt_stream_restore_default(). These are then used to convert the
STAT_TEST_TX_INVALID to use these new APIs.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210907071928.9750-17-magnus.karlsson@gmail.com
Magnus Karlsson [Tue, 7 Sep 2021 07:19:23 +0000 (09:19 +0200)]
selftests: xsk: Allow for invalid packets
Allow for invalid packets to be sent. These are verified by the Rx
thread not to be received. Or put in another way, if they are
received, the test will fail. This feature will be used to eliminate
an if statement for a stats test and will also be used by other tests
in later patches. The previous code could only deal with valid
packets.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210907071928.9750-16-magnus.karlsson@gmail.com
Magnus Karlsson [Tue, 7 Sep 2021 07:19:22 +0000 (09:19 +0200)]
selftests: xsk: Eliminate MAX_SOCKS define
Remove the MAX_SOCKS define as it always will be one for the forseable
future and the code does not work for any other case anyway.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210907071928.9750-15-magnus.karlsson@gmail.com
Magnus Karlsson [Tue, 7 Sep 2021 07:19:21 +0000 (09:19 +0200)]
selftests: xsx: Make pthreads local scope
Make the pthread_t variables local scope instead of global. No reason
for them to be global.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210907071928.9750-14-magnus.karlsson@gmail.com
Magnus Karlsson [Tue, 7 Sep 2021 07:19:20 +0000 (09:19 +0200)]
selftests: xsk: Make xdp_flags and bind_flags local
Make xdp_flags and bind_flags local instead of global by moving them
into the interface object. These flags decide if the socket should be
created in SKB mode or in DRV mode and therefore they are sticky and
will survive a test_spec_reset. Since every test is first run in SKB
mode then in DRV mode, this change only happens once. With this
change, the configured_mode global variable can also be
erradicated. The first test_spec_init() also becomes superfluous and
can be eliminated.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210907071928.9750-13-magnus.karlsson@gmail.com
Magnus Karlsson [Tue, 7 Sep 2021 07:19:19 +0000 (09:19 +0200)]
selftests: xsk: Specify number of sockets to create
Add the ability in the test specification to specify numbers of
sockets to create. The default is one socket. This is then used to
remove test specific if-statements around the bpf_res tests.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210907071928.9750-12-magnus.karlsson@gmail.com
Magnus Karlsson [Tue, 7 Sep 2021 07:19:18 +0000 (09:19 +0200)]
selftests: xsk: Replace second_step global variable
Replace the second_step global variable with a test specification
variable called total_steps that a test can be set to indicate how
many times the packet stream should be sent without reinitializing any
sockets. This eliminates test specific code in the test runner around
the bidirectional test.
The total_steps variable is 1 by default as most tests only need a
single round of packets.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210907071928.9750-11-magnus.karlsson@gmail.com
Magnus Karlsson [Tue, 7 Sep 2021 07:19:17 +0000 (09:19 +0200)]
selftests: xsk: Introduce rx_on and tx_on in ifobject
Introduce rx_on and tx_on in the ifobject so that we can describe if
the thread should create a socket with only tx, rx, or both. This
eliminates some test specific if statements from the code. We can also
eliminate the flow vector structure now as this is fully specified
by the tx_on and rx_on variables.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210907071928.9750-10-magnus.karlsson@gmail.com
Magnus Karlsson [Tue, 7 Sep 2021 07:19:16 +0000 (09:19 +0200)]
selftests: xsk: Add use_poll to ifobject
Add a use_poll option to the ifobject so that we do not need to use a
test specific if-statement in the test runner.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210907071928.9750-9-magnus.karlsson@gmail.com
Magnus Karlsson [Tue, 7 Sep 2021 07:19:15 +0000 (09:19 +0200)]
selftests: xsx: Introduce test name in test spec
Introduce the test name in the test specification. This so we can set
the name locally in the test function and simplify the logic for
printing out test results.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210907071928.9750-8-magnus.karlsson@gmail.com
Magnus Karlsson [Tue, 7 Sep 2021 07:19:14 +0000 (09:19 +0200)]
selftests: xsk: Make frame_size configurable
Make the frame size configurable instead of it being hard coded to a
default. This is a property of the umem and will make it possible to
implement tests for different umem frame sizes in a later patch.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210907071928.9750-7-magnus.karlsson@gmail.com
Magnus Karlsson [Tue, 7 Sep 2021 07:19:13 +0000 (09:19 +0200)]
selftests: xsk: Move rxqsize into xsk_socket_info
Move the global variable rxqsize to struct xsk_socket_info as it
describes the size of a ring in that struct. By default, it is set to
the size dictated by libbpf.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210907071928.9750-6-magnus.karlsson@gmail.com
Magnus Karlsson [Tue, 7 Sep 2021 07:19:12 +0000 (09:19 +0200)]
selftests: xsk: Move num_frames and frame_headroom to xsk_umem_info
Move the global variables num_frames and frame_headroom to struct
xsk_umem_info. They describe properties of the umem so no reason for
them to be global.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210907071928.9750-5-magnus.karlsson@gmail.com
Magnus Karlsson [Tue, 7 Sep 2021 07:19:11 +0000 (09:19 +0200)]
selftests: xsk: Introduce test specifications
Introduce a test specification to be able to concisely describe a
test. Currently, a test is implemented by sprinkling test specific if
statements here and there, which is not scalable or easy to
understand. The end goal with this patch set is to come to the point
in which a test is completely specified by a test specification that
can easily be constructed in a single function so that new tests can
be added without too much trouble. This test specification will be run
by a test runner that has no idea about tests. It just executes the
what test specification states.
This patch introduces the test specification and, as a start, puts the
two interface objects in there, one containing the packet stream to be
sent and the other one the packet stream that is supposed to be
received for a test to pass. The global variables containing these can
then be eliminated. The following patches will convert each existing
test into a test specification and add the needed fields into it and
the functionality in the test runner that act on the test
specification. At the end, the test runner should contain no test
specific code and each test should be described in a single simple
function.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210907071928.9750-4-magnus.karlsson@gmail.com
Magnus Karlsson [Tue, 7 Sep 2021 07:19:10 +0000 (09:19 +0200)]
selftests: xsk: Introduce type for thread function
Introduce a typedef of the thread function so this can be passed to
init_iface() in order to simplify that function.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210907071928.9750-3-magnus.karlsson@gmail.com
Magnus Karlsson [Tue, 7 Sep 2021 07:19:09 +0000 (09:19 +0200)]
selftests: xsk: Simplify xsk and umem arrays
Simplify the xsk_info and umem_info allocation by allocating them
upfront in an array, instead of allocating an array of pointers to
future creations of these. Allocating them upfront also has the
advantage that configuration information can be stored in these
structures instead of relying on global variables. With the previous
structure, xsk_info and umem_info were created too late to be able to
store most configuration information. This will be used to eliminate
most global variables in later patches in this series.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210907071928.9750-2-magnus.karlsson@gmail.com
Quentin Monnet [Wed, 8 Sep 2021 21:32:26 +0000 (14:32 -0700)]
libbpf: Add LIBBPF_DEPRECATED_SINCE macro for scheduling API deprecations
Introduce a macro LIBBPF_DEPRECATED_SINCE(major, minor, message) to prepare
the deprecation of two API functions. This macro marks functions as deprecated
when libbpf's version reaches the values passed as an argument.
As part of this change libbpf_version.h header is added with recorded major
(LIBBPF_MAJOR_VERSION) and minor (LIBBPF_MINOR_VERSION) libbpf version macros.
They are now part of libbpf public API and can be relied upon by user code.
libbpf_version.h is installed system-wide along other libbpf public headers.
Due to this new build-time auto-generated header, in-kernel applications
relying on libbpf (resolve_btfids, bpftool, bpf_preload) are updated to
include libbpf's output directory as part of a list of include search paths.
Better fix would be to use libbpf's make_install target to install public API
headers, but that clean up is left out as a future improvement. The build
changes were tested by building kernel (with KBUILD_OUTPUT and O= specified
explicitly), bpftool, libbpf, selftests/bpf, and resolve_btfids builds. No
problems were detected.
Note that because of the constraints of the C preprocessor we have to write
a few lines of macro magic for each version used to prepare deprecation (0.6
for now).
Also, use LIBBPF_DEPRECATED_SINCE() to schedule deprecation of
btf__get_from_id() and btf__load(), which are replaced by
btf__load_from_kernel_by_id() and btf__load_into_kernel(), respectively,
starting from future libbpf v0.6. This is part of libbpf 1.0 effort ([0]).
[0] Closes: https://github.com/libbpf/libbpf/issues/278
Co-developed-by: Quentin Monnet <quentin@isovalent.com>
Co-developed-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210908213226.1871016-1-andrii@kernel.org
Andrii Nakryiko [Tue, 7 Sep 2021 22:10:23 +0000 (15:10 -0700)]
libbpf: Fix build with latest gcc/binutils with LTO
After updating to binutils 2.35, the build began to fail with an
assembler error. A bug was opened on the Red Hat Bugzilla a few days
later for the same issue.
Work around the problem by using the new `symver` attribute (introduced
in GCC 10) as needed instead of assembler directives.
This addresses Red Hat ([0]) and OpenSUSE ([1]) bug reports, as well as libbpf
issue ([2]).
[0]: https://bugzilla.redhat.com/show_bug.cgi?id=
1863059
[1]: https://bugzilla.opensuse.org/show_bug.cgi?id=
1188749
[2]: Closes: https://github.com/libbpf/libbpf/issues/338
Co-developed-by: Patrick McCarty <patrick.mccarty@intel.com>
Co-developed-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Patrick McCarty <patrick.mccarty@intel.com>
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210907221023.2660953-1-andrii@kernel.org
Andrii Nakryiko [Wed, 8 Sep 2021 00:32:10 +0000 (17:32 -0700)]
Merge branch 'Bpf skeleton helper method'
Matt Smith says:
====================
This patch series changes the type of bpf_object_skeleton->data
to const void * and provides a helper method X__elf_bytes(size_t *sz)
for accessing the raw binary data of the compiled embedded BPF object.
The type change enforces the previously implied behavior of immutability
for this field while casting it to (void *) before assignment allows
for compiling with previous versions of the libbpf headers without
compiler warnings.
The helper method allows easier access to the BPF binary object data
and is leveraged to populate the skeleton field. The inclusion of
this helper method will allow users to get access to the data without
needing to populate an entire skeleton first.
Checks are added in the third patch to validate the behavior of the
added method
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Matt Smith [Wed, 1 Sep 2021 19:44:39 +0000 (12:44 -0700)]
selftests/bpf: Add checks for X__elf_bytes() skeleton helper
This patch adds two checks for the X__elf_bytes BPF skeleton helper
method. The first asserts that the pointer returned from the helper
method is valid, the second asserts that the provided size pointer is
set.
Signed-off-by: Matt Smith <alastorze@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210901194439.3853238-4-alastorze@fb.com
Matt Smith [Wed, 1 Sep 2021 19:44:38 +0000 (12:44 -0700)]
bpftool: Provide a helper method for accessing skeleton's embedded ELF data
This adds a skeleton method X__elf_bytes() which returns the binary data of
the compiled and embedded BPF object file. It additionally sets the size of
the return data to the provided size_t pointer argument.
The assignment to s->data is cast to void * to ensure no warning is issued if
compiled with a previous version of libbpf where the bpf_object_skeleton field
is void * instead of const void *
Signed-off-by: Matt Smith <alastorze@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210901194439.3853238-3-alastorze@fb.com
Matt Smith [Wed, 1 Sep 2021 19:44:37 +0000 (12:44 -0700)]
libbpf: Change bpf_object_skeleton data field to const pointer
This change was necessary to enforce the implied contract
that bpf_object_skeleton->data should not be mutated. The data
will be cast to `void *` during assignment to handle the case
where a user is compiling with older libbpf headers to avoid
a compiler warning of `const void *` data being cast to `void *`
Signed-off-by: Matt Smith <alastorze@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210901194439.3853238-2-alastorze@fb.com
Toke Høiland-Jørgensen [Wed, 1 Sep 2021 11:48:12 +0000 (13:48 +0200)]
libbpf: Don't crash on object files with no symbol tables
If libbpf encounters an ELF file that has been stripped of its symbol
table, it will crash in bpf_object__add_programs() when trying to
dereference the obj->efile.symbols pointer.
Fix this by erroring out of bpf_object__elf_collect() if it is not able
able to find the symbol table.
v2:
- Move check into bpf_object__elf_collect() and add nice error message
Fixes:
6245947c1b3c ("libbpf: Allow gaps in BPF program sections to support overriden weak functions")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210901114812.204720-1-toke@redhat.com
Neil Spring [Tue, 31 Aug 2021 03:33:56 +0000 (20:33 -0700)]
bpf: Permit ingress_ifindex in bpf_prog_test_run_xattr
bpf_prog_test_run_xattr takes a struct __sk_buff, but did not permit
that __skbuff to include an nonzero ingress_ifindex.
This patch updates to allow ingress_ifindex, convert the __sk_buff field to
sk_buff (skb_iif) and back, and tests that the value is present from on BPF
program side. The test sets an unlikely distinct value for ingress_ifindex
(11) from ifindex (1), which is in line with the rest of the synthetic field
tests.
Adding this support allows testing BPF that operates differently on
incoming and outgoing skbs by discriminating on this field.
Signed-off-by: Neil Spring <ntspring@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210831033356.1459316-1-ntspring@fb.com
Linus Torvalds [Sun, 5 Sep 2021 18:56:18 +0000 (11:56 -0700)]
Merge tag 'perf-tools-for-v5.15-2021-09-04' of git://git./linux/kernel/git/acme/linux
Pull perf tool updates from Arnaldo Carvalho de Melo:
"New features:
- Improvements for the flamegraph python script, including:
- Display perf.data header
- Display PIDs of user stacks
- Added option to change color scheme
- Default to blue/green color scheme to improve accessibility
- Correctly identify kernel stacks when debuginfo is available
- Improvements for 'perf bench futex':
- Add --mlockall parameter
- Add --broadcast and --pi to the 'requeue' sub benchmark
- Add support for PMU aliases.
- Introduce an ARM Coresight ETE decoder.
- Add a 'perf bench' entry for evlist open/close operations, to help
quantify improvements with multithreading 'perf record'.
- Allow reporting the [un]throttle PERF_RECORD_ meta event in 'perf
script's python scripting.
- Add a 'perf test' entry for PMU aliases.
- Add a 'perf test' entry for 'perf record/perf report/perf script'
pipe mode.
Fixes:
- perf script dlfilter (API for filtering via dynamically loaded
shared object introduced in v5.14) fixes and a 'perf test' entry
for it.
- Fix get_current_dir_name() compilation on Android.
- Fix issues with asciidoc and double dashes uses.
- Fix memory leaks in the BTF handling code.
- Fix leftover problems in the Documentation from the infrastructure
originally lifted from the git codebase.
- Fix *probe_vfs_getname.sh 'perf test' failures.
- Handle fd gaps in 'perf test's test__dso_data_reopen().
- Make sure to show disasembly warnings for 'perf annotate --stdio'.
- Fix output from pipe to file and vice-versa in 'perf
record/report/script'.
- Correct 'perf data -h' output.
- Fix wrong comm in system-wide mode with 'perf record --delay'.
- Do not allow --for-each-cgroup without cpu in 'perf stat'
- Make 'perf test --skip' work on shell tests.
- Fix libperf's verbose printing.
Misc improvements:
- Preparatory patches for multithreading various 'perf record' phases
(synthesizing, opening, recording, etc).
- Add sparse context/locking annotations in compiler-types.h, also to
help with the multithreading effort.
- Optimize the generation of the arch specific erno tables used in
'perf trace'.
- Optimize libperf's perf_cpu_map__max().
- Improve ARM's CoreSight warnings.
- Report collisions in AUX records.
- Improve warnings for the LLVM 'perf test' entry.
- Improve the PMU events 'perf test' codebase.
- perf test: Do not compare overheads in the zstd comp test
- Better support annotation on ARM.
- Update 'perf trace's cmd string table to decode sys_bpf() first
arg.
Vendor events:
- Add JSON events and metrics for Intel's Ice Lake, Tiger Lake and
Elhart Lake.
- Update JSON eventsand metrics for Intel's Cascade Lake and Sky Lake
servers.
Hardware tracing:
- Improvements for the ARM hardware tracing auxtrace support"
* tag 'perf-tools-for-v5.15-2021-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (130 commits)
perf tests: Add test for PMU aliases
perf pmu: Add PMU alias support
perf session: Report collisions in AUX records
perf script python: Allow reporting the [un]throttle PERF_RECORD_ meta event
perf build: Report failure for testing feature libopencsd
perf cs-etm: Show a warning for an unknown magic number
perf cs-etm: Print the decoder name
perf cs-etm: Create ETE decoder
perf cs-etm: Update OpenCSD decoder for ETE
perf cs-etm: Fix typo
perf cs-etm: Save TRCDEVARCH register
perf cs-etm: Refactor out ETMv4 header saving
perf cs-etm: Initialise architecture based on TRCIDR1
perf cs-etm: Refactor initialisation of decoder params.
tools build: Fix feature detect clean for out of source builds
perf evlist: Add evlist__for_each_entry_from() macro
perf evsel: Handle precise_ip fallback in evsel__open_cpu()
perf evsel: Move bpf_counter__install_pe() to success path in evsel__open_cpu()
perf evsel: Move test_attr__open() to success path in evsel__open_cpu()
perf evsel: Move ignore_missing_thread() to fallback code
...
Linus Torvalds [Sun, 5 Sep 2021 18:50:41 +0000 (11:50 -0700)]
Merge tag 'trace-v5.15' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
- simplify the Kconfig use of FTRACE and TRACE_IRQFLAGS_SUPPORT
- bootconfig can now start histograms
- bootconfig supports group/all enabling
- histograms now can put values in linear size buckets
- execnames can be passed to synthetic events
- introduce "event probes" that attach to other events and can retrieve
data from pointers of fields, or record fields as different types (a
pointer to a string as a string instead of just a hex number)
- various fixes and clean ups
* tag 'trace-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (35 commits)
tracing/doc: Fix table format in histogram code
selftests/ftrace: Add selftest for testing duplicate eprobes and kprobes
selftests/ftrace: Add selftest for testing eprobe events on synthetic events
selftests/ftrace: Add test case to test adding and removing of event probe
selftests/ftrace: Fix requirement check of README file
selftests/ftrace: Add clear_dynamic_events() to test cases
tracing: Add a probe that attaches to trace events
tracing/probes: Reject events which have the same name of existing one
tracing/probes: Have process_fetch_insn() take a void * instead of pt_regs
tracing/probe: Change traceprobe_set_print_fmt() to take a type
tracing/probes: Use struct_size() instead of defining custom macros
tracing/probes: Allow for dot delimiter as well as slash for system names
tracing/probe: Have traceprobe_parse_probe_arg() take a const arg
tracing: Have dynamic events have a ref counter
tracing: Add DYNAMIC flag for dynamic events
tracing: Replace deprecated CPU-hotplug functions.
MAINTAINERS: Add an entry for os noise/latency
tracepoint: Fix kerneldoc comments
bootconfig/tracing/ktest: Update ktest example for boot-time tracing
tools/bootconfig: Use per-group/all enable option in ftrace2bconf script
...
Linus Torvalds [Sun, 5 Sep 2021 18:43:03 +0000 (11:43 -0700)]
Merge tag 'arc-5.15-rc1' of git://git./linux/kernel/git/vgupta/arc
Pull ARC updates from Vineet Gupta:
"Finally a big pile of changes for ARC (atomics/mm). These are from our
internal arc64 tree, preparing mainline for eventual arc64 support.
I'm spreading them out to avoid tsunami of patches in one release.
- MM rework:
- Implement up to 4 paging levels
- Enable STRICT_MM_TYPECHECK
- switch pgtable_t back to 'struct page *'
- Atomics rework / implement relaxed accessors
- Retire legacy MMUv1,v2; ARC750 cores
- A few other build errors, typos"
* tag 'arc-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: (33 commits)
ARC: mm: vmalloc sync from kernel to user table to update PMD ...
ARC: mm: support 4 levels of page tables
ARC: mm: support 3 levels of page tables
ARC: mm: switch to asm-generic/pgalloc.h
ARC: mm: switch pgtable_t back to struct page *
ARC: mm: hack to allow 2 level build with 4 level code
ARC: mm: disintegrate pgtable.h into levels and flags
ARC: mm: disintegrate mmu.h (arcv2 bits out)
ARC: mm: move MMU specific bits out of entry code ...
ARC: mm: move MMU specific bits out of ASID allocator
ARC: mm: non-functional code movement/cleanup
ARC: mm: pmd_populate* to use the canonical set_pmd (and drop pmd_set)
ARC: ioremap: use more commonly used PAGE_KERNEL based uncached flag
ARC: mm: Enable STRICT_MM_TYPECHECKS
ARC: mm: Fixes to allow STRICT_MM_TYPECHECKS
ARC: mm: move mmu/cache externs out to setup.h
ARC: mm: remove tlb paranoid code
ARC: mm: use SCRATCH_DATA0 register for caching pgdir in ARCv2 only
ARC: retire MMUv1 and MMUv2 support
ARC: retire ARC750 support
...
Linus Torvalds [Sun, 5 Sep 2021 18:31:23 +0000 (11:31 -0700)]
Merge tag 'riscv-for-linus-5.15-mw0' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
- support PC-relative instructions (auipc and branches) in kprobes
- support for forced IRQ threading
- support for the hlt/nohlt kernel command line options, via the
generic idle loop
- show the edge/level triggered behavior of interrupts
in /proc/interrupts
- a handful of cleanups to our address mapping mechanisms
- support for allocating gigantic hugepages via CMA
- support for the undefined behavior sanitizer (UBSAN)
- a handful of cleanups to the VDSO that allow the kernel to build with
LLD.
- support for hugepage migration
* tag 'riscv-for-linus-5.15-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (21 commits)
riscv: add support for hugepage migration
RISC-V: Fix VDSO build for !MMU
riscv: use strscpy to replace strlcpy
riscv: explicitly use symbol offsets for VDSO
riscv: Enable Undefined Behavior Sanitizer UBSAN
riscv: Keep the riscv Kconfig selects sorted
riscv: Support allocating gigantic hugepages using CMA
riscv: fix the global name pfn_base confliction error
riscv: Move early fdt mapping creation in its own function
riscv: Simplify BUILTIN_DTB device tree mapping handling
riscv: Use __maybe_unused instead of #ifdefs around variable declarations
riscv: Get rid of map_size parameter to create_kernel_page_table
riscv: Introduce va_kernel_pa_offset for 32-bit kernel
riscv: Optimize kernel virtual address conversion macro
dt-bindings: riscv: add starfive jh7100 bindings
riscv: Enable GENERIC_IRQ_SHOW_LEVEL
riscv: Enable idle generic idle loop
riscv: Allow forced irq threading
riscv: Implement thread_struct whitelist for hardened usercopy
riscv: kprobes: implement the branch instructions
...
Linus Torvalds [Sun, 5 Sep 2021 18:24:05 +0000 (11:24 -0700)]
Enable '-Werror' by default for all kernel builds
... but make it a config option so that broken environments can disable
it when required.
We really should always have a clean build, and will disable specific
over-eager warnings as required, if we can't fix them. But while I
fairly religiously enforce that in my own tree, it doesn't get enforced
by various build robots that don't necessarily report warnings.
So this just makes '-Werror' a default compiler flag, but allows people
to disable it for their configuration if they have some particular
issues.
Occasionally, new compiler versions end up enabling new warnings, and it
can take a while before we have them fixed (or the warnings disabled if
that is what it takes), so the config option allows for that situation.
Hopefully this will mean that I get fewer pull requests that have new
warnings that were not noticed by various automation we have in place.
Knock wood.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 5 Sep 2021 18:19:15 +0000 (11:19 -0700)]
Merge tag 'usb-5.15-rc1-2' of git://git./linux/kernel/git/gregkh/usb
Pull more USB updates from Greg KH:
"Here are some straggler USB-serial changes for 5.15-rc1.
These were not included in the first pull request as they came in
"late" from Johan and I had missed them in my pull request earlier
this week.
Nothing big in here, just some USB to serial driver updates and fixes.
All of these were in linux-next before I pulled them into my tree, and
have been in linux-next all this week from my tree with no reported
problems"
* tag 'usb-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: serial: pl2303: fix GL type detection
USB: serial: replace symbolic permissions by octal permissions
USB: serial: cp210x: determine fw version for CP2105 and CP2108
USB: serial: cp210x: clean up type detection
USB: serial: cp210x: clean up set-chars request
USB: serial: cp210x: clean up control-request timeout
USB: serial: cp210x: fix flow-control error handling
USB: serial: cp210x: fix control-characters error handling
USB: serial: io_edgeport: drop unused descriptor helper
Linus Torvalds [Sun, 5 Sep 2021 17:50:12 +0000 (10:50 -0700)]
Merge tag 'mtd/for-5.15' of git://git./linux/kernel/git/mtd/linux
Pull MTD updates from Miquel Raynal:
"MTD changes:
- blkdevs:
- Simplify the refcounting in blktrans_{open, release}
- Simplify blktrans_getgeo
- Remove blktrans_ref_mutex
- Simplify blktrans_dev_get
- Use lockdep_assert_held
- Don't hold del_mtd_blktrans_dev in blktrans_{open, release}
- ftl:
- Don't cast away the type when calling add_mtd_blktrans_dev
- Don't cast away the type when calling add_mtd_blktrans_dev
- Use container_of() rather than cast
- Fix use-after-free
- Add discard support
- Allow use of MTD_RAM for testing purposes
- concat:
- Check _read, _write callbacks existence before assignment
- Judge callback existence based on the master
- maps:
- Maps: remove dead MTD map driver for PMC-Sierra MSP boards
- mtdblock:
- Warn if added for a NAND device
- Add comment about UBI block devices
- Update old JFFS2 mention in Kconfig
- partitions:
- Redboot: convert to YAML
NAND core changes:
- Repair Miquel Raynal's email address in MAINTAINERS
- Fix a couple of spelling mistakes in Kconfig
- bbt: Skip bad blocks when searching for the BBT in NAND
- Remove never changed ret variable
Raw NAND changes:
- cafe: Fix a resource leak in the error handling path of 'cafe_nand_probe()'
- intel: Fix error handling in probe
- omap: Fix kernel doc warning on 'calcuate' typo
- gpmc: Fix the ECC bytes vs. OOB bytes equation
SPI-NAND core changes:
- Properly fill the OOB area.
- Fix comment
SPI-NAND drivers changes:
- macronix: Add Quad support for serial NAND flash"
* tag 'mtd/for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (30 commits)
mtd: rawnand: cafe: Fix a resource leak in the error handling path of 'cafe_nand_probe()'
mtd_blkdevs: simplify the refcounting in blktrans_{open, release}
mtd_blkdevs: simplify blktrans_getgeo
mtd_blkdevs: remove blktrans_ref_mutex
mtd_blkdevs: simplify blktrans_dev_get
mtd/rfd_ftl: don't cast away the type when calling add_mtd_blktrans_dev
mtd/ftl: don't cast away the type when calling add_mtd_blktrans_dev
mtd_blkdevs: use lockdep_assert_held
mtd_blkdevs: don't hold del_mtd_blktrans_dev in blktrans_{open, release}
mtd: rawnand: intel: Fix error handling in probe
mtd: mtdconcat: Check _read, _write callbacks existence before assignment
mtd: mtdconcat: Judge callback existence based on the master
mtd: maps: remove dead MTD map driver for PMC-Sierra MSP boards
mtd: rfd_ftl: use container_of() rather than cast
mtd: rfd_ftl: fix use-after-free
mtd: rfd_ftl: add discard support
mtd: rfd_ftl: allow use of MTD_RAM for testing purposes
mtdblock: Warn if added for a NAND device
mtd: spinand: macronix: Add Quad support for serial NAND flash
mtdblock: Add comment about UBI block devices
...
Geert Uytterhoeven [Sun, 5 Sep 2021 09:30:34 +0000 (11:30 +0200)]
binfmt: a.out: Fix bogus semicolon
fs/binfmt_aout.c: In function ‘load_aout_library’:
fs/binfmt_aout.c:311:27: error: expected ‘)’ before ‘;’ token
311 | MAP_FIXED | MAP_PRIVATE;
| ^
fs/binfmt_aout.c:309:10: error: too few arguments to function ‘vm_mmap’
309 | error = vm_mmap(file, start_addr, ex.a_text + ex.a_data,
| ^~~~~~~
In file included from fs/binfmt_aout.c:12:
include/linux/mm.h:2626:35: note: declared here
2626 | extern unsigned long __must_check vm_mmap(struct file *, unsigned long,
| ^~~~~~~
Fix this by reverting the accidental replacement of a comma by a
semicolon.
Fixes:
42be8b42535183f8 ("binfmt: don't use MAP_DENYWRITE when loading shared libraries via uselib()")
Reported-by: noreply@ellerman.id.au
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 4 Sep 2021 18:35:47 +0000 (11:35 -0700)]
Merge tag 'denywrite-for-5.15' of git://github.com/davidhildenbrand/linux
Pull MAP_DENYWRITE removal from David Hildenbrand:
"Remove all in-tree usage of MAP_DENYWRITE from the kernel and remove
VM_DENYWRITE.
There are some (minor) user-visible changes:
- We no longer deny write access to shared libaries loaded via legacy
uselib(); this behavior matches modern user space e.g. dlopen().
- We no longer deny write access to the elf interpreter after exec
completed, treating it just like shared libraries (which it often
is).
- We always deny write access to the file linked via /proc/pid/exe:
sys_prctl(PR_SET_MM_MAP/EXE_FILE) will fail if write access to the
file cannot be denied, and write access to the file will remain
denied until the link is effectivel gone (exec, termination,
sys_prctl(PR_SET_MM_MAP/EXE_FILE)) -- just as if exec'ing the file.
Cross-compiled for a bunch of architectures (alpha, microblaze, i386,
s390x, ...) and verified via ltp that especially the relevant tests
(i.e., creat07 and execve04) continue working as expected"
* tag 'denywrite-for-5.15' of git://github.com/davidhildenbrand/linux:
fs: update documentation of get_write_access() and friends
mm: ignore MAP_DENYWRITE in ksys_mmap_pgoff()
mm: remove VM_DENYWRITE
binfmt: remove in-tree usage of MAP_DENYWRITE
kernel/fork: always deny write access to current MM exe_file
kernel/fork: factor out replacing the current MM exe_file
binfmt: don't use MAP_DENYWRITE when loading shared libraries via uselib()
Linus Torvalds [Sat, 4 Sep 2021 18:15:50 +0000 (11:15 -0700)]
Merge git://github.com/Paragon-Software-Group/linux-ntfs3
Merge NTFSv3 filesystem from Konstantin Komarov:
"This patch adds NTFS Read-Write driver to fs/ntfs3.
Having decades of expertise in commercial file systems development and
huge test coverage, we at Paragon Software GmbH want to make our
contribution to the Open Source Community by providing implementation
of NTFS Read-Write driver for the Linux Kernel.
This is fully functional NTFS Read-Write driver. Current version works
with NTFS (including v3.1) and normal/compressed/sparse files and
supports journal replaying.
We plan to support this version after the codebase once merged, and
add new features and fix bugs. For example, full journaling support
over JBD will be added in later updates"
Link: https://lore.kernel.org/lkml/20210729134943.778917-1-almaz.alexandrovich@paragon-software.com/
Link: https://lore.kernel.org/lkml/aa4aa155-b9b2-9099-b7a2-349d8d9d8fbd@paragon-software.com/
* git://github.com/Paragon-Software-Group/linux-ntfs3: (35 commits)
fs/ntfs3: Change how module init/info messages are displayed
fs/ntfs3: Remove GPL boilerplates from decompress lib files
fs/ntfs3: Remove unnecessary condition checking from ntfs_file_read_iter
fs/ntfs3: Fix integer overflow in ni_fiemap with fiemap_prep()
fs/ntfs3: Restyle comments to better align with kernel-doc
fs/ntfs3: Rework file operations
fs/ntfs3: Remove fat ioctl's from ntfs3 driver for now
fs/ntfs3: Restyle comments to better align with kernel-doc
fs/ntfs3: Fix error handling in indx_insert_into_root()
fs/ntfs3: Potential NULL dereference in hdr_find_split()
fs/ntfs3: Fix error code in indx_add_allocate()
fs/ntfs3: fix an error code in ntfs_get_acl_ex()
fs/ntfs3: add checks for allocation failure
fs/ntfs3: Use kcalloc/kmalloc_array over kzalloc/kmalloc
fs/ntfs3: Do not use driver own alloc wrappers
fs/ntfs3: Use kernel ALIGN macros over driver specific
fs/ntfs3: Restyle comment block in ni_parse_reparse()
fs/ntfs3: Remove unused including <linux/version.h>
fs/ntfs3: Fix fall-through warnings for Clang
fs/ntfs3: Fix one none utf8 char in source file
...
Linus Torvalds [Sat, 4 Sep 2021 17:48:47 +0000 (10:48 -0700)]
Merge tag 'f2fs-for-5.15-rc1' of git://git./linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this cycle, we've addressed some performance issues such as lock
contention, misbehaving compress_cache, allowing extent_cache for
compressed files, and new sysfs to adjust ra_size for fadvise.
In order to diagnose the performance issues quickly, we also added an
iostat which shows the IO latencies periodically.
On the stability side, we've found two memory leakage cases in the
error path in compression flow. And, we've also fixed various corner
cases in fiemap, quota, checkpoint=disable, zstd, and so on.
Enhancements:
- avoid long checkpoint latency by releasing nat_tree_lock
- collect and show iostats periodically
- support extent_cache for compressed files
- add a sysfs entry to manage ra_size given fadvise(POSIX_FADV_SEQUENTIAL)
- report f2fs GC status via sysfs
- add discard_unit=%s in mount option to handle zoned device
Bug fixes:
- fix two memory leakages when an error happens in the compressed IO flow
- fix commpress_cache to get the right LBA
- fix fiemap to deal with compressed case correctly
- fix wrong EIO returns due to SBI_NEED_FSCK
- fix missing writes when enabling checkpoint back
- fix quota deadlock
- fix zstd level mount option
In addition to the above major updates, we've cleaned up several code
paths such as dio, unnecessary operations, debugfs/f2fs/status, sanity
check, and typos"
* tag 'f2fs-for-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (46 commits)
f2fs: should put a page beyond EOF when preparing a write
f2fs: deallocate compressed pages when error happens
f2fs: enable realtime discard iff device supports discard
f2fs: guarantee to write dirty data when enabling checkpoint back
f2fs: fix to unmap pages from userspace process in punch_hole()
f2fs: fix unexpected ENOENT comes from f2fs_map_blocks()
f2fs: fix to account missing .skipped_gc_rwsem
f2fs: adjust unlock order for cleanup
f2fs: Don't create discard thread when device doesn't support realtime discard
f2fs: rebuild nat_bits during umount
f2fs: introduce periodic iostat io latency traces
f2fs: separate out iostat feature
f2fs: compress: do sanity check on cluster
f2fs: fix description about main_blkaddr node
f2fs: convert S_IRUGO to 0444
f2fs: fix to keep compatibility of fault injection interface
f2fs: support fault injection for f2fs_kmem_cache_alloc()
f2fs: compress: allow write compress released file after truncate to zero
f2fs: correct comment in segment.h
f2fs: improve sbi status info in debugfs/f2fs/status
...
Linus Torvalds [Sat, 4 Sep 2021 17:25:26 +0000 (10:25 -0700)]
Merge tag 'nfs-for-5.15-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker:
"New Features:
- Better client responsiveness when server isn't replying
- Use refcount_t in sunrpc rpc_client refcount tracking
- Add srcaddr and dst_port to the sunrpc sysfs info files
- Add basic support for connection sharing between servers with multiple NICs`
Bugfixes and Cleanups:
- Sunrpc tracepoint cleanups
- Disconnect after ib_post_send() errors to avoid deadlocks
- Fix for tearing down rpcrdma_reps
- Fix a potential pNFS layoutget livelock loop
- pNFS layout barrier fixes
- Fix a potential memory corruption in rpc_wake_up_queued_task_set_status()
- Fix reconnection locking
- Fix return value of get_srcport()
- Remove rpcrdma_post_sends()
- Remove pNFS dead code
- Remove copy size restriction for inter-server copies
- Overhaul the NFS callback service
- Clean up sunrpc TCP socket shutdowns
- Always provide aligned buffers to RPC read layers"
* tag 'nfs-for-5.15-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (39 commits)
NFS: Always provide aligned buffers to the RPC read layers
NFSv4.1 add network transport when session trunking is detected
SUNRPC enforce creation of no more than max_connect xprts
NFSv4 introduce max_connect mount options
SUNRPC add xps_nunique_destaddr_xprts to xprt_switch_info in sysfs
SUNRPC keep track of number of transports to unique addresses
NFSv3: Delete duplicate judgement in nfs3_async_handle_jukebox
SUNRPC: Tweak TCP socket shutdown in the RPC client
SUNRPC: Simplify socket shutdown when not reusing TCP ports
NFSv4.2: remove restriction of copy size for inter-server copy.
NFS: Clean up the synopsis of callback process_op()
NFS: Extract the xdr_init_encode/decode() calls from decode_compound
NFS: Remove unused callback void decoder
NFS: Add a private local dispatcher for NFSv4 callback operations
SUNRPC: Eliminate the RQ_AUTHERR flag
SUNRPC: Set rq_auth_stat in the pg_authenticate() callout
SUNRPC: Add svc_rqst::rq_auth_stat
SUNRPC: Add dst_port to the sysfs xprt info file
SUNRPC: Add srcaddr as a file in sysfs
sunrpc: Fix return value of get_srcport()
...
Linus Torvalds [Fri, 3 Sep 2021 22:55:41 +0000 (15:55 -0700)]
Merge tag 'linux-kselftest-next-5.15-rc1' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull Kselftest updates from Shuah Khan:
"Fixes to build and test failures:
- openat2 test failure for O_LARGEFILE flag on ARM64
- x86 test build failures related to glibc 2.34 adding support for
variable sized MINSIGSTKSZ and SIGSTKSZ
- removing obsolete configs in sync and cpufreq config files
- minor spelling and duplicate header include cleanups"
* tag 'linux-kselftest-next-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/cpufreq: Rename DEBUG_PI_LIST to DEBUG_PLIST
selftests/sync: Remove the deprecated config SYNC
selftests: safesetid: Fix spelling mistake "cant" -> "can't"
selftests/x86: Fix error: variably modified 'altstack_data' at file scope
kselftest:sched: remove duplicate include in cs_prctl_test.c
selftests: openat2: Fix testing failure for O_LARGEFILE flag
Linus Torvalds [Fri, 3 Sep 2021 22:33:47 +0000 (15:33 -0700)]
Merge tag 'kbuild-v5.15' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Add -s option (strict mode) to merge_config.sh to make it fail when
any symbol is redefined.
- Show a warning if a different compiler is used for building external
modules.
- Infer --target from ARCH for CC=clang to let you cross-compile the
kernel without CROSS_COMPILE.
- Make the integrated assembler default (LLVM_IAS=1) for CC=clang.
- Add <linux/stdarg.h> to the kernel source instead of borrowing
<stdarg.h> from the compiler.
- Add Nick Desaulniers as a Kbuild reviewer.
- Drop stale cc-option tests.
- Fix the combination of CONFIG_TRIM_UNUSED_KSYMS and CONFIG_LTO_CLANG
to handle symbols in inline assembly.
- Show a warning if 'FORCE' is missing for if_changed rules.
- Various cleanups
* tag 'kbuild-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (39 commits)
kbuild: redo fake deps at include/ksym/*.h
kbuild: clean up objtool_args slightly
modpost: get the *.mod file path more simply
checkkconfigsymbols.py: Fix the '--ignore' option
kbuild: merge vmlinux_link() between ARCH=um and other architectures
kbuild: do not remove 'linux' link in scripts/link-vmlinux.sh
kbuild: merge vmlinux_link() between the ordinary link and Clang LTO
kbuild: remove stale *.symversions
kbuild: remove unused quiet_cmd_update_lto_symversions
gen_compile_commands: extract compiler command from a series of commands
x86: remove cc-option-yn test for -mtune=
arc: replace cc-option-yn uses with cc-option
s390: replace cc-option-yn uses with cc-option
ia64: move core-y in arch/ia64/Makefile to arch/ia64/Kbuild
sparc: move the install rule to arch/sparc/Makefile
security: remove unneeded subdir-$(CONFIG_...)
kbuild: sh: remove unused install script
kbuild: Fix 'no symbols' warning when CONFIG_TRIM_UNUSD_KSYMS=y
kbuild: Switch to 'f' variants of integrated assembler flag
kbuild: Shuffle blank line to improve comment meaning
...
Linus Torvalds [Fri, 3 Sep 2021 22:21:54 +0000 (15:21 -0700)]
Merge branch 'stable/for-linus-5.15-rc0' of git://git./linux/kernel/git/konrad/ibft
Pull ibft fix from Konrad Rzeszutek Wilk:
"An arm64 compile fix for the new code that fixed the iBFT KASLR
handling. I missed the original 0-day build email report"
* 'stable/for-linus-5.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/ibft:
iscsi_ibft: Fix isa_bus_to_virt not working under ARM
Linus Torvalds [Fri, 3 Sep 2021 18:22:50 +0000 (11:22 -0700)]
Merge tag 'powerpc-5.15-1' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Convert pseries & powernv to use MSI IRQ domains.
- Rework the pseries CPU numbering so that CPUs that are removed, and
later re-added, are given a CPU number on the same node as
previously, when possible.
- Add support for a new more flexible device-tree format for specifying
NUMA distances.
- Convert powerpc to GENERIC_PTDUMP.
- Retire sbc8548 and sbc8641d board support.
- Various other small features and fixes.
Thanks to Alexey Kardashevskiy, Aneesh Kumar K.V, Anton Blanchard,
Cédric Le Goater, Christophe Leroy, Emmanuel Gil Peyrot, Fabiano Rosas,
Fangrui Song, Finn Thain, Gautham R. Shenoy, Hari Bathini, Joel
Stanley, Jordan Niethe, Kajol Jain, Laurent Dufour, Leonardo Bras, Lukas
Bulwahn, Marc Zyngier, Masahiro Yamada, Michal Suchanek, Nathan
Chancellor, Nicholas Piggin, Parth Shah, Paul Gortmaker, Pratik R.
Sampat, Randy Dunlap, Sebastian Andrzej Siewior, Srikar Dronamraju, Wan
Jiabing, Xiongwei Song, and Zheng Yongjun.
* tag 'powerpc-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (154 commits)
powerpc/bug: Cast to unsigned long before passing to inline asm
powerpc/ptdump: Fix generic ptdump for 64-bit
KVM: PPC: Fix clearing never mapped TCEs in realmode
powerpc/pseries/iommu: Rename "direct window" to "dma window"
powerpc/pseries/iommu: Make use of DDW for indirect mapping
powerpc/pseries/iommu: Find existing DDW with given property name
powerpc/pseries/iommu: Update remove_dma_window() to accept property name
powerpc/pseries/iommu: Reorganize iommu_table_setparms*() with new helper
powerpc/pseries/iommu: Add ddw_property_create() and refactor enable_ddw()
powerpc/pseries/iommu: Allow DDW windows starting at 0x00
powerpc/pseries/iommu: Add ddw_list_new_entry() helper
powerpc/pseries/iommu: Add iommu_pseries_alloc_table() helper
powerpc/kernel/iommu: Add new iommu_table_in_use() helper
powerpc/pseries/iommu: Replace hard-coded page shift
powerpc/numa: Update cpu_cpu_map on CPU online/offline
powerpc/numa: Print debug statements only when required
powerpc/numa: convert printk to pr_xxx
powerpc/numa: Drop dbg in favour of pr_debug
powerpc/smp: Enable CACHE domain for shared processor
powerpc/smp: Update cpu_core_map on all PowerPc systems
...
Linus Torvalds [Fri, 3 Sep 2021 18:15:49 +0000 (11:15 -0700)]
Merge tag 'for-5.15/parisc-2' of git://git./linux/kernel/git/deller/parisc-linux
Pull parisc architecture fixes from Helge Deller:
"Fix an unaligned-access crash in the bootloader and drop asm/swab.h"
* tag 'for-5.15/parisc-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix unaligned-access crash in bootloader
parisc: Drop __arch_swab16(), arch_swab24(), _arch_swab32() and __arch_swab64() functions
Linus Torvalds [Fri, 3 Sep 2021 18:11:54 +0000 (11:11 -0700)]
Merge tag 'mips_5.15' of git://git./linux/kernel/git/mips/linux
Pull MIPS updates from Thomas Bogendoerfer:
- converted Pistachio platform to use MIPS generic kernel
- fixes and cleanups
* tag 'mips_5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (29 commits)
MIPS: Malta: fix alignment of the devicetree buffer
MIPS: ingenic: Unconditionally enable clock of CPU #0
MIPS: mscc: ocelot: mark the phy-mode for internal PHY ports
MIPS: mscc: ocelot: disable all switch ports by default
MAINTAINERS: adjust PISTACHIO SOC SUPPORT after its retirement
MIPS: Return true/false (not 1/0) from bool functions
MIPS: generic: Return true/false (not 1/0) from bool functions
MIPS: Make a alias for pistachio_defconfig
MIPS: Retire MACH_PISTACHIO
MIPS: config: generic: Add config for Marduk board
pinctrl: pistachio: Make it as an option
phy: pistachio-usb: Depend on MIPS || COMPILE_TEST
clocksource/drivers/pistachio: Make it selectable for MIPS
clk: pistachio: Make it selectable for generic MIPS kernel
MIPS: DTS: Pistachio add missing cpc and cdmm
MIPS: generic: Allow generating FIT image for Marduk board
MIPS: locking/atomic: Fix atomic{_64,}_sub_if_positive
MIPS: loongson2ef: don't build serial.o unconditionally
MIPS: Replace deprecated CPU-hotplug functions.
MIPS: Alchemy: Fix spelling contraction "cant" -> "can't"
...
Linus Torvalds [Fri, 3 Sep 2021 18:03:00 +0000 (11:03 -0700)]
Merge tag 'for-linus' of git://github.com/openrisc/linux
Pull OpenRISC updates from Stafford Horne:
"A few cleanups and compiler warning fixes for OpenRISC.
Also, this includes dts and defconfig updates to enable Ethernet on
OpenRISC/Litex FPGA SoC's now that the LiteEth driver has gone
upstream"
* tag 'for-linus' of git://github.com/openrisc/linux:
openrisc/litex: Update defconfig
openrisc/litex: Add ethernet device
openrisc/litex: Update uart address
openrisc: Fix compiler warnings in setup
openrisc: rename or32 code & comments to or1k
openrisc: don't printk() unconditionally
Linus Torvalds [Fri, 3 Sep 2021 17:57:25 +0000 (10:57 -0700)]
Merge tag 'livepatching-for-5.15' of git://git./linux/kernel/git/livepatching/livepatching
Pull livepatching update from Petr Mladek.
* tag 'livepatching-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
livepatch: Replace deprecated CPU-hotplug functions.
Linus Torvalds [Fri, 3 Sep 2021 17:44:35 +0000 (10:44 -0700)]
Merge tag 'iommu-updates-v5.15' of git://git./linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
- New DART IOMMU driver for Apple Silicon M1 chips
- Optimizations for iommu_[map/unmap] performance
- Selective TLB flush support for the AMD IOMMU driver to make it more
efficient on emulated IOMMUs
- Rework IOVA setup and default domain type setting to move more code
out of IOMMU drivers and to support runtime switching between certain
types of default domains
- VT-d Updates from Lu Baolu:
- Update the virtual command related registers
- Enable Intel IOMMU scalable mode by default
- Preset A/D bits for user space DMA usage
- Allow devices to have more than 32 outstanding PRs
- Various cleanups
- ARM SMMU Updates from Will Deacon:
SMMUv3:
- Minor optimisation to avoid zeroing struct members on CMD submission
- Increased use of batched commands to reduce submission latency
- Refactoring in preparation for ECMDQ support
SMMUv2:
- Fix races when probing devices with identical StreamIDs
- Optimise walk cache flushing for Qualcomm implementations
- Allow deep sleep states for some Qualcomm SoCs with shared clocks
- Various smaller optimizations, cleanups, and fixes
* tag 'iommu-updates-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (85 commits)
iommu/io-pgtable: Abstract iommu_iotlb_gather access
iommu/arm-smmu: Fix missing unlock on error in arm_smmu_device_group()
iommu/vt-d: Add present bit check in pasid entry setup helpers
iommu/vt-d: Use pasid_pte_is_present() helper function
iommu/vt-d: Drop the kernel doc annotation
iommu/vt-d: Allow devices to have more than 32 outstanding PRs
iommu/vt-d: Preset A/D bits for user space DMA usage
iommu/vt-d: Enable Intel IOMMU scalable mode by default
iommu/vt-d: Refactor Kconfig a bit
iommu/vt-d: Remove unnecessary oom message
iommu/vt-d: Update the virtual command related registers
iommu: Allow enabling non-strict mode dynamically
iommu: Merge strictness and domain type configs
iommu: Only log strictness for DMA domains
iommu: Expose DMA domain strictness via sysfs
iommu: Express DMA strictness via the domain type
iommu/vt-d: Prepare for multiple DMA domain types
iommu/arm-smmu: Prepare for multiple DMA domain types
iommu/amd: Prepare for multiple DMA domain types
iommu: Introduce explicit type for non-strict DMA domains
...
Linus Torvalds [Fri, 3 Sep 2021 17:34:44 +0000 (10:34 -0700)]
Merge branch 'stable/for-linus-5.15' of git://git./linux/kernel/git/konrad/swiotlb
Pull swiotlb updates from Konrad Rzeszutek Wilk:
"A new feature called restricted DMA pools. It allows SWIOTLB to
utilize per-device (or per-platform) allocated memory pools instead of
using the global one.
The first big user of this is ARM Confidential Computing where the
memory for DMA operations can be set per platform"
* 'stable/for-linus-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb: (23 commits)
swiotlb: use depends on for DMA_RESTRICTED_POOL
of: restricted dma: Don't fail device probe on rmem init failure
of: Move of_dma_set_restricted_buffer() into device.c
powerpc/svm: Don't issue ultracalls if !mem_encrypt_active()
s390/pv: fix the forcing of the swiotlb
swiotlb: Free tbl memory in swiotlb_exit()
swiotlb: Emit diagnostic in swiotlb_exit()
swiotlb: Convert io_default_tlb_mem to static allocation
of: Return success from of_dma_set_restricted_buffer() when !OF_ADDRESS
swiotlb: add overflow checks to swiotlb_bounce
swiotlb: fix implicit debugfs declarations
of: Add plumbing for restricted DMA pool
dt-bindings: of: Add restricted DMA pool
swiotlb: Add restricted DMA pool initialization
swiotlb: Add restricted DMA alloc/free support
swiotlb: Refactor swiotlb_tbl_unmap_single
swiotlb: Move alloc_size to swiotlb_find_slots
swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
swiotlb: Update is_swiotlb_active to add a struct device argument
swiotlb: Update is_swiotlb_buffer to add a struct device argument
...
Linus Torvalds [Fri, 3 Sep 2021 17:08:28 +0000 (10:08 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
"173 patches.
Subsystems affected by this series: ia64, ocfs2, block, and mm (debug,
pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap,
bootmem, sparsemem, vmalloc, kasan, pagealloc, memory-failure,
hugetlb, userfaultfd, vmscan, compaction, mempolicy, memblock,
oom-kill, migration, ksm, percpu, vmstat, and madvise)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (173 commits)
mm/madvise: add MADV_WILLNEED to process_madvise()
mm/vmstat: remove unneeded return value
mm/vmstat: simplify the array size calculation
mm/vmstat: correct some wrong comments
mm/percpu,c: remove obsolete comments of pcpu_chunk_populated()
selftests: vm: add COW time test for KSM pages
selftests: vm: add KSM merging time test
mm: KSM: fix data type
selftests: vm: add KSM merging across nodes test
selftests: vm: add KSM zero page merging test
selftests: vm: add KSM unmerge test
selftests: vm: add KSM merge test
mm/migrate: correct kernel-doc notation
mm: wire up syscall process_mrelease
mm: introduce process_mrelease system call
memblock: make memblock_find_in_range method private
mm/mempolicy.c: use in_task() in mempolicy_slab_node()
mm/mempolicy: unify the create() func for bind/interleave/prefer-many policies
mm/mempolicy: advertise new MPOL_PREFERRED_MANY
mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY
...
zhangkui [Thu, 2 Sep 2021 22:01:11 +0000 (15:01 -0700)]
mm/madvise: add MADV_WILLNEED to process_madvise()
There is a usecase in Android that an app process's memory is swapped out
by process_madvise() with MADV_PAGEOUT, such as the memory is swapped to
zram or a backing device. When the process is scheduled to running, like
switch to foreground, multiple page faults may cause the app dropped
frames.
To reduce the problem, System Management Software can read-ahead memory
of the process immediately when the app switches to forground. Calling
process_madvise() with MADV_WILLNEED can meet this need.
Link: https://lkml.kernel.org/r/20210804082010.12482-1-zhangkui@oppo.com
Signed-off-by: zhangkui <zhangkui@oppo.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miaohe Lin [Thu, 2 Sep 2021 22:01:08 +0000 (15:01 -0700)]
mm/vmstat: remove unneeded return value
The return value of pagetypeinfo_showfree and pagetypeinfo_showblockcount
are unused now. Remove them.
Link: https://lkml.kernel.org/r/20210715122911.15700-4-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miaohe Lin [Thu, 2 Sep 2021 22:01:05 +0000 (15:01 -0700)]
mm/vmstat: simplify the array size calculation
We can replace the array_num * sizeof(array[0]) with sizeof(array) to
simplify the code.
Link: https://lkml.kernel.org/r/20210715122911.15700-3-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miaohe Lin [Thu, 2 Sep 2021 22:01:03 +0000 (15:01 -0700)]
mm/vmstat: correct some wrong comments
Patch series "Cleanup for vmstat".
This series contains cleanups to remove unneeded return value, correct
wrong comment and simplify the array size calculation. More details can
be found in the respective changelogs.
This patch (of 3):
Correct wrong fls(mem+1) to fls(mem)+1 and remove the duplicated comment
with quiet_vmstat().
Link: https://lkml.kernel.org/r/20210715122911.15700-1-linmiaohe@huawei.com
Link: https://lkml.kernel.org/r/20210715122911.15700-2-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jing Xiangfeng [Thu, 2 Sep 2021 22:01:00 +0000 (15:01 -0700)]
mm/percpu,c: remove obsolete comments of pcpu_chunk_populated()
Commit
b239f7daf553 ("percpu: set PCPU_BITMAP_BLOCK_SIZE to PAGE_SIZE")
removed the parameter 'for_alloc', so remove this comment.
Link: https://lkml.kernel.org/r/1630576043-21367-1-git-send-email-jingxiangfeng@huawei.com
Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zhansaya Bagdauletkyzy [Thu, 2 Sep 2021 22:00:57 +0000 (15:00 -0700)]
selftests: vm: add COW time test for KSM pages
Since merged pages are copied every time they need to be modified, the
write access time is different between shared and non-shared pages. Add
ksm_cow_time() function which evaluates latency of these COW breaks.
First, 4000 pages are allocated and the time, required to modify 1 byte in
every other page, is measured. After this, the pages are merged into 2000
pairs and in each pair, 1 page is modified (i.e. they are decoupled) to
detect COW breaks. The time needed to break COW of merged pages is then
compared with performance of non-shared pages.
The test is run as follows: ./ksm_tests -C
The output:
Total size: 15 MiB
Not merged pages:
Total time: 0.
002185489 s
Average speed: 3202.945 MiB/s
Merged pages:
Total time: 0.
004386872 s
Average speed: 1595.670 MiB/s
Link: https://lkml.kernel.org/r/1d03ee0d1b341959d4b61672c6401d498bff5652.1629386192.git.zhansayabagdaulet@gmail.com
Signed-off-by: Zhansaya Bagdauletkyzy <zhansayabagdaulet@gmail.com>
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zhansaya Bagdauletkyzy [Thu, 2 Sep 2021 22:00:54 +0000 (15:00 -0700)]
selftests: vm: add KSM merging time test
Patch series "add KSM performance tests", v3.
Extend KSM self tests with a performance benchmark. These tests are not
part of regular regression testing, as they are mainly intended to be used
by developers making changes to the memory management subsystem.
This patch (of 2):
Add ksm_merge_time() function to determine speed and time needed for
merging. The total spent time is shown in seconds while speed is in
MiB/s. User must specify the size of duplicated memory area (in MiB)
before running the test.
The test is run as follows: ./ksm_tests -P -s 100
The output:
Total size: 100 MiB
Total time: 0.
201106786 s
Average speed: 497.248 MiB/s
Link: https://lkml.kernel.org/r/cover.1629386192.git.zhansayabagdaulet@gmail.com
Link: https://lkml.kernel.org/r/318b946ac80cc9205c89d0962048378f7ce0705b.1629386192.git.zhansayabagdaulet@gmail.com
Signed-off-by: Zhansaya Bagdauletkyzy <zhansayabagdaulet@gmail.com>
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zhansaya Bagdauletkyzy [Thu, 2 Sep 2021 22:00:51 +0000 (15:00 -0700)]
mm: KSM: fix data type
ksm_stable_node_chains_prune_millisecs is declared as int, but in
stable__node_chains_prune_millisecs_store(), it can store values up to
UINT_MAX. Change its type to unsigned int.
Link: https://lkml.kernel.org/r/20210806111351.GA71845@asus
Signed-off-by: Zhansaya Bagdauletkyzy <zhansayabagdaulet@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zhansaya Bagdauletkyzy [Thu, 2 Sep 2021 22:00:48 +0000 (15:00 -0700)]
selftests: vm: add KSM merging across nodes test
Add check_ksm_numa_merge() function to test that pages in different NUMA
nodes are being handled properly. First, two duplicate pages are
allocated in two separate NUMA nodes using the libnuma library. Since
there is one unique page in each node, with merge_across_nodes = 0, there
won't be any shared pages. If merge_across_nodes is set to 1, the pages
will be treated as usual duplicate pages and will be merged. If NUMA
config is not enabled or the number of NUMA nodes is less than two, then
the test is skipped. The test is run as follows: ./ksm_tests -N
Link: https://lkml.kernel.org/r/071c17b5b04ebb0dfeba137acc495e5dd9d2a719.1626252248.git.zhansayabagdaulet@gmail.com
Signed-off-by: Zhansaya Bagdauletkyzy <zhansayabagdaulet@gmail.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zhansaya Bagdauletkyzy [Thu, 2 Sep 2021 22:00:45 +0000 (15:00 -0700)]
selftests: vm: add KSM zero page merging test
Add check_ksm_zero_page_merge() function to test that empty pages are
being handled properly. For this, several zero pages are allocated and
merged using madvise. If use_zero_pages is enabled, the pages must be
shared with the special kernel zero pages; otherwise, they are merged as
usual duplicate pages. The test is run as follows: ./ksm_tests -Z
Link: https://lkml.kernel.org/r/6d0caab00d4bdccf5e3791cb95cf6dfd5eb85e45.1626252248.git.zhansayabagdaulet@gmail.com
Signed-off-by: Zhansaya Bagdauletkyzy <zhansayabagdaulet@gmail.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zhansaya Bagdauletkyzy [Thu, 2 Sep 2021 22:00:42 +0000 (15:00 -0700)]
selftests: vm: add KSM unmerge test
Add check_ksm_unmerge() function to verify that KSM is properly unmerging
shared pages. For this, two duplicate pages are merged first and then
their contents are modified. Since they are not identical anymore, the
pages must be unmerged and the number of merged pages has to be 0. The
test is run as follows: ./ksm_tests -U
Link: https://lkml.kernel.org/r/c0f55420440d704d5b094275b4365aa1b2ad46b5.1626252248.git.zhansayabagdaulet@gmail.com
Signed-off-by: Zhansaya Bagdauletkyzy <zhansayabagdaulet@gmail.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zhansaya Bagdauletkyzy [Thu, 2 Sep 2021 22:00:39 +0000 (15:00 -0700)]
selftests: vm: add KSM merge test
Patch series "add KSM selftests".
Introduce selftests to validate the functionality of KSM. The tests are
run on private anonymous pages. Since some KSM tunables are modified,
their starting values are saved and restored after testing. At the start,
run is set to 2 to ensure that only test pages will be merged (we assume
that no applications make madvise syscalls in the background). If KSM
config not enabled, all tests will be skipped.
This patch (of 4):
Add check_ksm_merge() function to check the basic merging feature of KSM.
First, some number of identical pages are allocated and the MADV_MERGEABLE
advice is given to merge these pages. Then, pages_shared and
pages_sharing values are compared with the expected numbers using
assert_ksm_pages_count() function. The number of pages can be changed
using -p option.
Link: https://lkml.kernel.org/r/cover.1626252248.git.zhansayabagdaulet@gmail.com
Link: https://lkml.kernel.org/r/90287685c13300972ea84de93d1f3f900373f9fe.1626252248.git.zhansayabagdaulet@gmail.com
Signed-off-by: Zhansaya Bagdauletkyzy <zhansayabagdaulet@gmail.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Thu, 2 Sep 2021 22:00:36 +0000 (15:00 -0700)]
mm/migrate: correct kernel-doc notation
Use the expected "Return:" format to prevent a kernel-doc warning.
mm/migrate.c:1157: warning: Excess function parameter 'returns' description in 'next_demotion_node'
Link: https://lkml.kernel.org/r/20210808203151.10632-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>