Daniel T. Lee [Tue, 24 Nov 2020 09:03:05 +0000 (09:03 +0000)]
samples: bpf: Refactor test_cgrp2_sock2 program with libbpf
This commit refactors the existing cgroup program with libbpf bpf
loader. The original test_cgrp2_sock2 has keeped the bpf program
attached to the cgroup hierarchy even after the exit of user program.
To implement the same functionality with libbpf, this commit uses the
BPF_LINK_PINNING to pin the link attachment even after it is closed.
Since this uses LINK instead of ATTACH, detach of bpf program from
cgroup with 'test_cgrp2_sock' is not used anymore.
The code to mount the bpf was added to the .sh file in case the bpff
was not mounted on /sys/fs/bpf. Additionally, to fix the problem that
shell script cannot find the binary object from the current path,
relative path './' has been added in front of binary.
Fixes:
554ae6e792ef3 ("samples/bpf: add userspace example for prohibiting sockets")
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201124090310.24374-3-danieltimlee@gmail.com
Daniel T. Lee [Tue, 24 Nov 2020 09:03:04 +0000 (09:03 +0000)]
samples: bpf: Refactor hbm program with libbpf
This commit refactors the existing cgroup programs with libbpf
bpf loader. Since bpf_program__attach doesn't support cgroup program
attachment, this explicitly attaches cgroup bpf program with
bpf_program__attach_cgroup(bpf_prog, cg1).
Also, to change attach_type of bpf program, this uses libbpf's
bpf_program__set_expected_attach_type helper to switch EGRESS to
INGRESS. To keep bpf program attached to the cgroup hierarchy even
after the exit, this commit uses the BPF_LINK_PINNING to pin the link
attachment even after it is closed.
Besides, this program was broken due to the typo of BPF MAP definition.
But this commit solves the problem by fixing this from 'queue_stats' map
struct hvm_queue_stats -> hbm_queue_stats.
Fixes:
36b5d471135c ("selftests/bpf: samples/bpf: Split off legacy stuff from bpf_helpers.h")
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201124090310.24374-2-danieltimlee@gmail.com
Andrei Matei [Wed, 25 Nov 2020 03:52:55 +0000 (22:52 -0500)]
bpf: Fix selftest compilation on clang 11
Before this patch, profiler.inc.h wouldn't compile with clang-11 (before
the __builtin_preserve_enum_value LLVM builtin was introduced in
https://reviews.llvm.org/D83242).
Another test that uses this builtin (test_core_enumval) is conditionally
skipped if the compiler is too old. In that spirit, this patch inhibits
part of populate_cgroup_info(), which needs this CO-RE builtin. The
selftests build again on clang-11.
The affected test (the profiler test) doesn't pass on clang-11 because
it's missing https://reviews.llvm.org/D85570, but at least the test suite
as a whole compiles. The test's expected failure is already called out in
the README.
Signed-off-by: Andrei Matei <andreimatei1@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Florian Lehner <dev@der-flo.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20201125035255.17970-1-andreimatei1@gmail.com
KP Singh [Tue, 24 Nov 2020 15:12:10 +0000 (15:12 +0000)]
bpf: Add a selftest for bpf_ima_inode_hash
The test does the following:
- Mounts a loopback filesystem and appends the IMA policy to measure
executions only on this file-system. Restricting the IMA policy to
a particular filesystem prevents a system-wide IMA policy change.
- Executes an executable copied to this loopback filesystem.
- Calls the bpf_ima_inode_hash in the bprm_committed_creds hook and
checks if the call succeeded and checks if a hash was calculated.
The test shells out to the added ima_setup.sh script as the setup is
better handled in a shell script and is more complicated to do in the
test program or even shelling out individual commands from C.
The list of required configs (i.e. IMA, SECURITYFS,
IMA_{WRITE,READ}_POLICY) for running this test are also updated.
Suggested-by: Mimi Zohar <zohar@linux.ibm.com> (limit policy rule to loopback mount)
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20201124151210.1081188-4-kpsingh@chromium.org
KP Singh [Tue, 24 Nov 2020 15:12:09 +0000 (15:12 +0000)]
bpf: Add a BPF helper for getting the IMA hash of an inode
Provide a wrapper function to get the IMA hash of an inode. This helper
is useful in fingerprinting files (e.g executables on execution) and
using these fingerprints in detections like an executable unlinking
itself.
Since the ima_inode_hash can sleep, it's only allowed for sleepable
LSM hooks.
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20201124151210.1081188-3-kpsingh@chromium.org
KP Singh [Tue, 24 Nov 2020 15:12:08 +0000 (15:12 +0000)]
ima: Implement ima_inode_hash
This is in preparation to add a helper for BPF LSM programs to use
IMA hashes when attached to LSM hooks. There are LSM hooks like
inode_unlink which do not have a struct file * argument and cannot
use the existing ima_file_hash API.
An inode based API is, therefore, useful in LSM based detections like an
executable trying to delete itself which rely on the inode_unlink LSM
hook.
Moreover, the ima_file_hash function does nothing with the struct file
pointer apart from calling file_inode on it and converting it to an
inode.
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Mimi Zohar <zohar@linux.ibm.com>
Link: https://lore.kernel.org/bpf/20201124151210.1081188-2-kpsingh@chromium.org
Li RongQing [Tue, 24 Nov 2020 07:21:14 +0000 (15:21 +0800)]
libbpf: Add support for canceling cached_cons advance
Add a new function for returning descriptors the user received
after an xsk_ring_cons__peek call. After the application has
gotten a number of descriptors from a ring, it might not be able
to or want to process them all for various reasons. Therefore,
it would be useful to have an interface for returning or
cancelling a number of them so that they are returned to the ring.
This patch adds a new function called xsk_ring_cons__cancel that
performs this operation on nb descriptors counted from the end of
the batch of descriptors that was received through the peek call.
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ Magnus Karlsson: rewrote changelog ]
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/1606202474-8119-1-git-send-email-lirongqing@baidu.com
Wedson Almeida Filho [Sat, 21 Nov 2020 01:55:09 +0000 (01:55 +0000)]
bpf: Refactor check_cfg to use a structured loop.
The current implementation uses a number of gotos to implement a loop
and different paths within the loop, which makes the code less readable
than it would be with an explicit while-loop. This patch also replaces a
chain of if/if-elses keyed on the same expression with a switch
statement.
No change in behaviour is intended.
Signed-off-by: Wedson Almeida Filho <wedsonaf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201121015509.3594191-1-wedsonaf@google.com
Andrii Nakryiko [Sat, 21 Nov 2020 07:08:29 +0000 (23:08 -0800)]
bpf: Sanitize BTF data pointer after module is loaded
Given .BTF section is not allocatable, it will get trimmed after module is
loaded. BPF system handles that properly by creating an independent copy of
data. But prevent any accidental misused by resetting the pointer to BTF data.
Fixes:
36e68442d1af ("bpf: Load and verify kernel module BTFs")
Suggested-by: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jessica Yu <jeyu@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/bpf/20201121070829.2612884-2-andrii@kernel.org
Andrii Nakryiko [Sat, 21 Nov 2020 07:08:28 +0000 (23:08 -0800)]
kbuild: Skip module BTF generation for out-of-tree external modules
In some modes of operation, Kbuild allows to build modules without having
vmlinux image around. In such case, generation of module BTF is impossible.
This patch changes the behavior to emit a warning about impossibility of
generating kernel module BTF, instead of breaking the build. This is especially
important for out-of-tree external module builds.
In vmlinux-less mode:
$ make clean
$ make modules_prepare
$ touch drivers/acpi/button.c
$ make M=drivers/acpi
...
CC [M] drivers/acpi/button.o
MODPOST drivers/acpi/Module.symvers
LD [M] drivers/acpi/button.ko
BTF [M] drivers/acpi/button.ko
Skipping BTF generation for drivers/acpi/button.ko due to unavailability of vmlinux
...
$ readelf -S ~/linux-build/default/drivers/acpi/button.ko | grep BTF -A1
... empty ...
Now with normal build:
$ make all
...
LD [M] drivers/acpi/button.ko
BTF [M] drivers/acpi/button.ko
...
$ readelf -S ~/linux-build/default/drivers/acpi/button.ko | grep BTF -A1
[60] .BTF PROGBITS
0000000000000000 00029310
000000000000ab3f 0000000000000000 0 0 1
Fixes:
5f9ae91f7c0d ("kbuild: Build kernel module BTFs if BTF is enabled and pahole supports it")
Reported-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Link: https://lore.kernel.org/bpf/20201121070829.2612884-1-andrii@kernel.org
Andrei Matei [Sun, 22 Nov 2020 02:22:05 +0000 (21:22 -0500)]
selftest/bpf: Fix rst formatting in readme
A couple of places in the readme had invalid rst formatting causing the
rendering to be off. This patch fixes them with minimal edits.
Signed-off-by: Andrei Matei <andreimatei1@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20201122022205.57229-2-andreimatei1@gmail.com
Andrei Matei [Sun, 22 Nov 2020 02:22:04 +0000 (21:22 -0500)]
selftest/bpf: Fix link in readme
The link was bad because of invalid rst; it was pointing to itself and
was rendering badly.
Signed-off-by: Andrei Matei <andreimatei1@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20201122022205.57229-1-andreimatei1@gmail.com
Song Liu [Fri, 20 Nov 2020 00:28:33 +0000 (16:28 -0800)]
bpf: Simplify task_file_seq_get_next()
Simplify task_file_seq_get_next() by removing two in/out arguments: task
and fstruct. Use info->task and info->files instead.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20201120002833.2481110-1-songliubraving@fb.com
Yonghong Song [Thu, 19 Nov 2020 07:30:39 +0000 (23:30 -0800)]
bpftool: Add {i,d}tlb_misses support for bpftool profile
Commit
47c09d6a9f67("bpftool: Introduce "prog profile" command")
introduced "bpftool prog profile" command which can be used
to profile bpf program with metrics like # of instructions,
This patch added support for itlb_misses and dtlb_misses.
During an internal bpf program performance evaluation,
I found these two metrics are also very useful. The following
is an example output:
$ bpftool prog profile id 324 duration 3 cycles itlb_misses
1885029 run_cnt
5134686073 cycles
306893 itlb_misses
$ bpftool prog profile id 324 duration 3 cycles dtlb_misses
1827382 run_cnt
4943593648 cycles
5975636 dtlb_misses
$ bpftool prog profile id 324 duration 3 cycles llc_misses
1836527 run_cnt
5019612972 cycles
4161041 llc_misses
From the above, we can see quite some dtlb misses, 3 dtlb misses
perf prog run. This might be something worth further investigation.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201119073039.4060095-1-yhs@fb.com
Andrii Nakryiko [Thu, 19 Nov 2020 01:39:41 +0000 (17:39 -0800)]
Merge branch 'RISC-V selftest/bpf fixes'
Björn Töpel says:
====================
This series contain some fixes for selftests/bpf when building/running
on a RISC-V host. Details can be found in each individual commit.
v2:
Makefile cosmetics. (Andrii)
Simplified unpriv check and added comment. (Andrii)
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Björn Töpel [Wed, 18 Nov 2020 07:16:40 +0000 (08:16 +0100)]
selftests/bpf: Mark tests that require unaligned memory access
A lot of tests require unaligned memory access to work. Mark the tests
as such, so that they can be avoided on unsupported architectures such
as RISC-V.
Signed-off-by: Björn Töpel <bjorn.topel@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Luke Nelson <luke.r.nels@gmail.com>
Link: https://lore.kernel.org/bpf/20201118071640.83773-4-bjorn.topel@gmail.com
Björn Töpel [Wed, 18 Nov 2020 07:16:39 +0000 (08:16 +0100)]
selftests/bpf: Avoid running unprivileged tests with alignment requirements
Some architectures have strict alignment requirements. In that case,
the BPF verifier detects if a program has unaligned accesses and
rejects them. A user can pass BPF_F_ANY_ALIGNMENT to a program to
override this check. That, however, will only work when a privileged
user loads a program. An unprivileged user loading a program with this
flag will be rejected prior entering the verifier.
Hence, it does not make sense to load unprivileged programs without
strict alignment when testing the verifier. This patch avoids exactly
that.
Signed-off-by: Björn Töpel <bjorn.topel@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Luke Nelson <luke.r.nels@gmail.com>
Link: https://lore.kernel.org/bpf/20201118071640.83773-3-bjorn.topel@gmail.com
Björn Töpel [Wed, 18 Nov 2020 07:16:38 +0000 (08:16 +0100)]
selftests/bpf: Fix broken riscv build
The selftests/bpf Makefile includes system include directories from
the host, when building BPF programs. On RISC-V glibc requires that
__riscv_xlen is defined. This is not the case for "clang -target bpf",
which messes up __WORDSIZE (errno.h -> ... -> wordsize.h) and breaks
the build.
By explicitly defining __risc_xlen correctly for riscv, we can
workaround this.
Fixes:
167381f3eac0 ("selftests/bpf: Makefile fix "missing" headers on build with -idirafter")
Signed-off-by: Björn Töpel <bjorn.topel@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Luke Nelson <luke.r.nels@gmail.com>
Link: https://lore.kernel.org/bpf/20201118071640.83773-2-bjorn.topel@gmail.com
Dmitrii Banshchikov [Tue, 17 Nov 2020 18:45:49 +0000 (18:45 +0000)]
bpf: Add bpf_ktime_get_coarse_ns helper
The helper uses CLOCK_MONOTONIC_COARSE source of time that is less
accurate but more performant.
We have a BPF CGROUP_SKB firewall that supports event logging through
bpf_perf_event_output(). Each event has a timestamp and currently we use
bpf_ktime_get_ns() for it. Use of bpf_ktime_get_coarse_ns() saves ~15-20
ns in time required for event logging.
bpf_ktime_get_ns():
EgressLogByRemoteEndpoint 113.82ns 8.79M
bpf_ktime_get_coarse_ns():
EgressLogByRemoteEndpoint 95.40ns 10.48M
Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20201117184549.257280-1-me@ubique.spb.ru
KP Singh [Tue, 17 Nov 2020 23:29:29 +0000 (23:29 +0000)]
bpf: Add tests for bpf_bprm_opts_set helper
The test forks a child process, updates the local storage to set/unset
the securexec bit.
The BPF program in the test attaches to bprm_creds_for_exec which checks
the local storage of the current task to set the secureexec bit on the
binary parameters (bprm).
The child then execs a bash command with the environment variable
TMPDIR set in the envp. The bash command returns a different exit code
based on its observed value of the TMPDIR variable.
Since TMPDIR is one of the variables that is ignored by the dynamic
loader when the secureexec bit is set, one should expect the
child execution to not see this value when the secureexec bit is set.
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20201117232929.2156341-2-kpsingh@chromium.org
KP Singh [Tue, 17 Nov 2020 23:29:28 +0000 (23:29 +0000)]
bpf: Add bpf_bprm_opts_set helper
The helper allows modification of certain bits on the linux_binprm
struct starting with the secureexec bit which can be updated using the
BPF_F_BPRM_SECUREEXEC flag.
secureexec can be set by the LSM for privilege gaining executions to set
the AT_SECURE auxv for glibc. When set, the dynamic linker disables the
use of certain environment variables (like LD_PRELOAD).
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20201117232929.2156341-1-kpsingh@chromium.org
Daniel Borkmann [Tue, 17 Nov 2020 21:07:40 +0000 (22:07 +0100)]
Merge branch 'af-xdp-tx-batch'
Magnus Karlsson says:
====================
This patch set improves the performance of mainly the Tx processing of
AF_XDP sockets. Though, patch 3 also improves the Rx path. All in all,
this patch set improves the throughput of the l2fwd xdpsock application
by around 11%. If we just take a look at Tx processing part, it is
improved by 35% to 40%.
Hopefully the new batched Tx interfaces should be of value to other
drivers implementing AF_XDP zero-copy support. But patch #3 is generic
and will improve performance of all drivers when using AF_XDP sockets
(under the premises explained in that patch).
@Daniel. In patch 3, I apply all the padding required to hinder the
adjacency prefetcher to prefetch the wrong things. After this patch
set, I will submit another patch set that introduces
____cacheline_padding_in_smp in include/linux/cache.h according to your
suggestions. The last patch in that patch set will then convert the
explicit paddings that we have now to ____cacheline_padding_in_smp.
v2 -> v3:
* Fixed #pragma warning with clang and defined a loop_unrolled_for macro
for easier readability [lkp, Nick]
* Simplified invalid descriptor handling in xskq_cons_read_desc_batch()
v1 -> v2:
* Removed added parameter in i40e_setup_tx_descriptors and adopted a
simpler solution [Maciej]
* Added test for !xs in xsk_tx_peek_release_desc_batch() [John]
* Simplified return path in xsk_tx_peek_release_desc_batch() [John]
* Dropped patch #1 in v1 that introduced lazy completions. Hopefully
this is not needed when we get busy poll [Jakub]
* Iterate over local variable in xskq_prod_reserve_addr_batch() for
improved performance
* Fixed the fallback path in xsk_tx_peek_release_desc_batch() so that
it also produces a batch of descriptors, albeit by using the slower
(but more general) older code. This improves the performance of the
case when multiple sockets are sharing the same device and queue id.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Magnus Karlsson [Mon, 16 Nov 2020 11:12:47 +0000 (12:12 +0100)]
i40e: Use batched xsk Tx interfaces to increase performance
Use the new batched xsk interfaces for the Tx path in the i40e driver
to improve performance. On my machine, this yields a throughput
increase of 4% for the l2fwd sample app in xdpsock. If we instead just
look at the Tx part, this patch set increases throughput with above
20% for Tx.
Note that I had to explicitly loop unroll the inner loop to get to
this performance level, by using a pragma. It is honored by both clang
and gcc and should be ignored by versions that do not support
it. Using the -funroll-loops compiler command line switch on the
source file resulted in a loop unrolling on a higher level that
lead to a performance decrease instead of an increase.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/1605525167-14450-6-git-send-email-magnus.karlsson@gmail.com
Magnus Karlsson [Mon, 16 Nov 2020 11:12:46 +0000 (12:12 +0100)]
xsk: Introduce batched Tx descriptor interfaces
Introduce batched descriptor interfaces in the xsk core code for the
Tx path to be used in the driver to write a code path with higher
performance. This interface will be used by the i40e driver in the
next patch. Though other drivers would likely benefit from this new
interface too.
Note that batching is only implemented for the common case when
there is only one socket bound to the same device and queue id. When
this is not the case, we fall back to the old non-batched version of
the function.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/1605525167-14450-5-git-send-email-magnus.karlsson@gmail.com
Magnus Karlsson [Mon, 16 Nov 2020 11:12:45 +0000 (12:12 +0100)]
xsk: Introduce padding between more ring pointers
Introduce one cache line worth of padding between the consumer pointer
and the flags field as well as between the flags field and the start
of the descriptors in all the lockless rings. This so that the x86 HW
adjacency prefetcher will not prefetch the adjacent pointer/field when
only one pointer/field is going to be used. This improves throughput
performance for the l2fwd sample app with 1% on my machine with HW
prefetching turned on in the BIOS.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/1605525167-14450-4-git-send-email-magnus.karlsson@gmail.com
Magnus Karlsson [Mon, 16 Nov 2020 11:12:44 +0000 (12:12 +0100)]
i40e: Remove unnecessary sw_ring access from xsk Tx
Remove the unnecessary access to the software ring for the AF_XDP
zero-copy driver. This was used to record the length of the packet so
that the driver Tx completion code could sum this up to produce the
total bytes sent. This is now performed during the transmission of the
packet, so no need to record this in the software ring.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/1605525167-14450-3-git-send-email-magnus.karlsson@gmail.com
Magnus Karlsson [Mon, 16 Nov 2020 11:12:43 +0000 (12:12 +0100)]
samples/bpf: Increment Tx stats at sending
Increment the statistics over how many Tx packets have been sent at
the time of sending instead of at the time of completion. This as a
completion event means that the buffer has been sent AND returned to
user space. The packet always gets sent shortly after sendto() is
called. The kernel might, for performance reasons, decide to not
return every single buffer to user space immediately after sending,
for example, only after a batch of packets have been
transmitted. Incrementing the number of packets sent at completion,
will in that case be confusing as if you send a single packet, the
counter might show zero for a while even though the packet has been
transmitted.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/1605525167-14450-2-git-send-email-magnus.karlsson@gmail.com
Alan Maguire [Sun, 15 Nov 2020 10:46:35 +0000 (10:46 +0000)]
libbpf: bpf__find_by_name[_kind] should use btf__get_nr_types()
When operating on split BTF, btf__find_by_name[_kind] will not
iterate over all types since they use btf->nr_types to show
the number of types to iterate over. For split BTF this is
the number of types _on top of base BTF_, so it will
underestimate the number of types to iterate over, especially
for vmlinux + module BTF, where the latter is much smaller.
Use btf__get_nr_types() instead.
Fixes:
ba451366bf44 ("libbpf: Implement basic split BTF support")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1605437195-2175-1-git-send-email-alan.maguire@oracle.com
Martin KaFai Lau [Mon, 16 Nov 2020 20:01:13 +0000 (12:01 -0800)]
bpf: Fix the irq and nmi check in bpf_sk_storage for tracing usage
The intention of the current check is to avoid using bpf_sk_storage
in irq and nmi. Jakub pointed out that the current check cannot
do that. For example, in_serving_softirq() returns true
if the softirq handling is interrupted by hard irq.
Fixes:
8e4597c627fb ("bpf: Allow using bpf_sk_storage in FENTRY/FEXIT/RAW_TP")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201116200113.2868539-1-kafai@fb.com
Santucci Pierpaolo [Mon, 16 Nov 2020 10:30:37 +0000 (11:30 +0100)]
selftest/bpf: Fix IPV6FR handling in flow dissector
From second fragment on, IPV6FR program must stop the dissection of IPV6
fragmented packet. This is the same approach used for IPV4 fragmentation.
This fixes the flow keys calculation for the upper-layer protocols.
Note that according to RFC8200, the first fragment packet must include
the upper-layer header.
Signed-off-by: Santucci Pierpaolo <santucci@epigenesys.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/X7JUzUj34ceE2wBm@santucci.pierpaolo
Jakub Kicinski [Sat, 14 Nov 2020 21:23:01 +0000 (13:23 -0800)]
Merge branch 'ionic-updates'
Shannon Nelson says:
====================
ionic updates
These updates are a bit of code cleaning and a minor
bit of performance tweaking.
v3: convert ionic_lif_quiesce() to void
v2: added void cast on call to ionic_lif_quiesce()
lowered batching threshold
added patch to flatten calls to ionic_lif_rx_mode
added patch to change from_ndo to can_sleep
====================
Link: https://lore.kernel.org/r/20201112182208.46770-1-snelson@pensando.io
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shannon Nelson [Thu, 12 Nov 2020 18:22:08 +0000 (10:22 -0800)]
ionic: useful names for booleans
With a few more uses of true and false in function calls, we
need to give them some useful names so we can tell from the
calling point what we're doing.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shannon Nelson [Thu, 12 Nov 2020 18:22:07 +0000 (10:22 -0800)]
ionic: change set_rx_mode from_ndo to can_sleep
Instead of having two different ways of expressing the same
sleepability concept, using opposite logic, we can rework the
from_ndo to can_sleep for a more consistent usage.
Fixes:
1800eee16676 ("net: ionic: Replace in_interrupt() usage.")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shannon Nelson [Thu, 12 Nov 2020 18:22:06 +0000 (10:22 -0800)]
ionic: flatten calls to ionic_lif_rx_mode
The _ionic_lif_rx_mode() is only used once and really doesn't
need to be broken out.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shannon Nelson [Thu, 12 Nov 2020 18:22:05 +0000 (10:22 -0800)]
ionic: use mc sync for multicast filters
We should be using the multicast sync routines for the multicast
filters. Also, let's just flatten the logic a bit and pull
the small unicast routine back into ionic_set_rx_mode().
Fixes:
1800eee16676 ("net: ionic: Replace in_interrupt() usage.")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shannon Nelson [Thu, 12 Nov 2020 18:22:04 +0000 (10:22 -0800)]
ionic: batch rx buffer refilling
We don't need to refill the rx descriptors on every napi
if only a few were handled. Waiting until we can batch up
a few together will save us a few Rx cycles.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shannon Nelson [Thu, 12 Nov 2020 18:22:03 +0000 (10:22 -0800)]
ionic: add lif quiesce
After the queues are stopped, expressly quiesce the lif.
This assures that even if the queues were in an odd state,
the firmware will close up everything cleanly.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shannon Nelson [Thu, 12 Nov 2020 18:22:02 +0000 (10:22 -0800)]
ionic: check for link after netdev registration
Request a link check as soon as the netdev is registered rather
than waiting for the watchdog to go off in order to get the
interface operational a little more quickly.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shannon Nelson [Thu, 12 Nov 2020 18:22:01 +0000 (10:22 -0800)]
ionic: start queues before announcing link up
Change the order of operations in the link_up handling to be
sure that the queues are up and ready before we announce that
the link is up.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
YueHaibing [Thu, 12 Nov 2020 14:49:36 +0000 (22:49 +0800)]
net: macb: Fix passing zero to 'PTR_ERR'
Check PTR_ERR with IS_ERR to fix this.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20201112144936.54776-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lukas Bulwahn [Fri, 13 Nov 2020 13:50:12 +0000 (14:50 +0100)]
ipv6: remove unused function ipv6_skb_idev()
Commit
bdb7cc643fc9 ("ipv6: Count interface receive statistics on the
ingress netdev") removed all callees for ipv6_skb_idev(). Hence, since
then, ipv6_skb_idev() is unused and make CC=clang W=1 warns:
net/ipv6/exthdrs.c:909:33:
warning: unused function 'ipv6_skb_idev' [-Wunused-function]
So, remove this unused function and a -Wunused-function warning.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Link: https://lore.kernel.org/r/20201113135012.32499-1-lukas.bulwahn@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 14 Nov 2020 17:13:40 +0000 (09:13 -0800)]
Merge git://git./linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2020-11-14
1) Add BTF generation for kernel modules and extend BTF infra in kernel
e.g. support for split BTF loading and validation, from Andrii Nakryiko.
2) Support for pointers beyond pkt_end to recognize LLVM generated patterns
on inlined branch conditions, from Alexei Starovoitov.
3) Implements bpf_local_storage for task_struct for BPF LSM, from KP Singh.
4) Enable FENTRY/FEXIT/RAW_TP tracing program to use the bpf_sk_storage
infra, from Martin KaFai Lau.
5) Add XDP bulk APIs that introduce a defer/flush mechanism to optimize the
XDP_REDIRECT path, from Lorenzo Bianconi.
6) Fix a potential (although rather theoretical) deadlock of hashtab in NMI
context, from Song Liu.
7) Fixes for cross and out-of-tree build of bpftool and runqslower allowing build
for different target archs on same source tree, from Jean-Philippe Brucker.
8) Fix error path in htab_map_alloc() triggered from syzbot, from Eric Dumazet.
9) Move functionality from test_tcpbpf_user into the test_progs framework so it
can run in BPF CI, from Alexander Duyck.
10) Lift hashtab key_size limit to be larger than MAX_BPF_STACK, from Florian Lehner.
Note that for the fix from Song we have seen a sparse report on context
imbalance which requires changes in sparse itself for proper annotation
detection where this is currently being discussed on linux-sparse among
developers [0]. Once we have more clarification/guidance after their fix,
Song will follow-up.
[0] https://lore.kernel.org/linux-sparse/CAHk-=wh4bx8A8dHnX612MsDO13st6uzAz1mJ1PaHHVevJx_ZCw@mail.gmail.com/T/
https://lore.kernel.org/linux-sparse/
20201109221345.uklbp3lzgq6g42zb@ltop.local/T/
* git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (66 commits)
net: mlx5: Add xdp tx return bulking support
net: mvpp2: Add xdp tx return bulking support
net: mvneta: Add xdp tx return bulking support
net: page_pool: Add bulk support for ptr_ring
net: xdp: Introduce bulking for xdp tx return path
bpf: Expose bpf_d_path helper to sleepable LSM hooks
bpf: Augment the set of sleepable LSM hooks
bpf: selftest: Use bpf_sk_storage in FENTRY/FEXIT/RAW_TP
bpf: Allow using bpf_sk_storage in FENTRY/FEXIT/RAW_TP
bpf: Rename some functions in bpf_sk_storage
bpf: Folding omem_charge() into sk_storage_charge()
selftests/bpf: Add asm tests for pkt vs pkt_end comparison.
selftests/bpf: Add skb_pkt_end test
bpf: Support for pointers beyond pkt_end.
tools/bpf: Always run the *-clean recipes
tools/bpf: Add bootstrap/ to .gitignore
bpf: Fix NULL dereference in bpf_task_storage
tools/bpftool: Fix build slowdown
tools/runqslower: Build bpftool using HOSTCC
tools/runqslower: Enable out-of-tree build
...
====================
Link: https://lore.kernel.org/r/20201114020819.29584-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Steen Hegelund [Thu, 12 Nov 2020 09:22:50 +0000 (10:22 +0100)]
net: phy: mscc: Add PTP support for 2 more VSC PHYs
Add VSC8572 and VSC8574 in the PTP configuration
as they also support PTP.
The relevant datasheets can be found here:
- VSC8572: https://www.microchip.com/wwwproducts/en/VSC8572
- VSC8574: https://www.microchip.com/wwwproducts/en/VSC8574
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Link: https://lore.kernel.org/r/20201112092250.914079-1-steen.hegelund@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Borkmann [Sat, 14 Nov 2020 01:29:00 +0000 (02:29 +0100)]
Merge branch 'xdp-redirect-bulk'
Lorenzo Bianconi says:
====================
XDP bulk APIs introduce a defer/flush mechanism to return
pages belonging to the same xdp_mem_allocator object
(identified via the mem.id field) in bulk to optimize
I-cache and D-cache since xdp_return_frame is usually
run inside the driver NAPI tx completion loop.
Convert mvneta, mvpp2 and mlx5 drivers to xdp_return_frame_bulk APIs.
More details on benchmarks run on mlx5 can be found here:
https://github.com/xdp-project/xdp-project/blob/master/areas/mem/xdp_bulk_return01.org
Changes since v5:
- do not keep looping over ptr_ring if the cache is full but release leftover
pages running page_pool_return_page
Changes since v4:
- fix comments
- introduce xdp_frame_bulk_init utility routine
- compiler annotations for I-cache code layout
- move rcu_read_lock outside fast-path
- mlx5 xdp bulking code optimization
Changes since v3:
- align DEV_MAP_BULK_SIZE to XDP_BULK_QUEUE_SIZE
- refactor page_pool_put_page_bulk to avoid code duplication
Changes since v2:
- move mvneta changes in a dedicated patch
Changes since v1:
- improve comments
- rework xdp_return_frame_bulk routine logic
- move count and xa fields at the beginning of xdp_frame_bulk struct
- invert logic in page_pool_put_page_bulk for loop
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Lorenzo Bianconi [Fri, 13 Nov 2020 11:48:32 +0000 (12:48 +0100)]
net: mlx5: Add xdp tx return bulking support
Convert mlx5 driver to xdp_return_frame_bulk APIs.
XDP_REDIRECT (upstream codepath): 8.9Mpps
XDP_REDIRECT (upstream codepath + bulking APIs): 10.2Mpps
Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/250460319fd868b7b5668fc1deca74dd42813a90.1605267335.git.lorenzo@kernel.org
Lorenzo Bianconi [Fri, 13 Nov 2020 11:48:31 +0000 (12:48 +0100)]
net: mvpp2: Add xdp tx return bulking support
Convert mvpp2 driver to xdp_return_frame_bulk APIs.
XDP_REDIRECT (upstream codepath): 1.79Mpps
XDP_REDIRECT (upstream codepath + bulking APIs): 1.93Mpps
Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Matteo Croce <mcroce@microsoft.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/0b38c295e58e8ce251ef6b4e2187a2f457f9f7a3.1605267335.git.lorenzo@kernel.org
Lorenzo Bianconi [Fri, 13 Nov 2020 11:48:30 +0000 (12:48 +0100)]
net: mvneta: Add xdp tx return bulking support
Convert mvneta driver to xdp_return_frame_bulk APIs.
XDP_REDIRECT (upstream codepath): 275Kpps
XDP_REDIRECT (upstream codepath + bulking APIs): 284Kpps
Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/9af8014006d022fc0fec78cdaa71beb56999750d.1605267335.git.lorenzo@kernel.org
Lorenzo Bianconi [Fri, 13 Nov 2020 11:48:29 +0000 (12:48 +0100)]
net: page_pool: Add bulk support for ptr_ring
Introduce the capability to batch page_pool ptr_ring refill since it is
usually run inside the driver NAPI tx completion loop.
Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Link: https://lore.kernel.org/bpf/08dd249c9522c001313f520796faa777c4089e1c.1605267335.git.lorenzo@kernel.org
Lorenzo Bianconi [Fri, 13 Nov 2020 11:48:28 +0000 (12:48 +0100)]
net: xdp: Introduce bulking for xdp tx return path
XDP bulk APIs introduce a defer/flush mechanism to return
pages belonging to the same xdp_mem_allocator object
(identified via the mem.id field) in bulk to optimize
I-cache and D-cache since xdp_return_frame is usually run
inside the driver NAPI tx completion loop.
The bulk queue size is set to 16 to be aligned to how
XDP_REDIRECT bulking works. The bulk is flushed when
it is full or when mem.id changes.
xdp_frame_bulk is usually stored/allocated on the function
call-stack to avoid locking penalties.
Current implementation considers only page_pool memory model.
Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Link: https://lore.kernel.org/bpf/e190c03eac71b20c8407ae0fc2c399eda7835f49.1605267335.git.lorenzo@kernel.org
Jisheng Zhang [Thu, 12 Nov 2020 01:27:37 +0000 (09:27 +0800)]
net: stmmac: platform: use optional clk/reset get APIs
Use the devm_reset_control_get_optional() and devm_clk_get_optional()
rather than open coding them.
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Link: https://lore.kernel.org/r/20201112092606.5173aa6f@xhacker.debian
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit [Wed, 11 Nov 2020 21:56:29 +0000 (22:56 +0100)]
r8169: improve rtl_tx
We can simplify the for() condition and eliminate variable tx_left.
The change also considers that tp->cur_tx may be incremented by a
racing rtl8169_start_xmit().
In addition replace the write to tp->dirty_tx and the following
smp_mb() with an equivalent call to smp_store_mb(). This implicitly
adds a WRITE_ONCE() to the write.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/c2e19e5e-3d3f-d663-af32-13c3374f5def@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit [Wed, 11 Nov 2020 21:14:27 +0000 (22:14 +0100)]
r8169: use READ_ONCE in rtl_tx_slots_avail
tp->dirty_tx and tp->cur_tx may be changed by a racing rtl_tx() or
rtl8169_start_xmit(). Use READ_ONCE() to annotate the races and ensure
that the compiler doesn't use cached values.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/5676fee3-f6b4-84f2-eba5-c64949a371ad@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 13 Nov 2020 23:37:10 +0000 (15:37 -0800)]
Merge branch 'net-ipa-two-fixes'
Alex Elder says:
====================
net: ipa: two fixes
This small series makes two fixes to the IPA code:
- While reviewing something else I found that one of the resource
limits on the SDM845 used the wrong value. The first patch
fixes this. The correct value allocates more resources of this
type for IPA to use, and otherwise does not change behavior.
- When the IPA-resident microcontroller starts up it generates an
event, which triggers an AP interrupt. The event merely
provides some information for logging, which we don't support.
We already ignore the event, and that's harmless. So this
patch explicitly ignores it rather than issuing a warning when
it occurs.
====================
Link: https://lore.kernel.org/r/20201112121157.19784-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Thu, 12 Nov 2020 12:20:00 +0000 (06:20 -0600)]
net: ipa: ignore the microcontroller log event
The IPA-resident microcontroller has the ability to log various
activity in an area of IPA shared memory. When the microcontroller
starts it generates an event to the AP to provide information about
the log.
We don't support reading this log, and we can safely ignore the
event. So do that rather than treating the log info event we
receive as "unsupported."
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Thu, 12 Nov 2020 12:11:56 +0000 (06:11 -0600)]
net: ipa: fix source packet contexts limit
I have discovered that the maximum number of source packet contexts
configured for SDM845 is incorrect. Fix this error.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 13 Nov 2020 23:33:37 +0000 (15:33 -0800)]
Merge branch 'sfc-further-ef100-encap-tso-features'
Edward Cree says:
====================
sfc: further EF100 encap TSO features
This series adds support for GRE and GRE_CSUM TSO on EF100 NICs, as
well as improving the handling of UDP tunnel TSO.
====================
Link: https://lore.kernel.org/r/eda2de73-edf2-8b92-edb9-099ebda09ebc@solarflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Edward Cree [Thu, 12 Nov 2020 15:20:05 +0000 (15:20 +0000)]
sfc: support GRE TSO on EF100
We can treat SKB_GSO_GRE almost exactly the same as UDP tunnels, except
that we don't want to edit the outer UDP len (as there isn't one).
For SKB_GSO_GRE_CSUM, we have to use GSO_PARTIAL as the device doesn't
support offload of non-UDP outer L4 checksums.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Martin Habets <mhabets@solarflare.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Edward Cree [Thu, 12 Nov 2020 15:19:47 +0000 (15:19 +0000)]
sfc: correctly support non-partial GSO_UDP_TUNNEL_CSUM on EF100
By asking the HW for the correct edits, we can make UDP tunnel TSO
work without needing GSO_PARTIAL. So don't specify it in our
netdev->gso_partial_features.
However, retain GSO_PARTIAL support, as this will be used for other
protocols later.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Martin Habets <mhabets@solarflare.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Edward Cree [Thu, 12 Nov 2020 15:19:26 +0000 (15:19 +0000)]
sfc: extend bitfield macros to 19 fields
Our TSO descriptors got even more fussy.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Martin Habets <mhabets@solarflare.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Jakub Kicinski [Fri, 13 Nov 2020 23:13:45 +0000 (15:13 -0800)]
Merge branch 'net-ipa-gsi-register-consolidation'
Alex Elder says:
====================
net: ipa: GSI register consolidation
This series rearranges and consolidates some GSI register
definitions. Its general aim is to make things more
consistent, by:
- Using enumerated types to define the values held in GSI register
fields
- Defining field values in "gsi_reg.h", together with the
definition of the register (and field) that holds them
- Format enumerated type members consistently, with hexidecimal
numeric values, and assignments aligned on the same column
There is one checkpatch "CHECK" warning requesting a blank line; I
ignored that because my intention was to group certain definitions.
====================
Link: https://lore.kernel.org/r/20201110215922.23514-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Tue, 10 Nov 2020 21:59:22 +0000 (15:59 -0600)]
net: ipa: use enumerated types for GSI field values
Replace constants defined with an "_FVAL" suffix with values defined
in enumerated types, to be consistent with other usage in the driver.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Tue, 10 Nov 2020 21:59:21 +0000 (15:59 -0600)]
net: ipa: move GSI command opcode values into "gsi_reg.h"
The gsi_ch_cmd_opcode, gsi_evt_cmd_opcode, and gsi_generic_cmd_opcode
enumerated types are values that fields in the GSI command registers
can take on. Move their definitions out of "gsi.c" and into "gsi_reg.h",
alongside the definition of registers they are associated with.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Tue, 10 Nov 2020 21:59:20 +0000 (15:59 -0600)]
net: ipa: move GSI error values into "gsi_reg.h"
The gsi_err_code and gsi_err_type enumerated types are values that
fields in the GSI ERROR_LOG register can take on. Move their
definitions out of "gsi.c" and into "gsi_reg.h", alongside the
definition of the ERROR_LOG register offset and field symbols.
Drop the "_ERR" suffix in the names of the gsi_err_code members.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Tue, 10 Nov 2020 21:59:19 +0000 (15:59 -0600)]
net: ipa: move channel type values into "gsi_reg.h"
The gsi_channel_type enumerated type define values used for the
channel type/protocol for event rings and channels. Move its
definition out of "gsi.c" and into "gsi_reg.h", alongside the
definition of the CH_C_CNTXT_0 register offset and its fields.
Add a comment near the definition of the EV_CH_E_CNTXT_0 register
indicating this type is used for its EV_CHTYPE field.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Tue, 10 Nov 2020 21:59:18 +0000 (15:59 -0600)]
net: ipa: use common value for channel type and protocol
The numeric values that represent the event ring channel type are
identical to the values that represent the matching protocol used
for a channel. Use a new gsi_channel_type enumerated type to
represent the values programmed for both cases, using "CHANNEL_TYPE"
in member names in place of "EVT_CHTYPE" and "CHANNEL_PROTOCOL".
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Tue, 10 Nov 2020 21:59:17 +0000 (15:59 -0600)]
net: ipa: define GSI interrupt types with enums
Define the GSI global interrupt types with an enumerated type whose
values are the bit positions representing the global interrupt types.
Similarly, define the GSI general interrupt types with an enumerated
type whose values are the bit positions of general interrupt types.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wenlin Kang [Thu, 12 Nov 2020 09:34:42 +0000 (17:34 +0800)]
tipc: fix -Wstringop-truncation warnings
Replace strncpy() with strscpy(), fixes the following warning:
In function 'bearer_name_validate',
inlined from 'tipc_enable_bearer' at net/tipc/bearer.c:246:7:
net/tipc/bearer.c:141:2: warning: 'strncpy' specified bound 32 equals destination size [-Wstringop-truncation]
strncpy(name_copy, name, TIPC_MAX_BEARER_NAME);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Wenlin Kang <wenlin.kang@windriver.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Link: https://lore.kernel.org/r/20201112093442.8132-1-wenlin.kang@windriver.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 13 Nov 2020 20:03:21 +0000 (12:03 -0800)]
Merge tag 'mac80211-next-for-net-next-2020-11-13' of git://git./linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
Some updates:
* injection/radiotap updates for new test capabilities
* remove WDS support - even years ago when we turned
it off by default it was already basically unusable
* support for HE (802.11ax) rates for beacons
* support for some vendor-specific HE rates
* many other small features/cleanups
* tag 'mac80211-next-for-net-next-2020-11-13' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next: (21 commits)
nl80211: fix kernel-doc warning in the new SAE attribute
cfg80211: remove WDS code
mac80211: remove WDS-related code
rt2x00: remove WDS code
b43legacy: remove WDS code
b43: remove WDS code
carl9170: remove WDS code
ath9k: remove WDS code
wireless: remove CONFIG_WIRELESS_WDS
mac80211: assure that certain drivers adhere to DONT_REORDER flag
mac80211: don't overwrite QoS TID of injected frames
mac80211: adhere to Tx control flag that prevents frame reordering
mac80211: add radiotap flag to assure frames are not reordered
mac80211: save HE oper info in BSS config for mesh
cfg80211: add support to configure HE MCS for beacon rate
nl80211: fix beacon tx rate mask validation
nl80211/cfg80211: fix potential infinite loop
cfg80211: Add support to calculate and report 4096-QAM HE rates
cfg80211: Add support to configure SAE PWE value to drivers
ieee80211: Add definition for WFA DPP
...
====================
Link: https://lore.kernel.org/r/20201113101148.25268-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
KP Singh [Fri, 13 Nov 2020 00:59:30 +0000 (00:59 +0000)]
bpf: Expose bpf_d_path helper to sleepable LSM hooks
Sleepable hooks are never called from an NMI/interrupt context, so it
is safe to use the bpf_d_path helper in LSM programs attaching to these
hooks.
The helper is not restricted to sleepable programs and merely uses the
list of sleepable hooks as the initial subset of LSM hooks where it can
be used.
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20201113005930.541956-3-kpsingh@chromium.org
KP Singh [Fri, 13 Nov 2020 00:59:29 +0000 (00:59 +0000)]
bpf: Augment the set of sleepable LSM hooks
Update the set of sleepable hooks with the ones that do not trigger
a warning with might_fault() when exercised with the correct kernel
config options enabled, i.e.
DEBUG_ATOMIC_SLEEP=y
LOCKDEP=y
PROVE_LOCKING=y
This means that a sleepable LSM eBPF program can be attached to these
LSM hooks. A new helper method bpf_lsm_is_sleepable_hook is added and
the set is maintained locally in bpf_lsm.c
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20201113005930.541956-2-kpsingh@chromium.org
Alexei Starovoitov [Fri, 13 Nov 2020 02:39:28 +0000 (18:39 -0800)]
Merge branch 'bpf: Enable bpf_sk_storage for FENTRY/FEXIT/RAW_TP'
Martin KaFai says:
====================
This set is to allow the FENTRY/FEXIT/RAW_TP tracing program to use
bpf_sk_storage. The first two patches are a cleanup. The last patch is
tests. Patch 3 has the required kernel changes to
enable bpf_sk_storage for FENTRY/FEXIT/RAW_TP.
Please see individual patch for details.
v2:
- Rename some of the function prefix from sk_storage to bpf_sk_storage
- Use prefix check instead of substr check
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Martin KaFai Lau [Thu, 12 Nov 2020 21:13:20 +0000 (13:13 -0800)]
bpf: selftest: Use bpf_sk_storage in FENTRY/FEXIT/RAW_TP
This patch tests storing the task's related info into the
bpf_sk_storage by fentry/fexit tracing at listen, accept,
and connect. It also tests the raw_tp at inet_sock_set_state.
A negative test is done by tracing the bpf_sk_storage_free()
and using bpf_sk_storage_get() at the same time. It ensures
this bpf program cannot load.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201112211320.2587537-1-kafai@fb.com
Martin KaFai Lau [Thu, 12 Nov 2020 21:13:13 +0000 (13:13 -0800)]
bpf: Allow using bpf_sk_storage in FENTRY/FEXIT/RAW_TP
This patch enables the FENTRY/FEXIT/RAW_TP tracing program to use
the bpf_sk_storage_(get|delete) helper, so those tracing programs
can access the sk's bpf_local_storage and the later selftest
will show some examples.
The bpf_sk_storage is currently used in bpf-tcp-cc, tc,
cg sockops...etc which is running either in softirq or
task context.
This patch adds bpf_sk_storage_get_tracing_proto and
bpf_sk_storage_delete_tracing_proto. They will check
in runtime that the helpers can only be called when serving
softirq or running in a task context. That should enable
most common tracing use cases on sk.
During the load time, the new tracing_allowed() function
will ensure the tracing prog using the bpf_sk_storage_(get|delete)
helper is not tracing any bpf_sk_storage*() function itself.
The sk is passed as "void *" when calling into bpf_local_storage.
This patch only allows tracing a kernel function.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201112211313.2587383-1-kafai@fb.com
Martin KaFai Lau [Thu, 12 Nov 2020 21:13:07 +0000 (13:13 -0800)]
bpf: Rename some functions in bpf_sk_storage
Rename some of the functions currently prefixed with sk_storage
to bpf_sk_storage. That will make the next patch have fewer
prefix check and also bring the bpf_sk_storage.c to a more
consistent function naming.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: KP Singh <kpsingh@google.com>
Link: https://lore.kernel.org/bpf/20201112211307.2587021-1-kafai@fb.com
Martin KaFai Lau [Thu, 12 Nov 2020 21:13:01 +0000 (13:13 -0800)]
bpf: Folding omem_charge() into sk_storage_charge()
sk_storage_charge() is the only user of omem_charge().
This patch simplifies it by folding omem_charge() into
sk_storage_charge().
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: KP Singh <kpsingh@google.com>
Link: https://lore.kernel.org/bpf/20201112211301.2586255-1-kafai@fb.com
Jakub Kicinski [Fri, 13 Nov 2020 00:54:48 +0000 (16:54 -0800)]
Merge https://git./linux/kernel/git/netdev/net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Borkmann [Fri, 13 Nov 2020 00:42:12 +0000 (01:42 +0100)]
Merge branch 'bpf-ptrs-beyond-pkt-end'
Alexei Starovoitov says:
====================
v1->v2:
- removed set-but-unused variable.
- added Jiri's Tested-by.
In some cases LLVM uses the knowledge that branch is taken to optimze the code
which causes the verifier to reject valid programs.
Teach the verifier to recognize that
r1 = skb->data;
r1 += 10;
r2 = skb->data_end;
if (r1 > r2) {
here r1 points beyond packet_end and subsequent
if (r1 > r2) // always evaluates to "true".
}
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Alexei Starovoitov [Wed, 11 Nov 2020 03:12:13 +0000 (19:12 -0800)]
selftests/bpf: Add asm tests for pkt vs pkt_end comparison.
Add few assembly tests for packet comparison.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20201111031213.25109-4-alexei.starovoitov@gmail.com
Alexei Starovoitov [Wed, 11 Nov 2020 03:12:12 +0000 (19:12 -0800)]
selftests/bpf: Add skb_pkt_end test
Add a test that currently makes LLVM generate assembly code:
$ llvm-objdump -S skb_pkt_end.o
0000000000000000 <main_prog>:
; if (skb_shorter(skb, ETH_IPV4_TCP_SIZE))
0: 61 12 50 00 00 00 00 00 r2 = *(u32 *)(r1 + 80)
1: 61 14 4c 00 00 00 00 00 r4 = *(u32 *)(r1 + 76)
2: bf 43 00 00 00 00 00 00 r3 = r4
3: 07 03 00 00 36 00 00 00 r3 += 54
4: b7 01 00 00 00 00 00 00 r1 = 0
5: 2d 23 02 00 00 00 00 00 if r3 > r2 goto +2 <LBB0_2>
6: 07 04 00 00 0e 00 00 00 r4 += 14
; if (skb_shorter(skb, ETH_IPV4_TCP_SIZE))
7: bf 41 00 00 00 00 00 00 r1 = r4
0000000000000040 <LBB0_2>:
8: b4 00 00 00 ff ff ff ff w0 = -1
; if (!(ip = get_iphdr(skb)))
9: 2d 23 05 00 00 00 00 00 if r3 > r2 goto +5 <LBB0_6>
; proto = ip->protocol;
10: 71 12 09 00 00 00 00 00 r2 = *(u8 *)(r1 + 9)
; if (proto != IPPROTO_TCP)
11: 56 02 03 00 06 00 00 00 if w2 != 6 goto +3 <LBB0_6>
; if (tcp->dest != 0)
12: 69 12 16 00 00 00 00 00 r2 = *(u16 *)(r1 + 22)
13: 56 02 01 00 00 00 00 00 if w2 != 0 goto +1 <LBB0_6>
; return tcp->urg_ptr;
14: 69 10 26 00 00 00 00 00 r0 = *(u16 *)(r1 + 38)
0000000000000078 <LBB0_6>:
; }
15: 95 00 00 00 00 00 00 00 exit
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20201111031213.25109-3-alexei.starovoitov@gmail.com
Alexei Starovoitov [Wed, 11 Nov 2020 03:12:11 +0000 (19:12 -0800)]
bpf: Support for pointers beyond pkt_end.
This patch adds the verifier support to recognize inlined branch conditions.
The LLVM knows that the branch evaluates to the same value, but the verifier
couldn't track it. Hence causing valid programs to be rejected.
The potential LLVM workaround: https://reviews.llvm.org/D87428
can have undesired side effects, since LLVM doesn't know that
skb->data/data_end are being compared. LLVM has to introduce extra boolean
variable and use inline_asm trick to force easier for the verifier assembly.
Instead teach the verifier to recognize that
r1 = skb->data;
r1 += 10;
r2 = skb->data_end;
if (r1 > r2) {
here r1 points beyond packet_end and
subsequent
if (r1 > r2) // always evaluates to "true".
}
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20201111031213.25109-2-alexei.starovoitov@gmail.com
Guillaume Nault [Wed, 11 Nov 2020 15:05:35 +0000 (16:05 +0100)]
selftests: set conf.all.rp_filter=0 in bareudp.sh
When working on the rp_filter problem, I didn't realise that disabling
it on the network devices didn't cover all cases: rp_filter could also
be enabled globally in the namespace, in which case it would drop
packets, even if the net device has rp_filter=0.
Fixes:
1ccd58331f6f ("selftests: disable rp_filter when testing bareudp")
Fixes:
bbbc7aa45eef ("selftests: add test script for bareudp tunnels")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Link: https://lore.kernel.org/r/f2d459346471f163b239aa9d63ce3e2ba9c62895.1605107012.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 12 Nov 2020 23:55:25 +0000 (15:55 -0800)]
Merge branch 'mlxsw-spectrum-prepare-for-xm-implementation-prefix-insertion-and-removal'
Ido Schimmel says:
====================
mlxsw: spectrum: Prepare for XM implementation - prefix insertion and removal
Jiri says:
This is a preparation patchset for follow-up support of boards with
extended mezzanine (XM), which is going to allow extended (scale-wise)
router offload.
XM requires a separate PRM register named XMDR to be used instead of
RALUE to insert/update/remove FIB entries. Therefore, this patchset
extends the previously introduces low-level ops to be able to have
XM-specific FIB entry config implementation.
Currently the existing original RALUE implementation is moved to "basic"
low-level ops.
Unlike legacy router, insertion/update/removal of FIB entries into XM
could be done in bulks up to 4 items in a single PRM register write.
That is why this patchset implements "an op context", that allows the
future XM ops implementation to squash multiple FIB events to single
register write. For that, the way in which the FIB events are processed
by the work queue has to be changed.
The conversion from 1:1 FIB event - work callback call to event queue is
implemented in patch #3.
Patch #4 introduces "an op context" that will allow in future to squash
multiple FIB events into one XMDR register write. Patch #12 converts it
from stack to be allocated per instance.
Existing RALUE manipulations are pushed to ops in patch #10.
Patch #13 is introducing a possibility for low-level implementation to
have per FIB entry private memory.
The rest of the patches are either cosmetics or smaller preparations.
====================
Link: https://lore.kernel.org/r/20201110094900.1920158-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko [Tue, 10 Nov 2020 09:49:00 +0000 (11:49 +0200)]
mlxsw: spectrum_router: Introduce FIB entry update op
Follow-up patchset introducing XMDR implementation is going to need
to distinguish write and update ops. Therefore introduce "update op"
and call "write op" only when new FIB entry is inserted.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko [Tue, 10 Nov 2020 09:48:59 +0000 (11:48 +0200)]
mlxsw: spectrum_router: Track FIB entry committed state and skip uncommitted on delete
In case bulking is used, the entry that was previously added may not
be yet committed to the HW as it waits in the queue for bulk send. For
such entries, skip the deletion.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko [Tue, 10 Nov 2020 09:48:58 +0000 (11:48 +0200)]
mlxsw: spectrum_router: Introduce fib_entry priv for low-level ops
Prepare for the low-level ops that need to store some data alongside
the fib_entry and introduce a per-fib_entry priv for ll ops.
The priv is reference counted as in the follow-up patch it is going
to be saved in pack() function and used later on in commit() even in
case the related fib_entry gets freed in the middle.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko [Tue, 10 Nov 2020 09:48:57 +0000 (11:48 +0200)]
mlxsw: spectrum_router: Have FIB entry op context allocated for the instance
Get the max size needed for FIB entry op context and allocate it once
for the instance. Use it repeatedly from the scheduled work.
By this, allow to extend the context to hold more data than it is wise
to do when it was on the stack. Make sure to signalize that the context
needs to be initialized in case families of subsequent FIB entries differ.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko [Tue, 10 Nov 2020 09:48:56 +0000 (11:48 +0200)]
mlxsw: spectrum_router: Prepare work context for possible bulking
For XMDR register it is possible to carry multiple FIB entry
operations in a single write. However the FW does not restrict mixing
the types of operations, make the code easier and indicate the bulking
is ok only in case the bulk contains FIB operations of the same family
and event.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko [Tue, 10 Nov 2020 09:48:55 +0000 (11:48 +0200)]
mlxsw: spectrum: Push RALUE packing and writing into low-level router ops
With follow-up introduction of XM implementation, XMDR register is
going to be optionally used instead of RALUE register. Push the RALUE
packing helpers and write call into low-level router ops.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko [Tue, 10 Nov 2020 09:48:54 +0000 (11:48 +0200)]
mlxsw: spectrum_router: Use RALUE pack helper from abort function
Unify the RALUE register payload packing and use the
__mlxsw_sp_fib_entry_ralue_pack() helper from
__mlxsw_sp_router_set_abort_trap().
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko [Tue, 10 Nov 2020 09:48:53 +0000 (11:48 +0200)]
mlxsw: reg: Allow to pass NULL pointer to mlxsw_reg_ralue_pack4/6()
In preparation for the change that is going to be done in the next
patch, allow to pass NULL pointer to mlxsw_reg_ralue_pack4() and
mlxsw_reg_ralue_pack6() helpers.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko [Tue, 10 Nov 2020 09:48:52 +0000 (11:48 +0200)]
mlxsw: spectrum_router: Pass destination IP as a pointer to mlxsw_reg_ralue_pack4()
Instead of passing destination IP as a u32 value, pass it as pointer to
u32. Avoid using local variable for the pointer store.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko [Tue, 10 Nov 2020 09:48:51 +0000 (11:48 +0200)]
mlxsw: spectrum: Export RALUE pack helper and use it from IPIP
As the RALUE packing is going to be put into op, make the user from
IPIP code use the same helper as the router code does.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko [Tue, 10 Nov 2020 09:48:50 +0000 (11:48 +0200)]
mlxsw: spectrum_router: Push out RALUE pack into separate helper
As the RALUE packing is going to be pushed into an op, in preparation
for that push the code into a separate function in the meantime.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko [Tue, 10 Nov 2020 09:48:49 +0000 (11:48 +0200)]
mlxsw: spectrum: Propagate context from work handler containing RALUE payload
Currently, RALUE payload is defined locally in the function that is
calling the register write. With introduction of alternative register to
RALUE, XMDR, it has to be possible to put multiple FIB entry
operations into single register write.
So in order to prepare for that, have per-work entry operation context
and propagate it all the way down to the functions writing RALUE.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko [Tue, 10 Nov 2020 09:48:48 +0000 (11:48 +0200)]
mlxsw: spectrum_router: Introduce FIB event queue instead of separate works
Currently, every FIB event is queued-up as a separate work to be
processed. However, that allows to process only one FIB entry per work
callback.
In preparation of future XMDR register bulking of multiple FIB entries,
convert to FIB event queue. Implement this by a list_head, adding new
events to the end of the list in the FIB notify callback. That allows to
process multiple events from the list inside the work callback.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko [Tue, 10 Nov 2020 09:48:47 +0000 (11:48 +0200)]
mlxsw: spectrum_router: Use RALUE-independent op arg
Since the write/delete of FIB entry is going to be implemented by XMDR
register for XM implementation, introduce RALUE-independent enum for op
so the enum could be used in both RALUE and XMDR.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko [Tue, 10 Nov 2020 09:48:46 +0000 (11:48 +0200)]
mlxsw: spectrum_router: Pass non-register proto enum to __mlxsw_sp_router_set_abort_trap()
Don't pass RALXX register enum and rather pass enum mlxsw_sp_l3proto
to __mlxsw_sp_router_set_abort_trap(). This is in preparation to fib
entry pack implementation by XMDR register.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Randy Dunlap [Tue, 10 Nov 2020 17:57:46 +0000 (09:57 -0800)]
net: kcov: don't select SKB_EXTENSIONS when there is no NET
Fix kconfig warning when CONFIG_NET is not set/enabled:
WARNING: unmet direct dependencies detected for SKB_EXTENSIONS
Depends on [n]: NET [=n]
Selected by [y]:
- KCOV [=y] && ARCH_HAS_KCOV [=y] && (CC_HAS_SANCOV_TRACE_PC [=y] || GCC_PLUGINS [=n])
Fixes:
6370cc3bbd8a ("net: add kcov handle to skb extensions")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Aleksandr Nogikh <nogikh@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20201110175746.11437-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 12 Nov 2020 22:58:27 +0000 (14:58 -0800)]
Merge branch 'net-switch-further-drivers-to-core-functionality-for-handling-per-cpu-byte-packet-counters'
Heiner Kallweit says:
====================
net: switch further drivers to core functionality for handling per-cpu byte/packet counters
Switch further drivers to core functionality for handling per-cpu
byte/packet counters. All changes are compile-tested only.
====================
Link: https://lore.kernel.org/r/5fbe3a1f-6625-eadc-b1c9-f76f78debb94@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit [Tue, 10 Nov 2020 19:51:03 +0000 (20:51 +0100)]
net: usb: switch to dev_get_tstats64 and remove usbnet_get_stats64 alias
Replace usbnet_get_stats64() with new identical core function
dev_get_tstats64() in all users and remove usbnet_get_stats64().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>