linux-2.6-microblaze.git
3 years agolibbpf: Print hint when PERF_EVENT_IOC_SET_BPF returns -EPROTO
Song Liu [Thu, 23 Jul 2020 18:06:46 +0000 (11:06 -0700)]
libbpf: Print hint when PERF_EVENT_IOC_SET_BPF returns -EPROTO

The kernel prevents potential unwinder warnings and crashes by blocking
BPF program with bpf_get_[stack|stackid] on perf_event without
PERF_SAMPLE_CALLCHAIN, or with exclude_callchain_[kernel|user]. Print a
hint message in libbpf to help the user debug such issues.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723180648.1429892-4-songliubraving@fb.com
3 years agobpf: Fail PERF_EVENT_IOC_SET_BPF when bpf_get_[stack|stackid] cannot work
Song Liu [Thu, 23 Jul 2020 18:06:45 +0000 (11:06 -0700)]
bpf: Fail PERF_EVENT_IOC_SET_BPF when bpf_get_[stack|stackid] cannot work

bpf_get_[stack|stackid] on perf_events with precise_ip uses callchain
attached to perf_sample_data. If this callchain is not presented, do not
allow attaching BPF program that calls bpf_get_[stack|stackid] to this
event.

In the error case, -EPROTO is returned so that libbpf can identify this
error and print proper hint message.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723180648.1429892-3-songliubraving@fb.com
3 years agobpf: Separate bpf_get_[stack|stackid] for perf events BPF
Song Liu [Thu, 23 Jul 2020 18:06:44 +0000 (11:06 -0700)]
bpf: Separate bpf_get_[stack|stackid] for perf events BPF

Calling get_perf_callchain() on perf_events from PEBS entries may cause
unwinder errors. To fix this issue, the callchain is fetched early. Such
perf_events are marked with __PERF_SAMPLE_CALLCHAIN_EARLY.

Similarly, calling bpf_get_[stack|stackid] on perf_events from PEBS may
also cause unwinder errors. To fix this, add separate version of these
two helpers, bpf_get_[stack|stackid]_pe. These two hepers use callchain in
bpf_perf_event_data_kern->data->callchain.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723180648.1429892-2-songliubraving@fb.com
3 years agoMerge branch 'bpf_iter-for-map-elems'
Alexei Starovoitov [Fri, 24 Jul 2020 05:05:46 +0000 (22:05 -0700)]
Merge branch 'bpf_iter-for-map-elems'

Yonghong Song says:

====================
Bpf iterator has been implemented for task, task_file,
bpf_map, ipv6_route, netlink, tcp and udp so far.

For map elements, there are two ways to traverse all elements from
user space:
  1. using BPF_MAP_GET_NEXT_KEY bpf subcommand to get elements
     one by one.
  2. using BPF_MAP_LOOKUP_BATCH bpf subcommand to get a batch of
     elements.
Both these approaches need to copy data from kernel to user space
in order to do inspection.

This patch implements bpf iterator for map elements.
User can have a bpf program in kernel to run with each map element,
do checking, filtering, aggregation, modifying values etc.
without copying data to user space.

Patch #1 and #2 are refactoring. Patch #3 implements readonly/readwrite
buffer support in verifier. Patches #4 - #7 implements map element
support for hash, percpu hash, lru hash lru percpu hash, array,
percpu array and sock local storage maps. Patches #8 - #9 are libbpf
and bpftool support. Patches #10 - #13 are selftests for implemented
map element iterators.

Changelogs:
  v3 -> v4:
    . fix a kasan failure triggered by a failed bpf_iter link_create,
      not just free_link but need cleanup_link. (Alexei)
  v2 -> v3:
    . rebase on top of latest bpf-next
  v1 -> v2:
    . support to modify map element values. (Alexei)
    . map key/values can be used with helper arguments
      for those arguments with ARG_PTR_TO_MEM or
      ARG_PTR_TO_INIT_MEM register type. (Alexei)
    . remove usused variable. (kernel test robot)
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agoselftests/bpf: Add a test for out of bound rdonly buf access
Yonghong Song [Thu, 23 Jul 2020 18:41:24 +0000 (11:41 -0700)]
selftests/bpf: Add a test for out of bound rdonly buf access

If the bpf program contains out of bound access w.r.t. a
particular map key/value size, the verification will be
still okay, e.g., it will be accepted by verifier. But
it will be rejected during link_create time. A test
is added here to ensure link_create failure did happen
if out of bound access happened.
  $ ./test_progs -n 4
  ...
  #4/23 rdonly-buf-out-of-bound:OK
  ...

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184124.591700-1-yhs@fb.com
3 years agoselftests/bpf: Add a test for bpf sk_storage_map iterator
Yonghong Song [Thu, 23 Jul 2020 18:41:22 +0000 (11:41 -0700)]
selftests/bpf: Add a test for bpf sk_storage_map iterator

Added one test for bpf sk_storage_map_iterator.
  $ ./test_progs -n 4
  ...
  #4/22 bpf_sk_storage_map:OK
  ...

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184122.591591-1-yhs@fb.com
3 years agoselftests/bpf: Add test for bpf array map iterators
Yonghong Song [Thu, 23 Jul 2020 18:41:21 +0000 (11:41 -0700)]
selftests/bpf: Add test for bpf array map iterators

Two subtests are added.
  $ ./test_progs -n 4
  ...
  #4/20 bpf_array_map:OK
  #4/21 bpf_percpu_array_map:OK
  ...

The bpf_array_map subtest also tested bpf program
changing array element values and send key/value
to user space through bpf_seq_write() interface.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184121.591367-1-yhs@fb.com
3 years agoselftests/bpf: Add test for bpf hash map iterators
Yonghong Song [Thu, 23 Jul 2020 18:41:20 +0000 (11:41 -0700)]
selftests/bpf: Add test for bpf hash map iterators

Two subtests are added.
  $ ./test_progs -n 4
  ...
  #4/18 bpf_hash_map:OK
  #4/19 bpf_percpu_hash_map:OK
  ...

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184120.590916-1-yhs@fb.com
3 years agotools/bpftool: Add bpftool support for bpf map element iterator
Yonghong Song [Thu, 23 Jul 2020 18:41:19 +0000 (11:41 -0700)]
tools/bpftool: Add bpftool support for bpf map element iterator

The optional parameter "map MAP" can be added to "bpftool iter"
command to create a bpf iterator for map elements. For example,
  bpftool iter pin ./prog.o /sys/fs/bpf/p1 map id 333

For map element bpf iterator "map MAP" parameter is required.
Otherwise, bpf link creation will return an error.

Quentin Monnet kindly provided bash-completion implementation
for new "map MAP" option.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184119.590799-1-yhs@fb.com
3 years agotools/libbpf: Add support for bpf map element iterator
Yonghong Song [Thu, 23 Jul 2020 18:41:17 +0000 (11:41 -0700)]
tools/libbpf: Add support for bpf map element iterator

Add map_fd to bpf_iter_attach_opts and flags to
bpf_link_create_opts. Later on, bpftool or selftest
will be able to create a bpf map element iterator
by passing map_fd to the kernel during link
creation time.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184117.590673-1-yhs@fb.com
3 years agobpf: Implement bpf iterator for sock local storage map
Yonghong Song [Thu, 23 Jul 2020 18:41:16 +0000 (11:41 -0700)]
bpf: Implement bpf iterator for sock local storage map

The bpf iterator for bpf sock local storage map
is implemented. User space interacts with sock
local storage map with fd as a key and storage value.
In kernel, passing fd to the bpf program does not
really make sense. In this case, the sock itself is
passed to bpf program.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184116.590602-1-yhs@fb.com
3 years agobpf: Implement bpf iterator for array maps
Yonghong Song [Thu, 23 Jul 2020 18:41:15 +0000 (11:41 -0700)]
bpf: Implement bpf iterator for array maps

The bpf iterators for array and percpu array
are implemented. Similar to hash maps, for percpu
array map, bpf program will receive values
from all cpus.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184115.590532-1-yhs@fb.com
3 years agobpf: Implement bpf iterator for hash maps
Yonghong Song [Thu, 23 Jul 2020 18:41:14 +0000 (11:41 -0700)]
bpf: Implement bpf iterator for hash maps

The bpf iterators for hash, percpu hash, lru hash
and lru percpu hash are implemented. During link time,
bpf_iter_reg->check_target() will check map type
and ensure the program access key/value region is
within the map defined key/value size limit.

For percpu hash and lru hash maps, the bpf program
will receive values for all cpus. The map element
bpf iterator infrastructure will prepare value
properly before passing the value pointer to the
bpf program.

This patch set supports readonly map keys and
read/write map values. It does not support deleting
map elements, e.g., from hash tables. If there is
a user case for this, the following mechanism can
be used to support map deletion for hashtab, etc.
  - permit a new bpf program return value, e.g., 2,
    to let bpf iterator know the map element should
    be removed.
  - since bucket lock is taken, the map element will be
    queued.
  - once bucket lock is released after all elements under
    this bucket are traversed, all to-be-deleted map
    elements can be deleted.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184114.590470-1-yhs@fb.com
3 years agobpf: Add bpf_prog iterator
Alexei Starovoitov [Thu, 2 Jul 2020 01:10:18 +0000 (18:10 -0700)]
bpf: Add bpf_prog iterator

It's mostly a copy paste of commit 6086d29def80 ("bpf: Add bpf_map iterator")
that is use to implement bpf_seq_file opreations to traverse all bpf programs.

v1->v2: Tweak to use build time btf_id

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
3 years agobpf: Implement bpf iterator for map elements
Yonghong Song [Thu, 23 Jul 2020 18:41:12 +0000 (11:41 -0700)]
bpf: Implement bpf iterator for map elements

The bpf iterator for map elements are implemented.
The bpf program will receive four parameters:
  bpf_iter_meta *meta: the meta data
  bpf_map *map:        the bpf_map whose elements are traversed
  void *key:           the key of one element
  void *value:         the value of the same element

Here, meta and map pointers are always valid, and
key has register type PTR_TO_RDONLY_BUF_OR_NULL and
value has register type PTR_TO_RDWR_BUF_OR_NULL.
The kernel will track the access range of key and value
during verification time. Later, these values will be compared
against the values in the actual map to ensure all accesses
are within range.

A new field iter_seq_info is added to bpf_map_ops which
is used to add map type specific information, i.e., seq_ops,
init/fini seq_file func and seq_file private data size.
Subsequent patches will have actual implementation
for bpf_map_ops->iter_seq_info.

In user space, BPF_ITER_LINK_MAP_FD needs to be
specified in prog attr->link_create.flags, which indicates
that attr->link_create.target_fd is a map_fd.
The reason for such an explicit flag is for possible
future cases where one bpf iterator may allow more than
one possible customization, e.g., pid and cgroup id for
task_file.

Current kernel internal implementation only allows
the target to register at most one required bpf_iter_link_info.
To support the above case, optional bpf_iter_link_info's
are needed, the target can be extended to register such link
infos, and user provided link_info needs to match one of
target supported ones.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184112.590360-1-yhs@fb.com
3 years agobpf: Fix pos computation for bpf_iter seq_ops->start()
Yonghong Song [Wed, 22 Jul 2020 19:51:56 +0000 (12:51 -0700)]
bpf: Fix pos computation for bpf_iter seq_ops->start()

Currently, the pos pointer in bpf iterator map/task/task_file
seq_ops->start() is always incremented.
This is incorrect. It should be increased only if
*pos is 0 (for SEQ_START_TOKEN) since these start()
function actually returns the first real object.
If *pos is not 0, it merely found the object
based on the state in seq->private, and not really
advancing the *pos. This patch fixed this issue
by only incrementing *pos if it is 0.

Note that the old *pos calculation, although not
correct, does not affect correctness of bpf_iter
as bpf_iter seq_file->read() does not support llseek.

This patch also renamed "mid" in bpf_map iterator
seq_file private data to "map_id" for better clarity.

Fixes: 6086d29def80 ("bpf: Add bpf_map iterator")
Fixes: eaaacd23910f ("bpf: Add task and task/file iterator targets")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722195156.4029817-1-yhs@fb.com
3 years agobpf: Support readonly/readwrite buffers in verifier
Yonghong Song [Thu, 23 Jul 2020 18:41:11 +0000 (11:41 -0700)]
bpf: Support readonly/readwrite buffers in verifier

Readonly and readwrite buffer register states
are introduced. Totally four states,
PTR_TO_RDONLY_BUF[_OR_NULL] and PTR_TO_RDWR_BUF[_OR_NULL]
are supported. As suggested by their respective
names, PTR_TO_RDONLY_BUF[_OR_NULL] are for
readonly buffers and PTR_TO_RDWR_BUF[_OR_NULL]
for read/write buffers.

These new register states will be used
by later bpf map element iterator.

New register states share some similarity to
PTR_TO_TP_BUFFER as it will calculate accessed buffer
size during verification time. The accessed buffer
size will be later compared to other metrics during
later attach/link_create time.

Similar to reg_state PTR_TO_BTF_ID_OR_NULL in bpf
iterator programs, PTR_TO_RDONLY_BUF_OR_NULL or
PTR_TO_RDWR_BUF_OR_NULL reg_types can be set at
prog->aux->bpf_ctx_arg_aux, and bpf verifier will
retrieve the values during btf_ctx_access().
Later bpf map element iterator implementation
will show how such information will be assigned
during target registeration time.

The verifier is also enhanced such that PTR_TO_RDONLY_BUF
can be passed to ARG_PTR_TO_MEM[_OR_NULL] helper argument, and
PTR_TO_RDWR_BUF can be passed to ARG_PTR_TO_MEM[_OR_NULL] or
ARG_PTR_TO_UNINIT_MEM.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184111.590274-1-yhs@fb.com
3 years agoselftests/bpf: Test BPF socket lookup and reuseport with connections
Jakub Sitnicki [Wed, 22 Jul 2020 16:17:20 +0000 (18:17 +0200)]
selftests/bpf: Test BPF socket lookup and reuseport with connections

Cover the case when BPF socket lookup returns a socket that belongs to a
reuseport group, and the reuseport group contains connected UDP sockets.

Ensure that the presence of connected UDP sockets in reuseport group does
not affect the socket lookup result. Socket selected by reuseport should
always be used as result in such case.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Link: https://lore.kernel.org/bpf/20200722161720.940831-3-jakub@cloudflare.com
3 years agobpf: Refactor to provide aux info to bpf_iter_init_seq_priv_t
Yonghong Song [Thu, 23 Jul 2020 18:41:10 +0000 (11:41 -0700)]
bpf: Refactor to provide aux info to bpf_iter_init_seq_priv_t

This patch refactored target bpf_iter_init_seq_priv_t callback
function to accept additional information. This will be needed
in later patches for map element targets since a particular
map should be passed to traverse elements for that particular
map. In the future, other information may be passed to target
as well, e.g., pid, cgroup id, etc. to customize the iterator.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184110.590156-1-yhs@fb.com
3 years agobpf: Refactor bpf_iter_reg to have separate seq_info member
Yonghong Song [Thu, 23 Jul 2020 18:41:09 +0000 (11:41 -0700)]
bpf: Refactor bpf_iter_reg to have separate seq_info member

There is no functionality change for this patch.
Struct bpf_iter_reg is used to register a bpf_iter target,
which includes information for both prog_load, link_create
and seq_file creation.

This patch puts fields related seq_file creation into
a different structure. This will be useful for map
elements iterator where one iterator covers different
map types and different map types may have different
seq_ops, init/fini private_data function and
private_data size.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184109.590030-1-yhs@fb.com
3 years agoudp: Don't discard reuseport selection when group has connections
Jakub Sitnicki [Wed, 22 Jul 2020 16:17:19 +0000 (18:17 +0200)]
udp: Don't discard reuseport selection when group has connections

When BPF socket lookup prog selects a socket that belongs to a reuseport
group, and the reuseport group has connected sockets in it, the socket
selected by reuseport will be discarded, and socket returned by BPF socket
lookup will be used instead.

Modify this behavior so that the socket selected by reuseport running after
BPF socket lookup always gets used. Ignore the fact that the reuseport
group might have connections because it is only relevant when scoring
sockets during regular hashtable-based lookup.

Fixes: 72f7e9440e9b ("udp: Run SK_LOOKUP BPF program on socket lookup")
Fixes: 6d4201b1386b ("udp6: Run SK_LOOKUP BPF program on socket lookup")
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Link: https://lore.kernel.org/bpf/20200722161720.940831-2-jakub@cloudflare.com
3 years agotools/bpftool: Strip BPF .o files before skeleton generation
Andrii Nakryiko [Wed, 22 Jul 2020 04:38:04 +0000 (21:38 -0700)]
tools/bpftool: Strip BPF .o files before skeleton generation

Strip away DWARF info from .bpf.o files, before generating BPF skeletons.
This reduces bpftool binary size from 3.43MB to 2.58MB.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200722043804.2373298-1-andriin@fb.com
3 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
David S. Miller [Sun, 26 Jul 2020 00:49:04 +0000 (17:49 -0700)]
Merge git://git./linux/kernel/git/netdev/net

The UDP reuseport conflict was a little bit tricky.

The net-next code, via bpf-next, extracted the reuseport handling
into a helper so that the BPF sk lookup code could invoke it.

At the same time, the logic for reuseport handling of unconnected
sockets changed via commit efc6b6f6c3113e8b203b9debfb72d81e0f3dcace
which changed the logic to carry on the reuseport result into the
rest of the lookup loop if we do not return immediately.

This requires moving the reuseport_has_conns() logic into the callers.

While we are here, get rid of inline directives as they do not belong
in foo.c files.

The other changes were cases of more straightforward overlapping
modifications.

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge tag 'riscv-for-linus-5.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 25 Jul 2020 21:42:11 +0000 (14:42 -0700)]
Merge tag 'riscv-for-linus-5.8-rc7' of git://git./linux/kernel/git/riscv/linux into master

Pull RISC-V fixes from Palmer Dabbelt:
 "A few more fixes this week:

   - A fix to avoid using SBI calls during kasan initialization, as the
     SBI calls themselves have not been probed yet.

   - Three fixes related to systems with multiple memory regions"

* tag 'riscv-for-linus-5.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Parse all memory blocks to remove unusable memory
  RISC-V: Do not rely on initrd_start/end computed during early dt parsing
  RISC-V: Set maximum number of mapped pages correctly
  riscv: kasan: use local_tlb_flush_all() to avoid uninitialized __sbi_rfence

3 years agoMerge tag 'x86-urgent-2020-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 25 Jul 2020 21:25:47 +0000 (14:25 -0700)]
Merge tag 'x86-urgent-2020-07-25' of git://git./linux/kernel/git/tip/tip into master

Pull x86 fixes from Ingo Molnar:
 "Misc fixes:

   - Fix a section end page alignment assumption that was causing
     crashes

   - Fix ORC unwinding on freshly forked tasks which haven't executed
     yet and which have empty user task stacks

   - Fix the debug.exception-trace=1 sysctl dumping of user stacks,
     which was broken by recent maccess changes"

* tag 'x86-urgent-2020-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/dumpstack: Dump user space code correctly again
  x86/stacktrace: Fix reliable check for empty user task stacks
  x86/unwind/orc: Fix ORC for newly forked tasks
  x86, vmlinux.lds: Page-align end of ..page_aligned sections

3 years agoMerge tag 'perf-urgent-2020-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 25 Jul 2020 20:55:38 +0000 (13:55 -0700)]
Merge tag 'perf-urgent-2020-07-25' of git://git./linux/kernel/git/tip/tip into master

Pull uprobe fix from Ingo Molnar:
 "Fix an interaction/regression between uprobes based shared library
  tracing & GDB"

* tag 'perf-urgent-2020-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  uprobes: Change handle_swbp() to send SIGTRAP with si_code=SI_KERNEL, to fix GDB regression

3 years agoMerge tag 'timers-urgent-2020-07-25' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 25 Jul 2020 20:27:12 +0000 (13:27 -0700)]
Merge tag 'timers-urgent-2020-07-25' of git://git./linux/kernel/git/tip/tip into master

Pull timer fix from Ingo Molnar:
 "Fix a suspend/resume regression (crash) on TI AM3/AM4 SoC's"

* tag 'timers-urgent-2020-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource/drivers/timer-ti-dm: Fix suspend and resume for am3 and am4

3 years agoMerge tag 'sched-urgent-2020-07-25' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 25 Jul 2020 20:24:40 +0000 (13:24 -0700)]
Merge tag 'sched-urgent-2020-07-25' of git://git./linux/kernel/git/tip/tip into master

Pull scheduler fixes from Ingo Molnar:
 "Fix a race introduced by the recent loadavg race fix, plus add a debug
  check for a hard to debug case of bogus wakeup function flags"

* tag 'sched-urgent-2020-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Warn if garbage is passed to default_wake_function()
  sched: Fix race against ptrace_freeze_trace()

3 years agoMerge tag 'efi-urgent-2020-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 25 Jul 2020 20:18:42 +0000 (13:18 -0700)]
Merge tag 'efi-urgent-2020-07-25' of git://git./linux/kernel/git/tip/tip into master

Pull EFI fixes from Ingo Molnar:
 "Various EFI fixes:

   - Fix the layering violation in the use of the EFI runtime services
     availability mask in users of the 'efivars' abstraction

   - Revert build fix for GCC v4.8 which is no longer supported

   - Clean up some x86 EFI stub details, some of which are borderline
     bugs that copy around garbage into padding fields - let's fix these
     out of caution.

   - Fix build issues while working on RISC-V support

   - Avoid --whole-archive when linking the stub on arm64"

* tag 'efi-urgent-2020-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi: Revert "efi/x86: Fix build with gcc 4"
  efi/efivars: Expose RT service availability via efivars abstraction
  efi/libstub: Move the function prototypes to header file
  efi/libstub: Fix gcc error around __umoddi3 for 32 bit builds
  efi/libstub/arm64: link stub lib.a conditionally
  efi/x86: Only copy upto the end of setup_header
  efi/x86: Remove unused variables

3 years agoMerge tag '5.8-rc6-cifs-fix' of git://git.samba.org/sfrench/cifs-2.6 into master
Linus Torvalds [Sat, 25 Jul 2020 19:53:46 +0000 (12:53 -0700)]
Merge tag '5.8-rc6-cifs-fix' of git://git.samba.org/sfrench/cifs-2.6 into master

Pull cifs fix from Steve French:
 "A fix for a recently discovered regression in rename to older servers
  caused by a recent patch"

* tag '5.8-rc6-cifs-fix' of git://git.samba.org/sfrench/cifs-2.6:
  Revert "cifs: Fix the target file was deleted when rename failed."

3 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net into master
Linus Torvalds [Sat, 25 Jul 2020 18:50:59 +0000 (11:50 -0700)]
Merge git://git./linux/kernel/git/netdev/net into master

Pull networking fixes from David Miller:

 1) Fix RCU locaking in iwlwifi, from Johannes Berg.

 2) mt76 can access uninitialized NAPI struct, from Felix Fietkau.

 3) Fix race in updating pause settings in bnxt_en, from Vasundhara
    Volam.

 4) Propagate error return properly during unbind failures in ax88172a,
    from George Kennedy.

 5) Fix memleak in adf7242_probe, from Liu Jian.

 6) smc_drv_probe() can leak, from Wang Hai.

 7) Don't muck with the carrier state if register_netdevice() fails in
    the bonding driver, from Taehee Yoo.

 8) Fix memleak in dpaa_eth_probe, from Liu Jian.

 9) Need to check skb_put_padto() return value in hsr_fill_tag(), from
    Murali Karicheri.

10) Don't lose ionic RSS hash settings across FW update, from Shannon
    Nelson.

11) Fix clobbered SKB control block in act_ct, from Wen Xu.

12) Missing newlink in "tx_timeout" sysfs output, from Xiongfeng Wang.

13) IS_UDPLITE cleanup a long time ago, incorrectly handled
    transformations involving UDPLITE_RECV_CC. From Miaohe Lin.

14) Unbalanced locking in netdevsim, from Taehee Yoo.

15) Suppress false-positive error messages in qed driver, from Alexander
    Lobakin.

16) Out of bounds read in ax25_connect and ax25_sendmsg, from Peilin Ye.

17) Missing SKB release in cxgb4's uld_send(), from Navid Emamdoost.

18) Uninitialized value in geneve_changelink(), from Cong Wang.

19) Fix deadlock in xen-netfront, from Andera Righi.

19) flush_backlog() frees skbs with IRQs disabled, so should use
    dev_kfree_skb_irq() instead of kfree_skb(). From Subash Abhinov
    Kasiviswanathan.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (111 commits)
  drivers/net/wan: lapb: Corrected the usage of skb_cow
  dev: Defer free of skbs in flush_backlog
  qrtr: orphan socket in qrtr_release()
  xen-netfront: fix potential deadlock in xennet_remove()
  flow_offload: Move rhashtable inclusion to the source file
  geneve: fix an uninitialized value in geneve_changelink()
  bonding: check return value of register_netdevice() in bond_newlink()
  tcp: allow at most one TLP probe per flight
  AX.25: Prevent integer overflows in connect and sendmsg
  cxgb4: add missing release on skb in uld_send()
  net: atlantic: fix PTP on AQC10X
  AX.25: Prevent out-of-bounds read in ax25_sendmsg()
  sctp: shrink stream outq when fails to do addstream reconf
  sctp: shrink stream outq only when new outcnt < old outcnt
  AX.25: Fix out-of-bounds read in ax25_connect()
  enetc: Remove the mdio bus on PF probe bailout
  net: ethernet: ti: add NETIF_F_HW_TC hw feature flag for taprio offload
  net: ethernet: ave: Fix error returns in ave_init
  drivers/net/wan/x25_asy: Fix to make it work
  ipvs: fix the connection sync failed in some cases
  ...

3 years agoriscv: Parse all memory blocks to remove unusable memory
Atish Patra [Wed, 15 Jul 2020 23:30:09 +0000 (16:30 -0700)]
riscv: Parse all memory blocks to remove unusable memory

Currently, maximum physical memory allowed is equal to -PAGE_OFFSET.
That's why we remove any memory blocks spanning beyond that size. However,
it is done only for memblock containing linux kernel which will not work
if there are multiple memblocks.

Process all memory blocks to figure out how much memory needs to be removed
and remove at the end instead of updating the memblock list in place.

Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoRISC-V: Do not rely on initrd_start/end computed during early dt parsing
Atish Patra [Wed, 15 Jul 2020 23:30:08 +0000 (16:30 -0700)]
RISC-V: Do not rely on initrd_start/end computed during early dt parsing

Currently, initrd_start/end are computed during early_init_dt_scan
but used during arch_setup. We will get the following panic if initrd is used
and CONFIG_DEBUG_VIRTUAL is turned on.

[    0.000000] ------------[ cut here ]------------
[    0.000000] kernel BUG at arch/riscv/mm/physaddr.c:33!
[    0.000000] Kernel BUG [#1]
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.8.0-rc4-00015-ged0b226fed02 #886
[    0.000000] epc: ffffffe0002058d2 ra : ffffffe0000053f0 sp : ffffffe001001f40
[    0.000000]  gp : ffffffe00106e250 tp : ffffffe001009d40 t0 : ffffffe00107ee28
[    0.000000]  t1 : 0000000000000000 t2 : ffffffe000a2e880 s0 : ffffffe001001f50
[    0.000000]  s1 : ffffffe0001383e8 a0 : ffffffe00c087e00 a1 : 0000000080200000
[    0.000000]  a2 : 00000000010bf000 a3 : ffffffe00106f3c8 a4 : ffffffe0010bf000
[    0.000000]  a5 : ffffffe000000000 a6 : 0000000000000006 a7 : 0000000000000001
[    0.000000]  s2 : ffffffe00106f068 s3 : ffffffe00106f070 s4 : 0000000080200000
[    0.000000]  s5 : 0000000082200000 s6 : 0000000000000000 s7 : 0000000000000000
[    0.000000]  s8 : 0000000080011010 s9 : 0000000080012700 s10: 0000000000000000
[    0.000000]  s11: 0000000000000000 t3 : 000000000001fe30 t4 : 000000000001fe30
[    0.000000]  t5 : 0000000000000000 t6 : ffffffe00107c471
[    0.000000] status: 0000000000000100 badaddr: 0000000000000000 cause: 0000000000000003
[    0.000000] random: get_random_bytes called from print_oops_end_marker+0x22/0x46 with crng_init=0

To avoid the error, initrd_start/end can be computed from phys_initrd_start/size
in setup itself. It also improves the initrd placement by aligning the start
and size with the page size.

Fixes: 76d2a0493a17 ("RISC-V: Init and Halt Code")
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agodrivers/net/wan: lapb: Corrected the usage of skb_cow
Xie He [Fri, 24 Jul 2020 16:33:47 +0000 (09:33 -0700)]
drivers/net/wan: lapb: Corrected the usage of skb_cow

This patch fixed 2 issues with the usage of skb_cow in LAPB drivers
"lapbether" and "hdlc_x25":

1) After skb_cow fails, kfree_skb should be called to drop a reference
to the skb. But in both drivers, kfree_skb is not called.

2) skb_cow should be called before skb_push so that is can ensure the
safety of skb_push. But in "lapbether", it is incorrectly called after
skb_push.

More details about these 2 issues:

1) The behavior of calling kfree_skb on failure is also the behavior of
netif_rx, which is called by this function with "return netif_rx(skb);".
So this function should follow this behavior, too.

2) In "lapbether", skb_cow is called after skb_push. This results in 2
logical issues:
   a) skb_push is not protected by skb_cow;
   b) An extra headroom of 1 byte is ensured after skb_push. This extra
      headroom has no use in this function. It also has no use in the
      upper-layer function that this function passes the skb to
      (x25_lapb_receive_frame in net/x25/x25_dev.c).
So logically skb_cow should instead be called before skb_push.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'net-dsa-mv88e6xxx-port-mtu-support'
David S. Miller [Sat, 25 Jul 2020 03:03:28 +0000 (20:03 -0700)]
Merge branch 'net-dsa-mv88e6xxx-port-mtu-support'

Chris Packham says:

====================
net: dsa: mv88e6xxx: port mtu support

This series connects up the mv88e6xxx switches to the dsa infrastructure for
configuring the port MTU. The first patch is also a bug fix which might be a
candiatate for stable.

I've rebased this series on top of net-next/master to pick up Andrew's change
for the gigabit switches. Patch 1 and 2 are unchanged (aside from adding
Andrew's Reviewed-by). Patch 3 is reworked to make use of the existing mtu
support.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: mv88e6xxx: Use chip-wide max frame size for MTU
Chris Packham [Thu, 23 Jul 2020 23:21:22 +0000 (11:21 +1200)]
net: dsa: mv88e6xxx: Use chip-wide max frame size for MTU

Some of the chips in the mv88e6xxx family don't support jumbo
configuration per port. But they do have a chip-wide max frame size that
can be used. Use this to approximate the behaviour of configuring a port
based MTU.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: mv88e6xxx: Support jumbo configuration on 6190/6190X
Chris Packham [Thu, 23 Jul 2020 23:21:21 +0000 (11:21 +1200)]
net: dsa: mv88e6xxx: Support jumbo configuration on 6190/6190X

The MV88E6190 and MV88E6190X both support per port jumbo configuration
just like the other GE switches. Install the appropriate ops.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: mv88e6xxx: MV88E6097 does not support jumbo configuration
Chris Packham [Thu, 23 Jul 2020 23:21:20 +0000 (11:21 +1200)]
net: dsa: mv88e6xxx: MV88E6097 does not support jumbo configuration

The MV88E6097 chip does not support configuring jumbo frames. Prior to
commit 5f4366660d65 only the 6352, 6351, 6165 and 6320 chips configured
jumbo mode. The refactor accidentally added the function for the 6097.
Remove the erroneous function pointer assignment.

Fixes: 5f4366660d65 ("net: dsa: mv88e6xxx: Refactor setting of jumbo frames")
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodev: Defer free of skbs in flush_backlog
Subash Abhinov Kasiviswanathan [Thu, 23 Jul 2020 17:31:48 +0000 (11:31 -0600)]
dev: Defer free of skbs in flush_backlog

IRQs are disabled when freeing skbs in input queue.
Use the IRQ safe variant to free skbs here.

Fixes: 145dd5f9c88f ("net: flush the softnet backlog in process context")
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoRISC-V: Set maximum number of mapped pages correctly
Atish Patra [Wed, 15 Jul 2020 23:30:07 +0000 (16:30 -0700)]
RISC-V: Set maximum number of mapped pages correctly

Currently, maximum number of mapper pages are set to the pfn calculated
from the memblock size of the memblock containing kernel. This will work
until that memblock spans the entire memory. However, it will be set to
a wrong value if there are multiple memblocks defined in kernel
(e.g. with efi runtime services).

Set the the maximum value to the pfn calculated from dram size.

Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoMerge tag 'pci-v5.8-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas...
Linus Torvalds [Sat, 25 Jul 2020 01:30:24 +0000 (18:30 -0700)]
Merge tag 'pci-v5.8-fixes-2' of git://git./linux/kernel/git/helgaas/pci into master

Pull PCI fixes from Bjorn Helgaas:

 - Reject invalid IRQ 0 command line argument for virtio_mmio because
   IRQ 0 now generates warnings (Bjorn Helgaas)

 - Revert "PCI/PM: Assume ports without DLL Link Active train links in
   100 ms", which broke nouveau (Bjorn Helgaas)

* tag 'pci-v5.8-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  Revert "PCI/PM: Assume ports without DLL Link Active train links in 100 ms"
  virtio-mmio: Reject invalid IRQ 0 command line argument

3 years agoqrtr: orphan socket in qrtr_release()
Cong Wang [Fri, 24 Jul 2020 16:45:51 +0000 (09:45 -0700)]
qrtr: orphan socket in qrtr_release()

We have to detach sock from socket in qrtr_release(),
otherwise skb->sk may still reference to this socket
when the skb is released in tun->queue, particularly
sk->sk_wq still points to &sock->wq, which leads to
a UAF.

Reported-and-tested-by: syzbot+6720d64f31c081c2f708@syzkaller.appspotmail.com
Fixes: 28fb4e59a47d ("net: qrtr: Expose tunneling endpoint to user space")
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: hix5hd2_gmac: Remove unneeded cast from memory allocation
Wang Hai [Fri, 24 Jul 2020 13:46:30 +0000 (21:46 +0800)]
net: hix5hd2_gmac: Remove unneeded cast from memory allocation

Remove casting the values returned by memory allocation function.

Coccinelle emits WARNING:

./drivers/net/ethernet/hisilicon/hix5hd2_gmac.c:1027:9-23: WARNING:
 casting value returned by memory allocation function to (struct sg_desc *) is useless.

This issue was detected by using the Coccinelle software.

Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge tag 'wireless-drivers-2020-07-24' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Sat, 25 Jul 2020 00:26:09 +0000 (17:26 -0700)]
Merge tag 'wireless-drivers-2020-07-24' of git://git./linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for v5.8

Second set of fixes for v5.8, and hopefully also the last. Three
important regressions fixed.

ath9k

* fix a regression which broke support for all ath9k usb devices

ath10k

* fix a regression which broke support for all QCA4019 AHB devices

iwlwifi

* fix a regression which broke support for some Killer Wireless-AC 1550 cards
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'l2tp-avoid-multiple-assignment-remove-BUG_ON'
David S. Miller [Sat, 25 Jul 2020 00:19:14 +0000 (17:19 -0700)]
Merge branch 'l2tp-avoid-multiple-assignment-remove-BUG_ON'

Tom Parkin says:

====================
l2tp: avoid multiple assignment, remove BUG_ON

l2tp hasn't been kept up to date with the static analysis checks offered
by checkpatch.pl.

This patchset builds on the series: "l2tp: cleanup checkpatch.pl
warnings" and "l2tp: further checkpatch.pl cleanups" to resolve some of
the remaining checkpatch warnings in l2tp.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agol2tp: WARN_ON rather than BUG_ON in l2tp_session_free
Tom Parkin [Fri, 24 Jul 2020 15:31:57 +0000 (16:31 +0100)]
l2tp: WARN_ON rather than BUG_ON in l2tp_session_free

l2tp_session_free called BUG_ON if the tunnel magic feather value wasn't
correct.  The intent of this was to catch lifetime bugs; for example
early tunnel free due to incorrect use of reference counts.

Since the tunnel magic feather being wrong indicates either early free
or structure corruption, we can avoid doing more damage by simply
leaving the tunnel structure alone.  If the tunnel refcount isn't
dropped when it should be, the tunnel instance will remain in the
kernel, resulting in the tunnel structure and socket leaking.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agol2tp: remove BUG_ON refcount value in l2tp_session_free
Tom Parkin [Fri, 24 Jul 2020 15:31:56 +0000 (16:31 +0100)]
l2tp: remove BUG_ON refcount value in l2tp_session_free

l2tp_session_free is only called by l2tp_session_dec_refcount when the
reference count reaches zero, so it's of limited value to validate the
reference count value in l2tp_session_free itself.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agol2tp: WARN_ON rather than BUG_ON in l2tp_session_queue_purge
Tom Parkin [Fri, 24 Jul 2020 15:31:55 +0000 (16:31 +0100)]
l2tp: WARN_ON rather than BUG_ON in l2tp_session_queue_purge

l2tp_session_queue_purge is used during session shutdown to drop any
skbs queued for reordering purposes according to L2TP dataplane rules.

The BUG_ON in this function checks the session magic feather in an
attempt to catch lifetime bugs.

Rather than crashing the kernel with a BUG_ON, we can simply WARN_ON and
refuse to do anything more -- in the worst case this could result in a
leak.  However this is highly unlikely given that the session purge only
occurs from codepaths which have obtained the session by means of a lookup
via. the parent tunnel and which check the session "dead" flag to
protect against shutdown races.

While we're here, have l2tp_session_queue_purge return void rather than
an integer, since neither of the callsites checked the return value.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agol2tp: don't BUG_ON seqfile checks in l2tp_ppp
Tom Parkin [Fri, 24 Jul 2020 15:31:54 +0000 (16:31 +0100)]
l2tp: don't BUG_ON seqfile checks in l2tp_ppp

checkpatch advises that WARN_ON and recovery code are preferred over
BUG_ON which crashes the kernel.

l2tp_ppp has a BUG_ON check of struct seq_file's private pointer in
pppol2tp_seq_start prior to accessing data through that pointer.

Rather than crashing, we can simply bail out early and return NULL in
order to terminate the seq file processing in much the same way as we do
when reaching the end of tunnel/session instances to render.

Retain a WARN_ON to help trace possible bugs in this area.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agol2tp: don't BUG_ON session magic checks in l2tp_ppp
Tom Parkin [Fri, 24 Jul 2020 15:31:53 +0000 (16:31 +0100)]
l2tp: don't BUG_ON session magic checks in l2tp_ppp

checkpatch advises that WARN_ON and recovery code are preferred over
BUG_ON which crashes the kernel.

l2tp_ppp.c's BUG_ON checks of the l2tp session structure's "magic" field
occur in code paths where it's reasonably easy to recover:

 * In the case of pppol2tp_sock_to_session, we can return NULL and the
   caller will bail out appropriately.  There is no change required to
   any of the callsites of this function since they already handle
   pppol2tp_sock_to_session returning NULL.

 * In the case of pppol2tp_session_destruct we can just avoid
   decrementing the reference count on the suspect session structure.
   In the worst case scenario this results in a memory leak, which is
   preferable to a crash.

Convert these uses of BUG_ON to WARN_ON accordingly.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agol2tp: remove BUG_ON in l2tp_tunnel_closeall
Tom Parkin [Fri, 24 Jul 2020 15:31:52 +0000 (16:31 +0100)]
l2tp: remove BUG_ON in l2tp_tunnel_closeall

l2tp_tunnel_closeall is only called from l2tp_core.c, and it's easy
to statically analyse the code path calling it to validate that it
should never be passed a NULL tunnel pointer.

Having a BUG_ON checking the tunnel pointer triggers a checkpatch
warning.  Since the BUG_ON is of no value, remove it to avoid the
warning.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agol2tp: remove BUG_ON in l2tp_session_queue_purge
Tom Parkin [Fri, 24 Jul 2020 15:31:51 +0000 (16:31 +0100)]
l2tp: remove BUG_ON in l2tp_session_queue_purge

l2tp_session_queue_purge is only called from l2tp_core.c, and it's easy
to statically analyse the code paths calling it to validate that it
should never be passed a NULL session pointer.

Having a BUG_ON checking the session pointer triggers a checkpatch
warning.  Since the BUG_ON is of no value, remove it to avoid the
warning.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agol2tp: WARN_ON rather than BUG_ON in l2tp_dfs_seq_start
Tom Parkin [Fri, 24 Jul 2020 15:31:50 +0000 (16:31 +0100)]
l2tp: WARN_ON rather than BUG_ON in l2tp_dfs_seq_start

l2tp_dfs_seq_start had a BUG_ON to catch a possible programming error in
l2tp_dfs_seq_open.

Since we can easily bail out of l2tp_dfs_seq_start, prefer to do that
and flag the error with a WARN_ON rather than crashing the kernel.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agol2tp: avoid multiple assignments
Tom Parkin [Fri, 24 Jul 2020 15:31:49 +0000 (16:31 +0100)]
l2tp: avoid multiple assignments

checkpatch warns about multiple assignments.

Update l2tp accordingly.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'icmp6-support-rfc-4884'
David S. Miller [Sat, 25 Jul 2020 00:12:41 +0000 (17:12 -0700)]
Merge branch 'icmp6-support-rfc-4884'

Willem de Bruijn says:

====================
icmp6: support rfc 4884

Extend the feature merged earlier this week for IPv4 to IPv6.

I expected this to be a single patch, but patch 1 seemed better to be
stand-alone

patch 1: small fix in length calculation
patch 2: factor out ipv4-specific
patch 3: add ipv6

changes v1->v2: add missing static keyword in patch 3
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoicmp6: support rfc 4884
Willem de Bruijn [Fri, 24 Jul 2020 13:03:10 +0000 (09:03 -0400)]
icmp6: support rfc 4884

Extend the rfc 4884 read interface introduced for ipv4 in
commit eba75c587e81 ("icmp: support rfc 4884") to ipv6.

Add socket option SOL_IPV6/IPV6_RECVERR_RFC4884.

Changes v1->v2:
  - make ipv6_icmp_error_rfc4884 static (file scope)

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoicmp: prepare rfc 4884 for ipv6
Willem de Bruijn [Fri, 24 Jul 2020 13:03:09 +0000 (09:03 -0400)]
icmp: prepare rfc 4884 for ipv6

The RFC 4884 spec is largely the same between IPv4 and IPv6.
Factor out the IPv4 specific parts in preparation for IPv6 support:

- icmp types supported

- icmp header size, and thus offset to original datagram start

- datagram length field offset in icmp(6)hdr.

- datagram length field word size: 4B for IPv4, 8B for IPv6.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoicmp: revise rfc4884 tests
Willem de Bruijn [Fri, 24 Jul 2020 13:03:08 +0000 (09:03 -0400)]
icmp: revise rfc4884 tests

1) Only accept packets with original datagram len field >= header len.

The extension header must start after the original datagram headers.
The embedded datagram len field is compared against the 128B minimum
stipulated by RFC 4884. It is unlikely that headers extend beyond
this. But as we know the exact header length, check explicitly.

2) Remove the check that datagram length must be <= 576B.

This is a send constraint. There is no value in testing this on rx.
Within private networks it may be known safe to send larger packets.
Process these packets.

This test was also too lax. It compared original datagram length
rather than entire icmp packet length. The stand-alone fix would be:

  -       if (hlen + skb->len > 576)
  +       if (-skb_network_offset(skb) + skb->len > 576)

Fixes: eba75c587e81 ("icmp: support rfc 4884")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agosctp: remove redundant initialization of variable status
Colin Ian King [Fri, 24 Jul 2020 13:09:19 +0000 (14:09 +0100)]
sctp: remove redundant initialization of variable status

The variable status is being initialized with a value that is never read
and it is being updated later with a new value.  The initialization is
redundant and can be removed.  Also put the variable declarations into
reverse christmas tree order.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoxen-netfront: fix potential deadlock in xennet_remove()
Andrea Righi [Fri, 24 Jul 2020 08:59:10 +0000 (10:59 +0200)]
xen-netfront: fix potential deadlock in xennet_remove()

There's a potential race in xennet_remove(); this is what the driver is
doing upon unregistering a network device:

  1. state = read bus state
  2. if state is not "Closed":
  3.    request to set state to "Closing"
  4.    wait for state to be set to "Closing"
  5.    request to set state to "Closed"
  6.    wait for state to be set to "Closed"

If the state changes to "Closed" immediately after step 1 we are stuck
forever in step 4, because the state will never go back from "Closed" to
"Closing".

Make sure to check also for state == "Closed" in step 4 to prevent the
deadlock.

Also add a 5 sec timeout any time we wait for the bus state to change,
to avoid getting stuck forever in wait_event().

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: openvswitch: fixes potential deadlock in dp cleanup code
Eelco Chaudron [Fri, 24 Jul 2020 08:20:59 +0000 (10:20 +0200)]
net: openvswitch: fixes potential deadlock in dp cleanup code

The previous patch introduced a deadlock, this patch fixes it by making
sure the work is canceled without holding the global ovs lock. This is
done by moving the reorder processing one layer up to the netns level.

Fixes: eac87c413bf9 ("net: openvswitch: reorder masks array based on usage")
Reported-by: syzbot+2c4ff3614695f75ce26c@syzkaller.appspotmail.com
Reported-by: syzbot+bad6507e5db05017b008@syzkaller.appspotmail.com
Reviewed-by: Paolo <pabeni@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agosctp: fix slab-out-of-bounds in SCTP_DELAYED_SACK processing
Christoph Hellwig [Fri, 24 Jul 2020 06:48:55 +0000 (08:48 +0200)]
sctp: fix slab-out-of-bounds in SCTP_DELAYED_SACK processing

This sockopt accepts two kinds of parameters, using struct
sctp_sack_info and struct sctp_assoc_value. The mentioned commit didn't
notice an implicit cast from the smaller (latter) struct to the bigger
one (former) when copying the data from the user space, which now leads
to an attempt to write beyond the buffer (because it assumes the storing
buffer is bigger than the parameter itself).

Fix it by allocating a sctp_sack_info on stack and filling it out based
on the small struct for the compat case.

Changelog stole from an earlier patch from Marcelo Ricardo Leitner.

Fixes: ebb25defdc17 ("sctp: pass a kernel pointer to sctp_setsockopt_delayed_ack")
Reported-by: syzbot+0e4699d000d8b874d8dc@syzkaller.appspotmail.com
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Fri, 24 Jul 2020 23:39:28 +0000 (16:39 -0700)]
Merge branch '100GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2020-07-23

This series contains updates to ice driver only.

Jake refactors ice_discover_caps() to reduce the number of AdminQ calls
made. Splits ice_parse_caps() to separate functions to update function
and device capabilities separately to allow for updating outside of
initialization.

Akeem adds power management support.

Paul G refactors FC and FEC code to aid in restoring of PHY settings
on media insertion. Implements lenient mode and link override support.
Adds link debug info and formats existing debug info to be more
readable. Adds support to check and report additional autoneg
capabilities. Implements the capability to detect media cage in order to
differentiate AUI types as Direct Attach or backplane.

Bruce implements Total Port Shutdown for devices that support it.

Lev renames low_power_ctrl field to lower_power_ctrl_an to be more
descriptive of the field.

Doug reports AOC types as media type fiber.

Paul S adds code to handle 1G SGMII PHY type.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomISDN: Don't try to print a sockptr_t from debug logging code.
David S. Miller [Fri, 24 Jul 2020 23:36:13 +0000 (16:36 -0700)]
mISDN: Don't try to print a sockptr_t from debug logging code.

drivers/isdn/mISDN/socket.c: In function ‘data_sock_setsockopt’:
./include/linux/kern_levels.h:5:18: warning: format ‘%p’ expects argument of type ‘void *’, but argument 6 has type ‘sockptr_t’ [-Wformat=]
    5 | #define KERN_SOH "\001"  /* ASCII Start Of Header */
      |                  ^~~~~~
./include/linux/kern_levels.h:15:20: note: in expansion of macro ‘KERN_SOH’
   15 | #define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
      |                    ^~~~~~~~
drivers/isdn/mISDN/socket.c:410:10: note: in expansion of macro ‘KERN_DEBUG’
  410 |   printk(KERN_DEBUG "%s(%p, %d, %x, %p, %d)\n", __func__, sock,
      |          ^~~~~~~~~~
drivers/isdn/mISDN/socket.c:410:38: note: format string is defined here
  410 |   printk(KERN_DEBUG "%s(%p, %d, %x, %p, %d)\n", __func__, sock,
      |                                     ~^
      |                                      |
      |                                      void *

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge tag 'nfsd-5.8-2' of git://linux-nfs.org/~bfields/linux into master
Linus Torvalds [Fri, 24 Jul 2020 23:27:54 +0000 (16:27 -0700)]
Merge tag 'nfsd-5.8-2' of git://linux-nfs.org/~bfields/linux into master

Pull nfsd fix from Bruce Fields:
 "Just one fix for a NULL dereference if someone happens to read
  /proc/fs/nfsd/client/../state at the wrong moment"

* tag 'nfsd-5.8-2' of git://linux-nfs.org/~bfields/linux:
  nfsd4: fix NULL dereference in nfsd/clients display code

3 years agoMerge branch 'get-rid-of-the-address_space-override-in-setsockopt-v2'
David S. Miller [Fri, 24 Jul 2020 22:41:54 +0000 (15:41 -0700)]
Merge branch 'get-rid-of-the-address_space-override-in-setsockopt-v2'

Christoph Hellwig says:

====================
get rid of the address_space override in setsockopt v2

setsockopt is the last place in architecture-independ code that still
uses set_fs to force the uaccess routines to operate on kernel pointers.

This series adds a new sockptr_t type that can contained either a kernel
or user pointer, and which has accessors that do the right thing, and
then uses it for setsockopt, starting by refactoring some low-level
helpers and moving them over to it before finally doing the main
setsockopt method.

Note that apparently the eBPF selftests do not even cover this path, so
the series has been tested with a testing patch that always copies the
data first and passes a kernel pointer.  This is something that works for
most common sockopts (and is something that the ePBF support relies on),
but unfortunately in various corner cases we either don't use the passed
in length, or in one case actually copy data back from setsockopt, or in
case of bpfilter straight out do not work with kernel pointers at all.

Against net-next/master.

Changes since v1:
 - check that users don't pass in kernel addresses
 - more bpfilter cleanups
 - cosmetic mptcp tweak
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: optimize the sockptr_t for unified kernel/user address spaces
Christoph Hellwig [Thu, 23 Jul 2020 06:09:08 +0000 (08:09 +0200)]
net: optimize the sockptr_t for unified kernel/user address spaces

For architectures like x86 and arm64 we don't need the separate bit to
indicate that a pointer is a kernel pointer as the address spaces are
unified.  That way the sockptr_t can be reduced to a union of two
pointers, which leads to nicer calling conventions.

The only caveat is that we need to check that users don't pass in kernel
address and thus gain access to kernel memory.  Thus the USER_SOCKPTR
helper is replaced with a init_user_sockptr function that does this check
and returns an error if it fails.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: pass a sockptr_t into ->setsockopt
Christoph Hellwig [Thu, 23 Jul 2020 06:09:07 +0000 (08:09 +0200)]
net: pass a sockptr_t into ->setsockopt

Rework the remaining setsockopt code to pass a sockptr_t instead of a
plain user pointer.  This removes the last remaining set_fs(KERNEL_DS)
outside of architecture specific code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> [ieee802154]
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/tcp: switch do_tcp_setsockopt to sockptr_t
Christoph Hellwig [Thu, 23 Jul 2020 06:09:06 +0000 (08:09 +0200)]
net/tcp: switch do_tcp_setsockopt to sockptr_t

Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/tcp: switch ->md5_parse to sockptr_t
Christoph Hellwig [Thu, 23 Jul 2020 06:09:05 +0000 (08:09 +0200)]
net/tcp: switch ->md5_parse to sockptr_t

Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/udp: switch udp_lib_setsockopt to sockptr_t
Christoph Hellwig [Thu, 23 Jul 2020 06:09:04 +0000 (08:09 +0200)]
net/udp: switch udp_lib_setsockopt to sockptr_t

Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/ipv6: switch do_ipv6_setsockopt to sockptr_t
Christoph Hellwig [Thu, 23 Jul 2020 06:09:03 +0000 (08:09 +0200)]
net/ipv6: switch do_ipv6_setsockopt to sockptr_t

Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/ipv6: factor out a ipv6_set_opt_hdr helper
Christoph Hellwig [Thu, 23 Jul 2020 06:09:02 +0000 (08:09 +0200)]
net/ipv6: factor out a ipv6_set_opt_hdr helper

Factour out a helper to set the IPv6 option headers from
do_ipv6_setsockopt.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/ipv6: switch ipv6_flowlabel_opt to sockptr_t
Christoph Hellwig [Thu, 23 Jul 2020 06:09:01 +0000 (08:09 +0200)]
net/ipv6: switch ipv6_flowlabel_opt to sockptr_t

Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.

Note that the get case is pretty weird in that it actually copies data
back to userspace from setsockopt.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/ipv6: split up ipv6_flowlabel_opt
Christoph Hellwig [Thu, 23 Jul 2020 06:09:00 +0000 (08:09 +0200)]
net/ipv6: split up ipv6_flowlabel_opt

Split ipv6_flowlabel_opt into a subfunction for each action and a small
wrapper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/ipv6: switch ip6_mroute_setsockopt to sockptr_t
Christoph Hellwig [Thu, 23 Jul 2020 06:08:59 +0000 (08:08 +0200)]
net/ipv6: switch ip6_mroute_setsockopt to sockptr_t

Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/ipv4: switch do_ip_setsockopt to sockptr_t
Christoph Hellwig [Thu, 23 Jul 2020 06:08:58 +0000 (08:08 +0200)]
net/ipv4: switch do_ip_setsockopt to sockptr_t

Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/ipv4: merge ip_options_get and ip_options_get_from_user
Christoph Hellwig [Thu, 23 Jul 2020 06:08:57 +0000 (08:08 +0200)]
net/ipv4: merge ip_options_get and ip_options_get_from_user

Use the sockptr_t type to merge the versions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/ipv4: switch ip_mroute_setsockopt to sockptr_t
Christoph Hellwig [Thu, 23 Jul 2020 06:08:56 +0000 (08:08 +0200)]
net/ipv4: switch ip_mroute_setsockopt to sockptr_t

Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agobpfilter: switch bpfilter_ip_set_sockopt to sockptr_t
Christoph Hellwig [Thu, 23 Jul 2020 06:08:55 +0000 (08:08 +0200)]
bpfilter: switch bpfilter_ip_set_sockopt to sockptr_t

This is mostly to prepare for cleaning up the callers, as bpfilter by
design can't handle kernel pointers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonetfilter: switch nf_setsockopt to sockptr_t
Christoph Hellwig [Thu, 23 Jul 2020 06:08:54 +0000 (08:08 +0200)]
netfilter: switch nf_setsockopt to sockptr_t

Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonetfilter: switch xt_copy_counters to sockptr_t
Christoph Hellwig [Thu, 23 Jul 2020 06:08:53 +0000 (08:08 +0200)]
netfilter: switch xt_copy_counters to sockptr_t

Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonetfilter: remove the unused user argument to do_update_counters
Christoph Hellwig [Thu, 23 Jul 2020 06:08:52 +0000 (08:08 +0200)]
netfilter: remove the unused user argument to do_update_counters

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/xfrm: switch xfrm_user_policy to sockptr_t
Christoph Hellwig [Thu, 23 Jul 2020 06:08:51 +0000 (08:08 +0200)]
net/xfrm: switch xfrm_user_policy to sockptr_t

Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: switch sock_set_timeout to sockptr_t
Christoph Hellwig [Thu, 23 Jul 2020 06:08:50 +0000 (08:08 +0200)]
net: switch sock_set_timeout to sockptr_t

Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: switch sock_set_timeout to sockptr_t
Christoph Hellwig [Thu, 23 Jul 2020 06:08:49 +0000 (08:08 +0200)]
net: switch sock_set_timeout to sockptr_t

Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: switch sock_setbindtodevice to sockptr_t
Christoph Hellwig [Thu, 23 Jul 2020 06:08:48 +0000 (08:08 +0200)]
net: switch sock_setbindtodevice to sockptr_t

Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: switch copy_bpf_fprog_from_user to sockptr_t
Christoph Hellwig [Thu, 23 Jul 2020 06:08:47 +0000 (08:08 +0200)]
net: switch copy_bpf_fprog_from_user to sockptr_t

Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: add a new sockptr_t type
Christoph Hellwig [Thu, 23 Jul 2020 06:08:46 +0000 (08:08 +0200)]
net: add a new sockptr_t type

Add a uptr_t type that can hold a pointer to either a user or kernel
memory region, and simply helpers to copy to and from it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agobpfilter: reject kernel addresses
Christoph Hellwig [Thu, 23 Jul 2020 06:08:45 +0000 (08:08 +0200)]
bpfilter: reject kernel addresses

The bpfilter user mode helper processes the optval address using
process_vm_readv.  Don't send it kernel addresses fed under
set_fs(KERNEL_DS) as that won't work.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/bpfilter: split __bpfilter_process_sockopt
Christoph Hellwig [Thu, 23 Jul 2020 06:08:44 +0000 (08:08 +0200)]
net/bpfilter: split __bpfilter_process_sockopt

Split __bpfilter_process_sockopt into a low-level send request routine and
the actual setsockopt hook to split the init time ping from the actual
setsockopt processing.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agobpfilter: fix up a sparse annotation
Christoph Hellwig [Thu, 23 Jul 2020 06:08:43 +0000 (08:08 +0200)]
bpfilter: fix up a sparse annotation

The __user doesn't make sense when casting to an integer type, just
switch to a uintptr_t cast which also removes the need for the __force.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'TC-datapath-hash-api'
David S. Miller [Fri, 24 Jul 2020 22:23:31 +0000 (15:23 -0700)]
Merge branch 'TC-datapath-hash-api'

Ariel Levkovich says:

====================
TC datapath hash api

Hash based packet classification allows user to set up rules that
provide load balancing of traffic across multiple vports and
for ECMP path selection while keeping the number of rule at minimum.

Instead of matching on exact flow spec, which requires a rule per
flow, user can define rules based on a their hash value and distribute
the flows to different buckets. The number of rules
in this case will be constant and equal to the number of buckets.

The series introduces an extention to the cls flower classifier
and allows user to add rules that match on the hash value that
is stored in skb->hash while assuming the value was set prior to
the classification.

Setting the skb->hash can be done in various ways and is not defined
in this series - for example:
1. By the device driver upon processing an rx packet.
2. Using tc action bpf with a program which computes and sets the
skb->hash value.

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
flower hash 0x0/0xf  \
action mirred egress redirect dev ens1f0_1

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
flower hash 0x1/0xf  \
action mirred egress redirect dev ens1f0_2

v3 -> v4:
 *Drop hash setting code leaving only the classidication parts.
  Setting the hash will be possible via existing tc action bpf.

v2 -> v3:
 *Split hash algorithm option into 2 different actions.
  Asym_l4 available via act_skbedit and bpf via new act_hash.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/sched: cls_flower: Add hash info to flow classification
Ariel Levkovich [Wed, 22 Jul 2020 22:03:01 +0000 (01:03 +0300)]
net/sched: cls_flower: Add hash info to flow classification

Adding new cls flower keys for hash value and hash
mask and dissect the hash info from the skb into
the flow key towards flow classication.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/flow_dissector: add packet hash dissection
Ariel Levkovich [Wed, 22 Jul 2020 22:03:00 +0000 (01:03 +0300)]
net/flow_dissector: add packet hash dissection

Retreive a hash value from the SKB and store it
in the dissector key for future matching.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: hyperv: dump TX indirection table to ethtool regs
Chi Song [Fri, 24 Jul 2020 04:14:26 +0000 (21:14 -0700)]
net: hyperv: dump TX indirection table to ethtool regs

An imbalanced TX indirection table causes netvsc to have low
performance. This table is created and managed during runtime. To help
better diagnose performance issues caused by imbalanced tables, it needs
make TX indirection tables visible.

Because TX indirection table is driver specified information, so
display it via ethtool register dump.

Signed-off-by: Chi Song <chisong@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoflow_offload: Move rhashtable inclusion to the source file
Herbert Xu [Fri, 24 Jul 2020 00:50:22 +0000 (10:50 +1000)]
flow_offload: Move rhashtable inclusion to the source file

I noticed that touching linux/rhashtable.h causes lib/vsprintf.c to
be rebuilt.  This dependency came through a bogus inclusion in the
file net/flow_offload.h.  This patch moves it to the right place.

This patch also removes a lingering rhashtable inclusion in cls_api
created by the same commit.

Fixes: 4e481908c51b ("flow_offload: move tc indirect block to...")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'akpm' into master (patches from Andrew)
Linus Torvalds [Fri, 24 Jul 2020 21:24:35 +0000 (14:24 -0700)]
Merge branch 'akpm' into master (patches from Andrew)

Merge misc fixes from Andrew Morton:
 "Subsystems affected by this patch series: mm/pagemap, mm/shmem,
  mm/hotfixes, mm/memcg, mm/hugetlb, mailmap, squashfs, scripts,
  io-mapping, MAINTAINERS, and gdb"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  scripts/gdb: fix lx-symbols 'gdb.error' while loading modules
  MAINTAINERS: add KCOV section
  io-mapping: indicate mapping failure
  scripts/decode_stacktrace: strip basepath from all paths
  squashfs: fix length field overlap check in metadata reading
  mailmap: add entry for Mike Rapoport
  khugepaged: fix null-pointer dereference due to race
  mm/hugetlb: avoid hardcoding while checking if cma is enabled
  mm: memcg/slab: fix memory leak at non-root kmem_cache destroy
  mm/memcg: fix refcount error while moving and swapping
  mm/memcontrol: fix OOPS inside mem_cgroup_get_nr_swap_pages()
  mm: initialize return of vm_insert_pages
  vfs/xattr: mm/shmem: kernfs: release simple xattr entry in a right way
  mm/mmap.c: close race between munmap() and expand_upwards()/downwards()

3 years agoMerge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into...
Linus Torvalds [Fri, 24 Jul 2020 21:19:00 +0000 (14:19 -0700)]
Merge branch 'fixes' of git://git./linux/kernel/git/viro/vfs into master

Pull xtensa csum regression fix from Al Viro:
 "Max Filippov caught a breakage introduced in xtensa this cycle
  by the csum_and_copy_..._user() series.

  Cut'n'paste from the wrong source - the check that belongs
  in csum_and_copy_to_user() ended up both there and in
  csum_and_copy_from_user()"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  xtensa: fix access check in csum_and_copy_from_user

3 years agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux...
Linus Torvalds [Fri, 24 Jul 2020 21:16:12 +0000 (14:16 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux into master

Pull arm64 fix from Will Deacon:
 "Fix compat vDSO build flags for recent versions of clang to tell it
  where to find the assembler"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: vdso32: Fix '--prefix=' value for newer versions of clang