Eelco Chaudron [Thu, 20 Feb 2020 13:26:24 +0000 (13:26 +0000)]
libbpf: Bump libpf current version to v0.0.8
New development cycles starts, bump to v0.0.8.
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/158220518424.127661.8278643006567775528.stgit@xdp-tutorial
Andrii Nakryiko [Thu, 20 Feb 2020 06:26:35 +0000 (22:26 -0800)]
libbpf: Relax check whether BTF is mandatory
If BPF program is using BTF-defined maps, BTF is required only for
libbpf itself to process map definitions. If after that BTF fails to
be loaded into kernel (e.g., if it doesn't support BTF at all), this
shouldn't prevent valid BPF program from loading. Existing
retry-without-BTF logic for creating maps will succeed to create such
maps without any problems. So, presence of .maps section shouldn't make
BTF required for kernel. Update the check accordingly.
Validated by ensuring simple BPF program with BTF-defined maps is still
loaded on old kernel without BTF support and map is correctly parsed and
created.
Fixes:
abd29c931459 ("libbpf: allow specifying map definitions using BTF")
Reported-by: Julia Kartseva <hex@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200220062635.1497872-1-andriin@fb.com
Alexei Starovoitov [Wed, 19 Feb 2020 20:55:14 +0000 (12:55 -0800)]
selftests/bpf: Fix build of sockmap_ktls.c
The selftests fails to build with:
tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c: In function ‘test_sockmap_ktls_disconnect_after_delete’:
tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c:72:37: error: ‘TCP_ULP’ undeclared (first use in this function)
72 | err = setsockopt(cli, IPPROTO_TCP, TCP_ULP, "tls", strlen("tls"));
| ^~~~~~~
Similar to commit that fixes build of sockmap_basic.c on systems with old
/usr/include fix the build of sockmap_ktls.c
Fixes:
d1ba1204f2ee ("selftests/bpf: Test unhashing kTLS socket after removing from map")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200219205514.3353788-1-ast@kernel.org
Yonghong Song [Wed, 19 Feb 2020 00:42:36 +0000 (16:42 -0800)]
selftests/bpf: Change llvm flag -mcpu=probe to -mcpu=v3
The latest llvm supports cpu version v3, which is cpu version v1
plus some additional 64bit jmp insns and 32bit jmp insn support.
In selftests/bpf Makefile, the llvm flag -mcpu=probe did runtime
probe into the host system. Depending on compilation environments,
it is possible that runtime probe may fail, e.g., due to
memlock issue. This will cause generated code with cpu version v1.
This may cause confusion as the same compiler and the same C code
generates different byte codes in different environment.
Let us change the llvm flag -mcpu=probe to -mcpu=v3 so the
generated code will be the same regardless of the compilation
environment.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200219004236.2291125-1-yhs@fb.com
Alexei Starovoitov [Wed, 19 Feb 2020 22:37:36 +0000 (14:37 -0800)]
Merge branch 'bpf_read_branch_records'
Daniel Xu says:
====================
Branch records are a CPU feature that can be configured to record
certain branches that are taken during code execution. This data is
particularly interesting for profile guided optimizations. perf has had
branch record support for a while but the data collection can be a bit
coarse grained.
We (Facebook) have seen in experiments that associating metadata with
branch records can improve results (after postprocessing). We generally
use bpf_probe_read_*() to get metadata out of userspace. That's why bpf
support for branch records is useful.
Aside from this particular use case, having branch data available to bpf
progs can be useful to get stack traces out of userspace applications
that omit frame pointers.
Changes in v8:
- Use globals instead of perf buffer
- Call test_perf_branches__detach() before destroying skeleton
- Fix typo in docs
Changes in v7:
- Const-ify and static-ify local var
- Documentation formatting
Changes in v6:
- Move #ifdef a little to avoid unused variable warnings on !x86
- Test negative condition in selftest (-EINVAL on improperly configured
perf event)
- Skip positive condition selftest on setups that don't support branch
records
Changes in v5:
- Rename bpf_perf_prog_read_branches() -> bpf_read_branch_records()
- Rename BPF_F_GET_BR_SIZE -> BPF_F_GET_BRANCH_RECORDS_SIZE
- Squash tools/ bpf.h sync into selftest commit
Changes in v4:
- Add BPF_F_GET_BR_SIZE flag
- Return -ENOENT on unsupported architectures
- Only accept initialized memory in helper
- Check buffer size is multiple of sizeof(struct perf_branch_entry)
- Use bpf skeleton in selftest
- Add commit messages
- Spelling and formatting
Changes in v3:
- Document filling unused buffer with zero
- Formatting fixes
- Rebase
Changes in v2:
- Change to a bpf helper instead of context access
- Avoid mentioning Intel specific things
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Xu [Tue, 18 Feb 2020 03:04:32 +0000 (19:04 -0800)]
selftests/bpf: Add bpf_read_branch_records() selftest
Add a selftest to test:
* default bpf_read_branch_records() behavior
* BPF_F_GET_BRANCH_RECORDS_SIZE flag behavior
* error path on non branch record perf events
* using helper to write to stack
* using helper to write to global
On host with hardware counter support:
# ./test_progs -t perf_branches
#27/1 perf_branches_hw:OK
#27/2 perf_branches_no_hw:OK
#27 perf_branches:OK
Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED
On host without hardware counter support (VM):
# ./test_progs -t perf_branches
#27/1 perf_branches_hw:OK
#27/2 perf_branches_no_hw:OK
#27 perf_branches:OK
Summary: 1/2 PASSED, 1 SKIPPED, 0 FAILED
Also sync tools/include/uapi/linux/bpf.h.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200218030432.4600-3-dxu@dxuuu.xyz
Daniel Xu [Tue, 18 Feb 2020 03:04:31 +0000 (19:04 -0800)]
bpf: Add bpf_read_branch_records() helper
Branch records are a CPU feature that can be configured to record
certain branches that are taken during code execution. This data is
particularly interesting for profile guided optimizations. perf has had
branch record support for a while but the data collection can be a bit
coarse grained.
We (Facebook) have seen in experiments that associating metadata with
branch records can improve results (after postprocessing). We generally
use bpf_probe_read_*() to get metadata out of userspace. That's why bpf
support for branch records is useful.
Aside from this particular use case, having branch data available to bpf
progs can be useful to get stack traces out of userspace applications
that omit frame pointers.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200218030432.4600-2-dxu@dxuuu.xyz
Daniel Borkmann [Wed, 19 Feb 2020 15:54:05 +0000 (16:54 +0100)]
Merge branch 'bpf-skmsg-simplify-restore'
Jakub Sitnicki says:
====================
This series has been split out from "Extend SOCKMAP to store listening
sockets" [0]. I think it stands on its own, and makes the latter series
smaller, which will make the review easier, hopefully.
The essence is that we don't need to do a complicated dance in
sk_psock_restore_proto, if we agree that the contract with tcp_update_ulp
is to restore callbacks even when the socket doesn't use ULP. This is what
tcp_update_ulp currently does, and we just make use of it.
Series is accompanied by a test for a particularly tricky case of restoring
callbacks when we have both sockmap and tls callbacks configured in
sk->sk_prot.
[0] https://lore.kernel.org/bpf/
20200127131057.150941-1-jakub@cloudflare.com/
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jakub Sitnicki [Mon, 17 Feb 2020 12:15:30 +0000 (12:15 +0000)]
selftests/bpf: Test unhashing kTLS socket after removing from map
When a TCP socket gets inserted into a sockmap, its sk_prot callbacks get
replaced with tcp_bpf callbacks built from regular tcp callbacks. If TLS
gets enabled on the same socket, sk_prot callbacks get replaced once again,
this time with kTLS callbacks built from tcp_bpf callbacks.
Now, we allow removing a socket from a sockmap that has kTLS enabled. After
removal, socket remains with kTLS configured. This is where things things
get tricky.
Since the socket has a set of sk_prot callbacks that are a mix of kTLS and
tcp_bpf callbacks, we need to restore just the tcp_bpf callbacks to the
original ones. At the moment, it comes down to the the unhash operation.
We had a regression recently because tcp_bpf callbacks were not cleared in
this particular scenario of removing a kTLS socket from a sockmap. It got
fixed in commit
4da6a196f93b ("bpf: Sockmap/tls, during free we may call
tcp_bpf_unhash() in loop").
Add a test that triggers the regression so that we don't reintroduce it in
the future.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200217121530.754315-4-jakub@cloudflare.com
Jakub Sitnicki [Mon, 17 Feb 2020 12:15:29 +0000 (12:15 +0000)]
bpf, sk_msg: Don't clear saved sock proto on restore
There is no need to clear psock->sk_proto when restoring socket protocol
callbacks in sk->sk_prot. The psock is about to get detached from the sock
and eventually destroyed. At worst we will restore the protocol callbacks
and the write callback twice.
This makes reasoning about psock state easier. Once psock is initialized,
we can count on psock->sk_proto always being set.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200217121530.754315-3-jakub@cloudflare.com
Jakub Sitnicki [Mon, 17 Feb 2020 12:15:28 +0000 (12:15 +0000)]
bpf, sk_msg: Let ULP restore sk_proto and write_space callback
We don't need a fallback for when the socket is not using ULP.
tcp_update_ulp handles this case exactly the same as we do in
sk_psock_restore_proto. Get rid of the duplicated code.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200217121530.754315-2-jakub@cloudflare.com
Song Liu [Fri, 14 Feb 2020 23:41:46 +0000 (15:41 -0800)]
bpf: Allow bpf_perf_event_read_value in all BPF programs
bpf_perf_event_read_value() is NMI safe. Enable it for all BPF programs.
This can be used in fentry/fexit to profile BPF program and individual
kernel function with hardware counters.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200214234146.2910011-1-songliubraving@fb.com
YueHaibing [Tue, 18 Feb 2020 06:21:54 +0000 (14:21 +0800)]
net: ena: remove set but not used variable 'hash_key'
drivers/net/ethernet/amazon/ena/ena_com.c: In function ena_com_hash_key_allocate:
drivers/net/ethernet/amazon/ena/ena_com.c:1070:50:
warning: variable hash_key set but not used [-Wunused-but-set-variable]
commit
6a4f7dc82d1e ("net: ena: rss: do not allocate key when not supported")
introduced this, but not used, so remove it.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gustavo A. R. Silva [Mon, 17 Feb 2020 20:07:19 +0000 (14:07 -0600)]
net: netlink: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit
76497732932f ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gustavo A. R. Silva [Mon, 17 Feb 2020 20:02:36 +0000 (14:02 -0600)]
net: switchdev: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit
76497732932f ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gustavo A. R. Silva [Mon, 17 Feb 2020 20:01:11 +0000 (14:01 -0600)]
bpf, sockmap: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit
76497732932f ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gustavo A. R. Silva [Mon, 17 Feb 2020 19:59:41 +0000 (13:59 -0600)]
NFC: digital: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit
76497732932f ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gustavo A. R. Silva [Mon, 17 Feb 2020 19:58:16 +0000 (13:58 -0600)]
net: usb: cdc-phonet: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit
76497732932f ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Mon, 17 Feb 2020 16:03:11 +0000 (16:03 +0000)]
net: phy: allow bcm84881 to be a module
Now that the phylib module loading issue has been resolved, we can
allow this PHY driver to be built as a module.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 17 Feb 2020 22:50:25 +0000 (14:50 -0800)]
Merge branch 'net-smc-next'
Ursula Braun says:
====================
net/smc: patches 2020-02-17
here are patches for SMC making termination tasks more perfect.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Mon, 17 Feb 2020 15:24:55 +0000 (16:24 +0100)]
net/smc: reduce port_event scheduling
IB event handlers schedule the port event worker for further
processing of port state changes. This patch reduces the number of
schedules to avoid duplicate processing of the same port change.
Reviewed-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Mon, 17 Feb 2020 15:24:54 +0000 (16:24 +0100)]
net/smc: simplify normal link termination
smc_lgr_terminate() and smc_lgr_terminate_sched() both result in soft
link termination, smc_lgr_terminate_sched() is scheduling a worker for
this task. Take out complexity by always using the termination worker
and getting rid of smc_lgr_terminate() completely.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Mon, 17 Feb 2020 15:24:53 +0000 (16:24 +0100)]
net/smc: remove unused parameter of smc_lgr_terminate()
The soft parameter of smc_lgr_terminate() is not used and obsolete.
Remove it.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Mon, 17 Feb 2020 15:24:52 +0000 (16:24 +0100)]
net/smc: do not delete lgr from list twice
When 2 callers call smc_lgr_terminate() at the same time
for the same lgr, one gets the lgr_lock and deletes the lgr from the
list and releases the lock. Then the second caller gets the lock and
tries to delete it again.
In smc_lgr_terminate() add a check if the link group lgr is already
deleted from the link group list and prevent to try to delete it a
second time.
And add a check if the lgr is marked as freeing, which means that a
termination is already pending.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Mon, 17 Feb 2020 15:24:51 +0000 (16:24 +0100)]
net/smc: use termination worker under send_lock
smc_tx_rdma_write() is called under the send_lock and should not call
smc_lgr_terminate() directly. Call smc_lgr_terminate_sched() instead
which schedules a worker.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Mon, 17 Feb 2020 15:24:50 +0000 (16:24 +0100)]
net/smc: improve smc_lgr_cleanup()
smc_lgr_cleanup() is called during termination processing, there is no
need to send a DELETE_LINK at that time. A DELETE_LINK should have been
sent before the termination is initiated, if needed.
And remove the extra call to wake_up(&lnk->wr_reg_wait) because
smc_llc_link_inactive() already calls the related helper function
smc_wr_wakeup_reg_wait().
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 17 Feb 2020 22:42:54 +0000 (14:42 -0800)]
Merge branch 'mlxsw-Reduce-dependency-between-bridge-and-router-code'
Ido Schimmel says:
====================
mlxsw: Reduce dependency between bridge and router code
This patch set reduces the dependency between the bridge and the router
code in preparation for RTNL removal from the route insertion path in
mlxsw.
The motivation and solution are explained in detail in patch #3. The
main idea is that we need to stop special-casing the VXLAN devices with
regards to the reference counting of the FIDs. Otherwise, we can bump
into the situation described in patch #3, where the routing code calls
into the bridge code which calls back into the routing code. After
adding a mutex to protect router data structures to remove RTNL
dependency, this can result in an AA deadlock.
Patches #1 and #2 are preparations. They convert the FIDs to use
'refcount_t' for reference counting in order to catch over/under flows
and add extack to the bridge creation function.
Patches #3-#5 reduce the dependency between the bridge and the router
code. First, by having the VXLAN device take a reference on the FID in
patch #3 and then by removing unnecessary code following the change in
patch #3.
Patches #6-#10 adjust existing selftests and add new ones to exercise
the new code paths.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 17 Feb 2020 14:29:40 +0000 (16:29 +0200)]
selftests: mlxsw: vxlan: Add test for error path
Test that when two VXLAN tunnels with conflicting configurations (i.e.,
different TTL) are enslaved to the same VLAN-aware bridge, then the
enslavement of a port to the bridge is denied.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 17 Feb 2020 14:29:39 +0000 (16:29 +0200)]
selftests: mlxsw: vxlan: Adjust test to recent changes
After recent changes, the VXLAN tunnel will be offloaded regardless if
any local ports are member in the FID or not. Adjust the test to make
sure the tunnel is offloaded in this case.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 17 Feb 2020 14:29:38 +0000 (16:29 +0200)]
selftests: mlxsw: extack: Test creation of multiple VLAN-aware bridges
The driver supports a single VLAN-aware bridge. Test that the
enslavement of a port to the second VLAN-aware bridge fails with an
extack.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 17 Feb 2020 14:29:37 +0000 (16:29 +0200)]
selftests: mlxsw: extack: Test bridge creation with VXLAN
Test that creation of a bridge (both VLAN-aware and VLAN-unaware) fails
with an extack when a VXLAN device with an unsupported configuration is
already enslaved to it.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 17 Feb 2020 14:29:36 +0000 (16:29 +0200)]
selftests: mlxsw: Remove deprecated test
The addition of a VLAN on a bridge slave prompts the driver to have the
local port in question join the FID corresponding to this VLAN.
Before recent changes, the operation of joining the FID would also mean
that the driver would enable VXLAN tunneling if a VXLAN device was also
member in the VLAN. In case the configuration of the VXLAN tunnel was
not supported, an extack error would be returned.
Since the operation of joining the FID no longer means that VXLAN
tunneling is potentially enabled, the test is no longer relevant. Remove
it.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 17 Feb 2020 14:29:35 +0000 (16:29 +0200)]
mlxsw: spectrum: Reduce dependency between bridge and router code
Commit
f40be47a3e40 ("mlxsw: spectrum_router: Do not force specific
configuration order") added a call from the routing code to the bridge
code in order to handle the case where VNI should be set on a FID
following the joining of the router port to the FID.
This is no longer required, as previous patches made VXLAN devices
explicitly take a reference on the FID and set VNI on it.
Therefore, remove the unnecessary call and simply have the RIF take a
reference on the FID without checking if VNI should also be set on it.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 17 Feb 2020 14:29:34 +0000 (16:29 +0200)]
mlxsw: spectrum_switchdev: Remove VXLAN checks during FID membership
As explained in previous patch, VXLAN devices now take a reference on
the FID and not only local ports. Therefore, there is no need for local
ports to check if they need to set a VNI on the FID when they join the
FID.
Remove these unnecessary checks.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 17 Feb 2020 14:29:33 +0000 (16:29 +0200)]
mlxsw: spectrum_switchdev: Have VXLAN device take reference on FID
Up until now only local ports and the router port (which is also a local
port) took a reference on the corresponding FID (Filtering Identifier)
when joining a bridge. For example:
192.0.2.1/24
br0
|
+------+------+
| |
swp1 vxlan0
In this case the reference count of the FID will be '2'. Since the VXLAN
device does not take a reference on the FID, whenever a local port joins
the bridge it needs to check if a VXLAN device is already enslaved. If
the VXLAN device should be mapped to the FID in question, then the VXLAN
device's VNI is set on the FID.
Beside the fact that this scheme special-cases the VXLAN device, it also
creates an unnecessary dependency between the routing and bridge code:
1. [R] IP address is added on 'br0', which prompts the creation of a RIF
and a backing FID
2. [B] VNI is enabled on backing FID
3. [R] Host route corresponding to VXLAN device's source address is
promoted to perform NVE decapsulation
[R] - Routing code
[B] - Bridge code
This back and forth dependency will become problematic when a lock is
added in the routing code instead of relying on RTNL, as it will result
in an AA deadlock.
Instead, have the VXLAN device take a reference on the FID just like all
the other netdev members of the bridge. In order to correctly handle the
case where VXLAN devices are already enslaved to the bridge when it is
offloaded, walk the bridge's slaves and replay the configuration.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 17 Feb 2020 14:29:32 +0000 (16:29 +0200)]
mlxsw: spectrum_switchdev: Propagate extack to bridge creation function
Propagate extack to bridge creation function so that error messages
could be passed to user space via netlink instead of printing them to
kernel log.
A subsequent patch will pass the new extack argument to more functions.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 17 Feb 2020 14:29:31 +0000 (16:29 +0200)]
mlxsw: spectrum_fid: Use 'refcount_t' for FID reference counting
'refcount_t' is very useful for catching over/under flows. Convert the
FID (Filtering Identifier) objects to use it instead of 'unsigned int'
for their reference count.
A subsequent patch in the series will change the way VXLAN devices hold
/ release the FID reference, which is why the conversion is made now.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Mon, 17 Feb 2020 13:45:01 +0000 (14:45 +0100)]
net: bridge: teach ndo_dflt_bridge_getlink() more brport flags
This enables ndo_dflt_bridge_getlink() to report a bridge port's
offload settings for multicast and broadcast flooding.
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 17 Feb 2020 22:35:30 +0000 (14:35 -0800)]
Merge branch 'sfc-couple-more-ARFS-tidy-ups'
Edward Cree says:
====================
couple more ARFS tidy-ups
Tie up some loose ends from the recent ARFS work.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 17 Feb 2020 13:43:28 +0000 (13:43 +0000)]
sfc: move some ARFS code out of headers
efx_filter_rfs_expire() is a work-function, so it being inline makes no
sense. It's only ever used in efx_channels.c, so move it there.
While we're at it, clean out some related unused cruft.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 17 Feb 2020 13:43:10 +0000 (13:43 +0000)]
sfc: only schedule asynchronous filter work if needed
Prevent excessive CPU time spent running a workitem with nothing to do.
We avoid any races by keeping the same check in efx_filter_rfs_expire().
Suggested-by: Martin Habets <mhabets@solarflare.com>
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Mon, 17 Feb 2020 12:27:58 +0000 (13:27 +0100)]
net: vlan: suppress "failed to kill vid" warnings
When a real dev unregisters, vlan_device_event() also unregisters all
of its vlan interfaces. For each VID this ends up in __vlan_vid_del(),
which attempts to remove the VID from the real dev's VLAN filter.
But the unregistering real dev might no longer be able to issue the
required IOs, and return an error. Subsequently we raise a noisy warning
msg that is not appropriate for this situation: the real dev is being
torn down anyway, there shouldn't be any worry about cleanly releasing
all of its HW-internal resources.
So to avoid scaring innocent users, suppress this warning when the
failed deletion happens on an unregistering device.
While at it also convert the raw pr_warn() to a more fitting
netdev_warn().
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Shevchenko [Mon, 17 Feb 2020 10:58:27 +0000 (12:58 +0200)]
net: stmmac: Get rid of custom STMMAC_DEVICE() macro
Since PCI core provides a generic PCI_DEVICE_DATA() macro,
replace STMMAC_DEVICE() with former one.
No functional change intended.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 17 Feb 2020 22:17:02 +0000 (14:17 -0800)]
Merge branch 'Remove-rtnl-lock-dependency-from-flow_action-infra'
Vlad Buslov says:
====================
Remove rtnl lock dependency from flow_action infra
Currently, TC flow_action infrastructure code obtain rtnl lock before
accessing action state in tc_setup_flow_action() function and releases
it afterwards. This behavior is not supposed to impact TC filter
insertion rate because filling flow_action representation is only a
small part of creating new filter and expensive operations (hardware
offload callbacks, classifiers, cls API code that creates chains and
classifiers instances) already support unlocked execution. However,
typical vswitch implementation might need to also dump TC filters
concurrently, for example to age out unused flows or update flow
counters. TC dump is fully serialized and holds rtnl lock during its
whole execution in kernel space. As such, it can significantly impact
concurrent tasks that try to intermittently obtain rtnl lock when
filling intermediate representation for new filter offload (performance
evaluation at the end of this mail).
Refactor flow_action cls API infrastructure and its dependencies to not
rely on rtnl lock for synchronization. Patch set overview:
- Refactor tc_setup_flow_action() to obtain action tcf_lock when
accessing action state. Fix its dependencies to not obtain tcf_lock
themselves and assume that caller already holds it (needs to be done
in same patch to prevent deadlock) and not to call sleeping functions
(needs to be done in same patch to prevent "sleeping while atomic"
dmesg warnings).
- Refactor action helper functions to require tcf_lock instead of rtnl.
Internally, all of the actions already use tcf_lock for
synchronization to accommodate unlocked classifier API, so this change
relies on already existing functionality.
- Remove rtnl lock and "rtnl_held" argument from tc_setup_flow_action()
function.
To test the change, multiple concurrent TC instances are invoked with
following command:
time ls add* | xargs -n 1 -P 100 sudo tc -b
Ten batch files with following typical rules (100k each) are used:
filter add dev ens1f0_0 protocol ip ingress prio 1 handle 1 flower
src_mac e4:11:0:0:0:0 dst_mac e4:12:0:0:0:0 src_ip 192.168.111.1
dst_ip 192.168.111.2 ip_proto udp dst_port 1 src_port 1 action
tunnel_key set id 1 src_ip 2.2.2.2 dst_ip 2.2.2.3 dst_port 4789
no_percpu action mirred egress redirect dev vxlan1 no_percpu
TC dump of same device is called in infinite loop from five concurrent
instances:
while true do tc -s filter show dev $NIC ingress >/dev/null done
Results obtained on current net-next commit
9f68e3655aae ("Merge tag
'drm-next-2020-01-30' of git://anongit.freedesktop.org/drm/drm"):
| net-next | this change
---------------+----------+-------------
TC add | 6.3s | 6.3s
TC add + dump | 29.3s | 6.8s
Test results confirm significant impact of concurrent TC dump. The
impact is almost fully mitigated by proposed change (differences can be
attributed to contention for chain and tp locks between add and dump TC
instances).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov [Mon, 17 Feb 2020 10:12:12 +0000 (12:12 +0200)]
net: sched: don't take rtnl lock during flow_action setup
Refactor tc_setup_flow_action() function not to use rtnl lock and remove
'rtnl_held' argument that is no longer needed.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov [Mon, 17 Feb 2020 10:12:11 +0000 (12:12 +0200)]
net: sched: refactor ct action helpers to require tcf_lock
In order to remove rtnl lock dependency from flow_action representation
translator, change rtnl_dereference() to rcu_dereference_protected() in ct
action helpers that provide external access to zone and action values. This
is safe to do because the functions are not called from anywhere else
outside flow_action infrastructure which was modified to obtain tcf_lock
when accessing action data in one of previous patches in the series.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov [Mon, 17 Feb 2020 10:12:10 +0000 (12:12 +0200)]
net: sched: refactor police action helpers to require tcf_lock
In order to remove rtnl lock dependency from flow_action representation
translator, change rcu_dereference_bh_rtnl() to rcu_dereference_protected()
in police action helpers that provide external access to rate and burst
values. This is safe to do because the functions are not called from
anywhere else outside flow_action infrastructure which was modified to
obtain tcf_lock when accessing action data in one of previous patches in
the series.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov [Mon, 17 Feb 2020 10:12:09 +0000 (12:12 +0200)]
net: sched: lock action when translating it to flow_action infra
In order to remove dependency on rtnl lock, take action's tcfa_lock when
constructing its representation as flow_action_entry structure.
Refactor tcf_sample_get_group() to assume that caller holds tcf_lock and
don't take it manually. This callback is only called from flow_action infra
representation translator which now calls it with tcf_lock held, so this
refactoring is necessary to prevent deadlock.
Allocate memory with GFP_ATOMIC flag for ip_tunnel_info copy because
tcf_tunnel_info_copy() is only called from flow_action representation infra
code with tcf_lock spinlock taken.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 17 Feb 2020 04:04:42 +0000 (20:04 -0800)]
Merge branch 'mvneta-xdp-ethtool-stats'
Lorenzo Bianconi says:
====================
add xdp ethtool stats to mvneta driver
Rework mvneta stats accounting in order to introduce xdp ethtool
statistics in the mvneta driver.
Introduce xdp_redirect, xdp_pass, xdp_drop and xdp_tx counters to
ethtool statistics.
Fix skb_alloc_error and refill_error ethtool accounting
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Bianconi [Sun, 16 Feb 2020 21:07:33 +0000 (22:07 +0100)]
net: mvneta: get rid of xdp_ret in mvneta_swbm_rx_frame
Get rid of xdp_ret in mvneta_swbm_rx_frame routine since now
we can rely on xdp_stats to flush in case of xdp_redirect
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Bianconi [Sun, 16 Feb 2020 21:07:32 +0000 (22:07 +0100)]
net: mvneta: introduce xdp counters to ethtool
Add xdp_redirect, xdp_pass, xdp_drop and xdp_tx counters
to ethtool statistics
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Bianconi [Sun, 16 Feb 2020 21:07:31 +0000 (22:07 +0100)]
net: mvneta: rely on struct mvneta_stats in mvneta_update_stats routine
Introduce mvneta_stats structure in mvneta_update_stats routine signature
in order to collect all the rx stats and update them at the end at the
napi loop. mvneta_stats will be reused adding xdp statistics support to
ethtool.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Bianconi [Sun, 16 Feb 2020 21:07:30 +0000 (22:07 +0100)]
net: mvneta: rely on open-coding updating stats for non-xdp and tx path
In oreder to avoid unnecessary instructions rely on open-coding updating
per-cpu stats in mvneta_tx/mvneta_xdp_submit_frame and mvneta_rx_hwbm
routines. This patch will be used to add xdp support to ethtool for the
mvneta driver
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Bianconi [Sun, 16 Feb 2020 21:07:29 +0000 (22:07 +0100)]
net: mvneta: move refill_err and skb_alloc_err in per-cpu stats
mvneta_ethtool_update_stats routine is currently reporting
skb_alloc_error and refill_error only for the first rx queue.
Fix the issue moving skb_alloc_err and refill_err in
mvneta_pcpu_stats structure.
Moreover this patch will be used to introduce xdp statistics
to ethtool for the mvneta driver
Fixes:
17a96da62716 ("net: mvneta: discriminate error cause for missed packet")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 17 Feb 2020 04:00:22 +0000 (20:00 -0800)]
Merge branch 'mv88e6xxx-Add-SERDES-PCS-registers-to-ethtool-dump'
Andrew Lunn says:
====================
mv88e6xxx: Add SERDES/PCS registers to ethtool -d
ethtool -d will dump the registers of an interface. For mv88e6xxx
switch ports, this dump covers the port specific registers. Extend
this with the SERDES/PCS registers, if a port has a SERDES.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sun, 16 Feb 2020 17:54:15 +0000 (18:54 +0100)]
net: dsa: mv88e6xxx: Add 6390 family PCS registers to ethtool -d
The mv88e6390 has upto 8 sets of PCS registers, depending on how ports
9 and 10 are configured. The can be spread over 8 ports. If a port has
a PCS register set, return it along with the port registers. The
register space is sparse, so hard code a list of registers which will
be returned. It can later be extended, if needed, by append to the end
of the list.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sun, 16 Feb 2020 17:54:14 +0000 (18:54 +0100)]
net: dsa: mv88e6xxx: Add 6352 family PCS registers to ethtool -d
The mv88e6352 has one PCS which can be used for 1000BaseX or
SGMII. Add the registers to the dump for the port which the PCS is
associated to.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sun, 16 Feb 2020 17:54:13 +0000 (18:54 +0100)]
net: dsa: mv88e6xxx: Allow PCS registers to be retrieved via ethtool
ethtool provides a generic mechanism for a driver to return the
registers of an ethernet device. DSA uses this to give the port
registers associated with an interfaces. Extend this to allow PCS
registers to also be returned, if the port has a PCS associated to it.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 17 Feb 2020 03:51:21 +0000 (19:51 -0800)]
Merge branch '100GbE' of git://git./linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
100GbE Intel Wired LAN Driver Updates 2020-02-15
This series contains updates to ice driver only.
Brett adds support for "Queue in Queue" (QinQ) support, by supporting
S-tag & C-tag VLAN traffic by disabling pruning when there are no 0x8100
VLAN interfaces currently on top of the PF. Also refactored the port
VLAN configuration to re-use the common code for enabling and disabling
a port VLAN in single function. Added a helper function to determine if
the VF link is up. Fixed how the port VLAN configures the priority bits
for a VF interface. Fixed the port VLAN to only see its own broadcast
and multicast traffic. Added support to enable and disable all receive
queues, by refactoring adding a new function to do the necessary steps
to enable/disable a queue with the necessary read flush. Fixed how we
set the mapping mode for transmit and receive queues. Added support for
VF queues to handle LAN overflow events. Fixed and refactored how
receive queues get disabled for VFs, which was being handled one queue
at at time, so improve it to handle when the VF is requesting more than
one queue to be disabled. Fixed how the virtchnl_queue_select bitmap is
validated.
Finally a patch not authored by Brett, Bruce cleans up "fallthrough"
comments which are unnecessary. Also replaces the "fallthough" comments
with the GCC reserved word fallthrough, along with other GCC compiler
fixes. Add missing function header comment regarding a function
argument that was missing.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 17 Feb 2020 03:48:22 +0000 (19:48 -0800)]
Merge branch 'sonic-next'
Finn Thain says:
====================
Improvements for SONIC ethernet drivers
Now that the necessary sonic driver fixes have been merged, and the merge
window has closed again, I'm sending the remainder of my sonic driver
patch queue.
A couple of these patches will have to be applied in sequence to avoid
'git am' rejects. The others are independent and could have been submitted
individually. Please let me know if I should do that.
The complete sonic driver patch queue was tested on National Semiconductor
hardware (macsonic), qemu-system-m68k (macsonic) and qemu-system-mips64el
(jazzsonic).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Finn Thain [Sat, 15 Feb 2020 21:03:32 +0000 (08:03 +1100)]
net/macsonic: Remove interrupt handler wrapper
On m68k, local irqs remain enabled while interrupt handlers execute.
Therefore the macsonic driver has had to disable interrupts to avoid
re-entering sonic_interrupt().
As of commit
865ad2f2201d ("net/sonic: Add mutual exclusion for accessing
shared state"), sonic_interrupt() became re-entrant, and its wrapper
became redundant.
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Finn Thain [Sat, 15 Feb 2020 21:03:32 +0000 (08:03 +1100)]
net/sonic: Start packet transmission immediately
Give the transmit command as soon as the transmit descriptor is ready.
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Finn Thain [Sat, 15 Feb 2020 21:03:32 +0000 (08:03 +1100)]
net/sonic: Remove explicit memory barriers
The explicit memory barriers are redundant now that proper locking and
MMIO accessors have been employed.
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Finn Thain [Sat, 15 Feb 2020 21:03:32 +0000 (08:03 +1100)]
net/sonic: Remove redundant netif_start_queue() call
The transmit queue must be running already otherwise sonic_send_packet()
would not have been called. If the queue was stopped by the interrupt
handler, the interrupt handler will restart it again.
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Finn Thain [Sat, 15 Feb 2020 21:03:32 +0000 (08:03 +1100)]
net/sonic: Remove redundant next_tx variable
The eol_tx variable is the one that matters to the tx algorithm because
packets are always placed at the end of the list. The next_tx variable
just confuses things so remove it.
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Finn Thain [Sat, 15 Feb 2020 21:03:32 +0000 (08:03 +1100)]
net/sonic: Refactor duplicated code
No functional change.
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Finn Thain [Sat, 15 Feb 2020 21:03:32 +0000 (08:03 +1100)]
net/sonic: Remove obsolete comment
The comment is meaningless since mark_bh() was removed a long time ago.
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 17 Feb 2020 03:44:41 +0000 (19:44 -0800)]
Merge branch 'sh_eth-get-rid-of-the-dedicated-regiseter-mapping-for-RZ-A1-R7S72100'
Sergei Shtylyov says:
====================
sh_eth: get rid of the dedicated regiseter mapping for RZ/A1 (R7S72100)
Here's a set of 5 patches against DaveM's 'net-next.git' repo.
I changed my mind about the RZ/A1 SoC needing its own register
map -- now that we don't depend on the register map array in order
to determine whether a given register exists any more, we can add
a new flag to determine if the GECMR exists (this register is
present only on true GEther chips, not RZ/A1). We also need to
add the sh_eth_cpu_data::* flag checks where they were missing
so far: in the ethtool API for the register dump.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Sat, 15 Feb 2020 20:14:44 +0000 (23:14 +0300)]
sh_eth: use Gigabit register map for R7S72100
The register maps for the Gigabit controllers and the Ether one used on
RZ/A1 (AKA R7S72100) are identical except for GECMR which is only present
on the true GEther controllers. We no longer use the register map arrays
to determine if a given register exists, and have added the GECMR flag to
the 'struct sh_eth_cpu_data' in the previous patch, so we're ready to drop
the R7S72100 specific register map -- this saves 216 bytes of object code
(ARM gcc 4.8.5).
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Tested-by: Chris Brandt <chris.brandt@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Sat, 15 Feb 2020 20:13:45 +0000 (23:13 +0300)]
sh_eth: add sh_eth_cpu_data::gecmr flag
Not all Ether controllers having the Gigabit register layout have GECMR --
RZ/A1 (AKA R7S72100) actually has the same layout but no Gigabit speed
support and hence no GECMR. In the past, the new register map table was
added for this SoC, now I think we should have used the existing Gigabit
table with the differences (such as GECMR) covered by the mere flags in
the 'struct sh_eth_cpu_data'. Add such flag for GECMR -- and then we can
get rid of the R7S72100 specific layout in the next patch...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Tested-by: Chris Brandt <chris.brandt@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Sat, 15 Feb 2020 20:10:53 +0000 (23:10 +0300)]
sh_eth: check sh_eth_cpu_data::no_xdfar when dumping registers
When adding the sh_eth_cpu_data::no_xdfar flag I forgot to add the flag
check to __sh_eth_get_regs(), causing the non-existing RDFAR/TDFAR to be
considered for dumping on the R-Car gen1/2 SoCs (the register offset check
has the final say here)...
Fixes:
4c1d45850d5 ("sh_eth: add sh_eth_cpu_data::cexcr flag")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Tested-by: Chris Brandt <chris.brandt@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Sat, 15 Feb 2020 20:09:35 +0000 (23:09 +0300)]
sh_eth: check sh_eth_cpu_data::cexcr when dumping registers
When adding the sh_eth_cpu_data::cexcr flag I forgot to add the flag
check to __sh_eth_get_regs(), causing the non-existing RX packet counter
registers to be considered for dumping on the R7S72100 SoC (the register
offset sanity check has the final say here)...
Fixes:
4c1d45850d5 ("sh_eth: add sh_eth_cpu_data::cexcr flag")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Tested-by: Chris Brandt <chris.brandt@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Sat, 15 Feb 2020 20:08:20 +0000 (23:08 +0300)]
sh_eth: check sh_eth_cpu_data::no_tx_cntrs when dumping registers
When adding the sh_eth_cpu_data::no_tx_cntrs flag I forgot to add the
flag check to __sh_eth_get_regs(), causing the non-existing TX counter
registers to be considered for dumping on the R7S72100 SoC (the register
offset sanity check has the final say here)...
Fixes:
ce9134dff6d9 ("sh_eth: add sh_eth_cpu_data::no_tx_cntrs flag")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Tested-by: Chris Brandt <chris.brandt@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 17 Feb 2020 03:39:45 +0000 (19:39 -0800)]
Merge branch 'Pause-updates-for-phylib-and-phylink'
Russell King says:
====================
Pause updates for phylib and phylink
Currently, phylib resolves the speed and duplex settings, which MAC
drivers use directly. phylib also extracts the "Pause" and "AsymPause"
bits from the link partner's advertisement, and stores them in struct
phy_device's pause and asym_pause members with no further processing.
It is left up to each MAC driver to implement decoding for this
information.
phylink converted drivers are able to take advantage of code therein
which resolves the pause advertisements for the MAC driver, but this
does nothing for unconverted drivers. It also does not allow us to
make use of hardware-resolved pause states offered by several PHYs.
This series aims to address this by:
1. Providing a generic implementation, linkmode_resolve_pause(), that
takes the ethtool linkmode arrays for the link partner and local
advertisements, decoding the result to whether pause frames should
be allowed to be transmitted or received and acted upon. I call
this the pause enablement state.
2. Providing a phylib implementation, phy_get_pause(), which allows
MAC drivers to request the pause enablement state from phylib.
3. Providing a generic linkmode_set_pause() for setting the pause
advertisement according to the ethtool tx/rx flags - note that this
design has some shortcomings, see the comments in the kerneldoc for
this function.
4. Remove the ability in phylink to set the pause states for fixed
links, which brings them into line with how we deal with the speed
and duplex parameters; we can reintroduce this later if anyone
requires it. This could be a userspace-visible change.
5. Split application of manual pause enablement state from phylink's
resolution of the same to allow use of phylib's new phy_get_pause()
interface by phylink, and make that switch.
6. Resolve the fixed-link pause enablement state using the generic
linkmode_resolve_pause() helper introduced earlier. This, in
connection with the previous commits, allows us to kill the
MLO_PAUSE_SYM and MLO_PAUSE_ASYM flags.
7. make phylink's ethtool pause setting implementation update the
pause advertisement in the same way that phylib does, with the
same caveats that are present there (as mentioned above w.r.t
linkmode_set_pause()).
8. create a more accurate initial configuration for MACs, used when
phy_start() is called or a SFP is detected. In particular, this
ensures that the pause bits seen by MAC drivers in state->pause
are accurate for SGMII.
9. finally, update the kerneldoc descriptions for mac_config() for
the above changes.
This series has been build-tested against net-next; the boot tested
patches are in my "phy" branch against v5.5 plus the queued phylink
changes that were merged for 5.6.
The next series will introduce the ability for phylib drivers to
provide hardware resolved pause enablement state. These patches can
be found in my "phy" branch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Sat, 15 Feb 2020 15:50:09 +0000 (15:50 +0000)]
net: phylink: clarify flow control settings in documentation
Clarify the expected flow control settings operation in the phylink
documentation for each negotiation mode.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Sat, 15 Feb 2020 15:50:03 +0000 (15:50 +0000)]
net: phylink: improve initial mac configuration
Improve the initial MAC configuration so we get a configuration which
more represents the final operating mode, in particular with respect
to the flow control settings.
We do this by:
1) more fully initialising our phy state, so we can use this as the
initial state for PHY based connections.
2) reading the fixed link state.
3) ensuring that in-band mode has sane pause settings for SGMII vs
802.3z negotiation modes.
In all three cases, we ensure that state->link is false, just in case
any MAC drivers have other ideas by mis-using this member, and we also
take account of manual pause mode configuration at this point.
This avoids MLO_PAUSE_AN being seen in mac_config() when operating in
PHY, fixed mode or inband SGMII mode, thereby giving cleaner semantics
to the pause flags. As a result of this, the pause flags now indicate
in a mode-independent way what is required from a mac_config()
implementation.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Sat, 15 Feb 2020 15:49:58 +0000 (15:49 +0000)]
net: phylink: allow ethtool -A to change flow control advertisement
When ethtool -A is used to change the pause modes, the pause
advertisement is not being changed, but the documentation in
uapi/linux/ethtool.h says we should be. Add that capability to
phylink.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Sat, 15 Feb 2020 15:49:53 +0000 (15:49 +0000)]
net: phylink: resolve fixed link flow control
Resolve the fixed link flow control using the recently introduced
linkmode_resolve_pause() helper, which we use in
phylink_get_fixed_state() only when operating in full duplex mode.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Sat, 15 Feb 2020 15:49:48 +0000 (15:49 +0000)]
net: phylink: use phylib resolved flow control modes
Use the new phy_get_pause() helper to get the resolved pause modes for
a PHY rather than resolving the pause modes ourselves. We temporarily
retain our pause mode resolution for causes where there is no PHY
attached, e.g. for fixed-link modes.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Sat, 15 Feb 2020 15:49:43 +0000 (15:49 +0000)]
net: phylink: ensure manual flow control is selected appropriately
Split the application of manually controlled flow control modes from
phylink_resolve_flow(), so that we can use alternative providers of
flow control resolution.
We also want to clear the MLO_PAUSE_AN flag when autoneg is disabled,
since flow control can't be negotiated in this circumstance.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Sat, 15 Feb 2020 15:49:37 +0000 (15:49 +0000)]
net: phylink: remove pause mode ethtool setting for fixed links
Remove the ability for ethtool -A to change the pause settings for
fixed links; if this is really required, we can reinstate it later.
Andrew Lunn agrees: "So I think it is safe to not implement ethtool
-A, at least until somebody has a real use case for it."
Lets avoid making things too complex for use cases that aren't being
used.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Sat, 15 Feb 2020 15:49:32 +0000 (15:49 +0000)]
net: add linkmode helper for setting flow control advertisement
Add a linkmode helper to set the flow control advertisement in an
ethtool linkmode mask according to the tx/rx capabilities. This
implementation is moved from phylib, and documented with an
analysis of its shortcomings.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Sat, 15 Feb 2020 15:49:27 +0000 (15:49 +0000)]
net: add helpers to resolve negotiated flow control
Add a couple of helpers to resolve negotiated flow control. Two helpers
are provided:
- linkmode_resolve_pause() which takes the link partner and local
advertisements, and decodes whether we should enable TX or RX pause
at the MAC. This is useful outside of phylib, e.g. in phylink.
- phy_get_pause(), which returns the TX/RX enablement status for the
current negotiation results of the PHY.
This allows us to centralise the flow control resolution, rather than
spreading it around.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Sat, 15 Feb 2020 23:57:36 +0000 (23:57 +0000)]
net: linkmode: make linkmode_test_bit() take const pointer
linkmode_test_bit() does not modify the address; test_bit() is also
declared const volatile for the same reason. There's no need for
linkmode_test_bit() to be any different, and allows implementation of
helpers that take a const linkmode pointer.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 17 Feb 2020 03:36:06 +0000 (19:36 -0800)]
Merge branch 'r8169-series-with-further-smaller-improvements'
Heiner Kallweit says:
====================
r8169: series with further smaller improvements
Nothing too exciting. This series includes further smaller
improvements.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Sat, 15 Feb 2020 13:54:14 +0000 (14:54 +0100)]
r8169: improve statistics of missed rx packets
Register RxMissed exists on few early chip versions only, however all
chip versions have the number of missed RX packets in the hardware
counters. Therefore remove using RxMissed and get the number of missed
RX packets from the hardware stats.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Sat, 15 Feb 2020 13:52:58 +0000 (14:52 +0100)]
r8169: improve rtl_jumbo_config
Merge enabling and disabling jumbo packets to one function to make
the code a little simpler.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Sat, 15 Feb 2020 13:52:05 +0000 (14:52 +0100)]
r8169: improve rtl8169_get_mac_version
Currently code snippet (RTL_R32(tp, TxConfig) >> 20) & 0xfcf is used
in few places to extract the chip XID. Change the code to do the XID
extraction only once.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Sat, 15 Feb 2020 13:50:29 +0000 (14:50 +0100)]
r8169: add helper rtl_pci_commit
In few places we do a PCI commit by reading an arbitrary chip register.
It's not always obvious that the read is meant to be a PCI commit,
therefore add a helper for it.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Sat, 15 Feb 2020 13:49:37 +0000 (14:49 +0100)]
r8169: simplify setting netdev features
Setting dev->features a few lines later allows to simplify the code.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Sat, 15 Feb 2020 13:48:49 +0000 (14:48 +0100)]
r8169: remove setting PCI_CACHE_LINE_SIZE in rtl_hw_start_8169
This is done for all RTL8169 chip versions in rtl8169_init_phy already.
Therefore we can remove it here.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Sat, 15 Feb 2020 13:48:03 +0000 (14:48 +0100)]
r8169: remove unneeded check from rtl_link_chg_patch
rtl_link_chg_patch() can be called from rtl_open() to rtl8169_close()
only. And in rtl8169_close() phy_stop() ensures that this function
isn't called afterwards. So we don't need this check.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matteo Croce [Sat, 15 Feb 2020 13:20:56 +0000 (14:20 +0100)]
openvswitch: add TTL decrement action
New action to decrement TTL instead of setting it to a fixed value.
This action will decrement the TTL and, in case of expired TTL, drop it
or execute an action passed via a nested attribute.
The default TTL expired action is to drop the packet.
Supports both IPv4 and IPv6 via the ttl and hop_limit fields, respectively.
Tested with a corresponding change in the userspace:
# ovs-dpctl dump-flows
in_port(2),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, actions:dec_ttl{ttl<=1 action:(drop)},1
in_port(1),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, actions:dec_ttl{ttl<=1 action:(drop)},2
in_port(1),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, actions:2
in_port(2),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, actions:1
# ping -c1 192.168.0.2 -t 42
IP (tos 0x0, ttl 41, id 61647, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.0.1 > 192.168.0.2: ICMP echo request, id 386, seq 1, length 64
# ping -c1 192.168.0.2 -t 120
IP (tos 0x0, ttl 119, id 62070, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.0.1 > 192.168.0.2: ICMP echo request, id 388, seq 1, length 64
# ping -c1 192.168.0.2 -t 1
#
Co-developed-by: Bindiya Kurle <bindiyakurle@gmail.com>
Signed-off-by: Bindiya Kurle <bindiyakurle@gmail.com>
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Sat, 15 Feb 2020 00:32:29 +0000 (16:32 -0800)]
net: dsa: bcm_sf2: Also configure Port 5 for 2Gb/sec on 7278
Either port 5 or port 8 can be used on a 7278 device, make sure that
port 5 also gets configured properly for 2Gb/sec in that case.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arjun Roy [Fri, 14 Feb 2020 23:30:50 +0000 (15:30 -0800)]
tcp-zerocopy: Return sk_err (if set) along with tcp receive zerocopy.
This patchset is intended to reduce the number of extra system calls
imposed by TCP receive zerocopy. For ping-pong RPC style workloads,
this patchset has demonstrated a system call reduction of about 30%
when coupled with userspace changes.
For applications using epoll, returning sk_err along with the result
of tcp receive zerocopy could remove the need to call
recvmsg()=-EAGAIN after a spurious wakeup.
Consider a multi-threaded application using epoll. A thread may awaken
with EPOLLIN but another thread may already be reading. The
spuriously-awoken thread does not necessarily know that another thread
'won'; rather, it may be possible that it was woken up due to the
presence of an error if there is no data. A zerocopy read receiving 0
bytes thus would need to be followed up by recvmsg to be sure.
Instead, we return sk_err directly with zerocopy, so the application
can avoid this extra system call.
Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arjun Roy [Fri, 14 Feb 2020 23:30:49 +0000 (15:30 -0800)]
tcp-zerocopy: Return inq along with tcp receive zerocopy.
This patchset is intended to reduce the number of extra system calls
imposed by TCP receive zerocopy. For ping-pong RPC style workloads,
this patchset has demonstrated a system call reduction of about 30%
when coupled with userspace changes.
For applications using edge-triggered epoll, returning inq along with
the result of tcp receive zerocopy could remove the need to call
recvmsg()=-EAGAIN after a successful zerocopy. Generally speaking,
since normally we would need to perform a recvmsg() call for every
successful small RPC read via TCP receive zerocopy, returning inq can
reduce the number of system calls performed by approximately half.
Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 17 Feb 2020 03:01:50 +0000 (19:01 -0800)]
Merge branch 'Enhance-virtio-vsock-connection-semantics'
Sebastien Boeuf says:
====================
Enhance virtio-vsock connection semantics
This series improves the semantics behind the way virtio-vsock server
accepts connections coming from the client. Whenever the server
receives a connection request from the client, if it is bound to the
socket but not yet listening, it will answer with a RST packet. The
point is to ensure each request from the client is quickly processed
so that the client can decide about the strategy of retrying or not.
The series includes along with the improvement patch a new test to
ensure the behavior is consistent across all hypervisors drivers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sebastien Boeuf [Fri, 14 Feb 2020 11:48:02 +0000 (12:48 +0100)]
tools: testing: vsock: Test when server is bound but not listening
Whenever the server side of vsock is binding to the socket, but not
listening yet, we expect the behavior from the client to be identical to
what happens when the server is not even started.
This new test runs the server side so that it binds to the socket
without ever listening to it. The client side will try to connect and
should receive an ECONNRESET error.
This new test provides a way to validate the previously introduced patch
for making sure the server side will always answer with a RST packet in
case the client requested a new connection.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sebastien Boeuf [Fri, 14 Feb 2020 11:48:01 +0000 (12:48 +0100)]
net: virtio_vsock: Enhance connection semantics
Whenever the vsock backend on the host sends a packet through the RX
queue, it expects an answer on the TX queue. Unfortunately, there is one
case where the host side will hang waiting for the answer and might
effectively never recover if no timeout mechanism was implemented.
This issue happens when the guest side starts binding to the socket,
which insert a new bound socket into the list of already bound sockets.
At this time, we expect the guest to also start listening, which will
trigger the sk_state to move from TCP_CLOSE to TCP_LISTEN. The problem
occurs if the host side queued a RX packet and triggered an interrupt
right between the end of the binding process and the beginning of the
listening process. In this specific case, the function processing the
packet virtio_transport_recv_pkt() will find a bound socket, which means
it will hit the switch statement checking for the sk_state, but the
state won't be changed into TCP_LISTEN yet, which leads the code to pick
the default statement. This default statement will only free the buffer,
while it should also respond to the host side, by sending a packet on
its TX queue.
In order to simply fix this unfortunate chain of events, it is important
that in case the default statement is entered, and because at this stage
we know the host side is waiting for an answer, we must send back a
packet containing the operation VIRTIO_VSOCK_OP_RST.
One could say that a proper timeout mechanism on the host side will be
enough to avoid the backend to hang. But the point of this patch is to
ensure the normal use case will be provided with proper responsiveness
when it comes to establishing the connection.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 17 Feb 2020 03:00:22 +0000 (19:00 -0800)]
Merge tag 'mac80211-next-for-net-next-2020-02-14' of git://git./linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
A few big new things:
* 802.11 frame encapsulation offload support
* more HE (802.11ax) support, including some for 6 GHz band
* powersave in hwsim, for better testing
Of course as usual there are various cleanups and small fixes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>