linux-2.6-microblaze.git
5 years agonetfilter: nf_conntrack: ensure that CONNTRACK_LOCKS is power of 2
Li RongQing [Tue, 26 Feb 2019 09:20:52 +0000 (17:20 +0800)]
netfilter: nf_conntrack: ensure that CONNTRACK_LOCKS is power of 2

CONNTRACK_LOCKS is divisor when computer array index, if it is power of
2, compiler will optimize modulo operation as bitwise AND, or else
modulo will lower performance.

Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agonetfilter: nf_tables: check the result of dereferencing base_chain->stats
Li RongQing [Tue, 26 Feb 2019 09:13:56 +0000 (17:13 +0800)]
netfilter: nf_tables: check the result of dereferencing base_chain->stats

Check the result of dereferencing base_chain->stats, instead of result
of this_cpu_ptr with NULL.

base_chain->stats maybe be changed to NULL when a chain is updated and a
new NULL counter can be attached.

And we do not need to check returning of this_cpu_ptr since
base_chain->stats is from percpu allocator if it is non-NULL,
this_cpu_ptr returns a valid value.

And fix two sparse error by replacing rcu_access_pointer and
rcu_dereference with READ_ONCE under rcu_read_lock.

Thanks for Eric's help to finish this patch.

Fixes: 009240940e84c1 ("netfilter: nf_tables: don't assume chain stats are set when jumplabel is set")
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agonetfilter: bridge: Don't sabotage nf_hook calls for an l3mdev slave
David Ahern [Tue, 26 Feb 2019 00:21:14 +0000 (16:21 -0800)]
netfilter: bridge: Don't sabotage nf_hook calls for an l3mdev slave

Followup to a173f066c7cf ("netfilter: bridge: Don't sabotage nf_hook
calls from an l3mdev"). Some packets (e.g., ndisc) do not have the skb
device flipped to the l3mdev (e.g., VRF) device. Update ip_sabotage_in
to not drop packets for slave devices too. Currently, neighbor
solicitation packets for 'dev -> bridge (addr) -> vrf' setups are getting
dropped. This patch enables IPv6 communications for bridges with an
address that are enslaved to a VRF.

Fixes: 73e20b761acf ("net: vrf: Add support for PREROUTING rules on vrf device")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agoipvs: get sctphdr by sctphoff in sctp_csum_check
Xin Long [Mon, 25 Feb 2019 11:27:43 +0000 (19:27 +0800)]
ipvs: get sctphdr by sctphoff in sctp_csum_check

sctp_csum_check() is called by sctp_s/dnat_handler() where it calls
skb_make_writable() to ensure sctphdr to be linearized.

So there's no need to get sctphdr by calling skb_header_pointer()
in sctp_csum_check().

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agonetfilter: convert the proto argument from u8 to u16
Li RongQing [Fri, 22 Feb 2019 13:45:52 +0000 (21:45 +0800)]
netfilter: convert the proto argument from u8 to u16

The proto in struct xt_match and struct xt_target is u16, when
calling xt_check_target/match, their proto argument is u8,
and will cause truncation, it is harmless to ip packet, since
ip proto is u8

if a etable's match/target has proto that is u16, will cause
the check failure.

and convert be16 to short in bridge/netfilter/ebtables.c

Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agonetfilter: nft_tunnel: Add dst_cache support
wenxu [Fri, 22 Feb 2019 09:00:43 +0000 (17:00 +0800)]
netfilter: nft_tunnel: Add dst_cache support

The metadata_dst does not initialize the dst_cache field, this causes
problems to ip_md_tunnel_xmit() since it cannot use this cache, hence,
Triggering a route lookup for every packet.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agonetfilter: conntrack: tcp: only close if RST matches exact sequence
Florian Westphal [Thu, 21 Feb 2019 16:09:31 +0000 (17:09 +0100)]
netfilter: conntrack: tcp: only close if RST matches exact sequence

TCP resets cause instant transition from established to closed state
provided the reset is in-window.  Endpoints that implement RFC 5961
require resets to match the next expected sequence number.
RST segments that are in-window (but that do not match RCV.NXT) are
ignored, and a "challenge ACK" is sent back.

Main problem for conntrack is that its a middlebox, i.e.  whereas an end
host might have ACK'd SEQ (and would thus accept an RST with this
sequence number), conntrack might not have seen this ACK (yet).

Therefore we can't simply flag RSTs with non-exact match as invalid.

This updates RST processing as follows:

1. If the connection is in a state other than ESTABLISHED, nothing is
   changed, RST is subject to normal in-window check.

2. If the RSTs sequence number either matches exactly RCV.NXT,
   connection state moves to CLOSE.

3. The same applies if the RST sequence number aligns with a previous
   packet in the same direction.

In all other cases, the connection remains in ESTABLISHED state.
If the normal-in-window check passes, the timeout will be lowered
to that of CLOSE.

If the peer sends a challenge ack, connection timeout will be reset.

If the challenge ACK triggers another RST (RST was valid after all),
this 2nd RST will match expected sequence and conntrack state changes to
CLOSE.

If no challenge ACK is received, the connection will time out after
CLOSE seconds (10 seconds by default), just like without this patch.

Packetdrill test case:

0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0

0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
0.100 > S. 0:0(0) ack 1 win 64240 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4

// Receive a segment.
0.210 < P. 1:1001(1000) ack 1 win 46
0.210 > . 1:1(0) ack 1001

// Application writes 1000 bytes.
0.250 write(4, ..., 1000) = 1000
0.250 > P. 1:1001(1000) ack 1001

// First reset, old sequence. Conntrack (correctly) considers this
// invalid due to failed window validation (regardless of this patch).
0.260 < R  2:2(0) ack 1001 win 260

// 2nd reset, but too far ahead sequence.  Same: correctly handled
// as invalid.
0.270 < R 99990001:99990001(0) ack 1001 win 260

// in-window, but not exact sequence.
// Current Linux kernels might reply with a challenge ack, and do not
// remove connection.
// Without this patch, conntrack state moves to CLOSE.
// With patch, timeout is lowered like CLOSE, but connection stays
// in ESTABLISHED state.
0.280 < R 1010:1010(0) ack 1001 win 260

// Expect challenge ACK
0.281 > . 1001:1001(0) ack 1001 win 501

// With or without this patch, RST will cause connection
// to move to CLOSE (sequence number matches)
// 0.282 < R 1001:1001(0) ack 1001 win 260

// ACK
0.300 < . 1001:1001(0) ack 1001 win 257

// more data could be exchanged here, connection
// is still established

// Client closes the connection.
0.610 < F. 1001:1001(0) ack 1001 win 260
0.650 > . 1001:1001(0) ack 1002

// Close the connection without reading outstanding data
0.700 close(4) = 0

// so one more reset.  Will be deemed acceptable with patch as well:
// connection is already closing.
0.701 > R. 1001:1001(0) ack 1002 win 501
// End packetdrill test case.

With patch, this generates following conntrack events:
   [NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [UNREPLIED]
[UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80
[UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
[UPDATE] 120 FIN_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
[UPDATE] 60 CLOSE_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
[UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]

Without patch, first RST moves connection to close, whereas socket state
does not change until FIN is received.
   [NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [UNREPLIED]
[UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80
[UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED]
[UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED]

Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agoipvs: change some data types from int to bool
Andrea Claudi [Sat, 16 Feb 2019 15:39:53 +0000 (16:39 +0100)]
ipvs: change some data types from int to bool

Change the data type of the following variables from int to bool
across ipvs code:

  - found
  - loop
  - need_full_dest
  - need_full_svc
  - payload_csum

Also change the following functions to use bool full_entry param
instead of int:

  - ip_vs_genl_parse_dest()
  - ip_vs_genl_parse_service()

This patch does not change any functionality but makes the source
code slightly easier to read.

Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agonetfilter: nft_set_hash: remove nft_hash_key()
Pablo Neira Ayuso [Mon, 25 Feb 2019 13:13:43 +0000 (14:13 +0100)]
netfilter: nft_set_hash: remove nft_hash_key()

hashtable is never used for 2-byte keys, remove nft_hash_key().

Fixes: e240cd0df481 ("netfilter: nf_tables: place all set backends in one single module")
Reported-by: Florian Westphal <fw@strlen.de>
Tested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agonetfilter: nft_set_hash: bogus element self comparison from deactivation path
Pablo Neira Ayuso [Mon, 25 Feb 2019 13:13:42 +0000 (14:13 +0100)]
netfilter: nft_set_hash: bogus element self comparison from deactivation path

Use the element from the loop iteration, not the same element we want to
deactivate otherwise this branch always evaluates true.

Fixes: 6c03ae210ce3 ("netfilter: nft_set_hash: add non-resizable hashtable implementation")
Reported-by: Florian Westphal <fw@strlen.de>
Tested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agonetfilter: nft_set_hash: fix lookups with fixed size hash on big endian
Pablo Neira Ayuso [Mon, 25 Feb 2019 13:13:41 +0000 (14:13 +0100)]
netfilter: nft_set_hash: fix lookups with fixed size hash on big endian

Call jhash_1word() for the 4-bytes key case from the insertion and
deactivation path, otherwise big endian arch set lookups fail.

Fixes: 446a8268b7f5 ("netfilter: nft_set_hash: add lookup variant for fixed size hashtable")
Reported-by: Florian Westphal <fw@strlen.de>
Tested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agonetfilter: remove unneeded switch fall-through
Li RongQing [Fri, 22 Feb 2019 08:58:44 +0000 (16:58 +0800)]
netfilter: remove unneeded switch fall-through

Empty case is fine and does not switch fall-through

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agonetfilter: conntrack: avoid same-timeout update
Florian Westphal [Thu, 21 Feb 2019 14:38:29 +0000 (15:38 +0100)]
netfilter: conntrack: avoid same-timeout update

No need to dirty a cache line if timeout is unchanged.
Also, WARN() is useless here: we crash on 'skb->len' access
if skb is NULL.

Last, ct->timeout is u32, not 'unsigned long' so adapt the
function prototype accordingly.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agonetfilter: nat: remove nf_nat_l3proto.h and nf_nat_core.h
Florian Westphal [Tue, 19 Feb 2019 16:38:27 +0000 (17:38 +0100)]
netfilter: nat: remove nf_nat_l3proto.h and nf_nat_core.h

The l3proto name is gone, its header file is the last trace.
While at it, also remove nf_nat_core.h, its very small and all users
include nf_nat.h too.

before:
   text    data     bss     dec     hex filename
  22948    1612    4136   28696    7018 nf_nat.ko

after removal of l3proto register/unregister functions:
   text    data     bss     dec     hex filename
  22196    1516    4136   27848    6cc8 nf_nat.ko

checkpatch complains about overly long lines, but line breaks
do not make things more readable and the line length gets smaller
here, not larger.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agonetfilter: nat: remove l3proto struct
Florian Westphal [Tue, 19 Feb 2019 16:38:26 +0000 (17:38 +0100)]
netfilter: nat: remove l3proto struct

All l3proto function pointers have been removed.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agonetfilter: nat: remove csum_recalc hook
Florian Westphal [Tue, 19 Feb 2019 16:38:25 +0000 (17:38 +0100)]
netfilter: nat: remove csum_recalc hook

We can now use direct calls.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agonetfilter: nat: remove csum_update hook
Florian Westphal [Tue, 19 Feb 2019 16:38:24 +0000 (17:38 +0100)]
netfilter: nat: remove csum_update hook

We can now use direct calls.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agonetfilter: nat: remove l3 manip_pkt hook
Florian Westphal [Tue, 19 Feb 2019 16:38:23 +0000 (17:38 +0100)]
netfilter: nat: remove l3 manip_pkt hook

We can now use direct calls.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agonetfilter: nat: remove nf_nat_l4proto.h
Florian Westphal [Tue, 19 Feb 2019 16:38:22 +0000 (17:38 +0100)]
netfilter: nat: remove nf_nat_l4proto.h

after ipv4/6 nat tracker merge, there are no external callers, so
make last function static and remove the header.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agonetfilter: nat: merge nf_nat_ipv4,6 into nat core
Florian Westphal [Tue, 19 Feb 2019 16:38:21 +0000 (17:38 +0100)]
netfilter: nat: merge nf_nat_ipv4,6 into nat core

before:
   text    data     bss     dec     hex filename
  16566    1576    4136   22278    5706 nf_nat.ko
   3598     844       0    4442    115a nf_nat_ipv6.ko
   3187     844       0    4031     fbf nf_nat_ipv4.ko

after:
   text    data     bss     dec     hex filename
  22948    1612    4136   28696    7018 nf_nat.ko

... with ipv4/v6 nat now provided directly via nf_nat.ko.

Also changes:
       ret = nf_nat_ipv4_fn(priv, skb, state);
       if (ret != NF_DROP && ret != NF_STOLEN &&
into
if (ret != NF_ACCEPT)
return ret;

everywhere.

The nat hooks never should return anything other than
ACCEPT or DROP (and the latter only in rare error cases).

The original code uses multi-line ANDing including assignment-in-if:
        if (ret != NF_DROP && ret != NF_STOLEN &&
           !(IPCB(skb)->flags & IPSKB_XFRM_TRANSFORMED) &&
            (ct = nf_ct_get(skb, &ctinfo)) != NULL) {

I removed this while moving, breaking those in separate conditionals
and moving the assignments into extra lines.

checkpatch still generates some warnings:
 1. Overly long lines (of moved code).
    Breaking them is even more ugly. so I kept this as-is.
 2. use of extern function declarations in a .c file.
    This is necessary evil, we must call
    nf_nat_l3proto_register() from the nat core now.
    All l3proto related functions are removed later in this series,
    those prototypes are then removed as well.

v2: keep empty nf_nat_ipv6_csum_update stub for CONFIG_IPV6=n case.
v3: remove IS_ENABLED(NF_NAT_IPV4/6) tests, NF_NAT_IPVx toggles
    are removed here.
v4: also get rid of the assignments in conditionals.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agonetfilter: nat: move nlattr parse and xfrm session decode to core
Florian Westphal [Tue, 19 Feb 2019 16:38:20 +0000 (17:38 +0100)]
netfilter: nat: move nlattr parse and xfrm session decode to core

None of these functions calls any external functions, moving them allows
to avoid both the indirection and a need to export these symbols.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agonetfilter: nat: merge ipv4 and ipv6 masquerade functionality
Florian Westphal [Tue, 19 Feb 2019 16:38:19 +0000 (17:38 +0100)]
netfilter: nat: merge ipv4 and ipv6 masquerade functionality

Before:
   text    data     bss     dec     hex filename
  13916    1412    4128   19456    4c00 nf_nat.ko
   4510     968       4    5482    156a nf_nat_ipv4.ko
   5146     944       8    6098    17d2 nf_nat_ipv6.ko

After:
   text    data     bss     dec     hex filename
  16566    1576    4136   22278    5706 nf_nat.ko
   3187     844       0    4031     fbf nf_nat_ipv4.ko
   3598     844       0    4442    115a nf_nat_ipv6.ko

... so no drastic changes in combined size.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agonetfilter: ebtables: remove BUGPRINT messages
Florian Westphal [Mon, 18 Feb 2019 23:37:21 +0000 (00:37 +0100)]
netfilter: ebtables: remove BUGPRINT messages

They are however frequently triggered by syzkaller, so remove them.

ebtables userspace should never trigger any of these, so there is little
value in making them pr_debug (or ratelimited).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agonetfilter: nf_conntrack_amanda: add support for STATE streams
Florian Tham [Mon, 18 Feb 2019 08:55:46 +0000 (09:55 +0100)]
netfilter: nf_conntrack_amanda: add support for STATE streams

The Amanda CONNECT command has been updated to establish an optional
fourth connection [0]. Previously, a CONNECT command would look like:

    CONNECT DATA port0 MESG port1 INDEX port2

nf_conntrack_amanda analyses the CONNECT command string in order to
learn the port numbers of the related DATA, MESG and INDEX streams. As
of amanda v3.4, the CONNECT command can advertise an additional port:

    CONNECT DATA port0 MESG port1 INDEX port2 STATE port3

The new STATE stream is not handled, thus the connection on the STATE
port cannot be established.

The patch adds support for STATE streams to the amanda conntrack helper.

I tested with max_expected = 3, leaving the other patch hunks
unmodified. Amanda reports "connection refused" and aborts. After I set
max_expected to 4, the backup completes successfully.

[0] https://github.com/zmanda/amanda/commit/3b8384fc9f2941e2427f44c3aee29f561ed67894#diff-711e502fc81a65182c0954765b42919eR456

Signed-off-by: Florian Tham <tham@fidion.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agonetfilter: nft_compat: use .release_ops and remove list of extension
Pablo Neira Ayuso [Wed, 13 Feb 2019 12:18:36 +0000 (13:18 +0100)]
netfilter: nft_compat: use .release_ops and remove list of extension

Add .release_ops, that is called in case of error at a later stage in
the expression initialization path, ie. .select_ops() has been already
set up operations and that needs to be undone. This allows us to unwind
.select_ops from the error path, ie. release the dynamic operations for
this extension.

Moreover, allocate one single operation instead of recycling them, this
comes at the cost of consuming a bit more memory per rule, but it
simplifies the infrastructure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agonet: sched: pie: fix 64-bit division
Leslie Monis [Wed, 27 Feb 2019 01:00:06 +0000 (06:30 +0530)]
net: sched: pie: fix 64-bit division

Use div_u64() to resolve build failures on 32-bit platforms.

Fixes: 3f7ae5f3dc52 ("net: sched: pie: add more cases to auto-tune alpha and beta")
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: Use RCU_POINTER_INITIALIZER() to init static variable
Li RongQing [Mon, 25 Feb 2019 02:43:06 +0000 (10:43 +0800)]
net: Use RCU_POINTER_INITIALIZER() to init static variable

This pointer is RCU protected, so proper primitives should be used.

Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'tcp-cleanups'
David S. Miller [Tue, 26 Feb 2019 21:16:03 +0000 (13:16 -0800)]
Merge branch 'tcp-cleanups'

Eric Dumazet says:

====================
tcp: cleanups for linux-5.1

This small patch series cleanups few things, and add a small
timewait optimization for hosts not using md5.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: remove tcp_queue argument from tso_fragment()
Eric Dumazet [Tue, 26 Feb 2019 17:49:13 +0000 (09:49 -0800)]
tcp: remove tcp_queue argument from tso_fragment()

tso_fragment() is only called for packets still in write queue.

Remove the tcp_queue parameter to make this more obvious,
even if the comment clearly states this.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: use tcp_md5_needed for timewait sockets
Eric Dumazet [Tue, 26 Feb 2019 17:49:12 +0000 (09:49 -0800)]
tcp: use tcp_md5_needed for timewait sockets

This might speedup tcp_twsk_destructor() a bit,
avoiding a cache line miss.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: convert tcp_md5_needed to static_branch API
Eric Dumazet [Tue, 26 Feb 2019 17:49:11 +0000 (09:49 -0800)]
tcp: convert tcp_md5_needed to static_branch API

We prefer static_branch_unlikely() over static_key_false() these days.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: get rid of __tcp_add_write_queue_tail()
Eric Dumazet [Tue, 26 Feb 2019 17:49:10 +0000 (09:49 -0800)]
tcp: get rid of __tcp_add_write_queue_tail()

This helper is only used from tcp_add_write_queue_tail(), and does
not make the code more readable.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: get rid of tcp_check_send_head()
Eric Dumazet [Tue, 26 Feb 2019 17:49:09 +0000 (09:49 -0800)]
tcp: get rid of tcp_check_send_head()

This helper is used only once, and its name is no longer relevant.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotc-testing: gitignore, ignore local tdc config file
Vlad Buslov [Tue, 26 Feb 2019 15:37:09 +0000 (17:37 +0200)]
tc-testing: gitignore, ignore local tdc config file

Comment in tdc_config.py recommends putting customizations in
tdc_config_local.py file that wasn't included in gitignore. Add the local
config file to gitignore.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: fix typo in walker_check_empty()
Vlad Buslov [Tue, 26 Feb 2019 15:34:40 +0000 (17:34 +0200)]
net: sched: fix typo in walker_check_empty()

Function walker_check_empty() incorrectly verifies that tp pointer is not
NULL, instead of actual filter pointer. Fix conditional to check the right
pointer. Adjust filter pointer naming accordingly to other cls API
functions.

Fixes: 6676d5e416ee ("net: sched: set dedicated tcf_walker flag when tp is empty")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: pie: fix mistake in reference link
Leslie Monis [Tue, 26 Feb 2019 10:23:31 +0000 (15:53 +0530)]
net: sched: pie: fix mistake in reference link

Fix the incorrect reference link to RFC 8033

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum: remove set but not used variable 'autoneg_status'
YueHaibing [Mon, 25 Feb 2019 02:03:28 +0000 (02:03 +0000)]
mlxsw: spectrum: remove set but not used variable 'autoneg_status'

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/ethernet/mellanox/mlxsw/spectrum.c: In function 'mlxsw_sp_port_get_link_ksettings':
drivers/net/ethernet/mellanox/mlxsw/spectrum.c:3062:5: warning:
 variable 'autoneg_status' set but not used [-Wunused-but-set-variable]

It's not used since commit 475b33cb66c9 ("mlxsw: spectrum: Remove unsupported
eth_proto_lp_advertise field in PTYS")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'vxlan-create-and-changelink-extack-support'
David S. Miller [Tue, 26 Feb 2019 16:54:37 +0000 (08:54 -0800)]
Merge branch 'vxlan-create-and-changelink-extack-support'

Roopa Prabhu says:

====================
vxlan: create and changelink extack support

This series adds extack support to changelink paths.
In the process re-factors flag sets to a separate helper.
Also adds some changelink testcases to rtnetlink.sh

(This series was initially part of another series that
tried to support changelink for more attributes.
But after some feedback from sabrina, i have dropped the
'support changelink for more attributes' part because some
of them cannot be supported today or may require additional
use-case handling code. These can be done separately
as and when we see the need for it.)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotools: selftests: rtnetlink: add testcases for vxlan flag sets
Roopa Prabhu [Tue, 26 Feb 2019 06:03:02 +0000 (22:03 -0800)]
tools: selftests: rtnetlink: add testcases for vxlan flag sets

This patch extends rtnetlink.sh to cover some vxlan flag
netlink attribute sets.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agovxlan: add extack support for create and changelink
Roopa Prabhu [Tue, 26 Feb 2019 06:03:01 +0000 (22:03 -0800)]
vxlan: add extack support for create and changelink

This patch adds extack coverage in vxlan link
create and changelink paths. Introduces a new helper
vxlan_nl2flags to consolidate flag attribute validation.

thanks to Johannes Berg for some tips to construct the
generic vxlan flag extack strings.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'devlink-make-ethtool-compat-reliable'
David S. Miller [Tue, 26 Feb 2019 16:49:06 +0000 (08:49 -0800)]
Merge branch 'devlink-make-ethtool-compat-reliable'

Jakub Kicinski says:

====================
devlink: make ethtool compat reliable

This is a follow up to the series which added device flash
updates via devlink. I went with the approach of adding a
new NDO in the end. It seems to end up looking cleaner.

First patch removes the option to build devlink as a module.
Users can still decide to not build it, but the module option
ends up not being worth the maintenance cost.

Next two patches add a NDO which can be used to ask the driver
to return a devlink instance associated with a given netdev,
instead of iterating over devlink ports. Drivers which implement
this NDO must take into account the potential impact on the
visibility of the devlink instance.

With the new NDO in place we can remove NFP ethtool flash update
code.

Fifth patch makes sure we hold a reference to dev while
callbacks are active.

Last but not least the NULL-check of devlink->ops is moved
to instance allocation time.

Last but not least missing checks for devlink->ops are added.
There is currently no driver registering devlink without ops,
so can just fix this in -next.

v2 (Michal): add netdev_to_devlink() in patch 3.
v3 (Florian):
 - add missing checks for devlink->ops;
 - move locking/holding into devlink_compat_ functions.
v4 (Jiri):
 - hold devlink_mutex around callbacks (patch 2);
 - require non-NULL ops (patch 6).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodevlink: require non-NULL ops for devlink instances
Jakub Kicinski [Tue, 26 Feb 2019 03:34:07 +0000 (19:34 -0800)]
devlink: require non-NULL ops for devlink instances

Commit 76726ccb7f46 ("devlink: add flash update command") and
commit 2d8dc5bbf4e7 ("devlink: Add support for reload")
access devlink ops without NULL-checking. There is, however, no
driver which would pass in NULL ops, so let's just make that
a requirement. Remove the now unnecessary NULL-checking.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodevlink: hold a reference to the netdevice around ethtool compat
Jakub Kicinski [Tue, 26 Feb 2019 03:34:06 +0000 (19:34 -0800)]
devlink: hold a reference to the netdevice around ethtool compat

When ethtool is calling into devlink compat code make sure we have
a reference on the netdevice on which the operation was invoked.

v3: move the hold/lock logic into devlink_compat_* functions (Florian)

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: remove ethtool flashing fallback
Jakub Kicinski [Tue, 26 Feb 2019 03:34:05 +0000 (19:34 -0800)]
nfp: remove ethtool flashing fallback

Now that devlink fallback will be called reliably, we can remove
the ethtool flashing code.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: add .ndo_get_devlink
Jakub Kicinski [Tue, 26 Feb 2019 03:34:04 +0000 (19:34 -0800)]
nfp: add .ndo_get_devlink

Support getting devlink instance from a new NDO.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodevlink: create a special NDO for getting the devlink instance
Jakub Kicinski [Tue, 26 Feb 2019 03:34:03 +0000 (19:34 -0800)]
devlink: create a special NDO for getting the devlink instance

Instead of iterating over all devlink ports add a NDO which
will return the devlink instance from the driver.

v2: add the netdev_to_devlink() helper (Michal)
v3: check that devlink has ops (Florian)
v4: hold devlink_mutex (Jiri)

Suggested-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: devlink: turn devlink into a built-in
Jakub Kicinski [Tue, 26 Feb 2019 03:34:02 +0000 (19:34 -0800)]
net: devlink: turn devlink into a built-in

Being able to build devlink as a module causes growing pains.
First all drivers had to add a meta dependency to make sure
they are not built in when devlink is built as a module.  Now
we are struggling to invoke ethtool compat code reliably.

Make devlink code built-in, users can still not build it at
all but the dynamically loadable module option is removed.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: remove unused struct inet_frag_queue.fragments field
Peter Oskolkov [Tue, 26 Feb 2019 01:43:46 +0000 (17:43 -0800)]
net: remove unused struct inet_frag_queue.fragments field

Now that all users of struct inet_frag_queue have been converted
to use 'rb_fragments', remove the unused 'fragments' field.

Build with `make allyesconfig` succeeded. ip_defrag selftest passed.

Signed-off-by: Peter Oskolkov <posk@google.com>
Acked-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: wan: z85230: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
Yang Wei [Mon, 25 Feb 2019 15:06:24 +0000 (23:06 +0800)]
net: wan: z85230: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles

dev_consume_skb_irq() should be called in z8530_tx_done() when skb
xmit done. It makes drop profiles(dropwatch, perf) more friendly.

Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: wan: cosa: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
Yang Wei [Mon, 25 Feb 2019 15:05:41 +0000 (23:05 +0800)]
net: wan: cosa: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles

dev_consume_skb_irq() should be called in cosa_net_tx_done() when skb
xmit done. It makes drop profiles(dropwatch, perf) more friendly.

Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: wan: sbni: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
Yang Wei [Mon, 25 Feb 2019 15:03:40 +0000 (23:03 +0800)]
net: wan: sbni: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles

dev_consume_skb_irq() should be called in send_complete() when skb
xmit done. It makes drop profiles(dropwatch, perf) more friendly.

Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: wan: ixp4xx_hss: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
Yang Wei [Mon, 25 Feb 2019 15:02:57 +0000 (23:02 +0800)]
net: wan: ixp4xx_hss: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles

dev_consume_skb_irq() should be called in hss_hdlc_txdone_irq() when
skb xmit done. It makes drop profiles(dropwatch, perf) more friendly.

Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: wan: wanxl: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
Yang Wei [Mon, 25 Feb 2019 15:01:50 +0000 (23:01 +0800)]
net: wan: wanxl: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles

dev_consume_skb_irq() should be called in wanxl_tx_intr() when skb
xmit done. It makes drop profiles(dropwatch, perf) more friendly.

Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: lmc: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
Yang Wei [Mon, 25 Feb 2019 14:57:40 +0000 (22:57 +0800)]
net: lmc: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles

dev_consume_skb_irq() should be called in lmc_interrupt() when skb
xmit done. It makes drop profiles(dropwatch, perf) more friendly.

Delete a redundant comment line in lmc_interrupt().

Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'pie-next'
David S. Miller [Mon, 25 Feb 2019 22:21:03 +0000 (14:21 -0800)]
Merge branch 'pie-next'

Leslie Monis says:

====================
net: sched: pie: align PIE implementation with RFC 8033

The current implementation of the PIE queuing discipline is according to the
IETF draft [http://tools.ietf.org/html/draft-pan-aqm-pie-00] and the paper
[PIE: A Lightweight Control Scheme to Address the Bufferbloat Problem].
However, a lot of necessary modifications and enhancements have been proposed
in RFC 8033, which have not yet been incorporated in the source code of Linux.
This patch series helps in achieving the same.

Performance tests carried out using Flent [https://flent.org/]

Changes from v2 to v3:
  - Used div_u64() instead of direct division after explicit type casting as
    recommended by David

Changes from v1 to v2:
  - Excluded the patch setting PIE dynamically active/inactive as the test
    results were unsatisfactory
  - Fixed a scaling issue when adding more auto-tuning cases which caused
    local variables to underflow
  - Changed the long if/else chain to a loop as suggested by Stephen
  - Changed the position of the accu_prob variable in the pie_vars
    structure as recommended by Stephen
====================

Acked-by: Dave Taht <dave.taht@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: pie: update references
Mohit P. Tahiliani [Mon, 25 Feb 2019 19:10:01 +0000 (00:40 +0530)]
net: sched: pie: update references

RFC 8033 replaces the IETF draft for PIE

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: pie: add derandomization mechanism
Mohit P. Tahiliani [Mon, 25 Feb 2019 19:10:00 +0000 (00:40 +0530)]
net: sched: pie: add derandomization mechanism

Random dropping of packets to achieve latency control may
introduce outlier situations where packets are dropped too
close to each other or too far from each other. This can
cause the real drop percentage to temporarily deviate from
the intended drop probability. In certain scenarios, such
as a small number of simultaneous TCP flows, these
deviations can cause significant deviations in link
utilization and queuing latency.

RFC 8033 suggests using a derandomization mechanism to avoid
these deviations.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: pie: add more cases to auto-tune alpha and beta
Mohit P. Tahiliani [Mon, 25 Feb 2019 19:09:59 +0000 (00:39 +0530)]
net: sched: pie: add more cases to auto-tune alpha and beta

The current implementation scales the local alpha and beta
variables in the calculate_probability function by the same
amount for all values of drop probability below 1%.

RFC 8033 suggests using additional cases for auto-tuning
alpha and beta when the drop probability is less than 1%.

In order to add more auto-tuning cases, MAX_PROB must be
scaled by u64 instead of u32 to prevent underflow when
scaling the local alpha and beta variables in the
calculate_probability function.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: pie: change initial value of pie_vars->burst_time
Mohit P. Tahiliani [Mon, 25 Feb 2019 19:09:58 +0000 (00:39 +0530)]
net: sched: pie: change initial value of pie_vars->burst_time

RFC 8033 suggests an initial value of 150 milliseconds for
the maximum time allowed for a burst of packets.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: pie: change default value of pie_params->tupdate
Mohit P. Tahiliani [Mon, 25 Feb 2019 19:09:57 +0000 (00:39 +0530)]
net: sched: pie: change default value of pie_params->tupdate

RFC 8033 suggests a default value of 15 milliseconds for the
update interval.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: pie: change default value of pie_params->target
Mohit P. Tahiliani [Mon, 25 Feb 2019 19:09:56 +0000 (00:39 +0530)]
net: sched: pie: change default value of pie_params->target

RFC 8033 suggests a default value of 15 milliseconds for the
target queue delay.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: pie: change value of QUEUE_THRESHOLD
Mohit P. Tahiliani [Mon, 25 Feb 2019 19:09:55 +0000 (00:39 +0530)]
net: sched: pie: change value of QUEUE_THRESHOLD

RFC 8033 recommends a value of 16384 bytes for the queue
threshold.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum: acl: Use struct_size() in kzalloc()
Gustavo A. R. Silva [Mon, 25 Feb 2019 19:01:32 +0000 (13:01 -0600)]
mlxsw: spectrum: acl: Use struct_size() in kzalloc()

One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
    int stuff;
    struct boo entry[];
};

size = sizeof(struct foo) + count * sizeof(struct boo);
instance = kzalloc(size, GFP_KERNEL)

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL)

Notice that, in this case, variable alloc_size is not necessary, hence
it is removed.

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'aquantia-hwmon'
David S. Miller [Mon, 25 Feb 2019 22:16:22 +0000 (14:16 -0800)]
Merge branch 'aquantia-hwmon'

Heiner Kallweit says:

====================
net: phy: aquantia: add hwmon support

This series adds HWMON support for the temperature sensor and the
related alarms on the 107/108/109 chips.

v2:
- remove struct aqr_priv
- rename header file to aquantia.h
v3:
- add conditional compiling of aquantia_hwmon.c
- improve converting sensor register values to/from long
- add helper aqr_hwmon_test_bit
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: aquantia: add hwmon support
Heiner Kallweit [Mon, 25 Feb 2019 18:56:38 +0000 (19:56 +0100)]
net: phy: aquantia: add hwmon support

This adds HWMON support for the temperature sensor and the related
alarms on the 107/108/109 chips. This patch is based on work from
Nikita and Andrew. I added:
- support for changing alarm thresholds via sysfs
- move HWMON code to a separate source file to improve maintainability
- smaller changes like using IS_REACHABLE instead of ifdef
  (avoids problems if PHY driver is built in and HWMON is a module)

v2:
- remove struct aqr_priv
- rename header file to aquantia.h
v3:
- add conditional compiling of aquantia_hwmon.c
- improve converting sensor register values to/from long
- add helper aqr_hwmon_test_bit

Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: aquantia: rename aquantia.c to aquantia_main.c
Heiner Kallweit [Mon, 25 Feb 2019 18:53:04 +0000 (19:53 +0100)]
net: phy: aquantia: rename aquantia.c to aquantia_main.c

Rename aquantia.c to aquantia_main.c to be prepared for adding new
functionality to separate source code files.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Mon, 25 Feb 2019 22:14:24 +0000 (14:14 -0800)]
Merge branch '100GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2019-02-22

This series contains updates to the ice driver only.

Bruce adds the __always_unused attribute to a parameter to avoid
compiler warnings when using -Wunused-parameter.  Fixed unnecessary
type-casting and the use of sizeof().  Fix the allocation of structs
that have become memory hogs, so allocate them in heaps and fix all the
associated references.  Fixed the "possible" numeric overflow issues
that were caught with static analysis.

Maciej fixes the maximum MTU calculation by taking into account double
VLAN tagging amd ensure that the operations are done in the correct
order.

Victor fixes the supported node calculation, where we were not taking
into account if there is space to add the new VSI or intermediate node
above that layer, then it is not required to continue the calculation.
Added a check for a leaf node presence for a given VSI, which is needed
before removing a VSI.

Jake fixes an issue where the VSI list is shared, so simply removing a
VSI from the list will cause issues for the other users who reference
the list.  Since we also free the memory, this could lead to
segmentation faults.

Brett fixes an issue where driver unload could cause a system reboot
when intel_iommu=on parameter is set.  The issue is that we are not
clearing the CAUSE_ENA bit for the appropriate control queues register
when freeing the miscellaneous interrupt vector.

Mitch is so kind, he prevented spamming the VF with link messages when
the link status really has not changed.  Updates the driver to use the
absolute vector ID and not the per-PF vector ID for the VF MSIx vector
allocation.

Lukasz fixes the ethtool pause parameter for the ice driver, which was
originally based off the link status but is now based off the PHY
configuration.  This is to resolve an issue where pause parameters could
be set while link was down.

Jesse updates the string that reports statistics so the string does not
get modified at runtime and cause reports of string truncation.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: don't release block->lock when dumping chains
Vlad Buslov [Mon, 25 Feb 2019 15:45:44 +0000 (17:45 +0200)]
net: sched: don't release block->lock when dumping chains

Function tc_dump_chain() obtains and releases block->lock on each iteration
of its inner loop that dumps all chains on block. Outputting chain template
info is fast operation so locking/unlocking mutex multiple times is an
overhead when lock is highly contested. Modify tc_dump_chain() to only
obtain block->lock once and dump all chains without releasing it.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: set dedicated tcf_walker flag when tp is empty
Vlad Buslov [Mon, 25 Feb 2019 15:38:31 +0000 (17:38 +0200)]
net: sched: set dedicated tcf_walker flag when tp is empty

Using tcf_walker->stop flag to determine when tcf_walker->fn() was called
at least once is unreliable. Some classifiers set 'stop' flag on error
before calling walker callback, other classifiers used to call it with NULL
filter pointer when empty. In order to prevent further regressions, extend
tcf_walker structure with dedicated 'nonempty' flag. Set this flag in
tcf_walker->fn() implementation that is used to check if classifier has
filters configured.

Fixes: 8b64678e0af8 ("net: sched: refactor tp insert/delete for concurrent execution")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: mv88e6xxx: Fix phylink_validate for Topaz family
Marek BehĂșn [Mon, 25 Feb 2019 11:39:55 +0000 (12:39 +0100)]
net: dsa: mv88e6xxx: Fix phylink_validate for Topaz family

The Topaz family should have different phylink_validate method from the
Peridot, since on Topaz the port supporting 2500BaseX mode is port 5,
not 9 and 10.

Signed-off-by: Marek BehĂșn <marek.behun@nic.cz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: mv88e6xxx: Default CMODE to 1000BaseX only on 6390X
Marek BehĂșn [Mon, 25 Feb 2019 11:39:54 +0000 (12:39 +0100)]
net: dsa: mv88e6xxx: Default CMODE to 1000BaseX only on 6390X

Commit 787799a9d555 sets the SERDES interfaces of 6390 and 6390X to
1000BaseX, but this is only needed on 6390X, since there are SERDES
interfaces which can be used on lower ports on 6390.

This commit fixes this by returning to previous behaviour on 6390.
(Previous behaviour means that CMODE is not set at all if requested mode
is NA).

This is needed on Turris MOX, where the 88e6190 is connected to CPU in
2500BaseX mode.

Fixes: 787799a9d555 ("net: dsa: mv88e6xxx: Default ports 9/10 6390X CMODE to 1000BaseX")
Signed-off-by: Marek BehĂșn <marek.behun@nic.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: clean up SOCK_DEBUG()
Yafang Shao [Mon, 25 Feb 2019 10:33:48 +0000 (18:33 +0800)]
tcp: clean up SOCK_DEBUG()

Per discussion with Daniel[1] and Eric[2], these SOCK_DEBUG() calles in
TCP are not needed now.
We'd better clean up it.

[1] https://patchwork.ozlabs.org/patch/1035573/
[2] https://patchwork.ozlabs.org/patch/1040533/

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: remove unused parameter of tcp_sacktag_bsearch()
Taehee Yoo [Mon, 25 Feb 2019 09:42:33 +0000 (18:42 +0900)]
tcp: remove unused parameter of tcp_sacktag_bsearch()

parameter state in the tcp_sacktag_bsearch() is not used.
So, it can be removed.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoice: fix overlong string, update stats output
Jesse Brandeburg [Fri, 8 Feb 2019 20:50:43 +0000 (12:50 -0800)]
ice: fix overlong string, update stats output

A test started warning on a string truncation. This led to an unfortunate
realization that we are likely not accounting for the stats length
correctly before this patch, so fix the issue by putting "port." in front
of all the PF stats, instead of magically prepending it at runtime.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Fix for FC get rx/tx pause params
Lukasz Czapnik [Fri, 8 Feb 2019 20:50:42 +0000 (12:50 -0800)]
ice: Fix for FC get rx/tx pause params

Ethtool reported pause params based on the currently negotiated
link settings instead of current PHY config. User was not able
to turn off pause params because ethtool was incorrectly reporting
parameters as off when link was down even though PHY was configured
to support pause frames. Now pause params are taken from PHY config
instead of link status.

Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: use absolute vector ID for VFs
Mitch Williams [Fri, 8 Feb 2019 20:50:41 +0000 (12:50 -0800)]
ice: use absolute vector ID for VFs

When the PF driver sets up the VF MSI-X vector allocation, it needs to
use the hardware absolute vector ID, not the per-PF vector ID. Without
this change we see (apparent) TX hangs when using VFs on multiple PFs.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: check for a leaf node presence
Victor Raj [Fri, 8 Feb 2019 20:50:40 +0000 (12:50 -0800)]
ice: check for a leaf node presence

Check for a leaf node presence for a given VSI. This check is required
before removing a VSI since VSIs can't be removed with enabled queues
(with leaf nodes) from the FW scheduler tree unless its a reset.

Signed-off-by: Victor Raj <victor.raj@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: flush Tx pipe on disable queue timeout
Victor Raj [Fri, 8 Feb 2019 20:50:39 +0000 (12:50 -0800)]
ice: flush Tx pipe on disable queue timeout

Set the flush Tx pipe flag instead of getting an EAGAIN error when FW
times out in processing the disable Tx queue command.

Signed-off-by: Victor Raj <victor.raj@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: clear VF ARQLEN register on reset
Mitch Williams [Fri, 8 Feb 2019 20:50:38 +0000 (12:50 -0800)]
ice: clear VF ARQLEN register on reset

On older devices like X710 and X722, the VF's ARQLEN register is cleared
on reset, so the VF driver uses that register to detect an unannounced
reset. Unfortunately, on devices controlled by ice, this register is NOT
cleared on reset. This causes the VF to miss resets, and even on
properly-announced resets, the VF driver complains that it didn't see
the reset.

To fix this, we'll do it in software. When we handle a VF reset (whether
triggered by software or VFLR), clear this register after the HW reset
is complete.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: don't spam VFs with link messages
Mitch Williams [Fri, 8 Feb 2019 20:50:37 +0000 (12:50 -0800)]
ice: don't spam VFs with link messages

Don't send a link message to the VFs unless link actually changes state.
This avoids a small timing hole in some VF drivers that can cause an
apparent TX hang if they receive a link status message at the wrong time.

Although we have fixed the timing hole in the current VF driver, there
are still lots of drivers in the field that have this timing hole. Let's
not fall into it if we can avoid it.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: only use the VF for ICE_VSI_VF in ice_vsi_release
Brett Creeley [Fri, 8 Feb 2019 20:50:36 +0000 (12:50 -0800)]
ice: only use the VF for ICE_VSI_VF in ice_vsi_release

In ice_vsi_release we are always assigning a value to the local VF
variable. Change this to only be assigned if the VSI is a VF VSI.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: fix numeric overflow warning
Bruce Allan [Fri, 8 Feb 2019 20:50:35 +0000 (12:50 -0800)]
ice: fix numeric overflow warning

When compiling and analyzing the driver on newer kernels, a static
analyzer warns about the following "numeric overflow" issues:

  "The result of expression: 'budget-1' generates 4-byte type while casting
   to a bigger size of 8-byte".

  "The result of expression: '*words-words_read' generates 4-byte type
   while casting to a bigger size of 8-byte".

Fix them both.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: fix issue where host reboots on unload when iommu=on
Brett Creeley [Fri, 8 Feb 2019 20:50:34 +0000 (12:50 -0800)]
ice: fix issue where host reboots on unload when iommu=on

Currently if the kernel has the intel_iommu=on parameter set, on some
platforms removing the driver causes a system reboot. In initialization
we associate the control queue interrupts with the pf->hw_oicr_idx and
enable the interrupts by setting the CAUSE_ENA bit. The problem comes
on teardown because we are not clearing the CAUSE_ENA bit for the
control queues, but the vector at pf->hw_oicr_idx (miscellaneous
interrupt vector) gets disabled.

Fix this by clearing the CAUSE_ENA bit in the appropriate control queue
registers on when freeing the miscellaneous interrupt vector. Also,
move the call to ice_free_irq_msix_misc() to after ice_deinit_sw() in
ice_remove() because ice_deinit_sw() makes an AQ call, but
ice_free_irq_msix_misc() disables the miscellaneous vector and it's
associated interrupts.

Also, create two small helper functions to enable and disable the
control queue interrupts respectively.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: fix ice_remove_rule_internal vsi_list handling
Jacob Keller [Fri, 8 Feb 2019 20:50:33 +0000 (12:50 -0800)]
ice: fix ice_remove_rule_internal vsi_list handling

When adding multiple VLANs to the same VSI, the ice_add_vlan code will
share the VSI list, so as not to create multiple unnecessary VSI lists.

Consider the following flow

  ice_add_vlan(hw, <VSI 0 VID 7, VSI 0 VID 8, VSI 0 VID 9>)

Where we add three VLAN filters for VIDs 7, 8, and 9, all for VSI 0.

The ice_add_vlan will create a single vsi_list and share it among all
the filters.

Later, if we try to remove a VLAN,

  ice_remove_vlan(hw, <VSI 0 VID 7>)

Then the removal code will update the vsi_list and remove VSI 0 from it.
But, since the vsi_list is shared, this breaks the list for the other
users who reference it. We actually even free the VSI list memory, and
may result in segmentation faults.

This is due to the way that VLAN rule share VSI lists with reference
counts, and is caused because we call ice_rem_update_vsi_list even when
the ref_cnt is greater than one.

To fix this, handle the case where ref_cnt is greater than one
separately. In this case, we need to remove the associated rule without
modifying the vsi_list, since it is currently being referenced by
another rule. Instead, we just need to decrement the VSI list ref_cnt.

The case for handling sharing of VSI lists with multiple VSIs is not
currently supported by this code. No such rules will be created today,
and this code will require changes if/when such code is added.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: fix stack hogs from struct ice_vsi_ctx structures
Bruce Allan [Fri, 8 Feb 2019 20:50:32 +0000 (12:50 -0800)]
ice: fix stack hogs from struct ice_vsi_ctx structures

struct ice_vsi_ctx has gotten large enough that function local declarations
of it on the stack are causing stack hogs.  Fix that by allocating the
structs on heap.  Cleanup some formatting issues in the code around these
changes and fix incorrect data type uses of returned functions in a couple
places.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: sizeof(<type>) should be avoided
Bruce Allan [Fri, 8 Feb 2019 20:50:31 +0000 (12:50 -0800)]
ice: sizeof(<type>) should be avoided

With sizeof(), it is preferable to use the variable of type <type> instead
of sizeof(<type>).

There are multiple places where a temporary variable is used to hold a
'size' value which is then used for a subsequent alloc/memset. Get rid
of the temporary variable by calculating size as part of the alloc/memset
statement.

Also remove unnecessary type-cast.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Fix added in VSI supported nodes calc
Victor Raj [Fri, 8 Feb 2019 20:50:30 +0000 (12:50 -0800)]
ice: Fix added in VSI supported nodes calc

VSI supported nodes are calculated in order to add the VSI parent or
intermediate nodes to the scheduler tree. If one of the node in below
layers (from VSI layer) has space to add the new VSI or intermediate node
above that layer then it's not required to continue the calculation further
for below layers.

Signed-off-by: Victor Raj <victor.raj@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Fix the calculation of ICE_MAX_MTU
Maciej Fijalkowski [Fri, 8 Feb 2019 20:50:29 +0000 (12:50 -0800)]
ice: Fix the calculation of ICE_MAX_MTU

Currently ICE_MAX_MTU subtracts only ETH_HLEN from max frame size and
adds ETH_FCS_LEN and VLAN_HLEN, which is not what was intended.
The ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN expression should be surrounded
with parentheses.

Wrap mentioned expression and take into account VLAN double tagging.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Mark extack argument as __always_unused
Bruce Allan [Fri, 8 Feb 2019 20:50:28 +0000 (12:50 -0800)]
ice: Mark extack argument as __always_unused

Commit 87b0984ebfab ("net: Add extack argument to ndo_fdb_add()") in
net-next added an extended parameter to the .ndo_fdb_add op and changed
ice_fdb_add() accordingly. Update the function header and add the
__always_unused attribute to the new parameter to avoid -Wunused-parameter
warnings.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoswitchdev: Complete removal of switchdev_port_attr_get()
Florian Fainelli [Mon, 25 Feb 2019 02:39:02 +0000 (18:39 -0800)]
switchdev: Complete removal of switchdev_port_attr_get()

We have no more in tree users of switchdev_port_attr_get() after
d0e698d57a94 ("Merge branch 'net-Get-rid-of-switchdev_port_attr_get'")
so completely remove the function signature and body.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodsa: Remove phydev parameter from disable_port call
Andrew Lunn [Sun, 24 Feb 2019 19:44:43 +0000 (20:44 +0100)]
dsa: Remove phydev parameter from disable_port call

No current DSA driver makes use of the phydev parameter passed to the
disable_port call. Remove it.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: fix reading fixed phy status
Heiner Kallweit [Sun, 24 Feb 2019 17:01:18 +0000 (18:01 +0100)]
net: phy: fix reading fixed phy status

With the switch to phy_resolve_aneg_linkmode() we don't read from the
chip any longer what is advertised but use phydev->advertising directly.
For a fixed phy however this bitmap is empty so far, what results in
no common mode being found. This breaks DSA. Fix this by advertising
everything that is supported. For a normal phy this done by phy_probe().

Fixes: 5502b218e001 ("net: phy: use phy_resolve_aneg_linkmode in genphy_read_status")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: improve auto-neg emulation in swphy
Heiner Kallweit [Sun, 24 Feb 2019 16:41:47 +0000 (17:41 +0100)]
net: phy: improve auto-neg emulation in swphy

Auto-neg emulation currently doesn't set bit BMCR_ANENABLE in BMCR,
add this. Users will ignore speed and duplex settings in BMCR because
we're emulating auto-neg, therefore we can remove related code.
See also following discussion [0].

[0] https://marc.info/?t=155041784900002&r=1&w=2

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetoot...
David S. Miller [Mon, 25 Feb 2019 06:27:19 +0000 (22:27 -0800)]
Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth-next

Johan Hedberg says:

====================
Here's the main bluetooth-next pull request for the 5.1 kernel.

 - Fixes & improvements to mediatek, hci_qca, btrtl, and btmrvl HCI drivers
 - Fixes to parsing invalid L2CAP config option sizes
 - Locking fix to bt_accept_enqueue()
 - Add support for new Marvel sd8977 chipset
 - Various other smaller fixes & cleanups
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: fix double-free in bpf_lwt_xmit_reroute
Peter Oskolkov [Sun, 24 Feb 2019 02:25:01 +0000 (18:25 -0800)]
net: fix double-free in bpf_lwt_xmit_reroute

dst_output() frees skb when it fails (see, for example,
ip_finish_output2), so it must not be freed in this case.

Fixes: 3bd0b15281af ("bpf: add handling of BPF_LWT_REROUTE to lwt_bpf.c")
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoip_tunnel: Add ip tunnel tun_info type dst_cache in ip_tunnel_xmit
wenxu [Sun, 24 Feb 2019 00:24:45 +0000 (08:24 +0800)]
ip_tunnel: Add ip tunnel tun_info type dst_cache in ip_tunnel_xmit

ip l add dev tun type gretap key 1000

Non-tunnel-dst ip tunnel device can send packet through lwtunnel
This patch provide the tun_inf dst cache support for this mode.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'dsa-mv88e6xxx-lockdep'
David S. Miller [Mon, 25 Feb 2019 06:21:23 +0000 (22:21 -0800)]
Merge branch 'dsa-mv88e6xxx-lockdep'

Andrew Lunn says:

====================
mv88e6xxx: Avoid false positive Lockdep splats

When acquiring the GPIO interrupt line for the switch, it is possible
to trigger lockdep splats. These are false positives, the mutex is in
a different IRQ descriptor. But fix it anyway, since it could mask
real locking issues.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: mv88e6xxx: Release lock while requesting IRQ
Andrew Lunn [Sat, 23 Feb 2019 16:43:57 +0000 (17:43 +0100)]
net: dsa: mv88e6xxx: Release lock while requesting IRQ

There is no need to hold the register lock while requesting the GPIO
interrupt. By not holding it we can also avoid a false positive
lockdep splat.

Reported-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: mv88e6xxx: Add lockdep classes to fix false positive splat
Andrew Lunn [Sat, 23 Feb 2019 16:43:56 +0000 (17:43 +0100)]
net: dsa: mv88e6xxx: Add lockdep classes to fix false positive splat

The following false positive lockdep splat has been observed.

======================================================
WARNING: possible circular locking dependency detected
4.20.0+ #302 Not tainted
------------------------------------------------------
systemd-udevd/160 is trying to acquire lock:
edea6080 (&chip->reg_lock){+.+.}, at: __setup_irq+0x640/0x704

but task is already holding lock:
edff0340 (&desc->request_mutex){+.+.}, at: __setup_irq+0xa0/0x704

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&desc->request_mutex){+.+.}:
       mutex_lock_nested+0x1c/0x24
       __setup_irq+0xa0/0x704
       request_threaded_irq+0xd0/0x150
       mv88e6xxx_probe+0x41c/0x694 [mv88e6xxx]
       mdio_probe+0x2c/0x54
       really_probe+0x200/0x2c4
       driver_probe_device+0x5c/0x174
       __driver_attach+0xd8/0xdc
       bus_for_each_dev+0x58/0x7c
       bus_add_driver+0xe4/0x1f0
       driver_register+0x7c/0x110
       mdio_driver_register+0x24/0x58
       do_one_initcall+0x74/0x2e8
       do_init_module+0x60/0x1d0
       load_module+0x1968/0x1ff4
       sys_finit_module+0x8c/0x98
       ret_fast_syscall+0x0/0x28
       0xbedf2ae8

-> #0 (&chip->reg_lock){+.+.}:
       __mutex_lock+0x50/0x8b8
       mutex_lock_nested+0x1c/0x24
       __setup_irq+0x640/0x704
       request_threaded_irq+0xd0/0x150
       mv88e6xxx_g2_irq_setup+0xcc/0x1b4 [mv88e6xxx]
       mv88e6xxx_probe+0x44c/0x694 [mv88e6xxx]
       mdio_probe+0x2c/0x54
       really_probe+0x200/0x2c4
       driver_probe_device+0x5c/0x174
       __driver_attach+0xd8/0xdc
       bus_for_each_dev+0x58/0x7c
       bus_add_driver+0xe4/0x1f0
       driver_register+0x7c/0x110
       mdio_driver_register+0x24/0x58
       do_one_initcall+0x74/0x2e8
       do_init_module+0x60/0x1d0
       load_module+0x1968/0x1ff4
       sys_finit_module+0x8c/0x98
       ret_fast_syscall+0x0/0x28
       0xbedf2ae8

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&desc->request_mutex);
                               lock(&chip->reg_lock);
                               lock(&desc->request_mutex);
  lock(&chip->reg_lock);

&desc->request_mutex refer to two different mutex. #1 is the GPIO for
the chip interrupt. #2 is the chained interrupt between global 1 and
global 2.

Add lockdep classes to the GPIO interrupt to avoid this.

Reported-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoip_tunnel: Add dst_cache support in lwtunnel_state of ip tunnel
wenxu [Sat, 23 Feb 2019 13:32:54 +0000 (21:32 +0800)]
ip_tunnel: Add dst_cache support in lwtunnel_state of ip tunnel

The lwtunnel_state is not init the dst_cache Which make the
ip_md_tunnel_xmit can't use the dst_cache. It will lookup
route table every packets.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>