Jakub Kicinski [Tue, 7 Jan 2025 02:28:20 +0000 (18:28 -0800)]
netlink: specs: rt_link: decode ip6tnl, vti and vti6 link attrs
Some of our tests load vti and ip6tnl so not being able to decode
the link attrs gets in the way of using Python YNL for testing.
Decode link attributes for ip6tnl, vti and vti6.
ip6tnl uses IFLA_IPTUN_FLAGS as u32, while ipv4 and sit expect
a u16 attribute, so we have a (first?) subset type override...
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250107022820.2087101-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 7 Jan 2025 02:28:19 +0000 (18:28 -0800)]
tools: ynl: print some information about attribute we can't parse
When parsing throws an exception one often has to figure out which
attribute couldn't be parsed from first principles. For families
with large message parsing trees like rtnetlink guessing the
attribute can be hard.
Print a bit of information as the exception travels out, e.g.:
# when dumping rt links
Error decoding 'flags' from 'linkinfo-ip6tnl-attrs'
Error decoding 'data' from 'linkinfo-attrs'
Error decoding 'linkinfo' from 'link-attrs'
Traceback (most recent call last):
File "/home/kicinski/linux/./tools/net/ynl/cli.py", line 119, in <module>
main()
File "/home/kicinski/linux/./tools/net/ynl/cli.py", line 100, in main
reply = ynl.dump(args.dump, attrs)
File "/home/kicinski/linux/tools/net/ynl/lib/ynl.py", line 1064, in dump
return self._op(method, vals, dump=True)
File "/home/kicinski/linux/tools/net/ynl/lib/ynl.py", line 1058, in _op
return self._ops(ops)[0]
File "/home/kicinski/linux/tools/net/ynl/lib/ynl.py", line 1045, in _ops
rsp_msg = self._decode(decoded.raw_attrs, op.attr_set.name)
File "/home/kicinski/linux/tools/net/ynl/lib/ynl.py", line 738, in _decode
subdict = self._decode(NlAttrs(attr.raw), attr_spec['nested-attributes'], search_attrs)
File "/home/kicinski/linux/tools/net/ynl/lib/ynl.py", line 763, in _decode
decoded = self._decode_sub_msg(attr, attr_spec, search_attrs)
File "/home/kicinski/linux/tools/net/ynl/lib/ynl.py", line 714, in _decode_sub_msg
subdict = self._decode(NlAttrs(attr.raw, offset), msg_format.attr_set)
File "/home/kicinski/linux/tools/net/ynl/lib/ynl.py", line 749, in _decode
decoded = attr.as_scalar(attr_spec['type'], attr_spec.byte_order)
File "/home/kicinski/linux/tools/net/ynl/lib/ynl.py", line 147, in as_scalar
return format.unpack(self.raw)[0]
struct.error: unpack requires a buffer of 2 bytes
The Traceback is what we would previously see, the "Error..."
messages are new. We print a message per level (in the stack
order). Printing single combined message gets tricky quickly
given sub-messages etc.
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250107022820.2087101-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 7 Jan 2025 02:28:18 +0000 (18:28 -0800)]
tools: ynl: correctly handle overrides of fields in subset
We stated in documentation [1] and previous discussions [2]
that the need for overriding fields in members of subsets
is anticipated. Implement it.
Since each attr is now a new object we need to make sure
that the modifications are propagated. Specifically C codegen
wants to annotate which attrs are used in requests and replies
to generate the right validation artifacts.
[1] https://docs.kernel.org/next/userspace-api/netlink/specs.html#subset-of
[2] https://lore.kernel.org/netdev/
20231004171350.
1f59cd1d@kernel.org/
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250107022820.2087101-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Mon, 6 Jan 2025 17:46:20 +0000 (09:46 -0800)]
if_vlan: fix kdoc warnings
While merging net to net-next I noticed that the kdoc above
__vlan_get_protocol_offset() has the wrong function name.
Fix that and all the other kdoc warnings in this file.
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250106174620.1855269-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 8 Jan 2025 02:06:19 +0000 (18:06 -0800)]
Merge branch 'net-dsa-cleanup-eee-part-2'
Russell King says:
====================
net: dsa: cleanup EEE (part 2)
This is part 2 of the DSA EEE cleanups, removing what has become dead
code as a result of the EEE management phylib now does.
Patch 1 removes the useless setting of tx_lpi parameters in the
ksz driver.
Patch 2 does the same for mt753x.
Patch 3 removes the DSA core code that calls the get_mac_eee() operation.
This needs to be done before removing the implementations because doing
otherwise would cause dsa_user_get_eee() to return -EOPNOTSUPP.
Patches 4..8 remove the trivial get_mac_eee() implementations from DSA
drivers.
Patch 9 finally removes the get_mac_eee() method from struct
dsa_switch_ops.
====================
Link: https://patch.msgid.link/Z3vDwwsHSxH5D6Pm@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Mon, 6 Jan 2025 11:59:24 +0000 (11:59 +0000)]
net: dsa: remove get_mac_eee() method
The get_mac_eee() is no longer called by the core DSA code, nor are
there any implementations of this method. Remove it.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tUllU-007UzL-KV@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Mon, 6 Jan 2025 11:59:19 +0000 (11:59 +0000)]
net: dsa: qca: remove qca8k_get_mac_eee()
qca8k_get_mac_eee() is no longer called by the core DSA code. Remove it.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tUllP-007UzF-Gk@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Mon, 6 Jan 2025 11:59:14 +0000 (11:59 +0000)]
net: dsa: mv88e6xxx: remove mv88e6xxx_get_mac_eee()
mv88e6xxx_get_mac_eee() is no longer called by the core DSA code.
Remove it.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tUllK-007Uz9-D7@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Mon, 6 Jan 2025 11:59:09 +0000 (11:59 +0000)]
net: dsa: mt753x: remove ksz_get_mac_eee()
mt753x_get_mac_eee() is no longer called by the core DSA code. Remove
it.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Chester A. Unal <chester.a.unal@arinc9.com>
Link: https://patch.msgid.link/E1tUllF-007Uz3-95@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Mon, 6 Jan 2025 11:59:04 +0000 (11:59 +0000)]
net: dsa: ksz: remove ksz_get_mac_eee()
ksz_get_mac_eee() is no longer called by the core DSA code. Remove it.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tUllA-007Uyx-4o@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Mon, 6 Jan 2025 11:58:59 +0000 (11:58 +0000)]
net: dsa: b53/bcm_sf2: remove b53_get_mac_eee()
b53_get_mac_eee() is no longer called by the core DSA code. Remove it.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tUll5-007Uyr-1U@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Mon, 6 Jan 2025 11:58:53 +0000 (11:58 +0000)]
net: dsa: no longer call ds->ops->get_mac_eee()
All implementations of get_mac_eee() now just return zero without doing
anything useful. Remove the call to this method in preparation to
removing the method from each DSA driver.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tUlkz-007Uyl-UA@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Mon, 6 Jan 2025 11:58:48 +0000 (11:58 +0000)]
net: dsa: mt753x: remove setting of tx_lpi parameters
dsa_user_get_eee() calls the DSA switch get_mac_eee() method followed
by phylink_ethtool_get_eee(), which goes on to call
phy_ethtool_get_eee(). This overwrites all members of the passed
ethtool_keee, which means anything written by the DSA switch
get_mac_eee() method will be discarded.
Remove setting any members in mt753x_get_mac_eee().
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Chester A. Unal <chester.a.unal@arinc9.com>
Link: https://patch.msgid.link/E1tUlku-007Uyc-RP@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Mon, 6 Jan 2025 11:58:43 +0000 (11:58 +0000)]
net: dsa: ksz: remove setting of tx_lpi parameters
dsa_user_get_eee() calls the DSA switch get_mac_eee() method followed
by phylink_ethtool_get_eee(), which goes on to call
phy_ethtool_get_eee(). This overwrites all members of the passed
ethtool_keee, which means anything written by the DSA switch
get_mac_eee() method will be discarded.
Remove setting any members in ksz_get_mac_eee().
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tUlkp-007UyW-OR@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 8 Jan 2025 01:49:24 +0000 (17:49 -0800)]
Merge branch 'net-hold-per-netns-rtnl-during-netdev-notifier-registration'
Kuniyuki Iwashima says:
====================
net: Hold per-netns RTNL during netdev notifier registration.
This series adds per-netns RTNL for registration of the global
and per-netns netdev notifiers.
v1: https://lore.kernel.org/netdev/
20250104063735.36945-1-kuniyu@amazon.com/
====================
Link: https://patch.msgid.link/20250106070751.63146-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kuniyuki Iwashima [Mon, 6 Jan 2025 07:07:51 +0000 (16:07 +0900)]
net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net().
(un)?register_netdevice_notifier_dev_net() hold RTNL before triggering
the notifier for all netdev in the netns.
Let's convert the RTNL to rtnl_net_lock().
Note that move_netdevice_notifiers_dev_net() is assumed to be (but not
yet) protected by per-netns RTNL of both src and dst netns; we need to
convert wireless and hyperv drivers that call dev_change_net_namespace().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250106070751.63146-4-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kuniyuki Iwashima [Mon, 6 Jan 2025 07:07:50 +0000 (16:07 +0900)]
net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_net().
(un)?register_netdevice_notifier_net() hold RTNL before triggering the
notifier for all netdev in the netns.
Let's convert the RTNL to rtnl_net_lock().
Note that the per-netns netdev notifier is protected by per-netns RTNL.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250106070751.63146-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kuniyuki Iwashima [Mon, 6 Jan 2025 07:07:49 +0000 (16:07 +0900)]
net: Hold __rtnl_net_lock() in (un)?register_netdevice_notifier().
(un)?register_netdevice_notifier() hold pernet_ops_rwsem and RTNL,
iterate all netns, and trigger the notifier for all netdev.
Let's hold __rtnl_net_lock() before triggering the notifier.
Note that we will need protection for netdev_chain when RTNL is
removed. (e.g. blocking_notifier conversion [0] with a lockdep
annotation [1])
Link: https://lore.kernel.org/netdev/20250104063735.36945-2-kuniyu@amazon.com/
Link: https://lore.kernel.org/netdev/20250105075957.67334-1-kuniyu@amazon.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250106070751.63146-2-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dr. David Alan Gilbert [Sun, 5 Jan 2025 12:28:47 +0000 (12:28 +0000)]
ixgbevf: Remove unused ixgbevf_hv_mbx_ops
The const struct ixgbevf_hv_mbx_ops was added in 2016 as part of
commit
c6d45171d706 ("ixgbevf: Support Windows hosts (Hyper-V)")
but has remained unused.
The functions it references are still referenced elsewhere.
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Link: https://patch.msgid.link/20250105122847.27341-1-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Sun, 5 Jan 2025 09:09:24 +0000 (09:09 +0000)]
net: watchdog: rename __dev_watchdog_up() and dev_watchdog_down()
In commit
d7811e623dd4 ("[NET]: Drop tx lock in dev_watchdog_up")
dev_watchdog_up() became a simple wrapper for __netdev_watchdog_up()
Herbert also said : "In 2.6.19 we can eliminate the unnecessary
__dev_watchdog_up and replace it with dev_watchdog_up."
This patch consolidates things to have only two functions, with
a common prefix.
- netdev_watchdog_up(), exported for the sake of one freescale driver.
This replaces __netdev_watchdog_up() and dev_watchdog_up().
- netdev_watchdog_down(), static to net/sched/sch_generic.c
This replaces dev_watchdog_down().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://patch.msgid.link/20250105090924.1661822-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 7 Jan 2025 23:39:09 +0000 (15:39 -0800)]
Merge tag 'for-netdev' of https://git./linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2025-01-07
We've added 7 non-merge commits during the last 32 day(s) which contain
a total of 11 files changed, 190 insertions(+), 103 deletions(-).
The main changes are:
1) Migrate the test_xdp_meta.sh BPF selftest into test_progs
framework, from Bastien Curutchet.
2) Add ability to configure head/tailroom for netkit devices,
from Daniel Borkmann.
3) Fixes and improvements to the xdp_hw_metadata selftest,
from Song Yoong Siang.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
selftests/bpf: Extend netkit tests to validate set {head,tail}room
netkit: Add add netkit {head,tail}room to rt_link.yaml
netkit: Allow for configuring needed_{head,tail}room
selftests/bpf: Migrate test_xdp_meta.sh into xdp_context_test_run.c
selftests/bpf: test_xdp_meta: Rename BPF sections
selftests/bpf: Enable Tx hwtstamp in xdp_hw_metadata
selftests/bpf: Actuate tx_metadata_len in xdp_hw_metadata
====================
Link: https://patch.msgid.link/20250107130908.143644-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ted Chen [Sat, 4 Jan 2025 08:38:46 +0000 (16:38 +0800)]
bridge: Make br_is_nd_neigh_msg() accept pointer to "const struct sk_buff"
The skb_buff struct in br_is_nd_neigh_msg() is never modified. Mark it as
const.
Signed-off-by: Ted Chen <znscnchen@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250104083846.71612-1-znscnchen@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Tue, 7 Jan 2025 12:45:58 +0000 (13:45 +0100)]
Merge branch 'dev-hold-per-netns-rtnl-in-register-netdev'
Kuniyuki Iwashima says:
====================
dev: Hold per-netns RTNL in register_netdev().
Patch 1 adds rtnl_net_lock_killable() and Patch 2 uses it in
register_netdev() and converts it and unregister_netdev() to
per-netns RTNL.
With this and the netdev notifier series [0], ASSERT_RTNL_NET()
for NETDEV_REGISTER [1] wasn't fired on a simplest QEMU setup
like e1000 + x86_64_defconfig + CONFIG_DEBUG_NET_SMALL_RTNL.
[0]: https://lore.kernel.org/netdev/
20250104063735.36945-1-kuniyu@amazon.com/
[1]:
---8<---
diff --git a/net/core/rtnl_net_debug.c b/net/core/rtnl_net_debug.c
index
f406045cbd0e..
c0c30929002e 100644
--- a/net/core/rtnl_net_debug.c
+++ b/net/core/rtnl_net_debug.c
@@ -21,7 +21,6 @@ static int rtnl_net_debug_event(struct notifier_block *nb,
case NETDEV_DOWN:
case NETDEV_REBOOT:
case NETDEV_CHANGE:
- case NETDEV_REGISTER:
case NETDEV_UNREGISTER:
case NETDEV_CHANGEMTU:
case NETDEV_CHANGEADDR:
@@ -60,19 +59,10 @@ static int rtnl_net_debug_event(struct notifier_block *nb,
ASSERT_RTNL();
break;
- /* Once an event fully supports RTNL_NET, move it here
- * and remove "if (0)" below.
- *
- * case NETDEV_XXX:
- * ASSERT_RTNL_NET(net);
- * break;
- */
- }
-
- /* Just to avoid unused-variable error for dev and net. */
- if (0)
+ case NETDEV_REGISTER:
ASSERT_RTNL_NET(net);
+ break;
+ }
return NOTIFY_DONE;
}
---8<---
====================
Link: https://patch.msgid.link/20250104082149.48493-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Kuniyuki Iwashima [Sat, 4 Jan 2025 08:21:49 +0000 (17:21 +0900)]
dev: Hold per-netns RTNL in (un)?register_netdev().
Let's hold per-netns RTNL of dev_net(dev) in register_netdev()
and unregister_netdev().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Kuniyuki Iwashima [Sat, 4 Jan 2025 08:21:48 +0000 (17:21 +0900)]
rtnetlink: Add rtnl_net_lock_killable().
rtnl_lock_killable() is used only in register_netdev()
and will be converted to per-netns RTNL.
Let's unexport it and add the corresponding helper.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Mohsin Bashir [Sat, 4 Jan 2025 01:53:16 +0000 (17:53 -0800)]
eth: fbnic: update fbnic_poll return value
In cases where the work done is less than the budget, `fbnic_poll` is
returning 0. This affects the tracing of `napi_poll`. Following is a
snippet of before and after result from `napi_poll` tracepoint. Instead,
returning the work done improves the manual tracing.
Before:
@[10]: 1
...
@[64]: 208175
@[0]:
2128008
After:
@[56]: 86
@[48]: 222
...
@[5]:
1885756
@[6]:
1933841
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20250104015316.3192946-1-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Tue, 7 Jan 2025 11:32:53 +0000 (12:32 +0100)]
Merge branch 'net-airoha-add-qdisc-offload-support'
Lorenzo Bianconi says:
====================
net: airoha: Add Qdisc offload support
Introduce support for ETS and HTB Qdisc offload available on the Airoha
EN7581 ethernet controller.
====================
Link: https://patch.msgid.link/20250103-airoha-en7581-qdisc-offload-v1-0-608a23fa65d5@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lorenzo Bianconi [Fri, 3 Jan 2025 12:17:05 +0000 (13:17 +0100)]
net: airoha: Add sched HTB offload support
Introduce support for HTB Qdisc offload available in the Airoha EN7581
ethernet controller. EN7581 can offload only one level of HTB leafs.
Each HTB leaf represents a QoS channel supported by EN7581 SoC.
The typical use-case is creating a HTB leaf for QoS channel to rate
limit the egress traffic and attach an ETS Qdisc to each HTB leaf in
order to enforce traffic prioritization.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lorenzo Bianconi [Fri, 3 Jan 2025 12:17:04 +0000 (13:17 +0100)]
net: airoha: Add sched ETS offload support
Introduce support for ETS Qdisc offload available on the Airoha EN7581
ethernet controller. In order to be effective, ETS Qdisc must configured
as leaf of a HTB Qdisc (HTB Qdisc offload will be added in the following
patch). ETS Qdisc available on EN7581 ethernet controller supports at
most 8 concurrent bands (QoS queues). We can enable an ETS Qdisc for
each available QoS channel.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lorenzo Bianconi [Fri, 3 Jan 2025 12:17:03 +0000 (13:17 +0100)]
net: airoha: Introduce ndo_select_queue callback
Airoha EN7581 SoC supports 32 Tx DMA rings used to feed packets to QoS
channels. Each channels supports 8 QoS queues where the user can apply
QoS scheduling policies. In a similar way, the user can configure hw
rate shaping for each QoS channel.
Introduce ndo_select_queue callback in order to select the tx queue
based on QoS channel and QoS queue. In particular, for dsa device select
QoS channel according to the dsa user port index, rely on port id
otherwise. Select QoS queue based on the skb priority.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lorenzo Bianconi [Fri, 3 Jan 2025 12:17:02 +0000 (13:17 +0100)]
net: airoha: Enable Tx drop capability for each Tx DMA ring
This is a preliminary patch in order to enable hw Qdisc offloading.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Willem de Bruijn [Fri, 3 Jan 2025 11:31:14 +0000 (06:31 -0500)]
selftests/net: packetdrill: report benign debug flakes as xfail
A few recently added packetdrill tests that are known time sensitive
(e.g., because testing timestamping) occasionally fail in debug mode:
https://netdev.bots.linux.dev/contest.html?executor=vmksft-packetdrill-dbg
These failures are well understood. Correctness of the tests is
verified in non-debug mode. Continue running in debug mode also, to
keep coverage with debug instrumentation.
But, only in debug mode, mark these tests with well understood
timing issues as XFAIL (known failing) rather than FAIL when failing.
Introduce an allow list xfail_list with known cases.
Expand the ktap infrastructure with XFAIL support.
Fixes:
eab35989cc37 ("selftests/net: packetdrill: import tcp/fast_recovery, tcp/nagle, tcp/timestamping")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/
20241218100013.
0c698629@kernel.org/
Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250103113142.129251-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Furong Xu [Fri, 3 Jan 2025 09:37:33 +0000 (17:37 +0800)]
net: stmmac: Set dma_sync_size to zero for discarded frames
If a frame is going to be discarded by driver, this frame is never touched
by driver and the cache lines never become dirty obviously,
page_pool_recycle_direct() wastes CPU cycles on unnecessary calling of
page_pool_dma_sync_for_device() to sync entire frame.
page_pool_put_page() with sync_size setting to 0 is the proper method.
Signed-off-by: Furong Xu <0x1207@gmail.com>
Link: https://patch.msgid.link/20250103093733.3872939-1-0x1207@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Nihar Chaithanya [Sat, 4 Jan 2025 17:19:15 +0000 (22:49 +0530)]
octeontx2-pf: mcs: Remove dead code and semi-colon from rsrc_name()
Every case in the switch-block ends with return statement, and the
default: branch handles the cases where rsrc_type is invalid and
returns "Unknown", this makes the return statement at the end of the
function unreachable and redundant.
The semi-colon is not required after the switch-block's curly braces.
Remove the semi-colon after the switch-block's curly braces and the
return statement at the end of the function.
This issue was reported by Coverity Scan.
Signed-off-by: Nihar Chaithanya <niharchaithanya@gmail.com>
Link: https://patch.msgid.link/20250104171905.13293-1-niharchaithanya@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Krzysztof Kozlowski [Sat, 4 Jan 2025 14:20:43 +0000 (15:20 +0100)]
nfc: st21nfca: Drop unneeded null check in st21nfca_tx_work()
Variable 'info' is obtained via container_of() of struct work_struct, so
it cannot be NULL. Simplify the code and solve Smatch warning:
drivers/nfc/st21nfca/dep.c:119 st21nfca_tx_work() warn: can 'info' even be NULL?
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250104142043.116045-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 7 Jan 2025 00:33:44 +0000 (16:33 -0800)]
Merge branch 'mlx5-hardware-steering-part-2'
Tariq Toukan says:
====================
mlx5 Hardware Steering part 2
This series contain HWS code cleanups, enhancements, bug fixes, and
additions. Note that some of these patches are fixing bugs in existing
code, but we submit them without 'Fixes' tag to avoid the unnecessary
burden for stable releases, as HWS still couldn't be enabled.
Patches 1-5:
HWS, various code cleanups and enhancements
Patches 6-14:
HWS, various bug fixes and additions
Patch 15:
HWS, setting timeout on polling
====================
Link: https://patch.msgid.link/20250102181415.1477316-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yevgeny Kliteynik [Thu, 2 Jan 2025 18:14:14 +0000 (20:14 +0200)]
net/mlx5: HWS, set timeout on polling for completion
Consolidate BWC polling for completion into one function
and set a time limit on the loop that polls for completion.
This can happen only if there is some issue with FW/PCI/HW,
such as FW being stuck, PCI issue, etc.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Itamar Gozlan <igozlan@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250102181415.1477316-16-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vlad Dogaru [Thu, 2 Jan 2025 18:14:13 +0000 (20:14 +0200)]
net/mlx5: HWS, support flow sampler destination
Since sampler isn't currently supported via HWS, use a FW island
that forwards any packets to the supplied sampler.
Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250102181415.1477316-15-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yevgeny Kliteynik [Thu, 2 Jan 2025 18:14:12 +0000 (20:14 +0200)]
net/mlx5: HWS, use the right size when writing arg data
When writing arg data, wrong size was used - fixing this.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Itamar Gozlan <igozlan@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250102181415.1477316-14-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vlad Dogaru [Thu, 2 Jan 2025 18:14:11 +0000 (20:14 +0200)]
net/mlx5: HWS, handle returned error value in pool alloc
Handle all negative return values as errors, not just -1.
The code previously treated -ENOMEM (and potentially other negative
values) as valid segment numbers, leading to incorrect behavior.
This fix ensures that any negative return value is treated as an error.
Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250102181415.1477316-13-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yevgeny Kliteynik [Thu, 2 Jan 2025 18:14:10 +0000 (20:14 +0200)]
net/mlx5: HWS, fix definer's HWS_SET32 macro for negative offset
When bit offset for HWS_SET32 macro is negative,
UBSAN complains about the shift-out-of-bounds:
UBSAN: shift-out-of-bounds in
drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.c:177:2
shift exponent -8 is negative
Fixes:
74a778b4a63f ("net/mlx5: HWS, added definers handling")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Erez Shitrit <erezsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250102181415.1477316-12-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yevgeny Kliteynik [Thu, 2 Jan 2025 18:14:09 +0000 (20:14 +0200)]
net/mlx5: HWS, separate SQ that HWS uses from the usual traffic SQs
Mark the HWS SQ as 'non_wire' so that 'Flow Update' flow
won't mix with network traffic.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Itamar Gozlan <igozlan@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250102181415.1477316-11-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yevgeny Kliteynik [Thu, 2 Jan 2025 18:14:08 +0000 (20:14 +0200)]
net/mlx5: HWS, num_of_rules counter on matcher should be atomic
Rule counter in matcher's struct is used in two places:
1. As heuristics to decide when the number of rules have crossed a
certain percentage threshold and the matcher should be resized.
We don't mind here if the number will be off by 1-2 due to concurrency.
2. When destroying matcher, the counter value is checked and the
user is warned if it is not 0. Here we lock all the queues, so the
counter will be correct.
We don't need to always have *exact* number, but we do need this
number to not be corrupted, which is what is happening when the
counter isn't atomic, due to update by different threads.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Erez Shitrit <erezsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250102181415.1477316-10-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yevgeny Kliteynik [Thu, 2 Jan 2025 18:14:07 +0000 (20:14 +0200)]
net/mlx5: HWS, reduce memory consumption of a matcher struct
Instead of having a large array of action templates allocated with
kmalloc, have smaller array and allocate it with kvmalloc.
The size of the array represents the max number of AT attach
operations for the same matcher. This number is not expected
to be very high. In any case, when the limit is reached, the
next attempt to attach new AT will result in creation of a new
matcher and moving all the rules to this matcher.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Erez Shitrit <erezsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250102181415.1477316-9-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yevgeny Kliteynik [Thu, 2 Jan 2025 18:14:06 +0000 (20:14 +0200)]
net/mlx5: HWS, remove wrong deletion of the miss table list
Remove wrong cleanup of the old miss table list and
simplify the error flow in the function.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Itamar Gozlan <igozlan@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250102181415.1477316-8-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yevgeny Kliteynik [Thu, 2 Jan 2025 18:14:05 +0000 (20:14 +0200)]
net/mlx5: HWS, change error flow on matcher disconnect
Currently, when firmware failure occurs during matcher disconnect flow,
the error flow of the function reconnects the matcher back and returns
an error, which continues running the calling function and eventually
frees the matcher that is being disconnected.
This leads to a case where we have a freed matcher on the matchers list,
which in turn leads to use-after-free and eventual crash.
This patch fixes that by not trying to reconnect the matcher back when
some FW command fails during disconnect.
Note that we're dealing here with FW error. We can't overcome this
problem. This might lead to bad steering state (e.g. wrong connection
between matchers), and will also lead to resource leakage, as it is
the case with any other error handling during resource destruction.
However, the goal here is to allow the driver to continue and not crash
the machine with use-after-free error.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Itamar Gozlan <igozlan@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250102181415.1477316-7-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yevgeny Kliteynik [Thu, 2 Jan 2025 18:14:04 +0000 (20:14 +0200)]
net/mlx5: HWS, add error message on failure to move rules
Add error message for failure to move rules from
old matcher to new one during rehash.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Itamar Gozlan <igozlan@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250102181415.1477316-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yevgeny Kliteynik [Thu, 2 Jan 2025 18:14:03 +0000 (20:14 +0200)]
net/mlx5: HWS, simplify allocations as we support only FDB
In pools, STCs and actions: no need to allocate array for various
table types, as HWS is used to manage only FDB flow tables.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Erez Shitrit <erezsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250102181415.1477316-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yevgeny Kliteynik [Thu, 2 Jan 2025 18:14:02 +0000 (20:14 +0200)]
net/mlx5: HWS, denote how refcounts are protected
Some HWS structs have refcounts that are just u32.
Comment how they are protected and add '__must_hold()'
annotation where applicable.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Erez Shitrit <erezsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250102181415.1477316-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yevgeny Kliteynik [Thu, 2 Jan 2025 18:14:01 +0000 (20:14 +0200)]
net/mlx5: HWS, remove implementation of unused FW commands
Remove functions that manage alias objects - they are not used.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Itamar Gozlan <igozlan@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250102181415.1477316-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yevgeny Kliteynik [Thu, 2 Jan 2025 18:14:00 +0000 (20:14 +0200)]
net/mlx5: HWS, remove the use of duplicated structs
Remove definition in HWS of structs that are already defined
in mlx5_ifc.h, and fix the usage of these structs.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Itamar Gozlan <igozlan@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250102181415.1477316-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 7 Jan 2025 00:26:16 +0000 (16:26 -0800)]
Merge branch 'net-pcs-add-supported_interfaces-bitmap-for-pcs'
Russell King says:
====================
net: pcs: add supported_interfaces bitmap for PCS
This series adds supported_interfaces for PCS, which gives MAC code
a way to determine the interface modes that the PCS supports without
having to implement functions such as xpcs_get_interfaces(), or
workarounds such as in
https://lore.kernel.org/
20241213090526.71516-3-maxime.chevallier@bootlin.com
Patch 1 adds the new bitmask to struct phylink_pcs, and code within
phylink to validate that the PCS returned by the MAC driver supports
the interface mode - but only if this bitmask is non-empty.
Patch 2 through 4 fills in the interface modes for XPCS, Mediatek LynxI
and Lynx PCS.
Patch 5 adds support to stmmac to make use of this bitmask when filling
in phylink_config.supported_interfaces, eliminating the call to
xpcs_get_interfaces.
As xpcs_get_interfaces() is now unused outside of pcs-xpcs.c, patch 6
makes this function static and removes it from the header file.
====================
Link: https://patch.msgid.link/Z3fG9oTY9F9fCYHv@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Fri, 3 Jan 2025 11:16:56 +0000 (11:16 +0000)]
net: pcs: xpcs: make xpcs_get_interfaces() static
xpcs_get_interfaces() should no longer be used outside of the XPCS
code, so make it static.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1tTffk-007Roi-JM@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Fri, 3 Jan 2025 11:16:51 +0000 (11:16 +0000)]
net: stmmac: use PCS supported_interfaces
Use the PCS' supported_interfaces member to build the MAC level
supported_interfaces bitmap.
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tTfff-007Roc-Ff@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Fri, 3 Jan 2025 11:16:46 +0000 (11:16 +0000)]
net: pcs: lynx: fill in PCS supported_interfaces
Fill in the new PCS supported_interfaces member with the interfaces
that Lynx supports.
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tTffa-007RoV-Bo@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Fri, 3 Jan 2025 11:16:41 +0000 (11:16 +0000)]
net: pcs: mtk-lynxi: fill in PCS supported_interfaces
Fill in the new PCS supported_interfaces member with the interfaces
that the Mediatek LynxI supports.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1tTffV-007RoP-8D@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Fri, 3 Jan 2025 11:16:36 +0000 (11:16 +0000)]
net: pcs: xpcs: fill in PCS supported_interfaces
Fill in the new PCS supported_interfaces member with the interfaces
that XPCS supports.
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tTffQ-007RoJ-4u@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Fri, 3 Jan 2025 11:16:31 +0000 (11:16 +0000)]
net: phylink: add support for PCS supported_interfaces bitmap
Add support for the PCS to specify which interfaces it supports, which
can be used by MAC drivers to build the main supported_interfaces
bitmap. Phylink also validates that the PCS returned by the MAC driver
supports the interface that the MAC was asked for.
An empty supported_interfaces bitmap from the PCS indicates that it
does not provide this information, and we handle that appropriately.
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1tTffL-007RoD-1Y@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Fri, 3 Jan 2025 10:11:48 +0000 (10:11 +0000)]
net: hsr: remove one synchronize_rcu() from hsr_del_port()
Use kfree_rcu() instead of synchronize_rcu()+kfree().
This might allow syzbot to fuzz HSR a bit faster...
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250103101148.3594545-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Fri, 3 Jan 2025 21:05:14 +0000 (21:05 +0000)]
ax25: rcu protect dev->ax25_ptr
syzbot found a lockdep issue [1].
We should remove ax25 RTNL dependency in ax25_setsockopt()
This should also fix a variety of possible UAF in ax25.
[1]
WARNING: possible circular locking dependency detected
6.13.0-rc3-syzkaller-00762-g9268abe611b0 #0 Not tainted
------------------------------------------------------
syz.5.1818/12806 is trying to acquire lock:
ffffffff8fcb3988 (rtnl_mutex){+.+.}-{4:4}, at: ax25_setsockopt+0xa55/0xe90 net/ax25/af_ax25.c:680
but task is already holding lock:
ffff8880617ac258 (sk_lock-AF_AX25){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1618 [inline]
ffff8880617ac258 (sk_lock-AF_AX25){+.+.}-{0:0}, at: ax25_setsockopt+0x209/0xe90 net/ax25/af_ax25.c:574
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (sk_lock-AF_AX25){+.+.}-{0:0}:
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5849
lock_sock_nested+0x48/0x100 net/core/sock.c:3642
lock_sock include/net/sock.h:1618 [inline]
ax25_kill_by_device net/ax25/af_ax25.c:101 [inline]
ax25_device_event+0x24d/0x580 net/ax25/af_ax25.c:146
notifier_call_chain+0x1a5/0x3f0 kernel/notifier.c:85
__dev_notify_flags+0x207/0x400
dev_change_flags+0xf0/0x1a0 net/core/dev.c:9026
dev_ifsioc+0x7c8/0xe70 net/core/dev_ioctl.c:563
dev_ioctl+0x719/0x1340 net/core/dev_ioctl.c:820
sock_do_ioctl+0x240/0x460 net/socket.c:1234
sock_ioctl+0x626/0x8e0 net/socket.c:1339
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:906 [inline]
__se_sys_ioctl+0xf5/0x170 fs/ioctl.c:892
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
-> #0 (rtnl_mutex){+.+.}-{4:4}:
check_prev_add kernel/locking/lockdep.c:3161 [inline]
check_prevs_add kernel/locking/lockdep.c:3280 [inline]
validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3904
__lock_acquire+0x1397/0x2100 kernel/locking/lockdep.c:5226
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5849
__mutex_lock_common kernel/locking/mutex.c:585 [inline]
__mutex_lock+0x1ac/0xee0 kernel/locking/mutex.c:735
ax25_setsockopt+0xa55/0xe90 net/ax25/af_ax25.c:680
do_sock_setsockopt+0x3af/0x720 net/socket.c:2324
__sys_setsockopt net/socket.c:2349 [inline]
__do_sys_setsockopt net/socket.c:2355 [inline]
__se_sys_setsockopt net/socket.c:2352 [inline]
__x64_sys_setsockopt+0x1ee/0x280 net/socket.c:2352
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(sk_lock-AF_AX25);
lock(rtnl_mutex);
lock(sk_lock-AF_AX25);
lock(rtnl_mutex);
*** DEADLOCK ***
1 lock held by syz.5.1818/12806:
#0:
ffff8880617ac258 (sk_lock-AF_AX25){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1618 [inline]
#0:
ffff8880617ac258 (sk_lock-AF_AX25){+.+.}-{0:0}, at: ax25_setsockopt+0x209/0xe90 net/ax25/af_ax25.c:574
stack backtrace:
CPU: 1 UID: 0 PID: 12806 Comm: syz.5.1818 Not tainted
6.13.0-rc3-syzkaller-00762-g9268abe611b0 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
print_circular_bug+0x13a/0x1b0 kernel/locking/lockdep.c:2074
check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2206
check_prev_add kernel/locking/lockdep.c:3161 [inline]
check_prevs_add kernel/locking/lockdep.c:3280 [inline]
validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3904
__lock_acquire+0x1397/0x2100 kernel/locking/lockdep.c:5226
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5849
__mutex_lock_common kernel/locking/mutex.c:585 [inline]
__mutex_lock+0x1ac/0xee0 kernel/locking/mutex.c:735
ax25_setsockopt+0xa55/0xe90 net/ax25/af_ax25.c:680
do_sock_setsockopt+0x3af/0x720 net/socket.c:2324
__sys_setsockopt net/socket.c:2349 [inline]
__do_sys_setsockopt net/socket.c:2355 [inline]
__se_sys_setsockopt net/socket.c:2352 [inline]
__x64_sys_setsockopt+0x1ee/0x280 net/socket.c:2352
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f7b62385d29
Fixes:
c433570458e4 ("ax25: fix a use-after-free in ax25_fillin_cb()")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250103210514.87290-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Guillaume Nault [Thu, 2 Jan 2025 16:34:18 +0000 (17:34 +0100)]
sctp: Prepare sctp_v4_get_dst() to dscp_t conversion.
Define inet_sk_dscp() to get a dscp_t value from struct inet_sock, so
that sctp_v4_get_dst() can easily set ->flowi4_tos from a dscp_t
variable. For the SCTP_DSCP_SET_MASK case, we can just use
inet_dsfield_to_dscp() to get a dscp_t value.
Then, when converting ->flowi4_tos from __u8 to dscp_t, we'll just have
to drop the inet_dscp_to_dsfield() conversion function.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/1a645f4a0bc60ad18e7c0916642883ce8a43c013.1735835456.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Mon, 6 Jan 2025 21:32:46 +0000 (13:32 -0800)]
Merge branch 'igc-deadcoding'
Dr. David Alan Gilbert says:
====================
igc deadcoding
This set removes some functions that are entirely unused
and have been since ~2018.
====================
Link: https://patch.msgid.link/20250102174142.200700-1-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dr. David Alan Gilbert [Thu, 2 Jan 2025 17:41:42 +0000 (17:41 +0000)]
igc: Remove unused igc_read/write_pcie_cap_reg
The last uses of igc_read_pcie_cap_reg() and igc_write_pcie_cap_reg()
were removed in 2019 by
commit
16ecd8d9af26 ("igc: Remove the obsolete workaround")
Remove them.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250102174142.200700-4-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dr. David Alan Gilbert [Thu, 2 Jan 2025 17:41:41 +0000 (17:41 +0000)]
igc: Remove unused igc_read/write_pci_cfg wrappers
igc_read_pci_cfg() and igc_write_pci_cfg were added in 2018 as part of
commit
146740f9abc4 ("igc: Add support for PF")
but have remained unused.
Remove them.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250102174142.200700-3-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dr. David Alan Gilbert [Thu, 2 Jan 2025 17:41:40 +0000 (17:41 +0000)]
igc: Remove unused igc_acquire/release_nvm
igc_acquire_nvm() and igc_release_nvm() were added in 2018 as part of
commit
ab4056126813 ("igc: Add NVM support")
but never used.
Remove them.
The igc_1225.c has it's own specific implementations.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250102174142.200700-2-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Mon, 6 Jan 2025 21:31:52 +0000 (13:31 -0800)]
Merge branch 'i40e-deadcoding'
Dr. David Alan Gilbert says:
====================
i40e deadcoding
This is a bunch of deadcoding of functions that
are entirely uncalled in the i40e driver.
Build tested only.
====================
Link: https://patch.msgid.link/20250102173717.200359-1-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dr. David Alan Gilbert [Thu, 2 Jan 2025 17:37:17 +0000 (17:37 +0000)]
i40e: Remove unused i40e_dcb_hw_get_num_tc
The last useof i40e_dcb_hw_get_num_tc() was removed in 2022 by
commit
fe20371578ef ("Revert "i40e: Fix reset bw limit when DCB enabled
with 1 TC"")
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250102173717.200359-10-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dr. David Alan Gilbert [Thu, 2 Jan 2025 17:37:16 +0000 (17:37 +0000)]
i40e: Remove unused i40e_asq_send_command_v2
i40e_asq_send_command_v2() was added in 2022 by
commit
74073848b0d7 ("i40e: Add new versions of send ASQ command
functions")
but hasn't been used.
Remove it.
(The _atomic_v2 version of the function is used, so leave it).
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250102173717.200359-9-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dr. David Alan Gilbert [Thu, 2 Jan 2025 17:37:15 +0000 (17:37 +0000)]
i40e: Remove unused i40e_commit_partition_bw_setting
i40e_commit_partition_bw_setting() was added in 2017 by
commit
4fc8c6763957 ("i40e: genericize the partition bandwidth control")
but hasn't been used.
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250102173717.200359-8-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dr. David Alan Gilbert [Thu, 2 Jan 2025 17:37:14 +0000 (17:37 +0000)]
i40e: Remove unused i40e_del_filter
The last use of i40e_del_filter() was removed in 2016 by
commit
9569a9a4547d ("i40e: when adding or removing MAC filters, correctly
handle VLANs")
Remove it.
Fix up a comment that referenced it.
Note: The __ version of this function is still used.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250102173717.200359-7-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dr. David Alan Gilbert [Thu, 2 Jan 2025 17:37:13 +0000 (17:37 +0000)]
i40e: Remove unused i40e_get_cur_guaranteed_fd_count
The last use of i40e_get_cur_guaranteed_fd_count() was removed in 2015 by
commit
04294e38a451 ("i40e: FD filters flush policy changes")
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250102173717.200359-6-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dr. David Alan Gilbert [Thu, 2 Jan 2025 17:37:12 +0000 (17:37 +0000)]
i40e: Deadcode profile code
i40e_add_pinfo_to_list() was added in 2017 by
commit
1d5c960c5ef5 ("i40e: new AQ commands")
i40e_find_section_in_profile() was added in 2019 by
commit
cdc594e00370 ("i40e: Implement DDP support in i40e driver")
Neither have been used.
Remove them.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250102173717.200359-5-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dr. David Alan Gilbert [Thu, 2 Jan 2025 17:37:11 +0000 (17:37 +0000)]
i40e: Remove unused i40e_(read|write)_phy_register
i40e_read_phy_register() and i40e_write_phy_register() were added in
2016 by
commit
f62ba91458b5 ("i40e: Add functions which apply correct PHY access
method for read and write operation")
but haven't been used.
Remove them.
(There are more specific _clause* variants of these functions
that are still used.)
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250102173717.200359-4-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dr. David Alan Gilbert [Thu, 2 Jan 2025 17:37:10 +0000 (17:37 +0000)]
i40e: Remove unused i40e_blink_phy_link_led
i40e_blink_phy_link_led() was added in 2016 by
commit
fd077cd3399b ("i40e: Add functions to blink led on 10GBaseT PHY")
but hasn't been used.
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250102173717.200359-3-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dr. David Alan Gilbert [Thu, 2 Jan 2025 17:37:09 +0000 (17:37 +0000)]
i40e: Deadcode i40e_aq_*
i40e_aq_add_mirrorrule(), i40e_aq_delete_mirrorrule() and
i40e_aq_set_vsi_vlan_promisc() were added in 2016 by
commit
7bd6875bef70 ("i40e: APIs to Add/remove port mirroring rules")
but haven't been used.
They were the last user of i40e_mirrorrule_op().
i40e_aq_rearrange_nvm() was added in 2018 by
commit
f05798b4ff82 ("i40e: Add AQ command for rearrange NVM structure")
but hasn't been used.
i40e_aq_restore_lldp() was added in 2019 by
commit
c65e78f87f81 ("i40e: Further implementation of LLDP")
but hasn't been used.
Remove them.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250102173717.200359-2-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Borkmann [Fri, 20 Dec 2024 23:46:58 +0000 (00:46 +0100)]
selftests/bpf: Extend netkit tests to validate set {head,tail}room
Extend the netkit selftests to specify and validate the {head,tail}room
on the netdevice:
# ./vmtest.sh -- ./test_progs -t netkit
[...]
./test_progs -t netkit
[ 1.174147] bpf_testmod: loading out-of-tree module taints kernel.
[ 1.174585] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
[ 1.422307] tsc: Refined TSC clocksource calibration: 3407.983 MHz
[ 1.424511] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fc3e5084, max_idle_ns:
440795359833 ns
[ 1.428092] clocksource: Switched to clocksource tsc
#363 tc_netkit_basic:OK
#364 tc_netkit_device:OK
#365 tc_netkit_multi_links:OK
#366 tc_netkit_multi_opts:OK
#367 tc_netkit_neigh_links:OK
#368 tc_netkit_pkt_type:OK
#369 tc_netkit_scrub:OK
Summary: 7/0 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/bpf/20241220234658.490686-3-daniel@iogearbox.net
Daniel Borkmann [Fri, 20 Dec 2024 23:46:57 +0000 (00:46 +0100)]
netkit: Add add netkit {head,tail}room to rt_link.yaml
Add netkit {head,tail}room attribute support to the rt_link.yaml spec file.
Example:
# ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/rt_link.yaml \
--do getlink --json '{"ifname": "nk0"}' --output-json | jq
[...]
"linkinfo": {
"kind": "netkit",
"data": {
"primary": 0,
"policy": "forward",
"mode": "l3",
"scrub": "default",
"headroom": 0,
"tailroom": 0,
"peer-policy": "forward",
"peer-scrub": "default"
}
},
[...]
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/bpf/20241220234658.490686-2-daniel@iogearbox.net
Daniel Borkmann [Fri, 20 Dec 2024 23:46:56 +0000 (00:46 +0100)]
netkit: Allow for configuring needed_{head,tail}room
Allow the user to configure needed_{head,tail}room for both netkit
devices. The idea is similar to
163e529200af ("veth: implement
ndo_set_rx_headroom") with the difference that the two parameters
can be specified upon device creation. By default the current behavior
stays as is which is needed_{head,tail}room is 0.
In case of Cilium, for example, the netkit devices are not enslaved
into a bridge or openvswitch device (rather, BPF-based redirection
is used out of tcx), and as such these parameters are not propagated
into the Pod's netns via peer device.
Given Cilium can run in vxlan/geneve tunneling mode (needed_headroom)
and/or be used in combination with WireGuard (needed_{head,tail}room),
allow the Cilium CNI plugin to specify these two upon netkit device
creation.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/bpf/20241220234658.490686-1-daniel@iogearbox.net
Jakub Kicinski [Sun, 5 Jan 2025 01:00:53 +0000 (17:00 -0800)]
Merge tag 'ieee802154-for-net-next-2025-01-03' of git://git./linux/kernel/git/wpan/wpan-next
Stefan Schmidt says:
====================
pull-request: ieee802154-next 2025-01-03
Leo Stone provided a documatation fix to improve the grammar.
David Gilbert spotted a non-used fucntion we can safely remove.
* tag 'ieee802154-for-net-next-2025-01-03' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan-next:
net: mac802154: Remove unused ieee802154_mlme_tx_one
Documentation: ieee802154: fix grammar
====================
Link: https://patch.msgid.link/20250103154605.440478-1-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Furong Xu [Fri, 20 Dec 2024 08:07:26 +0000 (16:07 +0800)]
net: stmmac: TSO: Simplify the code flow of DMA descriptor allocations
The TCP Segmentation Offload (TSO) engine is an optional function in
DWMAC cores, it is implemented for dwmac4 and dwxgmac2 only, ancient
dwmac100 and dwmac1000 are not supported by hardware. Current driver
code checks priv->dma_cap.tsoen which is read from MAC_HW_Feature1
register to determine if TSO is enabled in hardware configurations,
if (!priv->dma_cap.tsoen) driver never sets NETIF_F_TSO for net_device.
This patch never affects dwmac100/dwmac1000 and their stmmac_desc_ops:
ndesc_ops/enh_desc_ops, since TSO is never supported by them two.
The DMA AXI address width of DWMAC cores can be configured to
32-bit/40-bit/48-bit, then the format of DMA transmit descriptors
get a little different between 32-bit and 40-bit/48-bit.
Current driver code checks priv->dma_cap.addr64 to use certain format
with certain configuration.
This patch converts the format of DMA transmit descriptors on dwmac4
and dwxgmac2 that the DMA AXI address width is configured to 32-bit (as
described by function comments of stmmac_tso_xmit() in current code) to
a more generic format (see updated function comments after this patch)
which is actually already used on 40-bit/48-bit platforms to provide
better compatibility and make code flow cleaner in TSO TX routine.
Another interesting finding, struct stmmac_desc_ops is a common abstract
interface to maintain descriptors, we should avoid the direct assignment
of descriptor members (e.g. desc->des0), stmmac_set_desc_addr() is the
proper method yet. This patch tries to improve this by the way.
Tested and verified on:
DWMAC CORE 5.00a with 32-bit DMA AXI address width
DWMAC CORE 5.10a with 32-bit DMA AXI address width
DWXGMAC CORE 3.20a with 40-bit DMA AXI address width
Signed-off-by: Furong Xu <0x1207@gmail.com>
Link: https://patch.msgid.link/20241220080726.1733837-1-0x1207@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Golle [Thu, 2 Jan 2025 12:41:21 +0000 (12:41 +0000)]
net: pcs: pcs-mtk-lynxi: correctly report in-band status capabilities
Neither does the LynxI PCS support QSGMII, nor is in-band-status supported
in 2500Base-X mode. Fix the pcs_inband_caps() method accordingly.
Fixes:
520d29bdda86 ("net: pcs: pcs-mtk-lynxi: implement pcs_inband_caps() method")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/Z3aJccb1vW14aukg@pidgin.makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 5 Dec 2024 19:48:58 +0000 (11:48 -0800)]
Merge git://git./linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.13-rc6).
No conflicts.
Adjacent changes:
include/linux/if_vlan.h
f91a5b808938 ("af_packet: fix vlan_get_protocol_dgram() vs MSG_PEEK")
3f330db30638 ("net: reformat kdoc return statements")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Fri, 3 Jan 2025 22:36:54 +0000 (14:36 -0800)]
Merge tag 'net-6.13-rc6' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireles and netfilter.
Nothing major here. Over the last two weeks we gathered only around
two-thirds of our normal weekly fix count, but delaying sending these
until -rc7 seemed like a really bad idea.
AFAIK we have no bugs under investigation. One or two reverts for
stuff for which we haven't gotten a proper fix will likely come in the
next PR.
Current release - fix to a fix:
- netfilter: nft_set_hash: unaligned atomic read on struct
nft_set_ext
- eth: gve: trigger RX NAPI instead of TX NAPI in gve_xsk_wakeup
Previous releases - regressions:
- net: reenable NETIF_F_IPV6_CSUM offload for BIG TCP packets
- mptcp:
- fix sleeping rcvmsg sleeping forever after bad recvbuffer adjust
- fix TCP options overflow
- prevent excessive coalescing on receive, fix throughput
- net: fix memory leak in tcp_conn_request() if map insertion fails
- wifi: cw1200: fix potential NULL dereference after conversion to
GPIO descriptors
- phy: micrel: dynamically control external clock of KSZ PHY, fix
suspend behavior
Previous releases - always broken:
- af_packet: fix VLAN handling with MSG_PEEK
- net: restrict SO_REUSEPORT to inet sockets
- netdev-genl: avoid empty messages in NAPI get
- dsa: microchip: fix set_ageing_time function on KSZ9477 and LAN937X
- eth:
- gve: XDP fixes around transmit, queue wakeup etc.
- ti: icssg-prueth: fix firmware load sequence to prevent time
jump which breaks timesync related operations
Misc:
- netlink: specs: mptcp: add missing attr and improve documentation"
* tag 'net-6.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits)
net: ti: icssg-prueth: Fix clearing of IEP_CMP_CFG registers during iep_init
net: ti: icssg-prueth: Fix firmware load sequence.
mptcp: prevent excessive coalescing on receive
mptcp: don't always assume copied data in mptcp_cleanup_rbuf()
mptcp: fix recvbuffer adjust on sleeping rcvmsg
ila: serialize calls to nf_register_net_hooks()
af_packet: fix vlan_get_protocol_dgram() vs MSG_PEEK
af_packet: fix vlan_get_tci() vs MSG_PEEK
net: wwan: iosm: Properly check for valid exec stage in ipc_mmio_init()
net: restrict SO_REUSEPORT to inet sockets
net: reenable NETIF_F_IPV6_CSUM offload for BIG TCP packets
net: sfc: Correct key_len for efx_tc_ct_zone_ht_params
net: wwan: t7xx: Fix FSM command timeout issue
sky2: Add device ID 11ab:4373 for Marvell
88E8075
mptcp: fix TCP options overflow.
net: mv643xx_eth: fix an OF node reference leak
gve: trigger RX NAPI instead of TX NAPI in gve_xsk_wakeup
eth: bcmsysport: fix call balance of priv->clk handling routines
net: llc: reset skb->transport_header
netlink: specs: mptcp: fix missing doc
...
Linus Torvalds [Fri, 3 Jan 2025 22:16:25 +0000 (14:16 -0800)]
Merge tag 'nios2_update_for_v6.14' of git://git./linux/kernel/git/dinguyen/linux
Pull nios2 fixlet from Dinh Nguyen:
- Use str_yes_no() helper function
* tag 'nios2_update_for_v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux:
nios2: Use str_yes_no() helper in show_cpuinfo()
Linus Torvalds [Fri, 3 Jan 2025 19:09:35 +0000 (11:09 -0800)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"A lot of fixes accumulated over the holiday break:
- Static tool fixes, value is already proven to be NULL, possible
integer overflow
- Many bnxt_re fixes:
- Crashes due to a mismatch in the maximum SGE list size
- Don't waste memory for user QPs by creating kernel-only
structures
- Fix compatability issues with older HW in some of the new HW
features recently introduced: RTS->RTS feature, work around 9096
- Do not allow destroy_qp to fail
- Validate QP MTU against device limits
- Add missing validation on madatory QP attributes for RTR->RTS
- Report port_num in query_qp as required by the spec
- Fix creation of QPs of the maximum queue size, and in the
variable mode
- Allow all QPs to be used on newer HW by limiting a work around
only to HW it affects
- Use the correct MSN table size for variable mode QPs
- Add missing locking in create_qp() accessing the qp_tbl
- Form WQE buffers correctly when some of the buffers are 0 hop
- Don't crash on QP destroy if the userspace doesn't setup the
dip_ctx
- Add the missing QP flush handler call on the DWQE path to avoid
hanging on error recovery
- Consistently use ENXIO for return codes if the devices is
fatally errored
- Try again to fix VLAN support on iwarp, previous fix was reverted
due to breaking other cards
- Correct error path return code for rdma netlink events
- Remove the seperate net_device pointer in siw and rxe which
syzkaller found a way to UAF
- Fix a UAF of a stack ib_sge in rtrs
- Fix a regression where old mlx5 devices and FW were wrongly
activing new device features and failing"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (28 commits)
RDMA/mlx5: Enable multiplane mode only when it is supported
RDMA/bnxt_re: Fix error recovery sequence
RDMA/rtrs: Ensure 'ib_sge list' is accessible
RDMA/rxe: Remove the direct link to net_device
RDMA/hns: Fix missing flush CQE for DWQE
RDMA/hns: Fix warning storm caused by invalid input in IO path
RDMA/hns: Fix accessing invalid dip_ctx during destroying QP
RDMA/hns: Fix mapping error of zero-hop WQE buffer
RDMA/bnxt_re: Fix the locking while accessing the QP table
RDMA/bnxt_re: Fix MSN table size for variable wqe mode
RDMA/bnxt_re: Add send queue size check for variable wqe
RDMA/bnxt_re: Disable use of reserved wqes
RDMA/bnxt_re: Fix max_qp_wrs reported
RDMA/siw: Remove direct link to net_device
RDMA/nldev: Set error code in rdma_nl_notify_event
RDMA/bnxt_re: Fix reporting hw_ver in query_device
RDMA/bnxt_re: Fix to export port num to ib_query_qp
RDMA/bnxt_re: Fix setting mandatory attributes for modify_qp
RDMA/bnxt_re: Add check for path mtu in modify_qp
RDMA/bnxt_re: Fix the check for 9060 condition
...
Linus Torvalds [Fri, 3 Jan 2025 18:57:57 +0000 (10:57 -0800)]
Merge tag 'pinctrl-v6.13-2' of git://git./linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
- A small Kconfig fixup for the i.MX.
In principle this could come in from the SoC tree but the bug was
introduced from the pin control tree so let's fix it from here.
- Fix a sleep in atomic context in the MCP23xxx GPIO expander by
disabling the regmap locking and using explicit mutex locks.
* tag 'pinctrl-v6.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: mcp23s08: Fix sleeping in atomic context due to regmap locking
ARM: imx: Re-introduce the PINCTRL selection
Linus Torvalds [Fri, 3 Jan 2025 18:54:51 +0000 (10:54 -0800)]
Merge tag 'sound-6.13-rc6' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"The first new year pull request: no surprises, all small fixes,
including:
- Follow-up fixes for the new compress-offload API extension
- A couple of fixes for MIDI 2.0 UMP handling
- A trivial race fix for OSS sequencer emulation ioctls
- USB-audio and HD-audio fixes / quirks"
* tag 'sound-6.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: seq: Check UMP support for midi_version change
ALSA hda/realtek: Add quirk for Framework F111:000C
Revert "ALSA: ump: Don't enumeration invalid groups for legacy rawmidi"
ALSA: seq: oss: Fix races at processing SysEx messages
ALSA: compress_offload: fix remaining descriptor races in sound/core/compress_offload.c
ALSA: compress_offload: Drop unneeded no_free_ptr()
ALSA: hda/tas2781: Ignore SUBSYS_ID not found for tas2563 projects
ALSA: usb-audio: US16x08: Initialize array before use
Linus Torvalds [Fri, 3 Jan 2025 18:06:44 +0000 (10:06 -0800)]
Merge tag 'drm-fixes-2025-01-03' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Happy New Year.
It was fairly quiet for holidays period, certainly nothing that worth
getting off the couch before I needed to, this is for the past two
weeks, i915, xe and some adv7511, I expect we will see some amdgpu etc
happening next week, but otherwise all quiet.
i915:
- Fix C10 pll programming sequence [cx0_phy]
- Fix power gate sequence. [dg1]
xe:
- uapi: Revert some devcoredump file format changes breaking a mesa
debug tool
- Fixes around waits when moving to system
- Fix a typo when checking for LMEM provisioning
- Fix a fault on fd close after unbind
- A couple of OA fixes squashed for stable backporting
adv7511:
- fix UAF
- drop single lane support
- audio infoframe fix"
* tag 'drm-fixes-2025-01-03' of https://gitlab.freedesktop.org/drm/kernel:
xe/oa: Fix query mode of operation for OAR/OAC
drm/i915/dg1: Fix power gate sequence.
drm/i915/cx0_phy: Fix C10 pll programming sequence
drm/xe: Fix fault on fd close after unbind
drm/xe/pf: Use correct function to check LMEM provisioning
drm/xe: Wait for migration job before unmapping pages
drm/xe: Use non-interruptible wait when moving BO to system
drm/xe: Revert some changes that break a mesa debug tool
drm: adv7511: Drop dsi single lane support
dt-bindings: display: adi,adv7533: Drop single lane support
drm: adv7511: Fix use-after-free in adv7533_attach_dsi()
drm/bridge: adv7511_audio: Update Audio InfoFrame properly
Linus Torvalds [Fri, 3 Jan 2025 18:04:43 +0000 (10:04 -0800)]
Merge tag 'ftrace-v6.13-rc5-2' of git://git./linux/kernel/git/trace/linux-trace
Pull ftrace fixes from Steven Rostedt:
- Add needed READ_ONCE() around access to the fgraph array element
The updates to the fgraph array can happen when callbacks are
registered and unregistered. The __ftrace_return_to_handler() can
handle reading either the old value or the new value. But once it
reads that value it must stay consistent otherwise the check that
looks to see if the value is a stub may show false, but if the
compiler decides to re-read after that check, it can be true which
can cause the code to crash later on.
- Make function profiler use the top level ops for filtering again
When function graph became available for instances, its filter ops
became independent from the top level set_ftrace_filter. In the
process the function profiler received its own filter ops as well.
But the function profiler uses the top level set_ftrace_filter file
and does not have one of its own. In giving it its own filter ops, it
lost any user interface it once had. Make it use the top level
set_ftrace_filter file again. This fixes a regression.
* tag 'ftrace-v6.13-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
ftrace: Fix function profiler's filtering functionality
fgraph: Add READ_ONCE() when accessing fgraph_array[]
Mark Zhang [Thu, 19 Dec 2024 12:23:36 +0000 (14:23 +0200)]
RDMA/mlx5: Enable multiplane mode only when it is supported
Driver queries vport_cxt.num_plane and enables multiplane when it is
greater then 0, but some old FWs (versions from x.40.1000 till x.42.1000),
report vport_cxt.num_plane = 1 unexpectedly.
Fix it by querying num_plane only when HCA_CAP2.multiplane bit is set.
Fixes:
2a5db20fa532 ("RDMA/mlx5: Add support to multi-plane device and port")
Link: https://patch.msgid.link/r/1ef901acdf564716fcf550453cf5e94f343777ec.1734610916.git.leon@kernel.org
Cc: stable@vger.kernel.org
Reported-by: Francesco Poli <invernomuto@paranoici.org>
Closes: https://lore.kernel.org/all/nvs4i2v7o6vn6zhmtq4sgazy2hu5kiulukxcntdelggmznnl7h@so3oul6uwgbl/
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Octavian Purdila [Mon, 30 Dec 2024 20:56:47 +0000 (12:56 -0800)]
team: prevent adding a device which is already a team device lower
Prevent adding a device which is already a team device lower,
e.g. adding veth0 if vlan1 was already added and veth0 is a lower of
vlan1.
This is not useful in practice and can lead to recursive locking:
$ ip link add veth0 type veth peer name veth1
$ ip link set veth0 up
$ ip link set veth1 up
$ ip link add link veth0 name veth0.1 type vlan protocol 802.1Q id 1
$ ip link add team0 type team
$ ip link set veth0.1 down
$ ip link set veth0.1 master team0
team0: Port device veth0.1 added
$ ip link set veth0 down
$ ip link set veth0 master team0
============================================
WARNING: possible recursive locking detected
6.13.0-rc2-virtme-00441-ga14a429069bb #46 Not tainted
--------------------------------------------
ip/7684 is trying to acquire lock:
ffff888016848e00 (team->team_lock_key){+.+.}-{4:4}, at: team_device_event (drivers/net/team/team_core.c:2928 drivers/net/team/team_core.c:2951 drivers/net/team/team_core.c:2973)
but task is already holding lock:
ffff888016848e00 (team->team_lock_key){+.+.}-{4:4}, at: team_add_slave (drivers/net/team/team_core.c:1147 drivers/net/team/team_core.c:1977)
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(team->team_lock_key);
lock(team->team_lock_key);
*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by ip/7684:
stack backtrace:
CPU: 3 UID: 0 PID: 7684 Comm: ip Not tainted
6.13.0-rc2-virtme-00441-ga14a429069bb #46
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:122)
print_deadlock_bug.cold (kernel/locking/lockdep.c:3040)
__lock_acquire (kernel/locking/lockdep.c:3893 kernel/locking/lockdep.c:5226)
? netlink_broadcast_filtered (net/netlink/af_netlink.c:1548)
lock_acquire.part.0 (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5851)
? team_device_event (drivers/net/team/team_core.c:2928 drivers/net/team/team_core.c:2951 drivers/net/team/team_core.c:2973)
? trace_lock_acquire (./include/trace/events/lock.h:24 (discriminator 2))
? team_device_event (drivers/net/team/team_core.c:2928 drivers/net/team/team_core.c:2951 drivers/net/team/team_core.c:2973)
? lock_acquire (kernel/locking/lockdep.c:5822)
? team_device_event (drivers/net/team/team_core.c:2928 drivers/net/team/team_core.c:2951 drivers/net/team/team_core.c:2973)
__mutex_lock (kernel/locking/mutex.c:587 kernel/locking/mutex.c:735)
? team_device_event (drivers/net/team/team_core.c:2928 drivers/net/team/team_core.c:2951 drivers/net/team/team_core.c:2973)
? team_device_event (drivers/net/team/team_core.c:2928 drivers/net/team/team_core.c:2951 drivers/net/team/team_core.c:2973)
? fib_sync_up (net/ipv4/fib_semantics.c:2167)
? team_device_event (drivers/net/team/team_core.c:2928 drivers/net/team/team_core.c:2951 drivers/net/team/team_core.c:2973)
team_device_event (drivers/net/team/team_core.c:2928 drivers/net/team/team_core.c:2951 drivers/net/team/team_core.c:2973)
notifier_call_chain (kernel/notifier.c:85)
call_netdevice_notifiers_info (net/core/dev.c:1996)
__dev_notify_flags (net/core/dev.c:8993)
? __dev_change_flags (net/core/dev.c:8975)
dev_change_flags (net/core/dev.c:9027)
vlan_device_event (net/8021q/vlan.c:85 net/8021q/vlan.c:470)
? br_device_event (net/bridge/br.c:143)
notifier_call_chain (kernel/notifier.c:85)
call_netdevice_notifiers_info (net/core/dev.c:1996)
dev_open (net/core/dev.c:1519 net/core/dev.c:1505)
team_add_slave (drivers/net/team/team_core.c:1219 drivers/net/team/team_core.c:1977)
? __pfx_team_add_slave (drivers/net/team/team_core.c:1972)
do_set_master (net/core/rtnetlink.c:2917)
do_setlink.isra.0 (net/core/rtnetlink.c:3117)
Reported-by: syzbot+3c47b5843403a45aef57@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=
3c47b5843403a45aef57
Fixes:
3d249d4ca7d0 ("net: introduce ethernet teaming device")
Signed-off-by: Octavian Purdila <tavip@google.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 3 Jan 2025 11:54:06 +0000 (11:54 +0000)]
Merge branch 'net-iep-clock-module-fixes'
Meghana Malladi says:
====================
IEP clock module bug fixes
This series has some bug fixes for IEP module needed by PPS and
timesync operations.
Patch 1/2 fixes firmware load sequence to run all the firmwares
when either of the ethernet interfaces is up. Move all the code
common for firmware bringup under common functions.
Patch 2/2 fixes distorted PPS signal when the ethernet interfaces
are brough down and up. This patch also fixes enabling PPS signal
after bringing the interface up, without disabling PPS.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Meghana Malladi [Mon, 23 Dec 2024 15:15:50 +0000 (20:45 +0530)]
net: ti: icssg-prueth: Fix clearing of IEP_CMP_CFG registers during iep_init
When ICSSG interfaces are brought down and brought up again, the
pru cores are shut down and booted again, flushing out all the memories
and start again in a clean state. Hence it is expected that the
IEP_CMP_CFG register needs to be flushed during iep_init() to ensure
that the existing residual configuration doesn't cause any unusual
behavior. If the register is not cleared, existing IEP_CMP_CFG set for
CMP1 will result in SYNC0_OUT signal based on the SYNC_OUT register values.
After bringing the interface up, calling PPS enable doesn't work as
the driver believes PPS is already enabled, (iep->pps_enabled is not
cleared during interface bring down) and driver will just return true
even though there is no signal. Fix this by disabling pps and perout.
Fixes:
c1e0230eeaab ("net: ti: icss-iep: Add IEP driver")
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
MD Danish Anwar [Mon, 23 Dec 2024 15:15:49 +0000 (20:45 +0530)]
net: ti: icssg-prueth: Fix firmware load sequence.
Timesync related operations are ran in PRU0 cores for both ICSSG SLICE0
and SLICE1. Currently whenever any ICSSG interface comes up we load the
respective firmwares to PRU cores and whenever interface goes down, we
stop the resective cores. Due to this, when SLICE0 goes down while
SLICE1 is still active, PRU0 firmwares are unloaded and PRU0 core is
stopped. This results in clock jump for SLICE1 interface as the timesync
related operations are no longer running.
As there are interdependencies between SLICE0 and SLICE1 firmwares,
fix this by running both PRU0 and PRU1 firmwares as long as at least 1
ICSSG interface is up. Add new flag in prueth struct to check if all
firmwares are running and remove the old flag (fw_running).
Use emacs_initialized as reference count to load the firmwares for the
first and last interface up/down. Moving init_emac_mode and fw_offload_mode
API outside of icssg_config to icssg_common_start API as they need
to be called only once per firmware boot.
Change prueth_emac_restart() to return error code and add error prints
inside the caller of this functions in case of any failures.
Move prueth_emac_stop() from common to sr1 driver.
sr1 and sr2 drivers have different logic handling for stopping
the firmwares. While sr1 driver is dependent on emac structure
to stop the corresponding pru cores for that slice, for sr2
all the pru cores of both the slices are stopped and is not
dependent on emac. So the prueth_emac_stop() function is no
longer common and can be moved to sr1 driver.
Fixes:
c1e0230eeaab ("net: ti: icss-iep: Add IEP driver")
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 3 Jan 2025 02:44:05 +0000 (18:44 -0800)]
Merge branch 'mptcp-rx-path-fixes'
Matthieu Baerts says:
====================
mptcp: rx path fixes
Here are 3 different fixes, all related to the MPTCP receive buffer:
- Patch 1: fix receive buffer space when recvmsg() blocks after
receiving some data. For a fix introduced in v6.12, backported to
v6.1.
- Patch 2: mptcp_cleanup_rbuf() can be called when no data has been
copied. For 5.11.
- Patch 3: prevent excessive coalescing on receive, which can affect the
throughput badly. It looks better to wait a bit before backporting
this one to stable versions, to get more results. For 5.10.
====================
Link: https://patch.msgid.link/20241230-net-mptcp-rbuf-fixes-v1-0-8608af434ceb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Mon, 30 Dec 2024 18:12:32 +0000 (19:12 +0100)]
mptcp: prevent excessive coalescing on receive
Currently the skb size after coalescing is only limited by the skb
layout (the skb must not carry frag_list). A single coalesced skb
covering several MSS can potentially fill completely the receive
buffer. In such a case, the snd win will zero until the receive buffer
will be empty again, affecting tput badly.
Fixes:
8268ed4c9d19 ("mptcp: introduce and use mptcp_try_coalesce()")
Cc: stable@vger.kernel.org # please delay 2 weeks after 6.13-final release
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241230-net-mptcp-rbuf-fixes-v1-3-8608af434ceb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Mon, 30 Dec 2024 18:12:31 +0000 (19:12 +0100)]
mptcp: don't always assume copied data in mptcp_cleanup_rbuf()
Under some corner cases the MPTCP protocol can end-up invoking
mptcp_cleanup_rbuf() when no data has been copied, but such helper
assumes the opposite condition.
Explicitly drop such assumption and performs the costly call only
when strictly needed - before releasing the msk socket lock.
Fixes:
fd8976790a6c ("mptcp: be careful on MPTCP-level ack.")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241230-net-mptcp-rbuf-fixes-v1-2-8608af434ceb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Mon, 30 Dec 2024 18:12:30 +0000 (19:12 +0100)]
mptcp: fix recvbuffer adjust on sleeping rcvmsg
If the recvmsg() blocks after receiving some data - i.e. due to
SO_RCVLOWAT - the MPTCP code will attempt multiple times to
adjust the receive buffer size, wrongly accounting every time the
cumulative of received data - instead of accounting only for the
delta.
Address the issue moving mptcp_rcv_space_adjust just after the
data reception and passing it only the just received bytes.
This also removes an unneeded difference between the TCP and MPTCP
RX code path implementation.
Fixes:
581302298524 ("mptcp: error out earlier on disconnect")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241230-net-mptcp-rbuf-fixes-v1-1-8608af434ceb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Mon, 30 Dec 2024 16:28:49 +0000 (16:28 +0000)]
ila: serialize calls to nf_register_net_hooks()
syzbot found a race in ila_add_mapping() [1]
commit
031ae72825ce ("ila: call nf_unregister_net_hooks() sooner")
attempted to fix a similar issue.
Looking at the syzbot repro, we have concurrent ILA_CMD_ADD commands.
Add a mutex to make sure at most one thread is calling nf_register_net_hooks().
[1]
BUG: KASAN: slab-use-after-free in rht_key_hashfn include/linux/rhashtable.h:159 [inline]
BUG: KASAN: slab-use-after-free in __rhashtable_lookup.constprop.0+0x426/0x550 include/linux/rhashtable.h:604
Read of size 4 at addr
ffff888028f40008 by task dhcpcd/5501
CPU: 1 UID: 0 PID: 5501 Comm: dhcpcd Not tainted
6.13.0-rc4-syzkaller-00054-gd6ef8b40d075 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0xc3/0x620 mm/kasan/report.c:489
kasan_report+0xd9/0x110 mm/kasan/report.c:602
rht_key_hashfn include/linux/rhashtable.h:159 [inline]
__rhashtable_lookup.constprop.0+0x426/0x550 include/linux/rhashtable.h:604
rhashtable_lookup include/linux/rhashtable.h:646 [inline]
rhashtable_lookup_fast include/linux/rhashtable.h:672 [inline]
ila_lookup_wildcards net/ipv6/ila/ila_xlat.c:127 [inline]
ila_xlat_addr net/ipv6/ila/ila_xlat.c:652 [inline]
ila_nf_input+0x1ee/0x620 net/ipv6/ila/ila_xlat.c:185
nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
nf_hook_slow+0xbb/0x200 net/netfilter/core.c:626
nf_hook.constprop.0+0x42e/0x750 include/linux/netfilter.h:269
NF_HOOK include/linux/netfilter.h:312 [inline]
ipv6_rcv+0xa4/0x680 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x12e/0x1e0 net/core/dev.c:5672
__netif_receive_skb+0x1d/0x160 net/core/dev.c:5785
process_backlog+0x443/0x15f0 net/core/dev.c:6117
__napi_poll.constprop.0+0xb7/0x550 net/core/dev.c:6883
napi_poll net/core/dev.c:6952 [inline]
net_rx_action+0xa94/0x1010 net/core/dev.c:7074
handle_softirqs+0x213/0x8f0 kernel/softirq.c:561
__do_softirq kernel/softirq.c:595 [inline]
invoke_softirq kernel/softirq.c:435 [inline]
__irq_exit_rcu+0x109/0x170 kernel/softirq.c:662
irq_exit_rcu+0x9/0x30 kernel/softirq.c:678
instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1049 [inline]
sysvec_apic_timer_interrupt+0xa4/0xc0 arch/x86/kernel/apic/apic.c:1049
Fixes:
7f00feaf1076 ("ila: Add generic ILA translation facility")
Reported-by: syzbot+47e761d22ecf745f72b9@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/
6772c9ae.
050a0220.2f3838.04c7.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Tom Herbert <tom@herbertland.com>
Link: https://patch.msgid.link/20241230162849.2795486-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Mon, 30 Dec 2024 16:10:04 +0000 (16:10 +0000)]
af_packet: fix vlan_get_protocol_dgram() vs MSG_PEEK
Blamed commit forgot MSG_PEEK case, allowing a crash [1] as found
by syzbot.
Rework vlan_get_protocol_dgram() to not touch skb at all,
so that it can be used from many cpus on the same skb.
Add a const qualifier to skb argument.
[1]
skbuff: skb_under_panic: text:
ffffffff8a8ccd05 len:29 put:14 head:
ffff88807fc8e400 data:
ffff88807fc8e3f4 tail:0x11 end:0x140 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:206 !
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 1 UID: 0 PID: 5892 Comm: syz-executor883 Not tainted
6.13.0-rc4-syzkaller-00054-gd6ef8b40d075 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
RIP: 0010:skb_panic net/core/skbuff.c:206 [inline]
RIP: 0010:skb_under_panic+0x14b/0x150 net/core/skbuff.c:216
Code: 0b 8d 48 c7 c6 86 d5 25 8e 48 8b 54 24 08 8b 0c 24 44 8b 44 24 04 4d 89 e9 50 41 54 41 57 41 56 e8 5a 69 79 f7 48 83 c4 20 90 <0f> 0b 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3
RSP: 0018:
ffffc900038d7638 EFLAGS:
00010282
RAX:
0000000000000087 RBX:
dffffc0000000000 RCX:
609ffd18ea660600
RDX:
0000000000000000 RSI:
0000000080000000 RDI:
0000000000000000
RBP:
ffff88802483c8d0 R08:
ffffffff817f0a8c R09:
1ffff9200071ae60
R10:
dffffc0000000000 R11:
fffff5200071ae61 R12:
0000000000000140
R13:
ffff88807fc8e400 R14:
ffff88807fc8e3f4 R15:
0000000000000011
FS:
00007fbac5e006c0(0000) GS:
ffff8880b8700000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007fbac5e00d58 CR3:
000000001238e000 CR4:
00000000003526f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
<TASK>
skb_push+0xe5/0x100 net/core/skbuff.c:2636
vlan_get_protocol_dgram+0x165/0x290 net/packet/af_packet.c:585
packet_recvmsg+0x948/0x1ef0 net/packet/af_packet.c:3552
sock_recvmsg_nosec net/socket.c:1033 [inline]
sock_recvmsg+0x22f/0x280 net/socket.c:1055
____sys_recvmsg+0x1c6/0x480 net/socket.c:2803
___sys_recvmsg net/socket.c:2845 [inline]
do_recvmmsg+0x426/0xab0 net/socket.c:2940
__sys_recvmmsg net/socket.c:3014 [inline]
__do_sys_recvmmsg net/socket.c:3037 [inline]
__se_sys_recvmmsg net/socket.c:3030 [inline]
__x64_sys_recvmmsg+0x199/0x250 net/socket.c:3030
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Fixes:
79eecf631c14 ("af_packet: Handle outgoing VLAN packets without hardware offloading")
Reported-by: syzbot+74f70bb1cb968bf09e4f@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/
6772c485.
050a0220.2f3838.04c5.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Chengen Du <chengen.du@canonical.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20241230161004.2681892-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>