Shay Drory [Tue, 9 Dec 2025 12:56:12 +0000 (14:56 +0200)]
net/mlx5: fw_tracer, Handle escaped percent properly
The firmware tracer's format string validation and parameter counting
did not properly handle escaped percent signs (%%). This caused
fw_tracer to count more parameters when trace format strings contained
literal percent characters.
To fix it, allow %% to pass string validation and skip %% sequences when
counting parameters since they represent literal percent signs rather
than format specifiers.
Fixes:
70dd6fdb8987 ("net/mlx5: FW tracer, parse traces and kernel tracing support")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reported-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Closes: https://lore.kernel.org/netdev/hanz6rzrb2bqbplryjrakvkbmv4y5jlmtthnvi3thg5slqvelp@t3s3erottr6s/
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1765284977-1363052-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Shay Drory [Tue, 9 Dec 2025 12:56:11 +0000 (14:56 +0200)]
net/mlx5: fw_tracer, Validate format string parameters
Add validation for format string parameters in the firmware tracer to
prevent potential security vulnerabilities and crashes from malformed
format strings received from firmware.
The firmware tracer receives format strings from the device firmware and
uses them to format trace messages. Without proper validation, bad
firmware could provide format strings with invalid format specifiers
(e.g., %s, %p, %n) that could lead to crashes, or other undefined
behavior.
Add mlx5_tracer_validate_params() to validate that all format specifiers
in trace strings are limited to safe integer/hex formats (%x, %d, %i,
%u, %llx, %lx, etc.). Reject strings containing other format types that
could be used to access arbitrary memory or cause crashes.
Invalid format strings are added to the trace output for visibility with
"BAD_FORMAT: " prefix.
Fixes:
70dd6fdb8987 ("net/mlx5: FW tracer, parse traces and kernel tracing support")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reported-by: Breno Leitao <leitao@debian.org>
Closes: https://lore.kernel.org/netdev/hanz6rzrb2bqbplryjrakvkbmv4y5jlmtthnvi3thg5slqvelp@t3s3erottr6s/
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1765284977-1363052-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Moshe Shemesh [Tue, 9 Dec 2025 12:56:10 +0000 (14:56 +0200)]
net/mlx5: Drain firmware reset in shutdown callback
Invoke drain_fw_reset() in the shutdown callback to ensure all
firmware reset handling is completed before shutdown proceeds.
Fixes:
16d42d313350 ("net/mlx5: Drain fw_reset when removing device")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1765284977-1363052-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Moshe Shemesh [Tue, 9 Dec 2025 12:56:09 +0000 (14:56 +0200)]
net/mlx5: fw reset, clear reset requested on drain_fw_reset
drain_fw_reset() waits for ongoing firmware reset events and blocks new
event handling, but does not clear the reset requested flag, and may
keep sync reset polling.
To fix it, call mlx5_sync_reset_clear_reset_requested() to clear the
flag, stop sync reset polling, and resume health polling, ensuring
health issues are still detected after the firmware reset drain.
Fixes:
16d42d313350 ("net/mlx5: Drain fw_reset when removing device")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1765284977-1363052-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 18 Dec 2025 11:53:23 +0000 (12:53 +0100)]
Merge branch 'net-dsa-lantiq-a-bunch-of-fixes'
Daniel Golle says:
====================
net: dsa: lantiq: a bunch of fixes
This series is the continuation and result of comments received for a fix
for the SGMII restart-an bit not actually being self-clearing, which was
reported by by Rasmus Villemoes.
A closer investigation and testing the .remove and the .shutdown paths
of the mxl-gsw1xx.c and lantiq_gswip.c drivers has revealed a couple of
existing problems, which are also addressed in this series.
====================
Link: https://patch.msgid.link/cover.1765241054.git.daniel@makrotopia.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Golle [Tue, 9 Dec 2025 01:29:34 +0000 (01:29 +0000)]
net: dsa: mxl-gsw1xx: manually clear RANEG bit
Despite being documented as self-clearing, the RANEG bit sometimes
remains set, preventing auto-negotiation from happening.
Manually clear the RANEG bit after 10ms as advised by MaxLinear.
In order to not hold RTNL during the 10ms of waiting schedule
delayed work to take care of clearing the bit asynchronously, which
is similar to the self-clearing behavior.
Fixes:
22335939ec90 ("net: dsa: add driver for MaxLinear GSW1xx switch family")
Reported-by: Rasmus Villemoes <ravi@prevas.dk>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/76745fceb5a3f53088110fb7a96acf88434088ca.1765241054.git.daniel@makrotopia.org
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Golle [Tue, 9 Dec 2025 01:29:05 +0000 (01:29 +0000)]
net: dsa: mxl-gsw1xx: fix .shutdown driver operation
The .shutdown operation should call dsa_switch_shutdown() just like
it is done also by the sibling lantiq_gswip driver. Not doing that
results in shutdown or reboot hanging and waiting for the CPU port
becoming free, which introduces a longer delay and a WARNING before
shutdown or reboot in case the driver is built-into the kernel.
Fix this by calling dsa_switch_shutdown() in the driver's shutdown
operation, harmonizing it with what is done in the lantiq_gswip
driver. As a side-effect this now allows to remove the previously
exported gswip_disable_switch() function which no longer got any
users.
Fixes:
22335939ec907 ("net: dsa: add driver for MaxLinear GSW1xx switch family")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/77ed91a5206e5dbf5d3e83d7e364ebfda90d31fd.1765241054.git.daniel@makrotopia.org
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Golle [Tue, 9 Dec 2025 01:28:49 +0000 (01:28 +0000)]
net: dsa: mxl-gsw1xx: fix order in .remove operation
The driver's .remove operation was calling gswip_disable_switch() which
clears the GSWIP_MDIO_GLOB_ENABLE bit before calling
dsa_unregister_switch() and thereby violating a Golden Rule of driver
development to always unpublish userspace interfaces before disabling
hardware, as pointed out by Russell King.
Fix this by relying in GSWIP_MDIO_GLOB_ENABLE being cleared by the
.teardown operation introduced by the previous commit
("net: dsa: lantiq_gswip: fix teardown order").
Fixes:
22335939ec907 ("net: dsa: add driver for MaxLinear GSW1xx switch family")
Suggested-by: "Russell King (Oracle)" <linux@armlinux.org.uk>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/63f882eeb910cf24503c35a443b541cc54a930f2.1765241054.git.daniel@makrotopia.org
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Golle [Tue, 9 Dec 2025 01:28:20 +0000 (01:28 +0000)]
net: dsa: lantiq_gswip: fix order in .remove operation
Russell King pointed out that disabling the switch by clearing
GSWIP_MDIO_GLOB_ENABLE before calling dsa_unregister_switch() is
problematic, as it violates a Golden Rule of driver development to
always first unpublish userspace interfaces and then disable the
hardware.
Fix this, and also simplify the probe() function, by introducing a
dsa_switch_ops teardown() operation which takes care of clearing the
GSWIP_MDIO_GLOB_ENABLE bit.
Fixes:
14fceff4771e5 ("net: dsa: Add Lantiq / Intel DSA driver for vrx200")
Suggested-by: "Russell King (Oracle)" <linux@armlinux.org.uk>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/4ebd72a29edc1e4059b9666a26a0bb5d906a829a.1765241054.git.daniel@makrotopia.org
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Gal Pressman [Mon, 8 Dec 2025 12:19:01 +0000 (14:19 +0200)]
ethtool: Avoid overflowing userspace buffer on stats query
The ethtool -S command operates across three ioctl calls:
ETHTOOL_GSSET_INFO for the size, ETHTOOL_GSTRINGS for the names, and
ETHTOOL_GSTATS for the values.
If the number of stats changes between these calls (e.g., due to device
reconfiguration), userspace's buffer allocation will be incorrect,
potentially leading to buffer overflow.
Drivers are generally expected to maintain stable stat counts, but some
drivers (e.g., mlx5, bnx2x, bna, ksz884x) use dynamic counters, making
this scenario possible.
Some drivers try to handle this internally:
- bnad_get_ethtool_stats() returns early in case stats.n_stats is not
equal to the driver's stats count.
- micrel/ksz884x also makes sure not to write anything beyond
stats.n_stats and overflow the buffer.
However, both use stats.n_stats which is already assigned with the value
returned from get_sset_count(), hence won't solve the issue described
here.
Change ethtool_get_strings(), ethtool_get_stats(),
ethtool_get_phy_stats() to not return anything in case of a mismatch
between userspace's size and get_sset_size(), to prevent buffer
overflow.
The returned n_stats value will be equal to zero, to reflect that
nothing has been returned.
This could result in one of two cases when using upstream ethtool,
depending on when the size change is detected:
1. When detected in ethtool_get_strings():
# ethtool -S eth2
no stats available
2. When detected in get stats, all stats will be reported as zero.
Both cases are presumably transient, and a subsequent ethtool call
should succeed.
Other than the overflow avoidance, these two cases are very evident (no
output/cleared stats), which is arguably better than presenting
incorrect/shifted stats.
I also considered returning an error instead of a "silent" response, but
that seems more destructive towards userspace apps.
Notes:
- This patch does not claim to fix the inherent race, it only makes sure
that we do not overflow the userspace buffer, and makes for a more
predictable behavior.
- RTNL lock is held during each ioctl, the race window exists between
the separate ioctl calls when the lock is released.
- Userspace ethtool always fills stats.n_stats, but it is likely that
these stats ioctls are implemented in other userspace applications
which might not fill it. The added code checks that it's not zero,
to prevent any regressions.
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20251208121901.3203692-1-gal@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Dan Carpenter [Tue, 9 Dec 2025 06:56:39 +0000 (09:56 +0300)]
nfc: pn533: Fix error code in pn533_acr122_poweron_rdr()
Set the error code if "transferred != sizeof(cmd)" instead of
returning success.
Fixes:
dbafc28955fa ("NFC: pn533: don't send USB data off of the stack")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://patch.msgid.link/aTfIJ9tZPmeUF4W1@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Victor Nogueira [Mon, 8 Dec 2025 19:01:25 +0000 (16:01 -0300)]
selftests/tc-testing: Create tests to exercise ets classes active list misplacements
Add a test case for a bug fixed by Jamal [1] and for scenario where an
ets drr class is inserted into the active list twice.
- Try to delete ets drr class' qdisc while still keeping it in the
active list
- Try to add ets class to the active list twice
[1] https://lore.kernel.org/netdev/
20251128151919.576920-1-jhs@mojatatu.com/
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20251208190125.1868423-2-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Victor Nogueira [Mon, 8 Dec 2025 19:01:24 +0000 (16:01 -0300)]
net/sched: ets: Remove drr class from the active list if it changes to strict
Whenever a user issues an ets qdisc change command, transforming a
drr class into a strict one, the ets code isn't checking whether that
class was in the active list and removing it. This means that, if a
user changes a strict class (which was in the active list) back to a drr
one, that class will be added twice to the active list [1].
Doing so with the following commands:
tc qdisc add dev lo root handle 1: ets bands 2 strict 1
tc qdisc add dev lo parent 1:2 handle 20: \
tbf rate 8bit burst 100b latency 1s
tc filter add dev lo parent 1: basic classid 1:2
ping -c1 -W0.01 -s 56 127.0.0.1
tc qdisc change dev lo root handle 1: ets bands 2 strict 2
tc qdisc change dev lo root handle 1: ets bands 2 strict 1
ping -c1 -W0.01 -s 56 127.0.0.1
Will trigger the following splat with list debug turned on:
[ 59.279014][ T365] ------------[ cut here ]------------
[ 59.279452][ T365] list_add double add: new=
ffff88801d60e350, prev=
ffff88801d60e350, next=
ffff88801d60e2c0.
[ 59.280153][ T365] WARNING: CPU: 3 PID: 365 at lib/list_debug.c:35 __list_add_valid_or_report+0x17f/0x220
[ 59.280860][ T365] Modules linked in:
[ 59.281165][ T365] CPU: 3 UID: 0 PID: 365 Comm: tc Not tainted
6.18.0-rc7-00105-g7e9f13163c13-dirty #239 PREEMPT(voluntary)
[ 59.281977][ T365] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 59.282391][ T365] RIP: 0010:__list_add_valid_or_report+0x17f/0x220
[ 59.282842][ T365] Code: 89 c6 e8 d4 b7 0d ff 90 0f 0b 90 90 31 c0 e9 31 ff ff ff 90 48 c7 c7 e0 a0 22 9f 48 89 f2 48 89 c1 4c 89 c6 e8 b2 b7 0d ff 90 <0f> 0b 90 90 31 c0 e9 0f ff ff ff 48 89 f7 48 89 44 24 10 4c 89 44
...
[ 59.288812][ T365] Call Trace:
[ 59.289056][ T365] <TASK>
[ 59.289224][ T365] ? srso_alias_return_thunk+0x5/0xfbef5
[ 59.289546][ T365] ets_qdisc_change+0xd2b/0x1e80
[ 59.289891][ T365] ? __lock_acquire+0x7e7/0x1be0
[ 59.290223][ T365] ? __pfx_ets_qdisc_change+0x10/0x10
[ 59.290546][ T365] ? srso_alias_return_thunk+0x5/0xfbef5
[ 59.290898][ T365] ? __mutex_trylock_common+0xda/0x240
[ 59.291228][ T365] ? __pfx___mutex_trylock_common+0x10/0x10
[ 59.291655][ T365] ? srso_alias_return_thunk+0x5/0xfbef5
[ 59.291993][ T365] ? srso_alias_return_thunk+0x5/0xfbef5
[ 59.292313][ T365] ? trace_contention_end+0xc8/0x110
[ 59.292656][ T365] ? srso_alias_return_thunk+0x5/0xfbef5
[ 59.293022][ T365] ? srso_alias_return_thunk+0x5/0xfbef5
[ 59.293351][ T365] tc_modify_qdisc+0x63a/0x1cf0
Fix this by always checking and removing an ets class from the active list
when changing it to strict.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/tree/net/sched/sch_ets.c?id=
ce052b9402e461a9aded599f5b47e76bc727f7de#n663
Fixes:
cd9b50adc6bb9 ("net/sched: ets: fix crash when flipping from 'strict' to 'quantum'")
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20251208190125.1868423-1-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Junrui Luo [Thu, 4 Dec 2025 13:30:47 +0000 (21:30 +0800)]
caif: fix integer underflow in cffrml_receive()
The cffrml_receive() function extracts a length field from the packet
header and, when FCS is disabled, subtracts 2 from this length without
validating that len >= 2.
If an attacker sends a malicious packet with a length field of 0 or 1
to an interface with FCS disabled, the subtraction causes an integer
underflow.
This can lead to memory exhaustion and kernel instability, potential
information disclosure if padding contains uninitialized kernel memory.
Fix this by validating that len >= 2 before performing the subtraction.
Reported-by: Yuhao Jiang <danisjiang@gmail.com>
Reported-by: Junrui Luo <moonafterrain@outlook.com>
Fixes:
b482cd2053e3 ("net-caif: add CAIF core protocol stack")
Signed-off-by: Junrui Luo <moonafterrain@outlook.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/SYBPR01MB7881511122BAFEA8212A1608AFA6A@SYBPR01MB7881.ausprd01.prod.outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Marcus Hughes [Sun, 7 Dec 2025 21:03:55 +0000 (21:03 +0000)]
net: sfp: extend Potron XGSPON quirk to cover additional EEPROM variant
Some Potron SFP+ XGSPON ONU sticks are shipped with different EEPROM
vendor ID and vendor name strings, but are otherwise functionally
identical to the existing "Potron SFP+ XGSPON ONU Stick" handled by
sfp_quirk_potron().
These modules, including units distributed under the "Better Internet"
branding, use the same UART pin assignment and require the same
TX_FAULT/LOS behaviour and boot delay. Re-use the existing Potron
quirk for this EEPROM variant.
Signed-off-by: Marcus Hughes <marcus.hughes@betterinternet.ltd>
Link: https://patch.msgid.link/20251207210355.333451-1-marcus.hughes@betterinternet.ltd
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 11 Dec 2025 09:02:29 +0000 (01:02 -0800)]
Merge tag 'linux-can-fixes-for-6.19-
20251210' of git://git./linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2025-12-10
Arnd Bergmann's patch fixes a build dependency with the CAN protocols
and drivers introduced in the current development cycle.
The last patch is by me and fixes the error handling cleanup in the
gs_usb driver.
* tag 'linux-can-fixes-for-6.19-
20251210' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: gs_usb: gs_can_open(): fix error handling
can: fix build dependency
====================
Link: https://patch.msgid.link/20251210083448.2116869-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 11 Dec 2025 08:58:38 +0000 (00:58 -0800)]
Merge tag 'nf-25-12-10' of https://git./linux/kernel/git/netfilter/nf
Florian Westphal says:
====================
netfilter: updates for net
1) Fix refcount leaks in nf_conncount, from Fernando Fernandez Mancera.
This addresses a recent regression that came in the last -next
pull request.
2) Fix a null dereference in route error handling in IPVS, from Slavin
Liu. This is an ancient issue dating back to 5.1 days.
3) Always set ifindex in route tuple in the flowtable output path, from
Lorenzo Bianconi. This bug came in with the recent output path refactoring.
4) Prefer 'exit $ksft_xfail' over 'exit $ksft_skip' when we fail to
trigger a nat race condition to exercise the clash resolution path in
selftest infra, $ksft_skip should be reserved for missing tooling,
From myself.
* tag 'nf-25-12-10' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
selftests: netfilter: prefer xfail in case race wasn't triggered
netfilter: always set route tuple out ifindex
ipvs: fix ipv4 null-ptr-deref in route error path
netfilter: nf_conncount: fix leaked ct in error paths
====================
Link: https://patch.msgid.link/20251210110754.22620-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 11 Dec 2025 08:53:18 +0000 (00:53 -0800)]
Merge branch 'selftests-forwarding-vxlan_bridge_1q_mc_ul-fix-flakiness'
Petr Machata says:
====================
selftests: forwarding: vxlan_bridge_1q_mc_ul: Fix flakiness
The net/forwarding/vxlan_bridge_1q_mc_ul selftest runs an overlay traffic,
forwarded over a multicast-routed VXLAN underlay. In order to determine
whether packets reach their intended destination, it uses a TC match. For
convenience, it uses a flower match, which however does not allow matching
on the encapsulated packet. So various service traffic ends up being
indistinguishable from the test packets, and ends up confusing the test. To
alleviate the problem, the test uses sleep to allow the necessary service
traffic to run and clear the channel, before running the test traffic. This
worked for a while, but lately we have nevertheless seen flakiness of the
test in the CI.
In this patchset, first generalize tc_rule_stats_get() to support u32 in
patch #1, then in patch #2 convert the test to use u32 to allow parsing
deeper into the packet, and in #3 drop the now-unnecessary sleep.
====================
Link: https://patch.msgid.link/cover.1765289566.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Petr Machata [Tue, 9 Dec 2025 15:29:03 +0000 (16:29 +0100)]
selftests: forwarding: vxlan_bridge_1q_mc_ul: Drop useless sleeping
After fixing traffic matching in the previous patch, the test does not need
to use the sleep anymore. So drop vx_wait() altogether, migrate all callers
of vx{10,20}_create_wait() to the corresponding _create(), and drop the now
unused _create_wait() helpers.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/eabfe4fa12ae788cf3b8c5c876a989de81dfc3d3.1765289566.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Petr Machata [Tue, 9 Dec 2025 15:29:02 +0000 (16:29 +0100)]
selftests: forwarding: vxlan_bridge_1q_mc_ul: Fix flakiness
This test runs an overlay traffic, forwarded over a multicast-routed VXLAN
underlay. In order to determine whether packets reach their intended
destination, it uses a TC match. For convenience, it uses a flower match,
which however does not allow matching on the encapsulated packet. So
various service traffic ends up being indistinguishable from the test
packets, and ends up confusing the test. To alleviate the problem, the test
uses sleep to allow the necessary service traffic to run and clear the
channel, before running the test traffic. This worked for a while, but
lately we have nevertheless seen flakiness of the test in the CI.
Fix the issue by using u32 to match the encapsulated packet as well. The
confusing packets seem to always be IPv6 multicast listener reports.
Realistically they could be ARP or other ICMP6 traffic as well. Therefore
look for ethertype IPv4 in the IPv4 traffic test, and for IPv6 / UDP
combination in the IPv6 traffic test.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/6438cb1613a2a667d3ff64089eb5994778f247af.1765289566.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Petr Machata [Tue, 9 Dec 2025 15:29:01 +0000 (16:29 +0100)]
selftests: net: lib: tc_rule_stats_get(): Don't hard-code array index
Flower is commonly used to match on packets in many bash-based selftests.
A dump of a flower filter including statistics looks something like this:
[
{
"protocol": "all",
"pref": 49152,
"kind": "flower",
"chain": 0
},
{
...
"options": {
...
"actions": [
{
...
"stats": {
"bytes": 0,
"packets": 0,
"drops": 0,
"overlimits": 0,
"requeues": 0,
"backlog": 0,
"qlen": 0
}
}
]
}
}
]
The JQ query in the helper function tc_rule_stats_get() assumes this form
and looks for the second element of the array.
However, a dump of a u32 filter looks like this:
[
{
"protocol": "all",
"pref": 49151,
"kind": "u32",
"chain": 0
},
{
"protocol": "all",
"pref": 49151,
"kind": "u32",
"chain": 0,
"options": {
"fh": "800:",
"ht_divisor": 1
}
},
{
...
"options": {
...
"actions": [
{
...
"stats": {
"bytes": 0,
"packets": 0,
"drops": 0,
"overlimits": 0,
"requeues": 0,
"backlog": 0,
"qlen": 0
}
}
]
}
},
]
There's an extra element which the JQ query ends up choosing.
Instead of hard-coding a particular index, look for the entry on which a
selector .options.actions yields anything.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/12982a44471c834511a0ee6c1e8f57e3a5307105.1765289566.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Westphal [Mon, 8 Dec 2025 23:03:36 +0000 (00:03 +0100)]
selftests: netfilter: prefer xfail in case race wasn't triggered
Jakub says: "We try to reserve SKIP for tests skipped because tool is
missing in env, something isn't built into the kernel etc."
use xfail, we can't force the race condition to appear at will
so its expected that the test 'fails' occasionally.
Fixes:
78a588363587 ("selftests: netfilter: add conntrack clash resolution test case")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/
20251206175647.
5c32f419@kernel.org/
Signed-off-by: Florian Westphal <fw@strlen.de>
Lorenzo Bianconi [Mon, 1 Dec 2025 10:22:45 +0000 (11:22 +0100)]
netfilter: always set route tuple out ifindex
Always set nf_flow_route tuple out ifindex even if the indev is not one
of the flowtable configured devices since otherwise the outdev lookup in
nf_flow_offload_ip_hook() or nf_flow_offload_ipv6_hook() for
FLOW_OFFLOAD_XMIT_NEIGH flowtable entries will fail.
The above issue occurs in the following configuration since IP6IP6
tunnel does not support flowtable acceleration yet:
$ip addr show
5: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 00:11:22:33:22:55 brd ff:ff:ff:ff:ff:ff link-netns ns1
inet6 2001:db8:1::2/64 scope global nodad
valid_lft forever preferred_lft forever
inet6 fe80::211:22ff:fe33:2255/64 scope link tentative proto kernel_ll
valid_lft forever preferred_lft forever
6: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 00:22:22:33:22:55 brd ff:ff:ff:ff:ff:ff link-netns ns3
inet6 2001:db8:2::1/64 scope global nodad
valid_lft forever preferred_lft forever
inet6 fe80::222:22ff:fe33:2255/64 scope link tentative proto kernel_ll
valid_lft forever preferred_lft forever
7: tun0@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1452 qdisc noqueue state UNKNOWN group default qlen 1000
link/tunnel6 2001:db8:2::1 peer 2001:db8:2::2 permaddr a85:e732:2c37::
inet6 2002:db8:1::1/64 scope global nodad
valid_lft forever preferred_lft forever
inet6 fe80::885:e7ff:fe32:2c37/64 scope link proto kernel_ll
valid_lft forever preferred_lft forever
$ip -6 route show
2001:db8:1::/64 dev eth0 proto kernel metric 256 pref medium
2001:db8:2::/64 dev eth1 proto kernel metric 256 pref medium
2002:db8:1::/64 dev tun0 proto kernel metric 256 pref medium
default via 2002:db8:1::2 dev tun0 metric 1024 pref medium
$nft list ruleset
table inet filter {
flowtable ft {
hook ingress priority filter
devices = { eth0, eth1 }
}
chain forward {
type filter hook forward priority filter; policy accept;
meta l4proto { tcp, udp } flow add @ft
}
}
Fixes:
b5964aac51e0 ("netfilter: flowtable: consolidate xmit path")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Slavin Liu [Fri, 21 Nov 2025 08:52:13 +0000 (16:52 +0800)]
ipvs: fix ipv4 null-ptr-deref in route error path
The IPv4 code path in __ip_vs_get_out_rt() calls dst_link_failure()
without ensuring skb->dev is set, leading to a NULL pointer dereference
in fib_compute_spec_dst() when ipv4_link_failure() attempts to send
ICMP destination unreachable messages.
The issue emerged after commit
ed0de45a1008 ("ipv4: recompile ip options
in ipv4_link_failure") started calling __ip_options_compile() from
ipv4_link_failure(). This code path eventually calls fib_compute_spec_dst()
which dereferences skb->dev. An attempt was made to fix the NULL skb->dev
dereference in commit
0113d9c9d1cc ("ipv4: fix null-deref in
ipv4_link_failure"), but it only addressed the immediate dev_net(skb->dev)
dereference by using a fallback device. The fix was incomplete because
fib_compute_spec_dst() later in the call chain still accesses skb->dev
directly, which remains NULL when IPVS calls dst_link_failure().
The crash occurs when:
1. IPVS processes a packet in NAT mode with a misconfigured destination
2. Route lookup fails in __ip_vs_get_out_rt() before establishing a route
3. The error path calls dst_link_failure(skb) with skb->dev == NULL
4. ipv4_link_failure() → ipv4_send_dest_unreach() →
__ip_options_compile() → fib_compute_spec_dst()
5. fib_compute_spec_dst() dereferences NULL skb->dev
Apply the same fix used for IPv6 in commit
326bf17ea5d4 ("ipvs: fix
ipv6 route unreach panic"): set skb->dev from skb_dst(skb)->dev before
calling dst_link_failure().
KASAN: null-ptr-deref in range [0x0000000000000328-0x000000000000032f]
CPU: 1 PID: 12732 Comm: syz.1.3469 Not tainted 6.6.114 #2
RIP: 0010:__in_dev_get_rcu include/linux/inetdevice.h:233
RIP: 0010:fib_compute_spec_dst+0x17a/0x9f0 net/ipv4/fib_frontend.c:285
Call Trace:
<TASK>
spec_dst_fill net/ipv4/ip_options.c:232
spec_dst_fill net/ipv4/ip_options.c:229
__ip_options_compile+0x13a1/0x17d0 net/ipv4/ip_options.c:330
ipv4_send_dest_unreach net/ipv4/route.c:1252
ipv4_link_failure+0x702/0xb80 net/ipv4/route.c:1265
dst_link_failure include/net/dst.h:437
__ip_vs_get_out_rt+0x15fd/0x19e0 net/netfilter/ipvs/ip_vs_xmit.c:412
ip_vs_nat_xmit+0x1d8/0xc80 net/netfilter/ipvs/ip_vs_xmit.c:764
Fixes:
ed0de45a1008 ("ipv4: recompile ip options in ipv4_link_failure")
Signed-off-by: Slavin Liu <slavin452@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>
Fernando Fernandez Mancera [Fri, 5 Dec 2025 11:58:01 +0000 (12:58 +0100)]
netfilter: nf_conncount: fix leaked ct in error paths
There are some situations where ct might be leaked as error paths are
skipping the refcounted check and return immediately. In order to solve
it make sure that the check is always called.
Fixes:
be102eb6a0e7 ("netfilter: nf_conncount: rework API to use sk_buff directly")
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Jakub Kicinski [Wed, 10 Dec 2025 09:15:32 +0000 (01:15 -0800)]
Merge branch 'inet-frags-flush-pending-skbs-in-fqdir_pre_exit'
Jakub Kicinski says:
====================
inet: frags: flush pending skbs in fqdir_pre_exit()
Fix the issue reported by NIPA starting on Sep 18th [1], where
pernet_ops_rwsem is constantly held by a reader, preventing writers
from grabbing it (specifically driver modules from loading).
The fact that reports started around that time seems coincidental.
The issue seems to be skbs queued for defrag preventing conntrack
from exiting.
First patch fixes another theoretical issue, it's mostly a leftover
from an attempt to get rid of the inet_frag_queue refcnt, which
I gave up on (still think it's doable but a bit of a time sink).
Second patch is a minor refactor.
The real fix is in the third patch. It's the simplest fix I can
think of which is to flush the frag queues. Perhaps someone has
a better suggestion?
Last patch adds an explicit warning for conntrack getting stuck,
as this seems like something that can easily happen if bugs sneak in.
The warning will hopefully save us the first 20% of the investigation
effort.
Link: https://lore.kernel.org/20251001082036.0fc51440@kernel.org
====================
Link: https://patch.msgid.link/20251207010942.1672972-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sun, 7 Dec 2025 01:09:42 +0000 (17:09 -0800)]
netfilter: conntrack: warn when cleanup is stuck
nf_conntrack_cleanup_net_list() calls schedule() so it does not
show up as a hung task. Add an explicit check to make debugging
leaked skbs/conntack references more obvious.
Acked-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251207010942.1672972-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sun, 7 Dec 2025 01:09:41 +0000 (17:09 -0800)]
inet: frags: flush pending skbs in fqdir_pre_exit()
We have been seeing occasional deadlocks on pernet_ops_rwsem since
September in NIPA. The stuck task was usually modprobe (often loading
a driver like ipvlan), trying to take the lock as a Writer.
lockdep does not track readers for rwsems so the read wasn't obvious
from the reports.
On closer inspection the Reader holding the lock was conntrack looping
forever in nf_conntrack_cleanup_net_list(). Based on past experience
with occasional NIPA crashes I looked thru the tests which run before
the crash and noticed that the crash follows ip_defrag.sh. An immediate
red flag. Scouring thru (de)fragmentation queues reveals skbs sitting
around, holding conntrack references.
The problem is that since conntrack depends on nf_defrag_ipv6,
nf_defrag_ipv6 will load first. Since nf_defrag_ipv6 loads first its
netns exit hooks run _after_ conntrack's netns exit hook.
Flush all fragment queue SKBs during fqdir_pre_exit() to release
conntrack references before conntrack cleanup runs. Also flush
the queues in timer expiry handlers when they discover fqdir->dead
is set, in case packet sneaks in while we're running the pre_exit
flush.
The commit under Fixes is not exactly the culprit, but I think
previously the timer firing would eventually unblock the spinning
conntrack.
Fixes:
d5dd88794a13 ("inet: fix various use-after-free in defrags units")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251207010942.1672972-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sun, 7 Dec 2025 01:09:40 +0000 (17:09 -0800)]
inet: frags: add inet_frag_queue_flush()
Instead of exporting inet_frag_rbtree_purge() which requires that
caller takes care of memory accounting, add a new helper. We will
need to call it from a few places in the next patch.
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251207010942.1672972-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sun, 7 Dec 2025 01:09:39 +0000 (17:09 -0800)]
inet: frags: avoid theoretical race in ip_frag_reinit()
In ip_frag_reinit() we want to move the frag timeout timer into
the future. If the timer fires in the meantime we inadvertently
scheduled it again, and since the timer assumes a ref on frag_queue
we need to acquire one to balance things out.
This is technically racy, we should have acquired the reference
_before_ we touch the timer, it may fire again before we take the ref.
Avoid this entire dance by using mod_timer_pending() which only modifies
the timer if its pending (and which exists since Linux v2.6.30)
Note that this was the only place we ever took a ref on frag_queue
since Eric's conversion to RCU. So we could potentially replace
the whole refcnt field with an atomic flag and a bit more RCU.
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251207010942.1672972-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 10 Dec 2025 09:08:51 +0000 (01:08 -0800)]
Merge branch 'selftests-fix-build-warnings-and-errors' (part)
Guenter Roeck says:
====================
selftests: Fix build warnings and errors
This series fixes build warnings and errors observed when trying to build
selftests.
====================
Link: https://patch.msgid.link/20251205171010.515236-1-linux@roeck-us.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Guenter Roeck [Fri, 5 Dec 2025 17:10:07 +0000 (09:10 -0800)]
selftests: net: tfo: Fix build warning
Fix
tfo.c: In function ‘run_server’:
tfo.c:84:9: warning: ignoring return value of ‘read’ declared with attribute ‘warn_unused_result’
by evaluating the return value from read() and displaying an error message
if it reports an error.
Fixes:
c65b5bb2329e3 ("selftests: net: add passive TFO test binary")
Cc: David Wei <dw@davidwei.uk>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Link: https://patch.msgid.link/20251205171010.515236-14-linux@roeck-us.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Guenter Roeck [Fri, 5 Dec 2025 17:10:04 +0000 (09:10 -0800)]
selftests: net: Fix build warnings
Fix
ksft.h: In function ‘ksft_ready’:
ksft.h:27:9: warning: ignoring return value of ‘write’ declared with attribute ‘warn_unused_result’
ksft.h: In function ‘ksft_wait’:
ksft.h:51:9: warning: ignoring return value of ‘read’ declared with attribute ‘warn_unused_result’
by checking the return value of the affected functions and displaying
an error message if an error is seen.
Fixes:
2b6d490b82668 ("selftests: drv-net: Factor out ksft C helpers")
Cc: Joe Damato <jdamato@fastly.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Link: https://patch.msgid.link/20251205171010.515236-11-linux@roeck-us.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Guenter Roeck [Fri, 5 Dec 2025 17:10:00 +0000 (09:10 -0800)]
selftest: af_unix: Support compilers without flex-array-member-not-at-end support
Fix:
gcc: error: unrecognized command-line option ‘-Wflex-array-member-not-at-end’
by making the compiler option dependent on its support.
Fixes:
1838731f1072c ("selftest: af_unix: Add -Wall and -Wflex-array-member-not-at-end to CFLAGS.")
Cc: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Link: https://patch.msgid.link/20251205171010.515236-7-linux@roeck-us.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ankit Khushwaha [Fri, 5 Dec 2025 16:32:42 +0000 (22:02 +0530)]
selftests: tls: fix warning of uninitialized variable
In 'poll_partial_rec_async' a uninitialized char variable 'token' with
is used for write/read instruction to synchronize between threads
via a pipe.
tls.c:2833:26: warning: variable 'token' is uninitialized
when passed as a const pointer argument
Initialize 'token' to '\0' to silence compiler warning.
Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux@gmail.com>
Link: https://patch.msgid.link/20251205163242.14615-1-ankitkhushwaha.linux@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alexey Simakov [Fri, 5 Dec 2025 15:58:16 +0000 (18:58 +0300)]
broadcom: b44: prevent uninitialized value usage
On execution path with raised B44_FLAG_EXTERNAL_PHY, b44_readphy()
leaves bmcr value uninitialized and it is used later in the code.
Add check of this flag at the beginning of the b44_nway_reset() and
exit early of the function with restarting autonegotiation if an
external PHY is used.
Fixes:
753f492093da ("[B44]: port to native ssb support")
Reviewed-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Alexey Simakov <bigalex934@gmail.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20251205155815.4348-1-bigalex934@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
caoping [Thu, 4 Dec 2025 09:10:58 +0000 (01:10 -0800)]
net/handshake: restore destructor on submit failure
handshake_req_submit() replaces sk->sk_destruct but never restores it when
submission fails before the request is hashed. handshake_sk_destruct() then
returns early and the original destructor never runs, leaking the socket.
Restore sk_destruct on the error path.
Fixes:
3b3009ea8abb ("net/handshake: Create a NETLINK service for handling handshake requests")
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: caoping <caoping@cmss.chinamobile.com>
Link: https://patch.msgid.link/20251204091058.1545151-1-caoping@cmss.chinamobile.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Arnd Bergmann [Thu, 4 Dec 2025 10:01:28 +0000 (11:01 +0100)]
net: ti: icssg-prueth: add PTP_1588_CLOCK_OPTIONAL dependency
The new icssg-prueth driver needs the same dependency as the other parts
that use the ptp-1588:
WARNING: unmet direct dependencies detected for TI_ICSS_IEP
Depends on [m]: NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_TI [=y] && PTP_1588_CLOCK_OPTIONAL [=m] && TI_PRUSS [=y]
Selected by [y]:
- TI_PRUETH [=y] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_TI [=y] && PRU_REMOTEPROC [=y] && NET_SWITCHDEV [=y]
Add the correct dependency on the two drivers missing it, and remove
the pointless 'imply' in the process.
Fixes:
e654b85a693e ("net: ti: icssg-prueth: Add ICSSG Ethernet driver for AM65x SR1.0 platforms")
Fixes:
511f6c1ae093 ("net: ti: icssm-prueth: Adds ICSSM Ethernet driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20251204100138.1034175-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ilya Maximets [Thu, 4 Dec 2025 10:53:32 +0000 (11:53 +0100)]
net: openvswitch: fix middle attribute validation in push_nsh() action
The push_nsh() action structure looks like this:
OVS_ACTION_ATTR_PUSH_NSH(OVS_KEY_ATTR_NSH(OVS_NSH_KEY_ATTR_BASE,...))
The outermost OVS_ACTION_ATTR_PUSH_NSH attribute is OK'ed by the
nla_for_each_nested() inside __ovs_nla_copy_actions(). The innermost
OVS_NSH_KEY_ATTR_BASE/MD1/MD2 are OK'ed by the nla_for_each_nested()
inside nsh_key_put_from_nlattr(). But nothing checks if the attribute
in the middle is OK. We don't even check that this attribute is the
OVS_KEY_ATTR_NSH. We just do a double unwrap with a pair of nla_data()
calls - first time directly while calling validate_push_nsh() and the
second time as part of the nla_for_each_nested() macro, which isn't
safe, potentially causing invalid memory access if the size of this
attribute is incorrect. The failure may not be noticed during
validation due to larger netlink buffer, but cause trouble later during
action execution where the buffer is allocated exactly to the size:
BUG: KASAN: slab-out-of-bounds in nsh_hdr_from_nlattr+0x1dd/0x6a0 [openvswitch]
Read of size 184 at addr
ffff88816459a634 by task a.out/22624
CPU: 8 UID: 0 PID: 22624 6.18.0-rc7+ #115 PREEMPT(voluntary)
Call Trace:
<TASK>
dump_stack_lvl+0x51/0x70
print_address_description.constprop.0+0x2c/0x390
kasan_report+0xdd/0x110
kasan_check_range+0x35/0x1b0
__asan_memcpy+0x20/0x60
nsh_hdr_from_nlattr+0x1dd/0x6a0 [openvswitch]
push_nsh+0x82/0x120 [openvswitch]
do_execute_actions+0x1405/0x2840 [openvswitch]
ovs_execute_actions+0xd5/0x3b0 [openvswitch]
ovs_packet_cmd_execute+0x949/0xdb0 [openvswitch]
genl_family_rcv_msg_doit+0x1d6/0x2b0
genl_family_rcv_msg+0x336/0x580
genl_rcv_msg+0x9f/0x130
netlink_rcv_skb+0x11f/0x370
genl_rcv+0x24/0x40
netlink_unicast+0x73e/0xaa0
netlink_sendmsg+0x744/0xbf0
__sys_sendto+0x3d6/0x450
do_syscall_64+0x79/0x2c0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
</TASK>
Let's add some checks that the attribute is properly sized and it's
the only one attribute inside the action. Technically, there is no
real reason for OVS_KEY_ATTR_NSH to be there, as we know that we're
pushing an NSH header already, it just creates extra nesting, but
that's how uAPI works today. So, keeping as it is.
Fixes:
b2d0f5d5dc53 ("openvswitch: enable NSH support")
Reported-by: Junvy Yang <zhuque@tencent.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Eelco Chaudron echaudro@redhat.com
Reviewed-by: Aaron Conole <aconole@redhat.com>
Link: https://patch.msgid.link/20251204105334.900379-1-i.maximets@ovn.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Marc Kleine-Budde [Mon, 1 Dec 2025 18:26:38 +0000 (19:26 +0100)]
can: gs_usb: gs_can_open(): fix error handling
Commit
2603be9e8167 ("can: gs_usb: gs_can_open(): improve error handling")
added missing error handling to the gs_can_open() function.
The driver uses 2 USB anchors to track the allocated URBs: the TX URBs in
struct gs_can::tx_submitted for each netdev and the RX URBs in struct
gs_usb::rx_submitted for the USB device. gs_can_open() allocates the RX
URBs, while TX URBs are allocated during gs_can_start_xmit().
The cleanup in gs_can_open() kills all anchored dev->tx_submitted
URBs (which is not necessary since the netdev is not yet registered), but
misses the parent->rx_submitted URBs.
Fix the problem by killing the rx_submitted instead of the tx_submitted.
Fixes:
2603be9e8167 ("can: gs_usb: gs_can_open(): improve error handling")
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251210-gs_usb-fix-error-handling-v1-1-d6a5a03f10bb@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Arnd Bergmann [Thu, 4 Dec 2025 10:00:09 +0000 (11:00 +0100)]
can: fix build dependency
A recent bugfix introduced a new problem with Kconfig dependencies:
WARNING: unmet direct dependencies detected for CAN_DEV
Depends on [n]: NETDEVICES [=n] && CAN [=m]
Selected by [m]:
- CAN [=m] && NET [=y]
Since the CAN core code now links into the CAN device code, that
particular function needs to be available, though the rest of it
does not.
Revert the incomplete fix and instead use Makefile logic to avoid
the link failure.
Fixes:
cb2dc6d2869a ("can: Kconfig: select CAN driver infrastructure by default")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/
202512091523.zty3CLmc-lkp@intel.com/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20251204100015.1033688-1-arnd@kernel.org
[mkl: removed module option from CAN_DEV help text (thanks Vincent)]
[mkl: removed '&& CAN' from Kconfig dependency (thanks Vincent)]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Jakub Kicinski [Tue, 9 Dec 2025 07:54:05 +0000 (23:54 -0800)]
Merge branch 'mptcp-misc-fixes-for-v6-19-rc1'
Matthieu Baerts says:
====================
mptcp: misc fixes for v6.19-rc1
Here are various unrelated fixes:
- Patches 1-2: ignore unknown in-kernel PM endpoint flags instead of
pretending they are supported. A fix for v5.7.
- Patch 3: avoid potential unnecessary rtx timer expiration. A fix for
v5.15.
- Patch 4: avoid a deadlock on fallback in case of SKB creation failure
while re-injecting.
====================
Link: https://patch.msgid.link/20251205-net-mptcp-misc-fixes-6-19-rc1-v1-0-9e4781a6c1b8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Fri, 5 Dec 2025 18:55:17 +0000 (19:55 +0100)]
mptcp: avoid deadlock on fallback while reinjecting
Jakub reported an MPTCP deadlock at fallback time:
WARNING: possible recursive locking detected
6.18.0-rc7-virtme #1 Not tainted
--------------------------------------------
mptcp_connect/20858 is trying to acquire lock:
ff1100001da18b60 (&msk->fallback_lock){+.-.}-{3:3}, at: __mptcp_try_fallback+0xd8/0x280
but task is already holding lock:
ff1100001da18b60 (&msk->fallback_lock){+.-.}-{3:3}, at: __mptcp_retrans+0x352/0xaa0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&msk->fallback_lock);
lock(&msk->fallback_lock);
*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by mptcp_connect/20858:
#0:
ff1100001da18290 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg+0x114/0x1bc0
#1:
ff1100001db40fd0 (k-sk_lock-AF_INET#2){+.+.}-{0:0}, at: __mptcp_retrans+0x2cb/0xaa0
#2:
ff1100001da18b60 (&msk->fallback_lock){+.-.}-{3:3}, at: __mptcp_retrans+0x352/0xaa0
stack backtrace:
CPU: 0 UID: 0 PID: 20858 Comm: mptcp_connect Not tainted 6.18.0-rc7-virtme #1 PREEMPT(full)
Hardware name: Bochs, BIOS Bochs 01/01/2011
Call Trace:
<TASK>
dump_stack_lvl+0x6f/0xa0
print_deadlock_bug.cold+0xc0/0xcd
validate_chain+0x2ff/0x5f0
__lock_acquire+0x34c/0x740
lock_acquire.part.0+0xbc/0x260
_raw_spin_lock_bh+0x38/0x50
__mptcp_try_fallback+0xd8/0x280
mptcp_sendmsg_frag+0x16c2/0x3050
__mptcp_retrans+0x421/0xaa0
mptcp_release_cb+0x5aa/0xa70
release_sock+0xab/0x1d0
mptcp_sendmsg+0xd5b/0x1bc0
sock_write_iter+0x281/0x4d0
new_sync_write+0x3c5/0x6f0
vfs_write+0x65e/0xbb0
ksys_write+0x17e/0x200
do_syscall_64+0xbb/0xfd0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7fa5627cbc5e
Code: 4d 89 d8 e8 14 bd 00 00 4c 8b 5d f8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 11 c9 c3 0f 1f 80 00 00 00 00 48 8b 45 10 0f 05 <c9> c3 83 e2 39 83 fa 08 75 e7 e8 13 ff ff ff 0f 1f 00 f3 0f 1e fa
RSP: 002b:
00007fff1fe14700 EFLAGS:
00000202 ORIG_RAX:
0000000000000001
RAX:
ffffffffffffffda RBX:
0000000000000005 RCX:
00007fa5627cbc5e
RDX:
0000000000001f9c RSI:
00007fff1fe16984 RDI:
0000000000000005
RBP:
00007fff1fe14710 R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000202 R12:
00007fff1fe16920
R13:
0000000000002000 R14:
0000000000001f9c R15:
0000000000001f9c
The packet scheduler could attempt a reinjection after receiving an
MP_FAIL and before the infinite map has been transmitted, causing a
deadlock since MPTCP needs to do the reinjection atomically from WRT
fallback.
Address the issue explicitly avoiding the reinjection in the critical
scenario. Note that this is the only fallback critical section that
could potentially send packets and hit the double-lock.
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://netdev-ctrl.bots.linux.dev/logs/vmksft/mptcp-dbg/results/412720/1-mptcp-join-sh/stderr
Fixes:
f8a1d9b18c5e ("mptcp: make fallback action and fallback decision atomic")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251205-net-mptcp-misc-fixes-6-19-rc1-v1-4-9e4781a6c1b8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Fri, 5 Dec 2025 18:55:16 +0000 (19:55 +0100)]
mptcp: schedule rtx timer only after pushing data
The MPTCP protocol usually schedule the retransmission timer only
when there is some chances for such retransmissions to happen.
With a notable exception: __mptcp_push_pending() currently schedule
such timer unconditionally, potentially leading to unnecessary rtx
timer expiration.
The issue is present since the blamed commit below but become easily
reproducible after commit
27b0e701d387 ("mptcp: drop bogus optimization
in __mptcp_check_push()")
Fixes:
33d41c9cd74c ("mptcp: more accurate timeout")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251205-net-mptcp-misc-fixes-6-19-rc1-v1-3-9e4781a6c1b8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts (NGI0) [Fri, 5 Dec 2025 18:55:15 +0000 (19:55 +0100)]
selftests: mptcp: pm: ensure unknown flags are ignored
This validates the previous commit: the userspace can set unknown flags
-- the 7th bit is currently unused -- without errors, but only the
supported ones are printed in the endpoints dumps.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
Fixes:
01cacb00b35c ("mptcp: add netlink-based PM")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251205-net-mptcp-misc-fixes-6-19-rc1-v1-2-9e4781a6c1b8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts (NGI0) [Fri, 5 Dec 2025 18:55:14 +0000 (19:55 +0100)]
mptcp: pm: ignore unknown endpoint flags
Before this patch, the kernel was saving any flags set by the userspace,
even unknown ones. This doesn't cause critical issues because the kernel
is only looking at specific ones. But on the other hand, endpoints dumps
could tell the userspace some recent flags seem to be supported on older
kernel versions.
Instead, ignore all unknown flags when parsing them. By doing that, the
userspace can continue to set unsupported flags, but it has a way to
verify what is supported by the kernel.
Note that it sounds better to continue accepting unsupported flags not
to change the behaviour, but also that eases things on the userspace
side by adding "optional" endpoint types only supported by newer kernel
versions without having to deal with the different kernel versions.
A note for the backports: there will be conflicts in mptcp.h on older
versions not having the mentioned flags, the new line should still be
added last, and the '5' needs to be adapted to have the same value as
the last entry.
Fixes:
01cacb00b35c ("mptcp: add netlink-based PM")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251205-net-mptcp-misc-fixes-6-19-rc1-v1-1-9e4781a6c1b8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sun, 7 Dec 2025 01:38:48 +0000 (17:38 -0800)]
tools: ynl: fix build on systems with old kernel headers
The wireguard YNL conversion was missing the customary .deps entry.
NIPA doesn't catch this but my CentOS 9 system complains:
wireguard-user.c:72:10: error: ‘WGALLOWEDIP_A_FLAGS’ undeclared here
wireguard-user.c:58:67: error: parameter 1 (‘value’) has incomplete type
58 | const char *wireguard_wgallowedip_flags_str(enum wgallowedip_flag value)
| ~~~~~~~~~~~~~~~~~~~~~~^~~~~
And similarly does Ubuntu 22.04.
One extra complication here is that we renamed the header guard,
so we need to compat with both old and new guard define.
Reviewed-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Link: https://patch.msgid.link/20251207013848.1692990-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sun, 7 Dec 2025 00:47:40 +0000 (16:47 -0800)]
ynl: add regen hint to new headers
Recent commit
68e83f347266 ("tools: ynl-gen: add regeneration comment")
added a hint how to regenerate the code to the headers. Update
the new headers from this release cycle to also include it.
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251207004740.1657799-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Biggers [Thu, 4 Dec 2025 05:44:17 +0000 (21:44 -0800)]
mptcp: select CRYPTO_LIB_UTILS instead of CRYPTO
Since the only crypto functions used by the mptcp code are the SHA-256
library functions and crypto_memneq(), select only the options needed
for those: CRYPTO_LIB_SHA256 and CRYPTO_LIB_UTILS.
Previously, CRYPTO was selected instead of CRYPTO_LIB_UTILS. That does
pull in CRYPTO_LIB_UTILS as well, but it's unnecessarily broad.
Years ago, the CRYPTO_LIB_* options were visible only when CRYPTO. That
may be another reason why CRYPTO is selected here. However, that was
fixed years ago, and the libraries can now be selected directly.
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Link: https://patch.msgid.link/20251204054417.491439-1-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mateusz Guzik [Wed, 3 Dec 2025 10:01:22 +0000 (11:01 +0100)]
af_unix: annotate unix_gc_lock with __cacheline_aligned_in_smp
Otherwise the lock is susceptible to ever-changing false-sharing due to
unrelated changes. This in particular popped up here where an unrelated
change improved performance:
https://lore.kernel.org/oe-lkp/
202511281306.
51105b46-lkp@intel.com/
Stabilize it with an explicit annotation which also has a side effect
of furher improving scalability:
> in our oiginal report,
284922f4c5 has a 6.1% performance improvement comparing
> to parent
17d85f33a8.
> we applied your patch directly upon
284922f4c5. as below, now by
> "
284922f4c5 + your patch"
> we observe a 12.8% performance improvements (still comparing to
17d85f33a8).
Note nothing was done for the other fields, so some fluctuation is still
possible.
Tested-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251203100122.291550-1-mjguzik@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Chan [Wed, 3 Dec 2025 00:30:24 +0000 (16:30 -0800)]
bnxt_en: Fix XDP_TX path
For XDP_TX action in bnxt_rx_xdp(), clearing of the event flags is not
correct. __bnxt_poll_work() -> bnxt_rx_pkt() -> bnxt_rx_xdp() may be
looping within NAPI and some event flags may be set in earlier
iterations. In particular, if BNXT_TX_EVENT is set earlier indicating
some XDP_TX packets are ready and pending, it will be cleared if it is
XDP_TX action again. Normally, we will set BNXT_TX_EVENT again when we
successfully call __bnxt_xmit_xdp(). But if the TX ring has no more
room, the flag will not be set. This will cause the TX producer to be
ahead but the driver will not hit the TX doorbell.
For multi-buf XDP_TX, there is no need to clear the event flags and set
BNXT_AGG_EVENT. The BNXT_AGG_EVENT flag should have been set earlier in
bnxt_rx_pkt().
The visible symptom of this is that the RX ring associated with the
TX XDP ring will eventually become empty and all packets will be dropped.
Because this condition will cause the driver to not refill the RX ring
seeing that the TX ring has forever pending XDP_TX packets.
The fix is to only clear BNXT_RX_EVENT when we have successfully
called __bnxt_xmit_xdp().
Fixes:
7f0a168b0441 ("bnxt_en: Add completion ring pointer in TX and RX ring structures")
Reported-by: Pavel Dubovitsky <pdubovitsky@meta.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20251203003024.2246699-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tim Hostetler [Tue, 2 Dec 2025 20:02:07 +0000 (20:02 +0000)]
gve: Move gve_init_clock to after AQ CONFIGURE_DEVICE_RESOURCES call
commit
46e7860ef941 ("gve: Move ptp_schedule_worker to gve_init_clock")
moved the first invocation of the AQ command REPORT_NIC_TIMESTAMP to
gve_probe(). However, gve_init_clock() invoking REPORT_NIC_TIMESTAMP is
not valid until after gve_probe() invokes the AQ command
CONFIGURE_DEVICE_RESOURCES.
Failure to do so results in the following error:
gve 0000:00:07.0: failed to read NIC clock -11
This was missed earlier because the driver under test was loaded at
runtime instead of boot-time. The boot-time driver had already
initialized the device, causing the runtime driver to successfully call
gve_init_clock() incorrectly.
Fixes:
46e7860ef941 ("gve: Move ptp_schedule_worker to gve_init_clock")
Reviewed-by: Ankit Garg <nktgrg@google.com>
Signed-off-by: Tim Hostetler <thostet@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251202200207.1434749-1-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
René Rebe [Tue, 2 Dec 2025 18:41:37 +0000 (19:41 +0100)]
r8169: fix RTL8117 Wake-on-Lan in DASH mode
Wake-on-Lan does currently not work for r8169 in DASH mode, e.g. the
ASUS Pro WS X570-ACE with RTL8168fp/RTL8117.
Fix by not returning early in rtl_prepare_power_down when dash_enabled.
While this fixes WoL, it still kills the OOB RTL8117 remote management
BMC connection. Fix by not calling rtl8168_driver_stop if WoL is enabled.
Fixes:
065c27c184d6 ("r8169: phy power ops")
Signed-off-by: René Rebe <rene@exactco.de>
Cc: stable@vger.kernel.org
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/20251202.194137.1647877804487085954.rene@exactco.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 5 Dec 2025 01:53:52 +0000 (17:53 -0800)]
Merge branch 'mlxsw-three-m-router-fixes'
Petr Machata says:
====================
mlxsw: Three (m)router fixes
This patchset contains two fixes in mlxsw Spectrum router code, and one for
the Spectrum multicast router code. Please see the individual patches for
more details.
====================
Link: https://patch.msgid.link/cover.1764695650.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Tue, 2 Dec 2025 17:44:13 +0000 (18:44 +0100)]
mlxsw: spectrum_mr: Fix use-after-free when updating multicast route stats
Cited commit added a dedicated mutex (instead of RTNL) to protect the
multicast route list, so that it will not change while the driver
periodically traverses it in order to update the kernel about multicast
route stats that were queried from the device.
One instance of list entry deletion (during route replace) was missed
and it can result in a use-after-free [1].
Fix by acquiring the mutex before deleting the entry from the list and
releasing it afterwards.
[1]
BUG: KASAN: slab-use-after-free in mlxsw_sp_mr_stats_update+0x4a5/0x540 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c:1006 [mlxsw_spectrum]
Read of size 8 at addr
ffff8881523c2fa8 by task kworker/2:5/22043
CPU: 2 UID: 0 PID: 22043 Comm: kworker/2:5 Not tainted
6.18.0-rc1-custom-g1a3d6d7cd014 #1 PREEMPT(full)
Hardware name: Mellanox Technologies Ltd. MSN2010/SA002610, BIOS 5.6.5 08/24/2017
Workqueue: mlxsw_core mlxsw_sp_mr_stats_update [mlxsw_spectrum]
Call Trace:
<TASK>
dump_stack_lvl+0xba/0x110
print_report+0x174/0x4f5
kasan_report+0xdf/0x110
mlxsw_sp_mr_stats_update+0x4a5/0x540 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c:1006 [mlxsw_spectrum]
process_one_work+0x9cc/0x18e0
worker_thread+0x5df/0xe40
kthread+0x3b8/0x730
ret_from_fork+0x3e9/0x560
ret_from_fork_asm+0x1a/0x30
</TASK>
Allocated by task 29933:
kasan_save_stack+0x30/0x50
kasan_save_track+0x14/0x30
__kasan_kmalloc+0x8f/0xa0
mlxsw_sp_mr_route_add+0xd8/0x4770 [mlxsw_spectrum]
mlxsw_sp_router_fibmr_event_work+0x371/0xad0 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:7965 [mlxsw_spectrum]
process_one_work+0x9cc/0x18e0
worker_thread+0x5df/0xe40
kthread+0x3b8/0x730
ret_from_fork+0x3e9/0x560
ret_from_fork_asm+0x1a/0x30
Freed by task 29933:
kasan_save_stack+0x30/0x50
kasan_save_track+0x14/0x30
__kasan_save_free_info+0x3b/0x70
__kasan_slab_free+0x43/0x70
kfree+0x14e/0x700
mlxsw_sp_mr_route_add+0x2dea/0x4770 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c:444 [mlxsw_spectrum]
mlxsw_sp_router_fibmr_event_work+0x371/0xad0 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:7965 [mlxsw_spectrum]
process_one_work+0x9cc/0x18e0
worker_thread+0x5df/0xe40
kthread+0x3b8/0x730
ret_from_fork+0x3e9/0x560
ret_from_fork_asm+0x1a/0x30
Fixes:
f38656d06725 ("mlxsw: spectrum_mr: Protect multicast route list with a lock")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/f996feecfd59fde297964bfc85040b6d83ec6089.1764695650.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Tue, 2 Dec 2025 17:44:12 +0000 (18:44 +0100)]
mlxsw: spectrum_router: Fix neighbour use-after-free
We sometimes observe use-after-free when dereferencing a neighbour [1].
The problem seems to be that the driver stores a pointer to the
neighbour, but without holding a reference on it. A reference is only
taken when the neighbour is used by a nexthop.
Fix by simplifying the reference counting scheme. Always take a
reference when storing a neighbour pointer in a neighbour entry. Avoid
taking a referencing when the neighbour is used by a nexthop as the
neighbour entry associated with the nexthop already holds a reference.
Tested by running the test that uncovered the problem over 300 times.
Without this patch the problem was reproduced after a handful of
iterations.
[1]
BUG: KASAN: slab-use-after-free in mlxsw_sp_neigh_entry_update+0x2d4/0x310
Read of size 8 at addr
ffff88817f8e3420 by task ip/3929
CPU: 3 UID: 0 PID: 3929 Comm: ip Not tainted
6.18.0-rc4-virtme-g36b21a067510 #3 PREEMPT(full)
Hardware name: Nvidia SN5600/VMOD0013, BIOS 5.13 05/31/2023
Call Trace:
<TASK>
dump_stack_lvl+0x6f/0xa0
print_address_description.constprop.0+0x6e/0x300
print_report+0xfc/0x1fb
kasan_report+0xe4/0x110
mlxsw_sp_neigh_entry_update+0x2d4/0x310
mlxsw_sp_router_rif_gone_sync+0x35f/0x510
mlxsw_sp_rif_destroy+0x1ea/0x730
mlxsw_sp_inetaddr_port_vlan_event+0xa1/0x1b0
__mlxsw_sp_inetaddr_lag_event+0xcc/0x130
__mlxsw_sp_inetaddr_event+0xf5/0x3c0
mlxsw_sp_router_netdevice_event+0x1015/0x1580
notifier_call_chain+0xcc/0x150
call_netdevice_notifiers_info+0x7e/0x100
__netdev_upper_dev_unlink+0x10b/0x210
netdev_upper_dev_unlink+0x79/0xa0
vrf_del_slave+0x18/0x50
do_set_master+0x146/0x7d0
do_setlink.isra.0+0x9a0/0x2880
rtnl_newlink+0x637/0xb20
rtnetlink_rcv_msg+0x6fe/0xb90
netlink_rcv_skb+0x123/0x380
netlink_unicast+0x4a3/0x770
netlink_sendmsg+0x75b/0xc90
__sock_sendmsg+0xbe/0x160
____sys_sendmsg+0x5b2/0x7d0
___sys_sendmsg+0xfd/0x180
__sys_sendmsg+0x124/0x1c0
do_syscall_64+0xbb/0xfd0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
[...]
Allocated by task 109:
kasan_save_stack+0x30/0x50
kasan_save_track+0x14/0x30
__kasan_kmalloc+0x7b/0x90
__kmalloc_noprof+0x2c1/0x790
neigh_alloc+0x6af/0x8f0
___neigh_create+0x63/0xe90
mlxsw_sp_nexthop_neigh_init+0x430/0x7e0
mlxsw_sp_nexthop_type_init+0x212/0x960
mlxsw_sp_nexthop6_group_info_init.constprop.0+0x81f/0x1280
mlxsw_sp_nexthop6_group_get+0x392/0x6a0
mlxsw_sp_fib6_entry_create+0x46a/0xfd0
mlxsw_sp_router_fib6_replace+0x1ed/0x5f0
mlxsw_sp_router_fib6_event_work+0x10a/0x2a0
process_one_work+0xd57/0x1390
worker_thread+0x4d6/0xd40
kthread+0x355/0x5b0
ret_from_fork+0x1d4/0x270
ret_from_fork_asm+0x11/0x20
Freed by task 154:
kasan_save_stack+0x30/0x50
kasan_save_track+0x14/0x30
__kasan_save_free_info+0x3b/0x60
__kasan_slab_free+0x43/0x70
kmem_cache_free_bulk.part.0+0x1eb/0x5e0
kvfree_rcu_bulk+0x1f2/0x260
kfree_rcu_work+0x130/0x1b0
process_one_work+0xd57/0x1390
worker_thread+0x4d6/0xd40
kthread+0x355/0x5b0
ret_from_fork+0x1d4/0x270
ret_from_fork_asm+0x11/0x20
Last potentially related work creation:
kasan_save_stack+0x30/0x50
kasan_record_aux_stack+0x8c/0xa0
kvfree_call_rcu+0x93/0x5b0
mlxsw_sp_router_neigh_event_work+0x67d/0x860
process_one_work+0xd57/0x1390
worker_thread+0x4d6/0xd40
kthread+0x355/0x5b0
ret_from_fork+0x1d4/0x270
ret_from_fork_asm+0x11/0x20
Fixes:
6cf3c971dc84 ("mlxsw: spectrum_router: Add private neigh table")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/92d75e21d95d163a41b5cea67a15cd33f547cba6.1764695650.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Tue, 2 Dec 2025 17:44:11 +0000 (18:44 +0100)]
mlxsw: spectrum_router: Fix possible neighbour reference count leak
mlxsw_sp_router_schedule_work() takes a reference on a neighbour,
expecting a work item to release it later on. However, we might fail to
schedule the work item, in which case the neighbour reference count will
be leaked.
Fix by taking the reference just before scheduling the work item. Note
that mlxsw_sp_router_schedule_work() can receive a NULL neighbour
pointer, but neigh_clone() handles that correctly.
Spotted during code review, did not actually observe the reference count
leak.
Fixes:
151b89f6025a ("mlxsw: spectrum_router: Reuse work neighbor initialization in work scheduler")
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/ec2934ae4aca187a8d8c9329a08ce93cca411378.1764695650.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Thorsten Blum [Tue, 2 Dec 2025 17:27:44 +0000 (18:27 +0100)]
net: phy: marvell-88q2xxx: Fix clamped value in mv88q2xxx_hwmon_write
The local variable 'val' was never clamped to -75000 or 180000 because
the return value of clamp_val() was not used. Fix this by assigning the
clamped value back to 'val', and use clamp() instead of clamp_val().
Cc: stable@vger.kernel.org
Fixes:
a557a92e6881 ("net: phy: marvell-88q2xxx: add support for temperature sensor")
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Dimitri Fedrau <dima.fedrau@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251202172743.453055-3-thorsten.blum@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gerd Bayer [Tue, 2 Dec 2025 11:12:57 +0000 (12:12 +0100)]
net/mlx5: Fix double unregister of HCA_PORTS component
Clear hca_devcom_comp in device's private data after unregistering it in
LAG teardown. Otherwise a slightly lagging second pass through
mlx5_unload_one() might try to unregister it again and trip over
use-after-free.
On s390 almost all PCI level recovery events trigger two passes through
mxl5_unload_one() - one through the poll_health() method and one through
mlx5_pci_err_detected() as callback from generic PCI error recovery.
While testing PCI error recovery paths with more kernel debug features
enabled, this issue reproducibly led to kernel panics with the following
call chain:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address:
6b6b6b6b6b6b6000 TEID:
6b6b6b6b6b6b6803 ESOP-2 FSI
Fault in home space mode while using kernel ASCE.
AS:
00000000705c4007 R3:
0000000000000024
Oops: 0038 ilc:3 [#1]SMP
CPU: 14 UID: 0 PID: 156 Comm: kmcheck Kdump: loaded Not tainted
6.18.0-
20251130.rc7.git0.
16131a59cab1.300.fc43.s390x+debug #1 PREEMPT
Krnl PSW :
0404e00180000000 0000020fc86aa1dc (__lock_acquire+0x5c/0x15f0)
R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
Krnl GPRS:
0000000000000000 0000020f00000001 6b6b6b6b6b6b6c33 0000000000000000
0000000000000000 0000000000000000 0000000000000001 0000000000000000
0000000000000000 0000020fca28b820 0000000000000000 0000010a1ced8100
0000010a1ced8100 0000020fc9775068 0000018fce14f8b8 0000018fce14f7f8
Krnl Code:
0000020fc86aa1cc:
e3b003400004 lg %r11,832
0000020fc86aa1d2:
a7840211 brc 8,
0000020fc86aa5f4
*
0000020fc86aa1d6:
c09000df0b25 larl %r9,
0000020fca28b820
>
0000020fc86aa1dc:
d50790002000 clc 0(8,%r9),0(%r2)
0000020fc86aa1e2:
a7840209 brc 8,
0000020fc86aa5f4
0000020fc86aa1e6:
c0e001100401 larl %r14,
0000020fca8aa9e8
0000020fc86aa1ec:
c01000e25a00 larl %r1,
0000020fca2f55ec
0000020fc86aa1f2:
a7eb00e8 aghi %r14,232
Call Trace:
__lock_acquire+0x5c/0x15f0
lock_acquire.part.0+0xf8/0x270
lock_acquire+0xb0/0x1b0
down_write+0x5a/0x250
mlx5_detach_device+0x42/0x110 [mlx5_core]
mlx5_unload_one_devl_locked+0x50/0xc0 [mlx5_core]
mlx5_unload_one+0x42/0x60 [mlx5_core]
mlx5_pci_err_detected+0x94/0x150 [mlx5_core]
zpci_event_attempt_error_recovery+0xcc/0x388
Fixes:
5a977b5833b7 ("net/mlx5: Lag, move devcom registration to LAG layer")
Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20251202-fix_lag-v1-1-59e8177ffce0@linux.ibm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Dmitry Skorodumov [Tue, 2 Dec 2025 10:39:03 +0000 (13:39 +0300)]
ipvlan: Ignore PACKET_LOOPBACK in handle_mode_l2()
Packets with pkt_type == PACKET_LOOPBACK are captured by
handle_frame() function, but they don't have L2 header.
We should not process them in handle_mode_l2().
This doesn't affect old L2 functionality, since handling
was anyway incorrect.
Handle them the same way as in br_handle_frame():
just pass the skb.
To observe invalid behaviour, just start "ping -b" on bcast address
of port-interface.
Fixes:
2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.")
Signed-off-by: Dmitry Skorodumov <skorodumov.dmitry@huawei.com>
Link: https://patch.msgid.link/20251202103906.4087675-1-skorodumov.dmitry@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Golle [Tue, 2 Dec 2025 09:57:21 +0000 (09:57 +0000)]
net: dsa: mxl-gsw1xx: fix SerDes RX polarity
According to MaxLinear engineer Benny Weng the RX lane of the SerDes
port of the GSW1xx switches is inverted in hardware, and the
SGMII_PHY_RX0_CFG2_INVERT bit is set by default in order to compensate
for that. Hence also set the SGMII_PHY_RX0_CFG2_INVERT bit by default in
gsw1xx_pcs_reset().
Fixes:
22335939ec90 ("net: dsa: add driver for MaxLinear GSW1xx switch family")
Reported-by: Rasmus Villemoes <ravi@prevas.dk>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/ca10e9f780c0152ecf9ae8cbac5bf975802e8f99.1764668951.git.daniel@makrotopia.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Ivan Galkin [Tue, 2 Dec 2025 09:07:42 +0000 (10:07 +0100)]
net: phy: RTL8211FVD: Restore disabling of PHY-mode EEE
When support for RTL8211F(D)(I)-VD-CG was introduced in commit
bb726b753f75 ("net: phy: realtek: add support for RTL8211F(D)(I)-VD-CG")
the implementation assumed that this PHY model doesn't have the
control register PHYCR2 (Page 0xa43 Address 0x19). This
assumption was based on the differences in CLKOUT configurations
between RTL8211FVD and the remaining RTL8211F PHYs. In the latter
commit
2c67301584f2
("net: phy: realtek: Avoid PHYCR2 access if PHYCR2 not present")
this assumption was expanded to the PHY-mode EEE.
I performed tests on RTL8211FI-VD-CG and confirmed that disabling
PHY-mode EEE works correctly and is uniform with other PHYs
supported by the driver. To validate the correctness,
I contacted Realtek support. Realtek confirmed that PHY-mode EEE on
RTL8211F(D)(I)-VD-CG is configured via Page 0xa43 Address 0x19 bit 5.
Moreover, Realtek informed me that the most recent datasheet
for RTL8211F(D)(I)-VD-CG v1.1 is incomplete and the naming of
control registers is partly inconsistent. The errata I
received from Realtek corrects the naming as follows:
| Register | Datasheet v1.1 | Errata |
|-------------------------|----------------|--------|
| Page 0xa44 Address 0x11 | PHYCR2 | PHYCR3 |
| Page 0xa43 Address 0x19 | N/A | PHYCR2 |
This information confirms that the supposedly missing control register,
PHYCR2, exists in the RTL8211F(D)(I)-VD-CG under the same address and
the same name. It controls widely the same configs as other PHYs from
the RTL8211F series (e.g. PHY-mode EEE). Clock out configuration is an
exception.
Given all this information, restore disabling of the PHY-mode EEE.
Fixes:
2c67301584f2 ("net: phy: realtek: Avoid PHYCR2 access if PHYCR2 not present")
Signed-off-by: Ivan Galkin <ivan.galkin@axis.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251202-phy_eee-v1-1-fe0bf6ab3df0@axis.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 4 Dec 2025 10:55:21 +0000 (11:55 +0100)]
Merge branch 'mlx5-misc-fixes-2025-12-01'
Tariq Toukan says:
====================
mlx5 misc fixes 2025-12-01
This small patchset provides misc bug fixes from the team to the mlx5
core and Eth drivers.
====================
Link: https://patch.msgid.link/1764602008-1334866-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Cosmin Ratiu [Mon, 1 Dec 2025 15:13:28 +0000 (17:13 +0200)]
net/mlx5e: Avoid unregistering PSP twice
PSP is unregistered twice in:
_mlx5e_remove -> mlx5e_psp_unregister
mlx5e_nic_cleanup -> mlx5e_psp_unregister
This leads to a refcount underflow in some conditions:
------------[ cut here ]------------
refcount_t: underflow; use-after-free.
WARNING: CPU: 2 PID: 1694 at lib/refcount.c:28 refcount_warn_saturate+0xd8/0xe0
[...]
mlx5e_psp_unregister+0x26/0x50 [mlx5_core]
mlx5e_nic_cleanup+0x26/0x90 [mlx5_core]
mlx5e_remove+0xe6/0x1f0 [mlx5_core]
auxiliary_bus_remove+0x18/0x30
device_release_driver_internal+0x194/0x1f0
bus_remove_device+0xc6/0x130
device_del+0x159/0x3c0
mlx5_rescan_drivers_locked+0xbc/0x2a0 [mlx5_core]
[...]
Do not directly remove psp from the _mlx5e_remove path, the PSP cleanup
happens as part of profile cleanup.
Fixes:
89ee2d92f66c ("net/mlx5e: Support PSP offload functionality")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1764602008-1334866-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Moshe Shemesh [Mon, 1 Dec 2025 15:13:27 +0000 (17:13 +0200)]
net/mlx5: make enable_mpesw idempotent
The enable_mpesw() function returns -EINVAL if ldev->mode is not
MLX5_LAG_MODE_NONE. This means attempting to enable MPESW mode when it's
already enabled will fail. In contrast, disable_mpesw() properly checks
if the mode is MLX5_LAG_MODE_MPESW before proceeding, making it
naturally idempotent and safe to call multiple times.
Fix enable_mpesw() to return success if mpesw is already enabled.
Fixes:
a32327a3a02c ("net/mlx5: Lag, Control MultiPort E-Switch single FDB mode")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1764602008-1334866-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jamal Hadi Salim [Fri, 28 Nov 2025 15:19:19 +0000 (10:19 -0500)]
net/sched: ets: Always remove class from active list before deleting in ets_qdisc_change
zdi-disclosures@trendmicro.com says:
The vulnerability is a race condition between `ets_qdisc_dequeue` and
`ets_qdisc_change`. It leads to UAF on `struct Qdisc` object.
Attacker requires the capability to create new user and network namespace
in order to trigger the bug.
See my additional commentary at the end of the analysis.
Analysis:
static int ets_qdisc_change(struct Qdisc *sch, struct nlattr *opt,
struct netlink_ext_ack *extack)
{
...
// (1) this lock is preventing .change handler (`ets_qdisc_change`)
//to race with .dequeue handler (`ets_qdisc_dequeue`)
sch_tree_lock(sch);
for (i = nbands; i < oldbands; i++) {
if (i >= q->nstrict && q->classes[i].qdisc->q.qlen)
list_del_init(&q->classes[i].alist);
qdisc_purge_queue(q->classes[i].qdisc);
}
WRITE_ONCE(q->nbands, nbands);
for (i = nstrict; i < q->nstrict; i++) {
if (q->classes[i].qdisc->q.qlen) {
// (2) the class is added to the q->active
list_add_tail(&q->classes[i].alist, &q->active);
q->classes[i].deficit = quanta[i];
}
}
WRITE_ONCE(q->nstrict, nstrict);
memcpy(q->prio2band, priomap, sizeof(priomap));
for (i = 0; i < q->nbands; i++)
WRITE_ONCE(q->classes[i].quantum, quanta[i]);
for (i = oldbands; i < q->nbands; i++) {
q->classes[i].qdisc = queues[i];
if (q->classes[i].qdisc != &noop_qdisc)
qdisc_hash_add(q->classes[i].qdisc, true);
}
// (3) the qdisc is unlocked, now dequeue can be called in parallel
// to the rest of .change handler
sch_tree_unlock(sch);
ets_offload_change(sch);
for (i = q->nbands; i < oldbands; i++) {
// (4) we're reducing the refcount for our class's qdisc and
// freeing it
qdisc_put(q->classes[i].qdisc);
// (5) If we call .dequeue between (4) and (5), we will have
// a strong UAF and we can control RIP
q->classes[i].qdisc = NULL;
WRITE_ONCE(q->classes[i].quantum, 0);
q->classes[i].deficit = 0;
gnet_stats_basic_sync_init(&q->classes[i].bstats);
memset(&q->classes[i].qstats, 0, sizeof(q->classes[i].qstats));
}
return 0;
}
Comment:
This happens because some of the classes have their qdiscs assigned to
NULL, but remain in the active list. This commit fixes this issue by always
removing the class from the active list before deleting and freeing its
associated qdisc
Reproducer Steps
(trimmed version of what was sent by zdi-disclosures@trendmicro.com)
```
DEV="${DEV:-lo}"
ROOT_HANDLE="${ROOT_HANDLE:-1:}"
BAND2_HANDLE="${BAND2_HANDLE:-20:}" # child under 1:2
PING_BYTES="${PING_BYTES:-48}"
PING_COUNT="${PING_COUNT:-200000}"
PING_DST="${PING_DST:-127.0.0.1}"
SLOW_TBF_RATE="${SLOW_TBF_RATE:-8bit}"
SLOW_TBF_BURST="${SLOW_TBF_BURST:-100b}"
SLOW_TBF_LAT="${SLOW_TBF_LAT:-1s}"
cleanup() {
tc qdisc del dev "$DEV" root 2>/dev/null
}
trap cleanup EXIT
ip link set "$DEV" up
tc qdisc del dev "$DEV" root 2>/dev/null || true
tc qdisc add dev "$DEV" root handle "$ROOT_HANDLE" ets bands 2 strict 2
tc qdisc add dev "$DEV" parent 1:2 handle "$BAND2_HANDLE" \
tbf rate "$SLOW_TBF_RATE" burst "$SLOW_TBF_BURST" latency "$SLOW_TBF_LAT"
tc filter add dev "$DEV" parent 1: protocol all prio 1 u32 match u32 0 0 flowid 1:2
tc -s qdisc ls dev $DEV
ping -I "$DEV" -f -c "$PING_COUNT" -s "$PING_BYTES" -W 0.001 "$PING_DST" \
>/dev/null 2>&1 &
tc qdisc change dev "$DEV" root handle "$ROOT_HANDLE" ets bands 2 strict 0
tc qdisc change dev "$DEV" root handle "$ROOT_HANDLE" ets bands 2 strict 2
tc -s qdisc ls dev $DEV
tc qdisc del dev "$DEV" parent 1:2 || true
tc -s qdisc ls dev $DEV
tc qdisc change dev "$DEV" root handle "$ROOT_HANDLE" ets bands 1 strict 1
```
KASAN report
```
==================================================================
BUG: KASAN: slab-use-after-free in ets_qdisc_dequeue+0x1071/0x11b0 kernel/net/sched/sch_ets.c:481
Read of size 8 at addr
ffff8880502fc018 by task ping/12308
>
CPU: 0 UID: 0 PID: 12308 Comm: ping Not tainted 6.18.0-rc4-dirty #1 PREEMPT(full)
Hardware name: QEMU Ubuntu 25.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
Call Trace:
<IRQ>
__dump_stack kernel/lib/dump_stack.c:94
dump_stack_lvl+0x100/0x190 kernel/lib/dump_stack.c:120
print_address_description kernel/mm/kasan/report.c:378
print_report+0x156/0x4c9 kernel/mm/kasan/report.c:482
kasan_report+0xdf/0x110 kernel/mm/kasan/report.c:595
ets_qdisc_dequeue+0x1071/0x11b0 kernel/net/sched/sch_ets.c:481
dequeue_skb kernel/net/sched/sch_generic.c:294
qdisc_restart kernel/net/sched/sch_generic.c:399
__qdisc_run+0x1c9/0x1b00 kernel/net/sched/sch_generic.c:417
__dev_xmit_skb kernel/net/core/dev.c:4221
__dev_queue_xmit+0x2848/0x4410 kernel/net/core/dev.c:4729
dev_queue_xmit kernel/./include/linux/netdevice.h:3365
[...]
Allocated by task 17115:
kasan_save_stack+0x30/0x50 kernel/mm/kasan/common.c:56
kasan_save_track+0x14/0x30 kernel/mm/kasan/common.c:77
poison_kmalloc_redzone kernel/mm/kasan/common.c:400
__kasan_kmalloc+0xaa/0xb0 kernel/mm/kasan/common.c:417
kasan_kmalloc kernel/./include/linux/kasan.h:262
__do_kmalloc_node kernel/mm/slub.c:5642
__kmalloc_node_noprof+0x34e/0x990 kernel/mm/slub.c:5648
kmalloc_node_noprof kernel/./include/linux/slab.h:987
qdisc_alloc+0xb8/0xc30 kernel/net/sched/sch_generic.c:950
qdisc_create_dflt+0x93/0x490 kernel/net/sched/sch_generic.c:1012
ets_class_graft+0x4fd/0x800 kernel/net/sched/sch_ets.c:261
qdisc_graft+0x3e4/0x1780 kernel/net/sched/sch_api.c:1196
[...]
Freed by task 9905:
kasan_save_stack+0x30/0x50 kernel/mm/kasan/common.c:56
kasan_save_track+0x14/0x30 kernel/mm/kasan/common.c:77
__kasan_save_free_info+0x3b/0x70 kernel/mm/kasan/generic.c:587
kasan_save_free_info kernel/mm/kasan/kasan.h:406
poison_slab_object kernel/mm/kasan/common.c:252
__kasan_slab_free+0x5f/0x80 kernel/mm/kasan/common.c:284
kasan_slab_free kernel/./include/linux/kasan.h:234
slab_free_hook kernel/mm/slub.c:2539
slab_free kernel/mm/slub.c:6630
kfree+0x144/0x700 kernel/mm/slub.c:6837
rcu_do_batch kernel/kernel/rcu/tree.c:2605
rcu_core+0x7c0/0x1500 kernel/kernel/rcu/tree.c:2861
handle_softirqs+0x1ea/0x8a0 kernel/kernel/softirq.c:622
__do_softirq kernel/kernel/softirq.c:656
[...]
Commentary:
1. Maher Azzouzi working with Trend Micro Zero Day Initiative was reported as
the person who found the issue. I requested to get a proper email to add to the
reported-by tag but got no response. For this reason i will credit the person
i exchanged emails with i.e zdi-disclosures@trendmicro.com
2. Neither i nor Victor who did a much more thorough testing was able to
reproduce a UAF with the PoC or other approaches we tried. We were both able to
reproduce a null ptr deref. After exchange with zdi-disclosures@trendmicro.com
they sent a small change to be made to the code to add an extra delay which
was able to simulate the UAF. i.e, this:
qdisc_put(q->classes[i].qdisc);
mdelay(90);
q->classes[i].qdisc = NULL;
I was informed by Thomas Gleixner(tglx@linutronix.de) that adding delays was
acceptable approach for demonstrating the bug, quote:
"Adding such delays is common exploit validation practice"
The equivalent delay could happen "by virt scheduling the vCPU out, SMIs,
NMIs, PREEMPT_RT enabled kernel"
3. I asked the OP to test and report back but got no response and after a
few days gave up and proceeded to submit this fix.
Fixes:
de6d25924c2a ("net/sched: sch_ets: don't peek at classes beyond 'nbands'")
Reported-by: zdi-disclosures@trendmicro.com
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Link: https://patch.msgid.link/20251128151919.576920-1-jhs@mojatatu.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Shaurya Rane [Sat, 29 Nov 2025 09:37:18 +0000 (15:07 +0530)]
net/hsr: fix NULL pointer dereference in prp_get_untagged_frame()
prp_get_untagged_frame() calls __pskb_copy() to create frame->skb_std
but doesn't check if the allocation failed. If __pskb_copy() returns
NULL, skb_clone() is called with a NULL pointer, causing a crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc000000000f: 0000 [#1] SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000078-0x000000000000007f]
CPU: 0 UID: 0 PID: 5625 Comm: syz.1.18 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:skb_clone+0xd7/0x3a0 net/core/skbuff.c:2041
Code: 03 42 80 3c 20 00 74 08 4c 89 f7 e8 23 29 05 f9 49 83 3e 00 0f 85 a0 01 00 00 e8 94 dd 9d f8 48 8d 6b 7e 49 89 ee 49 c1 ee 03 <43> 0f b6 04 26 84 c0 0f 85 d1 01 00 00 44 0f b6 7d 00 41 83 e7 0c
RSP: 0018:
ffffc9000d00f200 EFLAGS:
00010207
RAX:
ffffffff892235a1 RBX:
0000000000000000 RCX:
ffff88803372a480
RDX:
0000000000000000 RSI:
0000000000000820 RDI:
0000000000000000
RBP:
000000000000007e R08:
ffffffff8f7d0f77 R09:
1ffffffff1efa1ee
R10:
dffffc0000000000 R11:
fffffbfff1efa1ef R12:
dffffc0000000000
R13:
0000000000000820 R14:
000000000000000f R15:
ffff88805144cc00
FS:
0000555557f6d500(0000) GS:
ffff88808d72f000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000555581d35808 CR3:
000000005040e000 CR4:
0000000000352ef0
Call Trace:
<TASK>
hsr_forward_do net/hsr/hsr_forward.c:-1 [inline]
hsr_forward_skb+0x1013/0x2860 net/hsr/hsr_forward.c:741
hsr_handle_frame+0x6ce/0xa70 net/hsr/hsr_slave.c:84
__netif_receive_skb_core+0x10b9/0x4380 net/core/dev.c:5966
__netif_receive_skb_one_core net/core/dev.c:6077 [inline]
__netif_receive_skb+0x72/0x380 net/core/dev.c:6192
netif_receive_skb_internal net/core/dev.c:6278 [inline]
netif_receive_skb+0x1cb/0x790 net/core/dev.c:6337
tun_rx_batched+0x1b9/0x730 drivers/net/tun.c:1485
tun_get_user+0x2b65/0x3e90 drivers/net/tun.c:1953
tun_chr_write_iter+0x113/0x200 drivers/net/tun.c:1999
new_sync_write fs/read_write.c:593 [inline]
vfs_write+0x5c9/0xb30 fs/read_write.c:686
ksys_write+0x145/0x250 fs/read_write.c:738
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f0449f8e1ff
Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 f9 92 02 00 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 4c 93 02 00 48
RSP: 002b:
00007ffd7ad94c90 EFLAGS:
00000293 ORIG_RAX:
0000000000000001
RAX:
ffffffffffffffda RBX:
00007f044a1e5fa0 RCX:
00007f0449f8e1ff
RDX:
000000000000003e RSI:
0000200000000500 RDI:
00000000000000c8
RBP:
00007ffd7ad94d20 R08:
0000000000000000 R09:
0000000000000000
R10:
000000000000003e R11:
0000000000000293 R12:
0000000000000001
R13:
00007f044a1e5fa0 R14:
00007f044a1e5fa0 R15:
0000000000000003
</TASK>
Add a NULL check immediately after __pskb_copy() to handle allocation
failures gracefully.
Reported-by: syzbot+2fa344348a579b779e05@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=
2fa344348a579b779e05
Fixes:
f266a683a480 ("net/hsr: Better frame dispatch")
Cc: stable@vger.kernel.org
Signed-off-by: Shaurya Rane <ssrane_b23@ee.vjti.ac.in>
Reviewed-by: Felix Maurer <fmaurer@redhat.com>
Tested-by: Felix Maurer <fmaurer@redhat.com>
Link: https://patch.msgid.link/20251129093718.25320-1-ssrane_b23@ee.vjti.ac.in
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Wang Liang [Sat, 29 Nov 2025 04:13:15 +0000 (12:13 +0800)]
netrom: Fix memory leak in nr_sendmsg()
syzbot reported a memory leak [1].
When function sock_alloc_send_skb() return NULL in nr_output(), the
original skb is not freed, which was allocated in nr_sendmsg(). Fix this
by freeing it before return.
[1]
BUG: memory leak
unreferenced object 0xffff888129f35500 (size 240):
comm "syz.0.17", pid 6119, jiffies
4294944652
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 10 52 28 81 88 ff ff ..........R(....
backtrace (crc
1456a3e4):
kmemleak_alloc_recursive include/linux/kmemleak.h:44 [inline]
slab_post_alloc_hook mm/slub.c:4983 [inline]
slab_alloc_node mm/slub.c:5288 [inline]
kmem_cache_alloc_node_noprof+0x36f/0x5e0 mm/slub.c:5340
__alloc_skb+0x203/0x240 net/core/skbuff.c:660
alloc_skb include/linux/skbuff.h:1383 [inline]
alloc_skb_with_frags+0x69/0x3f0 net/core/skbuff.c:6671
sock_alloc_send_pskb+0x379/0x3e0 net/core/sock.c:2965
sock_alloc_send_skb include/net/sock.h:1859 [inline]
nr_sendmsg+0x287/0x450 net/netrom/af_netrom.c:1105
sock_sendmsg_nosec net/socket.c:727 [inline]
__sock_sendmsg net/socket.c:742 [inline]
sock_write_iter+0x293/0x2a0 net/socket.c:1195
new_sync_write fs/read_write.c:593 [inline]
vfs_write+0x45d/0x710 fs/read_write.c:686
ksys_write+0x143/0x170 fs/read_write.c:738
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xa4/0xfa0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Reported-by: syzbot+d7abc36bbbb6d7d40b58@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=
d7abc36bbbb6d7d40b58
Tested-by: syzbot+d7abc36bbbb6d7d40b58@syzkaller.appspotmail.com
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Link: https://patch.msgid.link/20251129041315.1550766-1-wangliang74@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Wei Fang [Fri, 28 Nov 2025 02:59:15 +0000 (10:59 +0800)]
net: fec: ERR007885 Workaround for XDP TX path
The ERR007885 will lead to a TDAR race condition for mutliQ when the
driver sets TDAR and the UDMA clears TDAR simultaneously or in a small
window (2-4 cycles). And it will cause the udma_tx and udma_tx_arbiter
state machines to hang. Therefore, the commit
53bb20d1faba ("net: fec:
add variable reg_desc_active to speed things up") and the commit
a179aad12bad ("net: fec: ERR007885 Workaround for conventional TX") have
added the workaround to fix the potential issue for the conventional TX
path. Similarly, the XDP TX path should also have the potential hang
issue, so add the workaround for XDP TX path.
Fixes:
6d6b39f180b8 ("net: fec: add initial XDP support")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20251128025915.2486943-1-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Linus Torvalds [Thu, 4 Dec 2025 01:24:33 +0000 (17:24 -0800)]
Merge tag 'net-next-6.19' of git://git./linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Replace busylock at the Tx queuing layer with a lockless list.
Resulting in a 300% (4x) improvement on heavy TX workloads, sending
twice the number of packets per second, for half the cpu cycles.
- Allow constantly busy flows to migrate to a more suitable CPU/NIC
queue.
Normally we perform queue re-selection when flow comes out of idle,
but under extreme circumstances the flows may be constantly busy.
Add sysctl to allow periodic rehashing even if it'd risk packet
reordering.
- Optimize the NAPI skb cache, make it larger, use it in more paths.
- Attempt returning Tx skbs to the originating CPU (like we already
did for Rx skbs).
- Various data structure layout and prefetch optimizations from Eric.
- Remove ktime_get() from the recvmsg() fast path, ktime_get() is
sadly quite expensive on recent AMD machines.
- Extend threaded NAPI polling to allow the kthread busy poll for
packets.
- Make MPTCP use Rx backlog processing. This lowers the lock
pressure, improving the Rx performance.
- Support memcg accounting of MPTCP socket memory.
- Allow admin to opt sockets out of global protocol memory accounting
(using a sysctl or BPF-based policy). The global limits are a poor
fit for modern container workloads, where limits are imposed using
cgroups.
- Improve heuristics for when to kick off AF_UNIX garbage collection.
- Allow users to control TCP SACK compression, and default to 33% of
RTT.
- Add tcp_rcvbuf_low_rtt sysctl to let datacenter users avoid
unnecessarily aggressive rcvbuf growth and overshot when the
connection RTT is low.
- Preserve skb metadata space across skb_push / skb_pull operations.
- Support for IPIP encapsulation in the nftables flowtable offload.
- Support appending IP interface information to ICMP messages (RFC
5837).
- Support setting max record size in TLS (RFC 8449).
- Remove taking rtnl_lock from RTM_GETNEIGHTBL and RTM_SETNEIGHTBL.
- Use a dedicated lock (and RCU) in MPLS, instead of rtnl_lock.
- Let users configure the number of write buffers in SMC.
- Add new struct sockaddr_unsized for sockaddr of unknown length,
from Kees.
- Some conversions away from the crypto_ahash API, from Eric Biggers.
- Some preparations for slimming down struct page.
- YAML Netlink protocol spec for WireGuard.
- Add a tool on top of YAML Netlink specs/lib for reporting commonly
computed derived statistics and summarized system state.
Driver API:
- Add CAN XL support to the CAN Netlink interface.
- Add uAPI for reporting PHY Mean Square Error (MSE) diagnostics, as
defined by the OPEN Alliance's "Advanced diagnostic features for
100BASE-T1 automotive Ethernet PHYs" specification.
- Add DPLL phase-adjust-gran pin attribute (and implement it in
zl3073x).
- Refactor xfrm_input lock to reduce contention when NIC offloads
IPsec and performs RSS.
- Add info to devlink params whether the current setting is the
default or a user override. Allow resetting back to default.
- Add standard device stats for PSP crypto offload.
- Leverage DSA frame broadcast to implement simple HSR frame
duplication for a lot of switches without dedicated HSR offload.
- Add uAPI defines for 1.6Tbps link modes.
Device drivers:
- Add Motorcomm YT921x gigabit Ethernet switch support.
- Add MUCSE driver for N500/N210 1GbE NIC series.
- Convert drivers to support dedicated ops for timestamping control,
and away from the direct IOCTL handling. While at it support GET
operations for PHY timestamping.
- Add (and convert most drivers to) a dedicated ethtool callback for
reading the Rx ring count.
- Significant refactoring efforts in the STMMAC driver, which
supports Synopsys turn-key MAC IP integrated into a ton of SoCs.
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support PPS in/out on all pins
- Intel (100G, ice, idpf):
- ice: implement standard ethtool and timestamping stats
- i40e: support setting the max number of MAC addresses per VF
- iavf: support RSS of GTP tunnels for 5G and LTE deployments
- nVidia/Mellanox (mlx5):
- reduce downtime on interface reconfiguration
- disable being an XDP redirect target by default (same as
other drivers) to avoid wasting resources if feature is
unused
- Meta (fbnic):
- add support for Linux-managed PCS on 25G, 50G, and 100G links
- Wangxun:
- support Rx descriptor merge, and Tx head writeback
- support Rx coalescing offload
- support 25G SPF and 40G QSFP modules
- Ethernet virtual:
- Google (gve):
- allow ethtool to configure rx_buf_len
- implement XDP HW RX Timestamping support for DQ descriptor
format
- Microsoft vNIC (mana):
- support HW link state events
- handle hardware recovery events when probing the device
- Ethernet NICs consumer, and embedded:
- usbnet: add support for Byte Queue Limits (BQL)
- AMD (amd-xgbe):
- add device selftests
- NXP (enetc):
- add i.MX94 support
- Broadcom integrated MACs (bcmgenet, bcmasp):
- bcmasp: add support for PHY-based Wake-on-LAN
- Broadcom switches (b53):
- support port isolation
- support BCM5389/97/98 and BCM63XX ARL formats
- Lantiq/MaxLinear switches:
- support bridge FDB entries on the CPU port
- use regmap for register access
- allow user to enable/disable learning
- support Energy Efficient Ethernet
- support configuring RMII clock delays
- add tagging driver for MaxLinear GSW1xx switches
- Synopsys (stmmac):
- support using the HW clock in free running mode
- add Eswin EIC7700 support
- add Rockchip RK3506 support
- add Altera Agilex5 support
- Cadence (macb):
- cleanup and consolidate descriptor and DMA address handling
- add EyeQ5 support
- TI:
- icssg-prueth: support AF_XDP
- Airoha access points:
- add missing Ethernet stats and link state callback
- add AN7583 support
- support out-of-order Tx completion processing
- Power over Ethernet:
- pd692x0: preserve PSE configuration across reboots
- add support for TPS23881B devices
- Ethernet PHYs:
- Open Alliance OATC14 10BASE-T1S PHY cable diagnostic support
- Support 50G SerDes and 100G interfaces in Linux-managed PHYs
- micrel:
- support for non PTP SKUs of lan8814
- enable in-band auto-negotiation on lan8814
- realtek:
- cable testing support on RTL8224
- interrupt support on RTL8221B
- motorcomm: support for PHY LEDs on YT853
- microchip: support for LAN867X Rev.D0 PHYs w/ SQI and cable diag
- mscc: support for PHY LED control
- CAN drivers:
- m_can: add support for optional reset and system wake up
- remove can_change_mtu() obsoleted by core handling
- mcp251xfd: support GPIO controller functionality
- Bluetooth:
- add initial support for PASTa
- WiFi:
- split ieee80211.h file, it's way too big
- improvements in VHT radiotap reporting, S1G, Channel Switch
Announcement handling, rate tracking in mesh networks
- improve multi-radio monitor mode support, and add a cfg80211
debugfs interface for it
- HT action frame handling on 6 GHz
- initial chanctx work towards NAN
- MU-MIMO sniffer improvements
- WiFi drivers:
- RealTek (rtw89):
- support USB devices RTL8852AU and RTL8852CU
- initial work for RTL8922DE
- improved injection support
- Intel:
- iwlwifi: new sniffer API support
- MediaTek (mt76):
- WED support for >32-bit DMA
- airoha NPU support
- regdomain improvements
- continued WiFi7/MLO work
- Qualcomm/Atheros:
- ath10k: factory test support
- ath11k: TX power insertion support
- ath12k: BSS color change support
- ath12k: statistics improvements
- brcmfmac: Acer A1 840 tablet quirk
- rtl8xxxu: 40 MHz connection fixes/support"
* tag 'net-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1381 commits)
net: page_pool: sanitise allocation order
net: page pool: xa init with destroy on pp init
net/mlx5e: Support XDP target xmit with dummy program
net/mlx5e: Update XDP features in switch channels
selftests/tc-testing: Test CAKE scheduler when enqueue drops packets
net/sched: sch_cake: Fix incorrect qlen reduction in cake_drop
wireguard: netlink: generate netlink code
wireguard: uapi: generate header with ynl-gen
wireguard: uapi: move flag enums
wireguard: uapi: move enum wg_cmd
wireguard: netlink: add YNL specification
selftests: drv-net: Fix tolerance calculation in devlink_rate_tc_bw.py
selftests: drv-net: Fix and clarify TC bandwidth split in devlink_rate_tc_bw.py
selftests: drv-net: Set shell=True for sysfs writes in devlink_rate_tc_bw.py
selftests: drv-net: Use Iperf3Runner in devlink_rate_tc_bw.py
selftests: drv-net: introduce Iperf3Runner for measurement use cases
selftests: drv-net: Add devlink_rate_tc_bw.py to TEST_PROGS
net: ps3_gelic_net: Use napi_alloc_skb() and napi_gro_receive()
Documentation: net: dsa: mention simple HSR offload helpers
Documentation: net: dsa: mention availability of RedBox
...
Linus Torvalds [Thu, 4 Dec 2025 00:54:54 +0000 (16:54 -0800)]
Merge tag 'bpf-next-6.19' of git://git./linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
- Convert selftests/bpf/test_tc_edt and test_tc_tunnel from .sh to
test_progs runner (Alexis Lothoré)
- Convert selftests/bpf/test_xsk to test_progs runner (Bastien
Curutchet)
- Replace bpf memory allocator with kmalloc_nolock() in
bpf_local_storage (Amery Hung), and in bpf streams and range tree
(Puranjay Mohan)
- Introduce support for indirect jumps in BPF verifier and x86 JIT
(Anton Protopopov) and arm64 JIT (Puranjay Mohan)
- Remove runqslower bpf tool (Hoyeon Lee)
- Fix corner cases in the verifier to close several syzbot reports
(Eduard Zingerman, KaFai Wan)
- Several improvements in deadlock detection in rqspinlock (Kumar
Kartikeya Dwivedi)
- Implement "jmp" mode for BPF trampoline and corresponding
DYNAMIC_FTRACE_WITH_JMP. It improves "fexit" program type performance
from 80 M/s to 136 M/s. With Steven's Ack. (Menglong Dong)
- Add ability to test non-linear skbs in BPF_PROG_TEST_RUN (Paul
Chaignon)
- Do not let BPF_PROG_TEST_RUN emit invalid GSO types to stack (Daniel
Borkmann)
- Generalize buildid reader into bpf_dynptr (Mykyta Yatsenko)
- Optimize bpf_map_update_elem() for map-in-map types (Ritesh
Oedayrajsingh Varma)
- Introduce overwrite mode for BPF ring buffer (Xu Kuohai)
* tag 'bpf-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (169 commits)
bpf: optimize bpf_map_update_elem() for map-in-map types
bpf: make kprobe_multi_link_prog_run always_inline
selftests/bpf: do not hardcode target rate in test_tc_edt BPF program
selftests/bpf: remove test_tc_edt.sh
selftests/bpf: integrate test_tc_edt into test_progs
selftests/bpf: rename test_tc_edt.bpf.c section to expose program type
selftests/bpf: Add success stats to rqspinlock stress test
rqspinlock: Precede non-head waiter queueing with AA check
rqspinlock: Disable spinning for trylock fallback
rqspinlock: Use trylock fallback when per-CPU rqnode is busy
rqspinlock: Perform AA checks immediately
rqspinlock: Enclose lock/unlock within lock entry acquisitions
bpf: Remove runqslower tool
selftests/bpf: Remove usage of lsm/file_alloc_security in selftest
bpf: Disable file_alloc_security hook
bpf: check for insn arrays in check_ptr_alignment
bpf: force BPF_F_RDONLY_PROG on insn array creation
bpf: Fix exclusive map memory leak
selftests/bpf: Make CS length configurable for rqspinlock stress test
selftests/bpf: Add lock wait time stats to rqspinlock stress test
...
Linus Torvalds [Wed, 3 Dec 2025 23:50:11 +0000 (15:50 -0800)]
Merge tag 'linux_kselftest-kunit-6.19-rc1' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull kunit updates from Shuah Khan:
- Make filter parameters configurable via Kconfig
- Add description of kunit.enable parameter to documentation
* tag 'linux_kselftest-kunit-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: Make filter parameters configurable via Kconfig
Documentation: kunit: add description of kunit.enable parameter
Linus Torvalds [Wed, 3 Dec 2025 23:08:18 +0000 (15:08 -0800)]
Merge tag 'linux_kselftest-next-6.19-rc1' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull kselftest updates from Shuah Khan:
- Add basic test for trace_marker_raw file to tracing selftest
- Fix invalid array access in printf dma_map_benchmark selftest
- Add tprobe enable/disable testcase to tracing selftest
- Update fprobe selftest for ftrace based fprobe
* tag 'linux_kselftest-next-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: tracing: Update fprobe selftest for ftrace based fprobe
selftests: tracing: Add tprobe enable/disable testcase
selftests/run_kselftest.sh: exit with error if tests fail
selftests/dma: fix invalid array access in printf
selftests/tracing: Add basic test for trace_marker_raw file
Linus Torvalds [Wed, 3 Dec 2025 22:42:21 +0000 (14:42 -0800)]
Merge tag 'kbuild-6.19-1' of git://git./linux/kernel/git/kbuild/linux
Pull Kbuild updates from Nicolas Schier:
- Enable -fms-extensions, allowing anonymous use of tagged struct or
union in struct/union (tag kbuild-ms-extensions-6.19). An exemplary
conversion patch is added here, too (btrfs).
[ Editor's note: the core of this actually came in early through a
shared branch and a few other trees - Linus ]
- Introduce architecture-specific CC_CAN_LINK and flags for userprogs
- Add new packaging target 'modules-cpio-pkg' for building a initramfs
cpio w/ kmods
- Handle included .c files in gen_compile_commands
- Minor kbuild changes:
- Use objtree for module signing key path, fixing oot kmod signing
- Improve documentation of KBUILD_BUILD_TIMESTAMP
- Reuse KBUILD_USERCFLAGS for UAPI, instead of defining twice
- Rename scripts/Makefile.extrawarn to Makefile.warn
- Drop obsolete types.h check from headers_check.pl
- Remove outdated config leak ignore entries
* tag 'kbuild-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
kbuild: add target to build a cpio containing modules
initramfs: add gen_init_cpio to hostprogs unconditionally
kbuild: allow architectures to override CC_CAN_LINK
init: deduplicate cc-can-link.sh invocations
kbuild: don't enable CC_CAN_LINK if the dummy program generates warnings
scripts: headers_install.sh: Remove two outdated config leak ignore entries
scripts/clang-tools: Handle included .c files in gen_compile_commands
kbuild: uapi: Drop types.h check from headers_check.pl
kbuild: Rename Makefile.extrawarn to Makefile.warn
MAINTAINERS, .mailmap: Update mail address for Nicolas Schier
kbuild: uapi: reuse KBUILD_USERCFLAGS
kbuild: doc: improve KBUILD_BUILD_TIMESTAMP documentation
kbuild: Use objtree for module signing key path
btrfs: send: make use of -fms-extensions for defining struct fs_path
Linus Torvalds [Wed, 3 Dec 2025 22:16:49 +0000 (14:16 -0800)]
Merge tag 'rust-6.19' of git://git./linux/kernel/git/ojeda/linux
Pull Rust updates from Miguel Ojeda:
"Toolchain and infrastructure:
- Add support for 'syn'.
Syn is a parsing library for parsing a stream of Rust tokens into a
syntax tree of Rust source code.
Currently this library is geared toward use in Rust procedural
macros, but contains some APIs that may be useful more generally.
'syn' allows us to greatly simplify writing complex macros such as
'pin-init' (Benno has already prepared the 'syn'-based version). We
will use it in the 'macros' crate too.
'syn' is the most downloaded Rust crate (according to crates.io),
and it is also used by the Rust compiler itself. While the amount
of code is substantial, there should not be many updates needed for
these crates, and even if there are, they should not be too big,
e.g. +7k -3k lines across the 3 crates in the last year.
'syn' requires two smaller dependencies: 'quote' and 'proc-macro2'.
I only modified their code to remove a third dependency
('unicode-ident') and to add the SPDX identifiers. The code can be
easily verified to exactly match upstream with the provided
scripts.
They are all licensed under "Apache-2.0 OR MIT", like the other
vendored 'alloc' crate we had for a while.
Please see the merge commit with the cover letter for more context.
- Allow 'unreachable_pub' and 'clippy::disallowed_names' for
doctests.
Examples (i.e. doctests) may want to do things like show public
items and use names such as 'foo'.
Nevertheless, we still try to keep examples as close to real code
as possible (this is part of why running Clippy on doctests is
important for us, e.g. for safety comments, which userspace Rust
does not support yet but we are stricter).
'kernel' crate:
- Replace our custom 'CStr' type with 'core::ffi::CStr'.
Using the standard library type reduces our custom code footprint,
and we retain needed custom functionality through an extension
trait and a new 'fmt!' macro which replaces the previous 'core'
import.
This started in 6.17 and continued in 6.18, and we finally land the
replacement now. This required quite some stamina from Tamir, who
split the changes in steps to prepare for the flag day change here.
- Replace 'kernel::c_str!' with C string literals.
C string literals were added in Rust 1.77, which produce '&CStr's
(the 'core' one), so now we can write:
c"hi"
instead of:
c_str!("hi")
- Add 'num' module for numerical features.
It includes the 'Integer' trait, implemented for all primitive
integer types.
It also includes the 'Bounded' integer wrapping type: an integer
value that requires only the 'N' least significant bits of the
wrapped type to be encoded:
// An unsigned 8-bit integer, of which only the 4 LSBs are used.
let v = Bounded::<u8, 4>::new::<15>();
assert_eq!(v.get(), 15);
'Bounded' is useful to e.g. enforce guarantees when working with
bitfields that have an arbitrary number of bits.
Values can also be constructed from simple non-constant expressions
or, for more complex ones, validated at runtime.
'Bounded' also comes with comparison and arithmetic operations
(with both their backing type and other 'Bounded's with a
compatible backing type), casts to change the backing type,
extending/shrinking and infallible/fallible conversions from/to
primitives as applicable.
- 'rbtree' module: add immutable cursor ('Cursor').
It enables to use just an immutable tree reference where
appropriate. The existing fully-featured mutable cursor is renamed
to 'CursorMut'.
kallsyms:
- Fix wrong "big" kernel symbol type read from procfs.
'pin-init' crate:
- A couple minor fixes (Benno asked me to pick these patches up for
him this cycle).
Documentation:
- Quick Start guide: add Debian 13 (Trixie).
Debian Stable is now able to build Linux, since Debian 13 (released
2025-08-09) packages Rust 1.85.0, which is recent enough.
We are planning to propose that the minimum supported Rust version
in Linux follows Debian Stable releases, with Debian 13 being the
first one we upgrade to, i.e. Rust 1.85.
MAINTAINERS:
- Add entry for the new 'num' module.
- Remove Alex as Rust maintainer: he hasn't had the time to
contribute for a few years now, so it is a no-op change in
practice.
And a few other cleanups and improvements"
* tag 'rust-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: (53 commits)
rust: macros: support `proc-macro2`, `quote` and `syn`
rust: syn: enable support in kbuild
rust: syn: add `README.md`
rust: syn: remove `unicode-ident` dependency
rust: syn: add SPDX License Identifiers
rust: syn: import crate
rust: quote: enable support in kbuild
rust: quote: add `README.md`
rust: quote: add SPDX License Identifiers
rust: quote: import crate
rust: proc-macro2: enable support in kbuild
rust: proc-macro2: add `README.md`
rust: proc-macro2: remove `unicode_ident` dependency
rust: proc-macro2: add SPDX License Identifiers
rust: proc-macro2: import crate
rust: kbuild: support using libraries in `rustc_procmacro`
rust: kbuild: support skipping flags in `rustc_test_library`
rust: kbuild: add proc macro library support
rust: kbuild: simplify `--cfg` handling
rust: kbuild: introduce `core-flags` and `core-skip_flags`
...
Linus Torvalds [Wed, 3 Dec 2025 21:46:48 +0000 (13:46 -0800)]
Merge tag 'livepatching-for-6.19' of git://git./linux/kernel/git/livepatching/livepatching
Pull livepatching updates from Petr Mladek:
- Support both paths where tracefs is typically mounted in selftests
- Make old_sympos 0 and 1 equal. They both are valid when there is only
one symbol with the given name.
* tag 'livepatching-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
selftests: livepatch: use canonical ftrace path
livepatch: Match old_sympos 0 and 1 in klp_find_func()
Linus Torvalds [Wed, 3 Dec 2025 21:25:39 +0000 (13:25 -0800)]
Merge tag 'sched_ext-for-6.19' of git://git./linux/kernel/git/tj/sched_ext
Pull sched_ext updates from Tejun Heo:
- Improve recovery from misbehaving BPF schedulers.
When a scheduler puts many tasks with varying affinity restrictions
on a shared DSQ, CPUs scanning through tasks they cannot run can
overwhelm the system, causing lockups.
Bypass mode now uses per-CPU DSQs with a load balancer to avoid this,
and hooks into the hardlockup detector to attempt recovery.
Add scx_cpu0 example scheduler to demonstrate this scenario.
- Add lockless peek operation for DSQs to reduce lock contention for
schedulers that need to query queue state during load balancing.
- Allow scx_bpf_reenqueue_local() to be called from anywhere in
preparation for deprecating cpu_acquire/release() callbacks in favor
of generic BPF hooks.
- Prepare for hierarchical scheduler support: add
scx_bpf_task_set_slice() and scx_bpf_task_set_dsq_vtime() kfuncs,
make scx_bpf_dsq_insert*() return bool, and wrap kfunc args in
structs for future aux__prog parameter.
- Implement cgroup_set_idle() callback to notify BPF schedulers when a
cgroup's idle state changes.
- Fix migration tasks being incorrectly downgraded from
stop_sched_class to rt_sched_class across sched_ext enable/disable.
Applied late as the fix is low risk and the bug subtle but needs
stable backporting.
- Various fixes and cleanups including cgroup exit ordering,
SCX_KICK_WAIT reliability, and backward compatibility improvements.
* tag 'sched_ext-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (44 commits)
sched_ext: Fix incorrect sched_class settings for per-cpu migration tasks
sched_ext: tools: Removing duplicate targets during non-cross compilation
sched_ext: Use kvfree_rcu() to release per-cpu ksyncs object
sched_ext: Pass locked CPU parameter to scx_hardlockup() and add docs
sched_ext: Update comments replacing breather with aborting mechanism
sched_ext: Implement load balancer for bypass mode
sched_ext: Factor out abbreviated dispatch dequeue into dispatch_dequeue_locked()
sched_ext: Factor out scx_dsq_list_node cursor initialization into INIT_DSQ_LIST_CURSOR
sched_ext: Add scx_cpu0 example scheduler
sched_ext: Hook up hardlockup detector
sched_ext: Make handle_lockup() propagate scx_verror() result
sched_ext: Refactor lockup handlers into handle_lockup()
sched_ext: Make scx_exit() and scx_vexit() return bool
sched_ext: Exit dispatch and move operations immediately when aborting
sched_ext: Simplify breather mechanism with scx_aborting flag
sched_ext: Use per-CPU DSQs instead of per-node global DSQs in bypass mode
sched_ext: Refactor do_enqueue_task() local and global DSQ paths
sched_ext: Use shorter slice in bypass mode
sched_ext: Mark racy bitfields to prevent adding fields that can't tolerate races
sched_ext: Minor cleanups to scx_task_iter
...
Linus Torvalds [Wed, 3 Dec 2025 21:04:07 +0000 (13:04 -0800)]
Merge tag 'cgroup-for-6.19' of git://git./linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- Defer task cgroup unlink until after the dying task's final context
switch so that controllers see the cgroup properly populated until
the task is truly gone
- cpuset cleanups and simplifications.
Enforce that domain isolated CPUs stay in root or isolated partitions
and fail if isolated+nohz_full would leave no housekeeping CPU. Fix
sched/deadline root domain handling during CPU hot-unplug and race
for tasks in attaching cpusets
- Misc fixes including memory reclaim protection documentation and
selftest KTAP conformance
* tag 'cgroup-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
cpuset: Treat cpusets in attaching as populated
sched/deadline: Walk up cpuset hierarchy to decide root domain when hot-unplug
cgroup/cpuset: Introduce cpuset_cpus_allowed_locked()
docs: cgroup: No special handling of unpopulated memcgs
docs: cgroup: Note about sibling relative reclaim protection
docs: cgroup: Explain reclaim protection target
selftests/cgroup: conform test to KTAP format output
cpuset: remove need_rebuild_sched_domains
cpuset: remove global remote_children list
cpuset: simplify node setting on error
cgroup: include missing header for struct irq_work
cgroup: Fix sleeping from invalid context warning on PREEMPT_RT
cgroup/cpuset: Globally track isolated_cpus update
cgroup/cpuset: Ensure domain isolated CPUs stay in root or isolated partition
cgroup/cpuset: Move up prstate_housekeeping_conflict() helper
cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping
cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_isolation_cpumasks()
cgroup: Defer task cgroup unlink until after the task is done switching out
cgroup: Move dying_tasks cleanup from cgroup_task_release() to cgroup_task_free()
cgroup: Rename cgroup lifecycle hooks to cgroup_task_*()
...
Linus Torvalds [Wed, 3 Dec 2025 20:50:10 +0000 (12:50 -0800)]
Merge tag 'wq-for-6.19' of git://git./linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
- Rescuer affinity management: Affinity is now updated only when
detached using wq_unbound_cpumask consistently. DISASSOCIATED workers
also follow unbound cpumask changes to avoid breaking CPU isolation
- Rescuer cleanups preparing for fetching work items one by one from
pool list: Work assignment factored out, optimized to skip pwqs no
longer needing rescue, and shutdown logic simplified
- Unused assert_rcu_or_wq_mutex_or_pool_mutex() removed
* tag 'wq-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Don't rely on wq->rescuer to stop rescuer
workqueue: Only assign rescuer work when really needed
workqueue: Factor out assign_rescuer_work()
workqueue: Init rescuer's affinities as wq_unbound_cpumask
workqueue: Let DISASSOCIATED workers follow unbound wq cpumask changes
workqueue: Update the rescuer's affinity only when it is detached
workqueue: Remove unused assert_rcu_or_wq_mutex_or_pool_mutex
Linus Torvalds [Wed, 3 Dec 2025 20:42:36 +0000 (12:42 -0800)]
Merge tag 'printk-for-6.19' of git://git./linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
- Allow creaing nbcon console drivers with an unsafe write_atomic()
callback that can only be called by the final nbcon_atomic_flush_unsafe().
Otherwise, the driver would rely on the kthread.
It is going to be used as the-best-effort approach for an
experimental nbcon netconsole driver, see
https://lore.kernel.org/r/
20251121-nbcon-v1-2-
503d17b2b4af@debian.org
Note that a safe .write_atomic() callback is supposed to work in NMI
context. But some networking drivers are not safe even in IRQ
context:
https://lore.kernel.org/r/oc46gdpmmlly5o44obvmoatfqo5bhpgv7pabpvb6sjuqioymcg@gjsma3ghoz35
In an ideal world, all networking drivers would be fixed first and
the atomic flush would be blocked only in NMI context. But it brings
the question how reliable networking drivers are when the system is
in a bad state. They might block flushing more reliable serial
consoles which are more suitable for serious debugging anyway.
- Allow to use the last 4 bytes of the printk ring buffer.
- Prevent queuing IRQ work and block printk kthreads when consoles are
suspended. Otherwise, they create non-necessary churn or even block
the suspend.
- Release console_lock() between each record in the kthread used for
legacy consoles on RT. It might significantly speed up the boot.
- Release nbcon context between each record in the atomic flush. It
prevents stalls of the related printk kthread after it has lost the
ownership in the middle of a record
- Add support for NBCON consoles into KDB
- Add %ptsP modifier for printing struct timespec64 and use it where
possible
- Misc code clean up
* tag 'printk-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: (48 commits)
printk: Use console_is_usable on console_unblank
arch: um: kmsg_dump: Use console_is_usable
drivers: serial: kgdboc: Drop checks for CON_ENABLED and CON_BOOT
lib/vsprintf: Unify FORMAT_STATE_NUM handlers
printk: Avoid irq_work for printk_deferred() on suspend
printk: Avoid scheduling irq_work on suspend
printk: Allow printk_trigger_flush() to flush all types
tracing: Switch to use %ptSp
scsi: snic: Switch to use %ptSp
scsi: fnic: Switch to use %ptSp
s390/dasd: Switch to use %ptSp
ptp: ocp: Switch to use %ptSp
pps: Switch to use %ptSp
PCI: epf-test: Switch to use %ptSp
net: dsa: sja1105: Switch to use %ptSp
mmc: mmc_test: Switch to use %ptSp
media: av7110: Switch to use %ptSp
ipmi: Switch to use %ptSp
igb: Switch to use %ptSp
e1000e: Switch to use %ptSp
...
Linus Torvalds [Wed, 3 Dec 2025 20:41:00 +0000 (12:41 -0800)]
Merge tag 'lkmm.2025.12.01a' of git://git./linux/kernel/git/paulmck/linux-rcu
Pull lkmm documentation update from Paul McKenney:
- Sort the memory-barriers.txt file's wait_event* and wait_on_bit* list
alphabetically
* tag 'lkmm.2025.12.01a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
memory-barriers.txt: Sort wait_event* and wait_on_bit* list alphabetically
Linus Torvalds [Wed, 3 Dec 2025 20:18:07 +0000 (12:18 -0800)]
Merge tag 'rcu.release.v6.19' of git://git./linux/kernel/git/rcu/linux
Pull RCU updates from Frederic Weisbecker:
"SRCU:
- Properly handle SRCU readers within IRQ disabled sections in tiny
SRCU
- Preparation to reimplement RCU Tasks Trace on top of SRCU fast:
- Introduce API to expedite a grace period and test it through
rcutorture
- Split srcu-fast in two flavours: SRCU-fast and SRCU-fast-updown.
Both are still targeted toward faster readers (without full
barriers on LOCK and UNLOCK) at the expense of heavier write
side (using full RCU grace period ordering instead of simply
full ordering) as compared to "traditional" non-fast SRCU. But
those srcu-fast flavours are going to be optimized in two
different ways:
- SRCU-fast will become the reimplementation basis for
RCU-TASK-TRACE for consolidation. Since RCU-TASK-TRACE must
be NMI safe, SRCU-fast must be as well.
- SRCU-fast-updown will be needed for uretprobes code in order
to get rid of the read-side memory barriers while still
allowing entering the reader at task level while exiting it
in a timer handler. It is considered semaphore-like in that
it can have different owners between LOCK and UNLOCK.
However it is not NMI-safe.
The actual optimizations are work in progress for the next
cycle. Only the new interfaces are added for now, along with
related torture and scalability test code.
- Create/document/debug/torture new proper initializers for RCU fast:
DEFINE_SRCU_FAST() and init_srcu_struct_fast()
This allows for using right away the proper ordering on the write
side (either full ordering or full RCU grace period ordering)
without waiting for the read side to tell which to use.
This also optimizes the read side altogether with moving flavour
debug checks under debug config and with removing a costly RmW
operation on their first call.
- Make some diagnostic functions tracing safe
Refscale:
- Add performance testing for common context synchronizations
(Preemption, IRQ, Softirq) and per-cpu increments. Those are
relevant comparisons against SRCU-fast read side APIs, especially
as they are planned to synchronize further tracing fast-path code
Miscellanous:
- In order to prepare the layout for nohz_full work deferral to user
exit, the context tracking state must shrink the counter of
transitions to/from RCU not watching. The only possible hazard is
to trigger wrap-around more easily, delaying a bit grace periods
when that happens. This should be a rare event though. Yet add
debugging and torture code to test that assumption
- Fix memory leak on locktorture module
- Annotate accesses in rculist_nulls.h to prevent from KCSAN
warnings. On recent discussions, we also concluded that all those
WRITE_ONCE() and READ_ONCE() on list APIs deserve appropriate
comments. Something to be expected for the next cycle
- Provide a script to apply several configs to several commits with
torture
- Allow torture to reuse a build directory in order to save needless
rebuild time
- Various cleanups"
* tag 'rcu.release.v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (29 commits)
refscale: Add SRCU-fast-updown readers
refscale: Exercise DEFINE_STATIC_SRCU_FAST() and init_srcu_struct_fast()
rcutorture: Make srcu{,d}_torture_init() announce the SRCU type
srcu: Create an SRCU-fast-updown API
refscale: Do not disable interrupts for tests involving local_bh_enable()
refscale: Add non-atomic per-CPU increment readers
refscale: Add this_cpu_inc() readers
refscale: Add preempt_disable() readers
refscale: Add local_bh_disable() readers
refscale: Add local_irq_disable() and local_irq_save() readers
torture: Permit negative kvm.sh --kconfig numberic arguments
srcu: Add SRCU_READ_FLAVOR_FAST_UPDOWN CPP macro
rcu: Mark diagnostic functions as notrace
rcutorture: Make TREE04 use CONFIG_RCU_DYNTICKS_TORTURE
rcutorture: Remove redundant rcutorture_one_extend() from rcu_torture_one_read()
rcutorture: Permit kvm-again.sh to re-use the build directory
torture: Add kvm-series.sh to test commit/scenario combination
rcu: use WRITE_ONCE() for ->next and ->pprev of hlist_nulls
locktorture: Fix memory leak in param_set_cpumask()
doc: Update for SRCU-fast definitions and initialization
...
Linus Torvalds [Wed, 3 Dec 2025 19:53:47 +0000 (11:53 -0800)]
Merge tag 'slab-for-6.19' of git://git./linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka:
- mempool_alloc_bulk() support for upcoming users in the block layer
that need to allocate multiple objects at once with the mempool's
guaranteed progress semantics, which is not achievable with an
allocation single objects in a loop. Along with refactoring and
various improvements (Christoph Hellwig)
- Preparations for the upcoming separation of struct slab from struct
page, mostly by removing the struct folio layer, as the purpose of
struct folio has shifted since it became used in slab code (Matthew
Wilcox)
- Modernisation of slab's boot param API usage, which removes some
unexpected parsing corner cases (Petr Tesarik)
- Refactoring of freelist_aba_t (now struct freelist_counters) and
associated functions for double cmpxchg, enabled by -fms-extensions
(Vlastimil Babka)
- Cleanups and improvements related to sheaves caching layer, that were
part of the full conversion to sheaves, which is planned for the next
release (Vlastimil Babka)
* tag 'slab-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: (42 commits)
slab: Remove unnecessary call to compound_head() in alloc_from_pcs()
mempool: clarify behavior of mempool_alloc_preallocated()
mempool: drop the file name in the top of file comment
mempool: de-typedef
mempool: remove mempool_{init,create}_kvmalloc_pool
mempool: legitimize the io_schedule_timeout in mempool_alloc_from_pool
mempool: add mempool_{alloc,free}_bulk
mempool: factor out a mempool_alloc_from_pool helper
slab: Remove references to folios from virt_to_slab()
kasan: Remove references to folio in __kasan_mempool_poison_object()
memcg: Convert mem_cgroup_from_obj_folio() to mem_cgroup_from_obj_slab()
mempool: factor out a mempool_adjust_gfp helper
mempool: add error injection support
mempool: improve kerneldoc comments
mm: improve kerneldoc comments for __alloc_pages_bulk
fault-inject: make enum fault_flags available unconditionally
usercopy: Remove folio references from check_heap_object()
slab: Remove folio references from kfree_nolock()
slab: Remove folio references from kfree_rcu_sheaf()
slab: Remove folio references from build_detached_freelist()
...
Linus Torvalds [Wed, 3 Dec 2025 19:34:28 +0000 (11:34 -0800)]
Merge tag 'docs-6.19' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"This has been another busy cycle for documentation, with a lot of
build-system thrashing. That work should slow down from here on out.
- The various scripts and tools for documentation were spread out in
several directories; now they are (almost) all coalesced under
tools/docs/. The holdout is the kernel-doc script, which cannot be
easily moved without some further thought.
- As the amount of Python code increases, we are accumulating modules
that are imported by multiple programs. These modules have been
pulled together under tools/lib/python/ -- at least, for
documentation-related programs. There is other Python code in the
tree that might eventually want to move toward this organization.
- The Perl kernel-doc.pl script has been removed. It is no longer
used by default, and nobody has missed it, least of all anybody who
actually had to look at it.
- The docs build was controlled by a complex mess of makefilese that
few dared to touch. Mauro has moved that logic into a new program
(tools/docs/sphinx-build-wrapper) that, with any luck at all, will
be far easier to understand and maintain.
- The get_feat.pl program, used to access information under
Documentation/features/, has been rewritten in Python, bringing an
end to the use of Perl in the docs subsystem.
- The top-level README file has been reorganized into a more
reader-friendly presentation.
- A lot of Chinese translation additions
- Typo fixes and documentation updates as usual"
* tag 'docs-6.19' of git://git.lwn.net/linux: (164 commits)
docs: makefile: move rustdoc check to the build wrapper
README: restructure with role-based documentation and guidelines
docs: kdoc: various fixes for grammar, spelling, punctuation
docs: kdoc_parser: use '@' for Excess enum value
docs: submitting-patches: Clarify that removal of Acks needs explanation too
docs: kdoc_parser: add data/function attributes to ignore
docs: MAINTAINERS: update Mauro's files/paths
docs/zh_CN: Add wd719x.rst translation
docs/zh_CN: Add libsas.rst translation
get_feat.pl: remove it, as it got replaced by get_feat.py
Documentation/sphinx/kernel_feat.py: use class directly
tools/docs/get_feat.py: convert get_feat.pl to Python
Documentation/admin-guide: fix typo and comment in cscope example
docs/zh_CN: Add data-integrity.rst translation
docs/zh_CN: Add blk-mq.rst translation
docs/zh_CN: Add block/index.rst translation
docs/zh_CN: Update the Chinese translation of kbuild.rst
docs: bring some order to our Python module hierarchy
docs: Move the python libraries to tools/lib/python
Documentation/kernel-parameters: Move the kernel build options
...
Linus Torvalds [Wed, 3 Dec 2025 19:28:38 +0000 (11:28 -0800)]
Merge tag 'v6.19-p1' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"API:
- Rewrite memcpy_sglist from scratch
- Add on-stack AEAD request allocation
- Fix partial block processing in ahash
Algorithms:
- Remove ansi_cprng
- Remove tcrypt tests for poly1305
- Fix EINPROGRESS processing in authenc
- Fix double-free in zstd
Drivers:
- Use drbg ctr helper when reseeding xilinx-trng
- Add support for PCI device 0x115A to ccp
- Add support of paes in caam
- Add support for aes-xts in dthev2
Others:
- Use likely in rhashtable lookup
- Fix lockdep false-positive in padata by removing a helper"
* tag 'v6.19-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (71 commits)
crypto: zstd - fix double-free in per-CPU stream cleanup
crypto: ahash - Zero positive err value in ahash_update_finish
crypto: ahash - Fix crypto_ahash_import with partial block data
crypto: lib/mpi - use min() instead of min_t()
crypto: ccp - use min() instead of min_t()
hwrng: core - use min3() instead of nested min_t()
crypto: aesni - ctr_crypt() use min() instead of min_t()
crypto: drbg - Delete unused ctx from struct sdesc
crypto: testmgr - Add missing DES weak and semi-weak key tests
Revert "crypto: scatterwalk - Move skcipher walk and use it for memcpy_sglist"
crypto: scatterwalk - Fix memcpy_sglist() to always succeed
crypto: iaa - Request to add Kanchana P Sridhar to Maintainers.
crypto: tcrypt - Remove unused poly1305 support
crypto: ansi_cprng - Remove unused ansi_cprng algorithm
crypto: asymmetric_keys - fix uninitialized pointers with free attribute
KEYS: Avoid -Wflex-array-member-not-at-end warning
crypto: ccree - Correctly handle return of sg_nents_for_len
crypto: starfive - Correctly handle return of sg_nents_for_len
crypto: iaa - Fix incorrect return value in save_iaa_wq()
crypto: zstd - Remove unnecessary size_t cast
...
Linus Torvalds [Wed, 3 Dec 2025 19:19:34 +0000 (11:19 -0800)]
Merge tag 'ipe-pr-
20251202' of git://git./linux/kernel/git/wufan/ipe
Pull IPE udates from Fan Wu:
"The primary change is the addition of support for the AT_EXECVE_CHECK
flag. This allows interpreters to signal the kernel to perform IPE
security checks on script files before execution, extending IPE
enforcement to indirectly executed scripts.
Update documentation for it, and also fix a comment"
* tag 'ipe-pr-
20251202' of git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe:
ipe: Update documentation for script enforcement
ipe: Add AT_EXECVE_CHECK support for script enforcement
ipe: Drop a duplicated CONFIG_ prefix in the ifdeffery
Linus Torvalds [Wed, 3 Dec 2025 19:08:03 +0000 (11:08 -0800)]
Merge tag 'integrity-v6.19' of git://git./linux/kernel/git/zohar/linux-integrity
Pull integrity updates from Mimi Zohar:
"Bug fixes:
- defer credentials checking from the bprm_check_security hook to the
bprm_creds_from_file security hook
- properly ignore IMA policy rules based on undefined SELinux labels
IMA policy rule extensions:
- extend IMA to limit including file hashes in the audit logs
(dont_audit action)
- define a new filesystem subtype policy option (fs_subtype)
Misc:
- extend IMA to support in-kernel module decompression by deferring
the IMA signature verification in kernel_read_file() to after the
kernel module is decompressed"
* tag 'integrity-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: Handle error code returned by ima_filter_rule_match()
ima: Access decompressed kernel module to verify appended signature
ima: add fs_subtype condition for distinguishing FUSE instances
ima: add dont_audit action to suppress audit actions
ima: Attach CREDS_CHECK IMA hook to bprm_creds_from_file LSM hook
Linus Torvalds [Wed, 3 Dec 2025 18:58:59 +0000 (10:58 -0800)]
Merge tag 'Smack-for-6.19' of https://github.com/cschaufler/smack-next
Pull smack updates from Casey Schaufler:
- fix several cases where labels were treated inconsistently when
imported from user space
- clean up the assignment of extended attributes
- documentation improvements
* tag 'Smack-for-6.19' of https://github.com/cschaufler/smack-next:
Smack: function parameter 'gfp' not described
smack: fix kernel-doc warnings for smk_import_valid_label()
smack: fix bug: setting task label silently ignores input garbage
smack: fix bug: unprivileged task can create labels
smack: fix bug: invalid label of unix socket file
smack: always "instantiate" inode in smack_inode_init_security()
smack: deduplicate xattr setting in smack_inode_init_security()
smack: fix bug: SMACK64TRANSMUTE set on non-directory
smack: deduplicate "does access rule request transmutation"
Linus Torvalds [Wed, 3 Dec 2025 18:52:01 +0000 (10:52 -0800)]
Merge tag 'audit-pr-
20251201' of git://git./linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore:
- Consolidate the loops in __audit_inode_child() to improve performance
When logging a child inode in __audit_inode_child(), we first run
through the list of recorded inodes looking for the parent and then
we repeat the search looking for a matching child entry. This pull
request consolidates both searches into one pass through the recorded
inodes, resuling in approximately a 50% reduction in audit overhead.
See the commit description for the testing details.
- Combine kmalloc()/memset() into kzalloc() in audit_krule_to_data()
- Comment fixes
* tag 'audit-pr-
20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: merge loops in __audit_inode_child()
audit: Use kzalloc() instead of kmalloc()/memset() in audit_krule_to_data()
audit: fix comment misindentation in audit.h
Linus Torvalds [Wed, 3 Dec 2025 18:45:47 +0000 (10:45 -0800)]
Merge tag 'selinux-pr-
20251201' of git://git./linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
- Improve the granularity of SELinux labeling for memfd files
Currently when creating a memfd file, SELinux treats it the same as
any other tmpfs, or hugetlbfs, file. While simple, the drawback is
that it is not possible to differentiate between memfd and tmpfs
files.
This adds a call to the security_inode_init_security_anon() LSM hook
and wires up SELinux to provide a set of memfd specific access
controls, including the ability to control the execution of memfds.
As usual, the commit message has more information.
- Improve the SELinux AVC lookup performance
Adopt MurmurHash3 for the SELinux AVC hash function instead of the
custom hash function currently used. MurmurHash3 is already used for
the SELinux access vector table so the impact to the code is minimal,
and performance tests have shown improvements in both hash
distribution and latency.
See the commit message for the performance measurments.
- Introduce a Kconfig option for the SELinux AVC bucket/slot size
While we have the ability to grow the number of AVC hash buckets
today, the size of the buckets (slot size) is fixed at 512. This pull
request makes that slot size configurable at build time through a new
Kconfig knob, CONFIG_SECURITY_SELINUX_AVC_HASH_BITS.
* tag 'selinux-pr-
20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: improve bucket distribution uniformity of avc_hash()
selinux: Move avtab_hash() to a shared location for future reuse
selinux: Introduce a new config to make avc cache slot size adjustable
memfd,selinux: call security_inode_init_security_anon()
Linus Torvalds [Wed, 3 Dec 2025 17:53:48 +0000 (09:53 -0800)]
Merge tag 'lsm-pr-
20251201' of git://git./linux/kernel/git/pcmoore/lsm
Pull LSM updates from Paul Moore:
- Rework the LSM initialization code
What started as a "quick" patch to enable a notification event once
all of the individual LSMs were initialized, snowballed a bit into a
30+ patch patchset when everything was done. Most of the patches, and
diffstat, is due to splitting out the initialization code into
security/lsm_init.c and cleaning up some of the mess that was there.
While not strictly necessary, it does cleanup the code signficantly,
and hopefully makes the upkeep a bit easier in the future.
Aside from the new LSM_STARTED_ALL notification, these changes also
ensure that individual LSM initcalls are only called when the LSM is
enabled at boot time. There should be a minor reduction in boot times
for those who build multiple LSMs into their kernels, but only enable
a subset at boot.
It is worth mentioning that nothing at present makes use of the
LSM_STARTED_ALL notification, but there is work in progress which is
dependent upon LSM_STARTED_ALL.
- Make better use of the seq_put*() helpers in device_cgroup
* tag 'lsm-pr-
20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: (36 commits)
lsm: use unrcu_pointer() for current->cred in security_init()
device_cgroup: Refactor devcgroup_seq_show to use seq_put* helpers
lsm: add a LSM_STARTED_ALL notification event
lsm: consolidate all of the LSM framework initcalls
selinux: move initcalls to the LSM framework
ima,evm: move initcalls to the LSM framework
lockdown: move initcalls to the LSM framework
apparmor: move initcalls to the LSM framework
safesetid: move initcalls to the LSM framework
tomoyo: move initcalls to the LSM framework
smack: move initcalls to the LSM framework
ipe: move initcalls to the LSM framework
loadpin: move initcalls to the LSM framework
lsm: introduce an initcall mechanism into the LSM framework
lsm: group lsm_order_parse() with the other lsm_order_*() functions
lsm: output available LSMs when debugging
lsm: cleanup the debug and console output in lsm_init.c
lsm: add/tweak function header comment blocks in lsm_init.c
lsm: fold lsm_init_ordered() into security_init()
lsm: cleanup initialize_lsm() and rename to lsm_init_single()
...
Linus Torvalds [Wed, 3 Dec 2025 17:45:23 +0000 (09:45 -0800)]
Merge tag 'keys-trusted-next-rc1' of git://git./linux/kernel/git/jarkko/linux-tpmdd
Pull trusted key updates from Jarkko Sakkinen:
- Remove duplicate 'tpm2_hash_map' in favor of 'tpm2_find_hash_alg()'
- Fix a memory leak on failure paths of 'tpm2_load_cmd'
* tag 'keys-trusted-next-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
KEYS: trusted: Fix a memory leak in tpm2_load_cmd
KEYS: trusted: Replace a redundant instance of tpm2_hash_map
Linus Torvalds [Wed, 3 Dec 2025 17:41:04 +0000 (09:41 -0800)]
Merge tag 'keys-next-6.19-rc1' of git://git./linux/kernel/git/jarkko/linux-tpmdd
Pull keys update from Jarkko Sakkinen:
"This contains only three fixes"
* tag 'keys-next-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
keys: Fix grammar and formatting in 'struct key_type' comments
keys: Replace deprecated strncpy in ecryptfs_fill_auth_tok
keys: Remove redundant less-than-zero checks
Linus Torvalds [Wed, 3 Dec 2025 17:23:25 +0000 (09:23 -0800)]
Merge tag 'nolibc-
20251130-for-6.19-1' of git://git./linux/kernel/git/nolibc/linux-nolibc
Pull nolibc updates from Thomas Weißschuh:
- Preparations to the use of nolibc in UML:
- Cleanup of sparse warnings
- Library mode without _start()
- More consistency when disabling errno
- Unconditional installation of all architecture support files
- Always 64-bit wide ino_t and off_t
- Various cleanups and bug fixes
* tag 'nolibc-
20251130-for-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/nolibc/linux-nolibc: (25 commits)
selftests/nolibc: error out on linker warnings
selftests/nolibc: use lld to link loongarch binaries
tools/nolibc: remove more __nolibc_enosys() fallbacks
tools/nolibc: remove now superfluous overflow check in llseek
tools/nolibc: use 64-bit off_t
tools/nolibc: prefer the llseek syscall
tools/nolibc: handle 64-bit off_t for llseek
tools/nolibc: use 64-bit ino_t
tools/nolibc: avoid using plain integer as NULL pointer
tools/nolibc: add support for fchdir()
tools/nolibc: clean up outdated comments in generic arch.h
tools/nolibc: make the "headers" target install all supported archs
tools/nolibc: add the more portable inttypes.h
tools/nolibc: provide the portable sys/select.h
tools/nolibc: add missing memchr() to string.h
tools/nolibc: fix misleading help message regarding installation path
tools/nolibc: add uio.h with readv and writev
tools/nolibc: add option to disable runtime
tools/nolibc: use __fallthrough__ rather than fallthrough
tools/nolibc: implement %m if errno is not defined
...
Rob Herring (Arm) [Wed, 3 Dec 2025 15:24:36 +0000 (09:24 -0600)]
dt-bindings: thermal: qcom-tsens: Remove invalid tab character
Commit
1ee90870ce79 ("dt-bindings: thermal: tsens: Add QCS8300
compatible") uses a tab character which is illegal in YAML (at the
beginning of a line). The original patch was correct, so this got
corrupted when applied.
Fixes:
1ee90870ce79 ("dt-bindings: thermal: tsens: Add QCS8300 compatible")
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yanzhu Huang [Wed, 5 Nov 2025 23:26:15 +0000 (23:26 +0000)]
ipe: Update documentation for script enforcement
This patch adds explanation of script enforcement mechanism in admin
guide documentation. Describes how IPE supports integrity enforcement
for indirectly executed scripts through the AT_EXECVE_CHECK flag, and
how this differs from kernel enforcement for compiled executables.
Signed-off-by: Yanzhu Huang <yanzhuhuang@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@kernel.org>
Yanzhu Huang [Wed, 5 Nov 2025 23:26:14 +0000 (23:26 +0000)]
ipe: Add AT_EXECVE_CHECK support for script enforcement
This patch adds a new ipe_bprm_creds_for_exec() hook that integrates
with the AT_EXECVE_CHECK mechanism. To enable script enforcement,
interpreters need to incorporate the AT_EXECVE_CHECK flag when
calling execveat() on script files before execution.
When a userspace interpreter calls execveat() with the AT_EXECVE_CHECK
flag, this hook triggers IPE policy evaluation on the script file. The
hook only triggers IPE when bprm->is_check is true, ensuring it's
being called from an AT_EXECVE_CHECK context. It then builds an
evaluation context for an IPE_OP_EXEC operation and invokes IPE policy.
The kernel returns the policy decision to the interpreter, which can
then decide whether to proceed with script execution.
This extends IPE enforcement to indirectly executed scripts, permitting
trusted scripts to execute while denying untrusted ones.
Signed-off-by: Yanzhu Huang <yanzhuhuang@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@kernel.org>
Borislav Petkov (AMD) [Thu, 9 Oct 2025 15:45:25 +0000 (17:45 +0200)]
ipe: Drop a duplicated CONFIG_ prefix in the ifdeffery
Looks like it got added by mistake, perhaps editor auto-completion
artifact. Drop it.
No functional changes.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Fan Wu <wufan@kernel.org>
Linus Torvalds [Wed, 3 Dec 2025 03:00:26 +0000 (19:00 -0800)]
Merge tag 'random-6.19-rc1-for-linus' of git://git./linux/kernel/git/crng/random
Pull random number generator updates from Jason Donenfeld:
- Dynamically allocate cpumasks off of the stack if the kernel is
configured for a lot of CPUs, to handle a -Wframe-larger-than case
- The removal of next_pseudo_random32() after the last user was
switched over to the prandom interface
- The removal of get_random_u{8,16,32,64}_wait() functions, as there
were no users of those at all
- Some house keeping changes - a few grammar cleanups in the
comments, system_unbound_wq was renamed to system_dfl_wq, and
static_key_initialized no longer needs to be checked
* tag 'random-6.19-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
random: complete sentence of comment
random: drop check for static_key_initialized
random: remove unused get_random_var_wait functions
random: replace use of system_unbound_wq with system_dfl_wq
random: use offstack cpumask when necessary
prandom: remove next_pseudo_random32
media: vivid: use prandom
random: add missing words in function comments
Linus Torvalds [Wed, 3 Dec 2025 02:53:50 +0000 (18:53 -0800)]
Merge tag 'fpsimd-on-stack-for-linus' of git://git./linux/kernel/git/ebiggers/linux
Pull arm64 FPSIMD on-stack buffer updates from Eric Biggers:
"This is a core arm64 change. However, I was asked to take this because
most uses of kernel-mode FPSIMD are in crypto or CRC code.
In v6.8, the size of task_struct on arm64 increased by 528 bytes due
to the new 'kernel_fpsimd_state' field. This field was added to allow
kernel-mode FPSIMD code to be preempted.
Unfortunately, 528 bytes is kind of a lot for task_struct. This
regression in the task_struct size was noticed and reported.
Recover that space by making this state be allocated on the stack at
the beginning of each kernel-mode FPSIMD section.
To make it easier for all the users of kernel-mode FPSIMD to do that
correctly, introduce and use a 'scoped_ksimd' abstraction"
* tag 'fpsimd-on-stack-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (23 commits)
lib/crypto: arm64: Move remaining algorithms to scoped ksimd API
lib/crypto: arm/blake2b: Move to scoped ksimd API
arm64/fpsimd: Allocate kernel mode FP/SIMD buffers on the stack
arm64/fpu: Enforce task-context only for generic kernel mode FPU
net/mlx5: Switch to more abstract scoped ksimd guard API on arm64
arm64/xorblocks: Switch to 'ksimd' scoped guard API
crypto/arm64: sm4 - Switch to 'ksimd' scoped guard API
crypto/arm64: sm3 - Switch to 'ksimd' scoped guard API
crypto/arm64: sha3 - Switch to 'ksimd' scoped guard API
crypto/arm64: polyval - Switch to 'ksimd' scoped guard API
crypto/arm64: nhpoly1305 - Switch to 'ksimd' scoped guard API
crypto/arm64: aes-gcm - Switch to 'ksimd' scoped guard API
crypto/arm64: aes-blk - Switch to 'ksimd' scoped guard API
crypto/arm64: aes-ccm - Switch to 'ksimd' scoped guard API
raid6: Move to more abstract 'ksimd' guard API
crypto: aegis128-neon - Move to more abstract 'ksimd' guard API
crypto/arm64: sm4-ce-gcm - Avoid pointless yield of the NEON unit
crypto/arm64: sm4-ce-ccm - Avoid pointless yield of the NEON unit
crypto/arm64: aes-ce-ccm - Avoid pointless yield of the NEON unit
lib/crc: Switch ARM and arm64 to 'ksimd' scoped guard API
...