Vladimir Oltean [Tue, 3 May 2022 12:14:28 +0000 (15:14 +0300)]
selftests: ocelot: tc_flower_chains: specify conform-exceed action for policer
As discussed here with Ido Schimmel:
https://patchwork.kernel.org/project/netdevbpf/patch/
20220224102908.5255-2-jianbol@nvidia.com/
the default conform-exceed action is "reclassify", for a reason we don't
really understand.
The point is that hardware can't offload that police action, so not
specifying "conform-exceed" was always wrong, even though the command
used to work in hardware (but not in software) until the kernel started
adding validation for it.
Fix the command used by the selftest by making the policer drop on
exceed, and pass the packet to the next action (goto) on conform.
Fixes:
8cd6b020b644 ("selftests: ocelot: add some example VCAP IS1, IS2 and ES0 tc offloads")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20220503121428.842906-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 5 May 2022 02:22:34 +0000 (19:22 -0700)]
Merge branch 'insufficient-tcp-source-port-randomness'
Willy Tarreau says:
====================
insufficient TCP source port randomness
In a not-yet published paper, Moshe Kol, Amit Klein, and Yossi Gilad
report being able to accurately identify a client by forcing it to emit
only 40 times more connections than the number of entries in the
table_perturb[] table, which is indexed by hashing the connection tuple.
The current 2^8 setting allows them to perform that attack with only 10k
connections, which is not hard to achieve in a few seconds.
Eric, Amit and I have been working on this for a few weeks now imagining,
testing and eliminating a number of approaches that Amit and his team were
still able to break or that were found to be too risky or too expensive,
and ended up with the simple improvements in this series that resists to
the attack, doesn't degrade the performance, and preserves a reliable port
selection algorithm to avoid connection failures, including the odd/even
port selection preference that allows bind() to always find a port quickly
even under strong connect() stress.
The approach relies on several factors:
- resalting the hash secret that's used to choose the table_perturb[]
entry every 10 seconds to eliminate slow attacks and force the
attacker to forget everything that was learned after this delay.
This already eliminates most of the problem because if a client
stays silent for more than 10 seconds there's no link between the
previous and the next patterns, and 10s isn't yet frequent enough
to cause too frequent repetition of a same port that may induce a
connection failure ;
- adding small random increments to the source port. Previously, a
random 0 or 1 was added every 16 ports. Now a random 0 to 7 is
added after each port. This means that with the default 32768-60999
range, a worst case rollover happens after 1764 connections, and
an average of 3137. This doesn't stop statistical attacks but
requires significantly more iterations of the same attack to
confirm a guess.
- increasing the table_perturb[] size from 2^8 to 2^16, which Amit
says will require 2.6 million connections to be attacked with the
changes above, making it pointless to get a fingerprint that will
only last 10 seconds. Due to the size, the table was made dynamic.
- a few minor improvements on the bits used from the hash, to eliminate
some unfortunate correlations that may possibly have been exploited
to design future attack models.
These changes were tested under the most extreme conditions, up to
1.1 million connections per second to one and a few targets, showing no
performance regression, and only 2 connection failures within 13 billion,
which is less than 2^-32 and perfectly within usual values.
The series is split into small reviewable changes and was already reviewed
by Amit and Eric.
====================
Link: https://lore.kernel.org/r/20220502084614.24123-1-w@1wt.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Willy Tarreau [Mon, 2 May 2022 08:46:14 +0000 (10:46 +0200)]
tcp: drop the hash_32() part from the index calculation
In commit
190cc82489f4 ("tcp: change source port randomizarion at
connect() time"), the table_perturb[] array was introduced and an
index was taken from the port_offset via hash_32(). But it turns
out that hash_32() performs a multiplication while the input here
comes from the output of SipHash in secure_seq, that is well
distributed enough to avoid the need for yet another hash.
Suggested-by: Amit Klein <aksecurity@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Willy Tarreau [Mon, 2 May 2022 08:46:13 +0000 (10:46 +0200)]
tcp: increase source port perturb table to 2^16
Moshe Kol, Amit Klein, and Yossi Gilad reported being able to accurately
identify a client by forcing it to emit only 40 times more connections
than there are entries in the table_perturb[] table. The previous two
improvements consisting in resalting the secret every 10s and adding
randomness to each port selection only slightly improved the situation,
and the current value of 2^8 was too small as it's not very difficult
to make a client emit 10k connections in less than 10 seconds.
Thus we're increasing the perturb table from 2^8 to 2^16 so that the
same precision now requires 2.6M connections, which is more difficult in
this time frame and harder to hide as a background activity. The impact
is that the table now uses 256 kB instead of 1 kB, which could mostly
affect devices making frequent outgoing connections. However such
components usually target a small set of destinations (load balancers,
database clients, perf assessment tools), and in practice only a few
entries will be visited, like before.
A live test at 1 million connections per second showed no performance
difference from the previous value.
Reported-by: Moshe Kol <moshe.kol@mail.huji.ac.il>
Reported-by: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
Reported-by: Amit Klein <aksecurity@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Willy Tarreau [Mon, 2 May 2022 08:46:12 +0000 (10:46 +0200)]
tcp: dynamically allocate the perturb table used by source ports
We'll need to further increase the size of this table and it's likely
that at some point its size will not be suitable anymore for a static
table. Let's allocate it on boot from inet_hashinfo2_init(), which is
called from tcp_init().
Cc: Moshe Kol <moshe.kol@mail.huji.ac.il>
Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
Cc: Amit Klein <aksecurity@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Willy Tarreau [Mon, 2 May 2022 08:46:11 +0000 (10:46 +0200)]
tcp: add small random increments to the source port
Here we're randomly adding between 0 and 7 random increments to the
selected source port in order to add some noise in the source port
selection that will make the next port less predictable.
With the default port range of 32768-60999 this means a worst case
reuse scenario of 14116/8=1764 connections between two consecutive
uses of the same port, with an average of 14116/4.5=3137. This code
was stressed at more than 800000 connections per second to a fixed
target with all connections closed by the client using RSTs (worst
condition) and only 2 connections failed among 13 billion, despite
the hash being reseeded every 10 seconds, indicating a perfectly
safe situation.
Cc: Moshe Kol <moshe.kol@mail.huji.ac.il>
Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
Cc: Amit Klein <aksecurity@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Mon, 2 May 2022 08:46:10 +0000 (10:46 +0200)]
tcp: resalt the secret every 10 seconds
In order to limit the ability for an observer to recognize the source
ports sequence used to contact a set of destinations, we should
periodically shuffle the secret. 10 seconds looks effective enough
without causing particular issues.
Cc: Moshe Kol <moshe.kol@mail.huji.ac.il>
Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
Cc: Amit Klein <aksecurity@gmail.com>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Willy Tarreau [Mon, 2 May 2022 08:46:09 +0000 (10:46 +0200)]
tcp: use different parts of the port_offset for index and offset
Amit Klein suggests that we use different parts of port_offset for the
table's index and the port offset so that there is no direct relation
between them.
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Moshe Kol <moshe.kol@mail.huji.ac.il>
Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
Cc: Amit Klein <aksecurity@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Willy Tarreau [Mon, 2 May 2022 08:46:08 +0000 (10:46 +0200)]
secure_seq: use the 64 bits of the siphash for port offset calculation
SipHash replaced MD5 in secure_ipv{4,6}_port_ephemeral() via commit
7cd23e5300c1 ("secure_seq: use SipHash in place of MD5"), but the output
remained truncated to 32-bit only. In order to exploit more bits from the
hash, let's make the functions return the full 64-bit of siphash_3u32().
We also make sure the port offset calculation in __inet_hash_connect()
remains done on 32-bit to avoid the need for div_u64_rem() and an extra
cost on 32-bit systems.
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Moshe Kol <moshe.kol@mail.huji.ac.il>
Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
Cc: Amit Klein <aksecurity@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 5 May 2022 00:50:00 +0000 (17:50 -0700)]
Merge branch 'wireguard-patches-for-5-18-rc6'
Jason A. Donenfeld says:
====================
wireguard patches for 5.18-rc6
In working on some other problems, I wound up leaning on the WireGuard
CI more than usual and uncovered a few small issues with reliability.
These are fairly low key changes, since they don't impact kernel code
itself.
One change does stick out in particular, though, which is the "make
routing loop test non-fatal" commit. I'm not thrilled about doing this,
but currently [1] remains unsolved, and I'm still working on a real
solution to that (hopefully for 5.19 or 5.20 if I can come up with a
good idea...), so for now that test just prints a big red warning
instead.
[1] https://lore.kernel.org/netdev/YmszSXueTxYOC41G@zx2c4.com/
====================
Link: https://lore.kernel.org/r/20220504202920.72908-1-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jason A. Donenfeld [Wed, 4 May 2022 20:29:20 +0000 (22:29 +0200)]
wireguard: selftests: set panic_on_warn=1 from cmdline
Rather than setting this once init is running, set panic_on_warn from
the kernel command line, so that it catches splats from WireGuard
initialization code and the various crypto selftests.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jason A. Donenfeld [Wed, 4 May 2022 20:29:19 +0000 (22:29 +0200)]
wireguard: selftests: bump package deps
Use newer, more reliable package dependencies. These should hopefully
reduce flakes. However, we keep the old iputils package, as it
accumulated bugs after resulting in flakes on slow machines.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jason A. Donenfeld [Wed, 4 May 2022 20:29:18 +0000 (22:29 +0200)]
wireguard: selftests: restore support for ccache
When moving to non-system toolchains, we inadvertantly killed the
ability to use ccache. So instead, build ccache support into the test
harness directly.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jason A. Donenfeld [Wed, 4 May 2022 20:29:17 +0000 (22:29 +0200)]
wireguard: selftests: use newer toolchains to fill out architectures
Rather than relying on the system to have cross toolchains available,
simply download musl.cc's ones and use that libc.so, and then we use it
to fill in a few missing platforms, such as riscv64, riscv64, powerpc64,
and s390x.
Since riscv doesn't have a second serial port in its device description,
we have to use virtio's vport. This is actually the same situation on
ARM, but we were previously hacking QEMU up to work around this, which
required a custom QEMU. Instead just do the vport trick on ARM too.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jason A. Donenfeld [Wed, 4 May 2022 20:29:16 +0000 (22:29 +0200)]
wireguard: selftests: limit parallelism to $(nproc) tests at once
The parallel tests were added to catch queueing issues from multiple
cores. But what happens in reality when testing tons of processes is
that these separate threads wind up fighting with the scheduler, and we
wind up with contention in places we don't care about that decrease the
chances of hitting a bug. So just do a test with the number of CPU
cores, rather than trying to scale up arbitrarily.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jason A. Donenfeld [Wed, 4 May 2022 20:29:15 +0000 (22:29 +0200)]
wireguard: selftests: make routing loop test non-fatal
I hate to do this, but I still do not have a good solution to actually
fix this bug across architectures. So just disable it for now, so that
the CI can still deliver actionable results. This commit adds a large
red warning, so that at least the failure isn't lost forever, and
hopefully this can be revisited down the line.
Link: https://lore.kernel.org/netdev/CAHmME9pv1x6C4TNdL6648HydD8r+txpV4hTUXOBVkrapBXH4QQ@mail.gmail.com/
Link: https://lore.kernel.org/netdev/YmszSXueTxYOC41G@zx2c4.com/
Link: https://lore.kernel.org/wireguard/CAHmME9rNnBiNvBstb7MPwK-7AmAN0sOfnhdR=eeLrowWcKxaaQ@mail.gmail.com/
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David S. Miller [Wed, 4 May 2022 09:51:05 +0000 (10:51 +0100)]
Merge tag 'mlx5-fixes-2022-05-03' of git://git./linux/kernel/g
it/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2022-05-03
This series provides bug fixes to mlx5 driver.
Please pull and let me know if there is any problem.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark Bloch [Sun, 10 Apr 2022 11:58:05 +0000 (11:58 +0000)]
net/mlx5: Fix matching on inner TTC
The cited commits didn't use proper matching on inner TTC
as a result distribution of encapsulated packets wasn't symmetric
between the physical ports.
Fixes:
4c71ce50d2fe ("net/mlx5: Support partial TTC rules")
Fixes:
8e25a2bc6687 ("net/mlx5: Lag, add support to create TTC tables for LAG port selection")
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Moshe Shemesh [Mon, 11 Apr 2022 17:38:44 +0000 (20:38 +0300)]
net/mlx5: Avoid double clear or set of sync reset requested
Double clear of reset requested state can lead to NULL pointer as it
will try to delete the timer twice. This can happen for example on a
race between abort from FW and pci error or reset. Avoid such case using
test_and_clear_bit() to verify only one time reset requested state clear
flow. Similarly use test_and_set_bit() to verify only one time reset
requested state set flow.
Fixes:
7dd6df329d4c ("net/mlx5: Handle sync reset abort event")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Moshe Shemesh [Mon, 11 Apr 2022 18:31:06 +0000 (21:31 +0300)]
net/mlx5: Fix deadlock in sync reset flow
The sync reset flow can lead to the following deadlock when
poll_sync_reset() is called by timer softirq and waiting on
del_timer_sync() for the same timer. Fix that by moving the part of the
flow that waits for the timer to reset_reload_work.
It fixes the following kernel Trace:
RIP: 0010:del_timer_sync+0x32/0x40
...
Call Trace:
<IRQ>
mlx5_sync_reset_clear_reset_requested+0x26/0x50 [mlx5_core]
poll_sync_reset.cold+0x36/0x52 [mlx5_core]
call_timer_fn+0x32/0x130
__run_timers.part.0+0x180/0x280
? tick_sched_handle+0x33/0x60
? tick_sched_timer+0x3d/0x80
? ktime_get+0x3e/0xa0
run_timer_softirq+0x2a/0x50
__do_softirq+0xe1/0x2d6
? hrtimer_interrupt+0x136/0x220
irq_exit+0xae/0xb0
smp_apic_timer_interrupt+0x7b/0x140
apic_timer_interrupt+0xf/0x20
</IRQ>
Fixes:
3c5193a87b0f ("net/mlx5: Use del_timer_sync in fw reset flow of halting poll")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Moshe Tal [Wed, 9 Feb 2022 17:23:56 +0000 (19:23 +0200)]
net/mlx5e: Fix trust state reset in reload
Setting dscp2prio during the driver reload can cause dcb ieee app list to
be not empty after the reload finish and as a result to a conflict between
the priority trust state reported by the app and the state in the device
register.
Reset the dcb ieee app list on initialization in case this is
conflicting with the register status.
Fixes:
2a5e7a1344f4 ("net/mlx5e: Add dcbnl dscp to priority support")
Signed-off-by: Moshe Tal <moshet@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Ariel Levkovich [Mon, 28 Mar 2022 15:29:13 +0000 (18:29 +0300)]
net/mlx5e: Avoid checking offload capability in post_parse action
During TC action parsing, the can_offload callback is called
before calling the action's main parsing callback.
Later on, the can_offload callback is called again before handling
the action's post_parse callback if exists.
Since the main parsing callback might have changed and set parsing
params for the rule, following can_offload checks might fail because
some parsing params were already set.
Specifically, the ct action main parsing sets the ct param in the
parsing status structure and when the second can_offload for ct action
is called, before handling the ct post parsing, it will return an error
since it checks this ct param to indicate multiple ct actions which are
not supported.
Therefore, the can_offload call is removed from the post parsing
handling to prevent such cases.
This is allowed since the first can_offload call will ensure that the
action can be offloaded and the fact the code reached the post parsing
handling already means that the action can be offloaded.
Fixes:
8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Paul Blakey [Tue, 29 Mar 2022 14:42:46 +0000 (17:42 +0300)]
net/mlx5e: CT: Fix queued up restore put() executing after relevant ft release
__mlx5_tc_ct_entry_put() queues release of tuple related to some ct FT,
if that is the last reference to that tuple, the actual deletion of
the tuple can happen after the FT is already destroyed and freed.
Flush the used workqueue before destroying the ct FT.
Fixes:
a2173131526d ("net/mlx5e: CT: manage the lifetime of the ct entry object")
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Ariel Levkovich [Mon, 25 Apr 2022 14:12:12 +0000 (17:12 +0300)]
net/mlx5e: TC, fix decap fallback to uplink when int port not supported
When resolving the decap route device for a tunnel decap rule,
the result may be an OVS internal port device.
Prior to adding the support for internal port offload, such case
would result in using the uplink as the default decap route device
which allowed devices that can't support internal port offload
to offload this decap rule.
This behavior got broken by adding the internal port offload which
will fail in case the device can't support internal port offload.
To restore the old behavior, use the uplink device as the decap
route as before when internal port offload is not supported.
Fixes:
b16eb3c81fe2 ("net/mlx5: Support internal port as decap route device")
Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Ariel Levkovich [Wed, 23 Feb 2022 19:29:17 +0000 (21:29 +0200)]
net/mlx5e: TC, Fix ct_clear overwriting ct action metadata
ct_clear action is translated to clearing reg_c metadata
which holds ct state and zone information using mod header
actions.
These actions are allocated during the actions parsing, as
part of the flow attributes main mod header action list.
If ct action exists in the rule, the flow's main mod header
is used only in the post action table rule, after the ct tables
which set the ct info in the reg_c as part of the ct actions.
Therefore, if the original rule has a ct_clear action followed
by a ct action, the ct action reg_c setting will be done first and
will be followed by the ct_clear resetting reg_c and overwriting
the ct info.
Fix this by moving the ct_clear mod header actions allocation from
the ct action parsing stage to the ct action post parsing stage where
it is already known if ct_clear is followed by a ct action.
In such case, we skip the mod header actions allocation for the ct
clear since the ct action will write to reg_c anyway after clearing it.
Fixes:
806401c20a0f ("net/mlx5e: CT, Fix multiple allocations and memleak of mod acts")
Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Vlad Buslov [Mon, 18 Apr 2022 14:40:37 +0000 (17:40 +0300)]
net/mlx5e: Lag, Don't skip fib events on current dst
Referenced change added check to skip updating fib when new fib instance
has same or lower priority. However, new fib instance can be an update on
same dst address as existing one even though the structure is another
instance that has different address. Ignoring events on such instances
causes multipath LAG state to not be correctly updated.
Track 'dst' and 'dst_len' fields of fib event fib_entry_notifier_info
structure and don't skip events that have the same value of that fields.
Fixes:
ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Vlad Buslov [Mon, 18 Apr 2022 14:32:54 +0000 (17:32 +0300)]
net/mlx5e: Lag, Fix fib_info pointer assignment
Referenced change incorrectly sets single path fib_info even when LAG is
not active. Fix it by moving call to mlx5_lag_fib_set() into conditional
that verifies LAG state.
Fixes:
ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Vlad Buslov [Mon, 18 Apr 2022 14:32:19 +0000 (17:32 +0300)]
net/mlx5e: Lag, Fix use-after-free in fib event handler
Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.
Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.
[0]:
[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr
ffff888144df2050 by task kworker/u20:4/138
[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>
[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3
[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 203.666952] The buggy address belongs to the object at
ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [
ffff888144df2000,
ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:
00000000bf6c9314 refcount:1 mapcount:0 mapping:
0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:
00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw:
002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw:
0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected
[ 203.681455] Memory state around the buggy address:
[ 203.682421]
ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863]
ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >
ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820]
ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226]
ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================
Fixes:
ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Mark Zhang [Wed, 6 Apr 2022 07:30:21 +0000 (10:30 +0300)]
net/mlx5e: Fix the calling of update_buffer_lossy() API
The arguments of update_buffer_lossy() is in a wrong order. Fix it.
Fixes:
88b3d5c90e96 ("net/mlx5e: Fix port buffers cell size value")
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Vlad Buslov [Mon, 28 Mar 2022 12:54:52 +0000 (15:54 +0300)]
net/mlx5e: Don't match double-vlan packets if cvlan is not set
Currently, match VLAN rule also matches packets that have multiple VLAN
headers. This behavior is similar to buggy flower classifier behavior that
has recently been fixed. Fix the issue by matching on
outer_second_cvlan_tag with value 0 which will cause the HW to verify the
packet doesn't contain second vlan header.
Fixes:
699e96ddf47f ("net/mlx5e: Support offloading tc double vlan headers match")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Aya Levin [Thu, 3 Mar 2022 17:02:03 +0000 (19:02 +0200)]
net/mlx5: Fix slab-out-of-bounds while reading resource dump menu
Resource dump menu may span over more than a single page, support it.
Otherwise, menu read may result in a memory access violation: reading
outside of the allocated page.
Note that page format of the first menu page contains menu headers while
the proceeding menu pages contain only records.
The KASAN logs are as follows:
BUG: KASAN: slab-out-of-bounds in strcmp+0x9b/0xb0
Read of size 1 at addr
ffff88812b2e1fd0 by task systemd-udevd/496
CPU: 5 PID: 496 Comm: systemd-udevd Tainted: G B 5.16.0_for_upstream_debug_2022_01_10_23_12 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x57/0x7d
print_address_description.constprop.0+0x1f/0x140
? strcmp+0x9b/0xb0
? strcmp+0x9b/0xb0
kasan_report.cold+0x83/0xdf
? strcmp+0x9b/0xb0
strcmp+0x9b/0xb0
mlx5_rsc_dump_init+0x4ab/0x780 [mlx5_core]
? mlx5_rsc_dump_destroy+0x80/0x80 [mlx5_core]
? lockdep_hardirqs_on_prepare+0x286/0x400
? raw_spin_unlock_irqrestore+0x47/0x50
? aomic_notifier_chain_register+0x32/0x40
mlx5_load+0x104/0x2e0 [mlx5_core]
mlx5_init_one+0x41b/0x610 [mlx5_core]
....
The buggy address belongs to the object at
ffff88812b2e0000
which belongs to the cache kmalloc-4k of size 4096
The buggy address is located 4048 bytes to the right of
4096-byte region [
ffff88812b2e0000,
ffff88812b2e1000)
The buggy address belongs to the page:
page:
000000009d69807a refcount:1 mapcount:0 mapping:
0000000000000000 index:0xffff88812b2e6000 pfn:0x12b2e0
head:
000000009d69807a order:3 compound_mapcount:0 compound_pincount:0
flags: 0x8000000000010200(slab|head|zone=2)
raw:
8000000000010200 0000000000000000 dead000000000001 ffff888100043040
raw:
ffff88812b2e6000 0000000080040000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff88812b2e1e80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88812b2e1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>
ffff88812b2e1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff88812b2e2000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88812b2e2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Fixes:
12206b17235a ("net/mlx5: Add support for resource dump")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Ariel Levkovich [Tue, 15 Mar 2022 16:20:48 +0000 (18:20 +0200)]
net/mlx5e: Fix wrong source vport matching on tunnel rule
When OVS internal port is the vtep device, the first decap
rule is matching on the internal port's vport metadata value
and then changes the metadata to be the uplink's value.
Therefore, following rules on the tunnel, in chain > 0, should
avoid matching on internal port metadata and use the uplink
vport metadata instead.
Select the uplink's metadata value for the source vport match
in case the rule is in chain greater than zero, even if the tunnel
route device is internal port.
Fixes:
166f431ec6be ("net/mlx5e: Add indirect tc offload of ovs internal port")
Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Jakub Kicinski [Wed, 4 May 2022 00:41:35 +0000 (17:41 -0700)]
Merge branch 'bnxt_en-bug-fixes'
Michael Chan says:
====================
bnxt_en: Bug fixes
This patch series includes 3 fixes:
- Fix an occasional VF open failure.
- Fix a PTP spinlock usage before initialization
- Fix unnecesary RX packet drops under high TX traffic load.
====================
Link: https://lore.kernel.org/r/1651540392-2260-1-git-send-email-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Chan [Tue, 3 May 2022 01:13:12 +0000 (21:13 -0400)]
bnxt_en: Fix unnecessary dropping of RX packets
In bnxt_poll_p5(), we first check cpr->has_more_work. If it is true,
we are in NAPI polling mode and we will call __bnxt_poll_cqs() to
continue polling. It is possible to exhanust the budget again when
__bnxt_poll_cqs() returns.
We then enter the main while loop to check for new entries in the NQ.
If we had previously exhausted the NAPI budget, we may call
__bnxt_poll_work() to process an RX entry with zero budget. This will
cause packets to be dropped unnecessarily, thinking that we are in the
netpoll path. Fix it by breaking out of the while loop if we need
to process an RX NQ entry with no budget left. We will then exit
NAPI and stay in polling mode.
Fixes:
389a877a3b20 ("bnxt_en: Process the NQ under NAPI continuous polling.")
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Chan [Tue, 3 May 2022 01:13:11 +0000 (21:13 -0400)]
bnxt_en: Initiallize bp->ptp_lock first before using it
bnxt_ptp_init() calls bnxt_ptp_init_rtc() which will acquire the ptp_lock
spinlock. The spinlock is not initialized until later. Move the
bnxt_ptp_init_rtc() call after the spinlock is initialized.
Fixes:
24ac1ecd5240 ("bnxt_en: Add driver support to use Real Time Counter for PTP")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Somnath Kotur [Tue, 3 May 2022 01:13:10 +0000 (21:13 -0400)]
bnxt_en: Fix possible bnxt_open() failure caused by wrong RFS flag
bnxt_open() can fail in this code path, especially on a VF when
it fails to reserve default rings:
bnxt_open()
__bnxt_open_nic()
bnxt_clear_int_mode()
bnxt_init_dflt_ring_mode()
RX rings would be set to 0 when we hit this error path.
It is possible for a subsequent bnxt_open() call to potentially succeed
with a code path like this:
bnxt_open()
bnxt_hwrm_if_change()
bnxt_fw_init_one()
bnxt_fw_init_one_p3()
bnxt_set_dflt_rfs()
bnxt_rfs_capable()
bnxt_hwrm_reserve_rings()
On older chips, RFS is capable if we can reserve the number of vnics that
is equal to RX rings + 1. But since RX rings is still set to 0 in this
code path, we may mistakenly think that RFS is supported for 0 RX rings.
Later, when the default RX rings are reserved and we try to enable
RFS, it would fail and cause bnxt_open() to fail unnecessarily.
We fix this in 2 places. bnxt_rfs_capable() will always return false if
RX rings is not yet set. bnxt_init_dflt_ring_mode() will call
bnxt_set_dflt_rfs() which will always clear the RFS flags if RFS is not
supported.
Fixes:
20d7d1c5c9b1 ("bnxt_en: reliably allocate IRQ table on reset to avoid crash")
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sergey Shtylyov [Mon, 2 May 2022 20:14:09 +0000 (23:14 +0300)]
smsc911x: allow using IRQ0
The AlphaProject AP-SH4A-3A/AP-SH4AD-0A SH boards use IRQ0 for their SMSC
LAN911x Ethernet chip, so the networking on them must have been broken by
commit
965b2aa78fbc ("net/smsc911x: fix irq resource allocation failure")
which filtered out 0 as well as the negative error codes -- it was kinda
correct at the time, as platform_get_irq() could return 0 on of_irq_get()
failure and on the actual 0 in an IRQ resource. This issue was fixed by
me (back in 2016!), so we should be able to fix this driver to allow IRQ0
usage again...
When merging this to the stable kernels, make sure you also merge commit
e330b9a6bb35 ("platform: don't return 0 from platform_get_irq[_byname]()
on error") -- that's my fix to platform_get_irq() for the DT platforms...
Fixes:
965b2aa78fbc ("net/smsc911x: fix irq resource allocation failure")
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Link: https://lore.kernel.org/r/656036e4-6387-38df-b8a7-6ba683b16e63@omp.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthew Hagan [Mon, 2 May 2022 22:33:15 +0000 (23:33 +0100)]
net: sfp: Add tx-fault workaround for Huawei MA5671A SFP ONT
As noted elsewhere, various GPON SFP modules exhibit non-standard
TX-fault behaviour. In the tested case, the Huawei MA5671A, when used
in combination with a Marvell mv88e6085 switch, was found to
persistently assert TX-fault, resulting in the module being disabled.
This patch adds a quirk to ignore the SFP_F_TX_FAULT state, allowing the
module to function.
Change from v1: removal of erroneous return statment (Andrew Lunn)
Signed-off-by: Matthew Hagan <mnhagan88@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220502223315.1973376-1-mnhagan88@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tetsuo Handa [Mon, 2 May 2022 01:40:18 +0000 (10:40 +0900)]
net: rds: acquire refcount on TCP sockets
syzbot is reporting use-after-free read in tcp_retransmit_timer() [1],
for TCP socket used by RDS is accessing sock_net() without acquiring a
refcount on net namespace. Since TCP's retransmission can happen after
a process which created net namespace terminated, we need to explicitly
acquire a refcount.
Link: https://syzkaller.appspot.com/bug?extid=694120e1002c117747ed
Reported-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com>
Fixes:
26abe14379f8e2fa ("net: Modify sk_alloc to not reference count the netns of kernel sockets.")
Fixes:
8a68173691f03661 ("net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com>
Link: https://lore.kernel.org/r/a5fb1fc4-2284-3359-f6a0-e4e390239d7b@I-love.SAKURA.ne.jp
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Marc Kleine-Budde [Mon, 2 May 2022 09:46:38 +0000 (11:46 +0200)]
selftests/net: so_txtime: usage(): fix documentation of default clock
The program uses CLOCK_TAI as default clock since it was added to the
Linux repo. In commit:
|
040806343bb4 ("selftests/net: so_txtime multi-host support")
a help text stating the wrong default clock was added.
This patch fixes the help text.
Fixes:
040806343bb4 ("selftests/net: so_txtime multi-host support")
Cc: Carlos Llamas <cmllamas@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20220502094638.1921702-3-mkl@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Marc Kleine-Budde [Mon, 2 May 2022 09:46:37 +0000 (11:46 +0200)]
selftests/net: so_txtime: fix parsing of start time stamp on 32 bit systems
This patch fixes the parsing of the cmd line supplied start time on 32
bit systems. A "long" on 32 bit systems is only 32 bit wide and cannot
hold a timestamp in nano second resolution.
Fixes:
040806343bb4 ("selftests/net: so_txtime multi-host support")
Cc: Carlos Llamas <cmllamas@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20220502094638.1921702-2-mkl@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Ido Schimmel [Mon, 2 May 2022 08:45:07 +0000 (11:45 +0300)]
selftests: mirror_gre_bridge_1q: Avoid changing PVID while interface is operational
In emulated environments, the bridge ports enslaved to br1 get a carrier
before changing br1's PVID. This means that by the time the PVID is
changed, br1 is already operational and configured with an IPv6
link-local address.
When the test is run with netdevs registered by mlxsw, changing the PVID
is vetoed, as changing the VID associated with an existing L3 interface
is forbidden. This restriction is similar to the 8021q driver's
restriction of changing the VID of an existing interface.
Fix this by taking br1 down and bringing it back up when it is fully
configured.
With this fix, the test reliably passes on top of both the SW and HW
data paths (emulated or not).
Fixes:
239e754af854 ("selftests: forwarding: Test mirror-to-gretap w/ UL 802.1q")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/20220502084507.364774-1-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Tue, 3 May 2022 09:07:37 +0000 (11:07 +0200)]
Merge branch 'emaclite-improve-error-handling-and-minor-cleanup'
Radhey Shyam Pandey says:
====================
emaclite: improve error handling and minor cleanup
This patchset does error handling for of_address_to_resource() and also
removes "Don't advertise 1000BASE-T" and auto negotiation.
Changes for v3:
- Resolve git apply conflicts for 2/2 patch.
Changes for v2:
- Added Andrew's reviewed by tag in 1/2 patch.
- Move ret to down to align with reverse xmas tree style in 2/2 patch.
- Also add fixes tag in 2/2 patch.
- Specify tree name in subject prefix.
====================
Link: https://lore.kernel.org/r/1651476470-23904-1-git-send-email-radhey.shyam.pandey@xilinx.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Shravya Kumbham [Mon, 2 May 2022 07:27:50 +0000 (12:57 +0530)]
net: emaclite: Add error handling for of_address_to_resource()
check the return value of of_address_to_resource() and also add
missing of_node_put() for np and npp nodes.
Fixes:
e0a3bc65448c ("net: emaclite: Support multiple phys connected to one MDIO bus")
Addresses-Coverity: Event check_return value.
Signed-off-by: Shravya Kumbham <shravya.kumbham@xilinx.com>
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Shravya Kumbham [Mon, 2 May 2022 07:27:49 +0000 (12:57 +0530)]
net: emaclite: Don't advertise 1000BASE-T and do auto negotiation
In xemaclite_open() function we are setting the max speed of
emaclite to 100Mb using phy_set_max_speed() function so,
there is no need to write the advertising registers to stop
giga-bit speed and the phy_start() function starts the
auto-negotiation so, there is no need to handle it separately
using advertising registers. Remove the phy_read and phy_write
of advertising registers in xemaclite_open() function.
Signed-off-by: Shravya Kumbham <shravya.kumbham@xilinx.com>
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Russell King (Oracle) [Fri, 29 Apr 2022 16:43:03 +0000 (09:43 -0700)]
net: dsa: b53: convert to phylink_pcs
Convert B53 to use phylink_pcs for the serdes rather than hooking it
into the MAC-layer callbacks.
Fixes:
81c1681cbb9f ("net: dsa: b53: mark as non-legacy")
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Gleixner [Fri, 29 Apr 2022 13:54:24 +0000 (15:54 +0200)]
pci_irq_vector() can't be used in atomic context any longer. This conflicts
with the usage of this function in nic_mbx_intr_handler().
Cache the Linux interrupt numbers in struct nicpf and use that cache in the
interrupt handler to select the mailbox.
Fixes:
495c66aca3da ("genirq/msi: Convert to new functions")
Reported-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Sunil Goutham <sgoutham@marvell.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: netdev@vger.kernel.org
Cc: stable@vger.kernel.org
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2041772
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 1 May 2022 12:26:05 +0000 (13:26 +0100)]
Merge branch 'nfc-fixes'
Duoming Zhou says:
====================
Replace improper checks and fix bugs in nfc subsystem
The first patch is used to replace improper checks in netlink related
functions of nfc core, the second patch is used to fix bugs in
nfcmrvl driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Duoming Zhou [Fri, 29 Apr 2022 12:45:51 +0000 (20:45 +0800)]
nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs
There are destructive operations such as nfcmrvl_fw_dnld_abort and
gpio_free in nfcmrvl_nci_unregister_dev. The resources such as firmware,
gpio and so on could be destructed while the upper layer functions such as
nfcmrvl_fw_dnld_start and nfcmrvl_nci_recv_frame is executing, which leads
to double-free, use-after-free and null-ptr-deref bugs.
There are three situations that could lead to double-free bugs.
The first situation is shown below:
(Thread 1) | (Thread 2)
nfcmrvl_fw_dnld_start |
... | nfcmrvl_nci_unregister_dev
release_firmware() | nfcmrvl_fw_dnld_abort
kfree(fw) //(1) | fw_dnld_over
| release_firmware
... | kfree(fw) //(2)
| ...
The second situation is shown below:
(Thread 1) | (Thread 2)
nfcmrvl_fw_dnld_start |
... |
mod_timer |
(wait a time) |
fw_dnld_timeout | nfcmrvl_nci_unregister_dev
fw_dnld_over | nfcmrvl_fw_dnld_abort
release_firmware | fw_dnld_over
kfree(fw) //(1) | release_firmware
... | kfree(fw) //(2)
The third situation is shown below:
(Thread 1) | (Thread 2)
nfcmrvl_nci_recv_frame |
if(..->fw_download_in_progress)|
nfcmrvl_fw_dnld_recv_frame |
queue_work |
|
fw_dnld_rx_work | nfcmrvl_nci_unregister_dev
fw_dnld_over | nfcmrvl_fw_dnld_abort
release_firmware | fw_dnld_over
kfree(fw) //(1) | release_firmware
| kfree(fw) //(2)
The firmware struct is deallocated in position (1) and deallocated
in position (2) again.
The crash trace triggered by POC is like below:
BUG: KASAN: double-free or invalid-free in fw_dnld_over
Call Trace:
kfree
fw_dnld_over
nfcmrvl_nci_unregister_dev
nci_uart_tty_close
tty_ldisc_kill
tty_ldisc_hangup
__tty_hangup.part.0
tty_release
...
What's more, there are also use-after-free and null-ptr-deref bugs
in nfcmrvl_fw_dnld_start. If we deallocate firmware struct, gpio or
set null to the members of priv->fw_dnld in nfcmrvl_nci_unregister_dev,
then, we dereference firmware, gpio or the members of priv->fw_dnld in
nfcmrvl_fw_dnld_start, the UAF or NPD bugs will happen.
This patch reorders destructive operations after nci_unregister_device
in order to synchronize between cleanup routine and firmware download
routine.
The nci_unregister_device is well synchronized. If the device is
detaching, the firmware download routine will goto error. If firmware
download routine is executing, nci_unregister_device will wait until
firmware download routine is finished.
Fixes:
3194c6870158 ("NFC: nfcmrvl: add firmware download support")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Duoming Zhou [Fri, 29 Apr 2022 12:45:50 +0000 (20:45 +0800)]
nfc: replace improper check device_is_registered() in netlink related functions
The device_is_registered() in nfc core is used to check whether
nfc device is registered in netlink related functions such as
nfc_fw_download(), nfc_dev_up() and so on. Although device_is_registered()
is protected by device_lock, there is still a race condition between
device_del() and device_is_registered(). The root cause is that
kobject_del() in device_del() is not protected by device_lock.
(cleanup task) | (netlink task)
|
nfc_unregister_device | nfc_fw_download
device_del | device_lock
... | if (!device_is_registered)//(1)
kobject_del//(2) | ...
... | device_unlock
The device_is_registered() returns the value of state_in_sysfs and
the state_in_sysfs is set to zero in kobject_del(). If we pass check in
position (1), then set zero in position (2). As a result, the check
in position (1) is useless.
This patch uses bool variable instead of device_is_registered() to judge
whether the nfc device is registered, which is well synchronized.
Fixes:
3e256b8f8dfa ("NFC: add nfc subsystem core")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tan Tee Min [Fri, 29 Apr 2022 11:58:07 +0000 (19:58 +0800)]
net: stmmac: disable Split Header (SPH) for Intel platforms
Based on DesignWare Ethernet QoS datasheet, we are seeing the limitation
of Split Header (SPH) feature is not supported for Ipv4 fragmented packet.
This SPH limitation will cause ping failure when the packets size exceed
the MTU size. For example, the issue happens once the basic ping packet
size is larger than the configured MTU size and the data is lost inside
the fragmented packet, replaced by zeros/corrupted values, and leads to
ping fail.
So, disable the Split Header for Intel platforms.
v2: Add fixes tag in commit message.
Fixes:
67afd6d1cfdf("net: stmmac: Add Split Header support and enable it in XGMAC cores")
Cc: <stable@vger.kernel.org> # 5.10.x
Suggested-by: Ong, Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Mohammad Athari Bin Ismail <mohammad.athari.ismail@intel.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 29 Apr 2022 16:20:36 +0000 (09:20 -0700)]
mld: respect RCU rules in ip6_mc_source() and ip6_mc_msfilter()
Whenever RCU protected list replaces an object,
the pointer to the new object needs to be updated
_before_ the call to kfree_rcu() or call_rcu()
Also ip6_mc_msfilter() needs to update the pointer
before releasing the mc_lock mutex.
Note that linux-5.13 was supporting kfree_rcu(NULL, rcu),
so this fix does not need the conditional test I was
forced to use in the equivalent patch for IPv4.
Fixes:
882ba1f73c06 ("mld: convert ipv6_mc_socklist->sflist to RCU")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 29 Apr 2022 15:42:57 +0000 (08:42 -0700)]
net: igmp: respect RCU rules in ip_mc_source() and ip_mc_msfilter()
syzbot reported an UAF in ip_mc_sf_allow() [1]
Whenever RCU protected list replaces an object,
the pointer to the new object needs to be updated
_before_ the call to kfree_rcu() or call_rcu()
Because kfree_rcu(ptr, rcu) got support for NULL ptr
only recently in commit
12edff045bc6 ("rcu: Make kfree_rcu()
ignore NULL pointers"), I chose to use the conditional
to make sure stable backports won't miss this detail.
if (psl)
kfree_rcu(psl, rcu);
net/ipv6/mcast.c has similar issues, addressed in a separate patch.
[1]
BUG: KASAN: use-after-free in ip_mc_sf_allow+0x6bb/0x6d0 net/ipv4/igmp.c:2655
Read of size 4 at addr
ffff88807d37b904 by task syz-executor.5/908
CPU: 0 PID: 908 Comm: syz-executor.5 Not tainted
5.18.0-rc4-syzkaller-00064-g8f4dd16603ce #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description.constprop.0.cold+0xeb/0x467 mm/kasan/report.c:313
print_report mm/kasan/report.c:429 [inline]
kasan_report.cold+0xf4/0x1c6 mm/kasan/report.c:491
ip_mc_sf_allow+0x6bb/0x6d0 net/ipv4/igmp.c:2655
raw_v4_input net/ipv4/raw.c:190 [inline]
raw_local_deliver+0x4d1/0xbe0 net/ipv4/raw.c:218
ip_protocol_deliver_rcu+0xcf/0xb30 net/ipv4/ip_input.c:193
ip_local_deliver_finish+0x2ee/0x4c0 net/ipv4/ip_input.c:233
NF_HOOK include/linux/netfilter.h:307 [inline]
NF_HOOK include/linux/netfilter.h:301 [inline]
ip_local_deliver+0x1b3/0x200 net/ipv4/ip_input.c:254
dst_input include/net/dst.h:461 [inline]
ip_rcv_finish+0x1cb/0x2f0 net/ipv4/ip_input.c:437
NF_HOOK include/linux/netfilter.h:307 [inline]
NF_HOOK include/linux/netfilter.h:301 [inline]
ip_rcv+0xaa/0xd0 net/ipv4/ip_input.c:556
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5405
__netif_receive_skb+0x24/0x1b0 net/core/dev.c:5519
netif_receive_skb_internal net/core/dev.c:5605 [inline]
netif_receive_skb+0x13e/0x8e0 net/core/dev.c:5664
tun_rx_batched.isra.0+0x460/0x720 drivers/net/tun.c:1534
tun_get_user+0x28b7/0x3e30 drivers/net/tun.c:1985
tun_chr_write_iter+0xdb/0x200 drivers/net/tun.c:2015
call_write_iter include/linux/fs.h:2050 [inline]
new_sync_write+0x38a/0x560 fs/read_write.c:504
vfs_write+0x7c0/0xac0 fs/read_write.c:591
ksys_write+0x127/0x250 fs/read_write.c:644
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f3f12c3bbff
Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 99 fd ff ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 cc fd ff ff 48
RSP: 002b:
00007f3f13ea9130 EFLAGS:
00000293 ORIG_RAX:
0000000000000001
RAX:
ffffffffffffffda RBX:
00007f3f12d9bf60 RCX:
00007f3f12c3bbff
RDX:
0000000000000036 RSI:
0000000020002ac0 RDI:
00000000000000c8
RBP:
00007f3f12ce308d R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000036 R11:
0000000000000293 R12:
0000000000000000
R13:
00007fffb68dd79f R14:
00007f3f13ea9300 R15:
0000000000022000
</TASK>
Allocated by task 908:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:436 [inline]
____kasan_kmalloc mm/kasan/common.c:515 [inline]
____kasan_kmalloc mm/kasan/common.c:474 [inline]
__kasan_kmalloc+0xa6/0xd0 mm/kasan/common.c:524
kasan_kmalloc include/linux/kasan.h:234 [inline]
__do_kmalloc mm/slab.c:3710 [inline]
__kmalloc+0x209/0x4d0 mm/slab.c:3719
kmalloc include/linux/slab.h:586 [inline]
sock_kmalloc net/core/sock.c:2501 [inline]
sock_kmalloc+0xb5/0x100 net/core/sock.c:2492
ip_mc_source+0xba2/0x1100 net/ipv4/igmp.c:2392
do_ip_setsockopt net/ipv4/ip_sockglue.c:1296 [inline]
ip_setsockopt+0x2312/0x3ab0 net/ipv4/ip_sockglue.c:1432
raw_setsockopt+0x274/0x2c0 net/ipv4/raw.c:861
__sys_setsockopt+0x2db/0x6a0 net/socket.c:2180
__do_sys_setsockopt net/socket.c:2191 [inline]
__se_sys_setsockopt net/socket.c:2188 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2188
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Freed by task 753:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:366 [inline]
____kasan_slab_free+0x13d/0x180 mm/kasan/common.c:328
kasan_slab_free include/linux/kasan.h:200 [inline]
__cache_free mm/slab.c:3439 [inline]
kmem_cache_free_bulk+0x69/0x460 mm/slab.c:3774
kfree_bulk include/linux/slab.h:437 [inline]
kfree_rcu_work+0x51c/0xa10 kernel/rcu/tree.c:3318
process_one_work+0x996/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e9/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298
Last potentially related work creation:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
__kasan_record_aux_stack+0x7e/0x90 mm/kasan/generic.c:348
kvfree_call_rcu+0x74/0x990 kernel/rcu/tree.c:3595
ip_mc_msfilter+0x712/0xb60 net/ipv4/igmp.c:2510
do_ip_setsockopt net/ipv4/ip_sockglue.c:1257 [inline]
ip_setsockopt+0x32e1/0x3ab0 net/ipv4/ip_sockglue.c:1432
raw_setsockopt+0x274/0x2c0 net/ipv4/raw.c:861
__sys_setsockopt+0x2db/0x6a0 net/socket.c:2180
__do_sys_setsockopt net/socket.c:2191 [inline]
__se_sys_setsockopt net/socket.c:2188 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2188
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Second to last potentially related work creation:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
__kasan_record_aux_stack+0x7e/0x90 mm/kasan/generic.c:348
call_rcu+0x99/0x790 kernel/rcu/tree.c:3074
mpls_dev_notify+0x552/0x8a0 net/mpls/af_mpls.c:1656
notifier_call_chain+0xb5/0x200 kernel/notifier.c:84
call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:1938
call_netdevice_notifiers_extack net/core/dev.c:1976 [inline]
call_netdevice_notifiers net/core/dev.c:1990 [inline]
unregister_netdevice_many+0x92e/0x1890 net/core/dev.c:10751
default_device_exit_batch+0x449/0x590 net/core/dev.c:11245
ops_exit_list+0x125/0x170 net/core/net_namespace.c:167
cleanup_net+0x4ea/0xb00 net/core/net_namespace.c:594
process_one_work+0x996/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e9/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298
The buggy address belongs to the object at
ffff88807d37b900
which belongs to the cache kmalloc-64 of size 64
The buggy address is located 4 bytes inside of
64-byte region [
ffff88807d37b900,
ffff88807d37b940)
The buggy address belongs to the physical page:
page:
ffffea0001f4dec0 refcount:1 mapcount:0 mapping:
0000000000000000 index:0xffff88807d37b180 pfn:0x7d37b
flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff)
raw:
00fff00000000200 ffff888010c41340 ffffea0001c795c8 ffff888010c40200
raw:
ffff88807d37b180 ffff88807d37b000 000000010000001f 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x342040(__GFP_IO|__GFP_NOWARN|__GFP_COMP|__GFP_HARDWALL|__GFP_THISNODE), pid 2963, tgid 2963 (udevd), ts
139732238007, free_ts
139730893262
prep_new_page mm/page_alloc.c:2441 [inline]
get_page_from_freelist+0xba2/0x3e00 mm/page_alloc.c:4182
__alloc_pages+0x1b2/0x500 mm/page_alloc.c:5408
__alloc_pages_node include/linux/gfp.h:587 [inline]
kmem_getpages mm/slab.c:1378 [inline]
cache_grow_begin+0x75/0x350 mm/slab.c:2584
cache_alloc_refill+0x27f/0x380 mm/slab.c:2957
____cache_alloc mm/slab.c:3040 [inline]
____cache_alloc mm/slab.c:3023 [inline]
__do_cache_alloc mm/slab.c:3267 [inline]
slab_alloc mm/slab.c:3309 [inline]
__do_kmalloc mm/slab.c:3708 [inline]
__kmalloc+0x3b3/0x4d0 mm/slab.c:3719
kmalloc include/linux/slab.h:586 [inline]
kzalloc include/linux/slab.h:714 [inline]
tomoyo_encode2.part.0+0xe9/0x3a0 security/tomoyo/realpath.c:45
tomoyo_encode2 security/tomoyo/realpath.c:31 [inline]
tomoyo_encode+0x28/0x50 security/tomoyo/realpath.c:80
tomoyo_realpath_from_path+0x186/0x620 security/tomoyo/realpath.c:288
tomoyo_get_realpath security/tomoyo/file.c:151 [inline]
tomoyo_path_perm+0x21b/0x400 security/tomoyo/file.c:822
security_inode_getattr+0xcf/0x140 security/security.c:1350
vfs_getattr fs/stat.c:157 [inline]
vfs_statx+0x16a/0x390 fs/stat.c:232
vfs_fstatat+0x8c/0xb0 fs/stat.c:255
__do_sys_newfstatat+0x91/0x110 fs/stat.c:425
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
page last free stack trace:
reset_page_owner include/linux/page_owner.h:24 [inline]
free_pages_prepare mm/page_alloc.c:1356 [inline]
free_pcp_prepare+0x549/0xd20 mm/page_alloc.c:1406
free_unref_page_prepare mm/page_alloc.c:3328 [inline]
free_unref_page+0x19/0x6a0 mm/page_alloc.c:3423
__vunmap+0x85d/0xd30 mm/vmalloc.c:2667
__vfree+0x3c/0xd0 mm/vmalloc.c:2715
vfree+0x5a/0x90 mm/vmalloc.c:2746
__do_replace+0x16b/0x890 net/ipv6/netfilter/ip6_tables.c:1117
do_replace net/ipv6/netfilter/ip6_tables.c:1157 [inline]
do_ip6t_set_ctl+0x90d/0xb90 net/ipv6/netfilter/ip6_tables.c:1639
nf_setsockopt+0x83/0xe0 net/netfilter/nf_sockopt.c:101
ipv6_setsockopt+0x122/0x180 net/ipv6/ipv6_sockglue.c:1026
tcp_setsockopt+0x136/0x2520 net/ipv4/tcp.c:3696
__sys_setsockopt+0x2db/0x6a0 net/socket.c:2180
__do_sys_setsockopt net/socket.c:2191 [inline]
__se_sys_setsockopt net/socket.c:2188 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2188
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Memory state around the buggy address:
ffff88807d37b800: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
ffff88807d37b880: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
>
ffff88807d37b900: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
^
ffff88807d37b980: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
ffff88807d37ba00: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
Fixes:
c85bb41e9318 ("igmp: fix ip_mc_sf_allow race [v5]")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Howells [Fri, 29 Apr 2022 20:05:16 +0000 (21:05 +0100)]
rxrpc: Enable IPv6 checksums on transport socket
AF_RXRPC doesn't currently enable IPv6 UDP Tx checksums on the transport
socket it opens and the checksums in the packets it generates end up 0.
It probably should also enable IPv6 UDP Rx checksums and IPv4 UDP
checksums. The latter only seem to be applied if the socket family is
AF_INET and don't seem to apply if it's AF_INET6. IPv4 packets from an
IPv6 socket seem to have checksums anyway.
What seems to have happened is that the inet_inv_convert_csum() call didn't
get converted to the appropriate udp_port_cfg parameters - and
udp_sock_create() disables checksums unless explicitly told not too.
Fix this by enabling the three udp_port_cfg checksum options.
Fixes:
1a9b86c9fd95 ("rxrpc: use udp tunnel APIs instead of open code in rxrpc_open_socket")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
cc: Vadim Fedorenko <vfedorenko@novek.ru>
cc: David S. Miller <davem@davemloft.net>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Yingliang [Fri, 29 Apr 2022 01:53:37 +0000 (09:53 +0800)]
net: cpsw: add missing of_node_put() in cpsw_probe_dt()
'tmp_node' need be put before returning from cpsw_probe_dt(),
so add missing of_node_put() in error path.
Fixes:
ed3525eda4c4 ("net: ethernet: ti: introduce cpsw switchdev based driver part 1 - dual-emac")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Yingliang [Thu, 28 Apr 2022 09:57:16 +0000 (17:57 +0800)]
net: stmmac: dwmac-sun8i: add missing of_node_put() in sun8i_dwmac_register_mdio_mux()
The node pointer returned by of_get_child_by_name() with refcount incremented,
so add of_node_put() after using it.
Fixes:
634db83b8265 ("net: stmmac: dwmac-sun8i: Handle integrated/external MDIOs")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20220428095716.540452-1-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yang Yingliang [Thu, 28 Apr 2022 09:53:17 +0000 (17:53 +0800)]
net: dsa: mt7530: add missing of_node_put() in mt7530_setup()
Add of_node_put() if of_get_phy_mode() fails in mt7530_setup()
Fixes:
0c65b2b90d13 ("net: of_get_phy_mode: Change API to solve int/unit warnings")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20220428095317.538829-1-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Arun Ramadoss [Thu, 28 Apr 2022 07:07:09 +0000 (12:37 +0530)]
net: dsa: ksz9477: port mirror sniffing limited to one port
This patch limits the sniffing to only one port during the mirror add.
And during the mirror_del it checks for all the ports using the sniff,
if and only if no other ports are referring, sniffing is disabled.
The code is updated based on the review comments of LAN937x port mirror
patch.
Link: https://patchwork.kernel.org/project/netdevbpf/patch/20210422094257.1641396-8-prasanna.vengateshan@microchip.com/
Fixes:
b987e98e50ab ("dsa: add DSA switch driver for Microchip KSZ9477")
Signed-off-by: Prasanna Vengateshan <prasanna.vengateshan@microchip.com>
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Link: https://lore.kernel.org/r/20220428070709.7094-1-arun.ramadoss@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Qiao Ma [Thu, 28 Apr 2022 12:30:16 +0000 (20:30 +0800)]
hinic: fix bug of wq out of bound access
If wq has only one page, we need to check wqe rolling over page by
compare end_idx and curr_idx, and then copy wqe to shadow wqe to
avoid out of bound access.
This work has been done in hinic_get_wqe, but missed for hinic_read_wqe.
This patch fixes it, and removes unnecessary MASKED_WQE_IDX().
Fixes:
7dd29ee12865 ("hinic: add sriov feature support")
Signed-off-by: Qiao Ma <mqaio@linux.alibaba.com>
Reviewed-by: Xunlei Pang <xlpang@linux.alibaba.com>
Link: https://lore.kernel.org/r/282817b0e1ae2e28fdf3ed8271a04e77f57bf42e.1651148587.git.mqaio@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Niels Dossche [Thu, 28 Apr 2022 21:19:32 +0000 (23:19 +0200)]
net: mdio: Fix ENOMEM return value in BCM6368 mux bus controller
Error values inside the probe function must be < 0. The ENOMEM return
value has the wrong sign: it is positive instead of negative.
Add a minus sign.
Fixes:
e239756717b5 ("net: mdio: Add BCM6368 MDIO mux bus controller")
Signed-off-by: Niels Dossche <dossche.niels@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220428211931.8130-1-dossche.niels@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yang Yingliang [Thu, 28 Apr 2022 06:25:43 +0000 (14:25 +0800)]
net: ethernet: mediatek: add missing of_node_put() in mtk_sgmii_init()
The node pointer returned by of_parse_phandle() with refcount incremented,
so add of_node_put() after using it in mtk_sgmii_init().
Fixes:
9ffee4a8276c ("net: ethernet: mediatek: Extend SGMII related functions")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20220428062543.64883-1-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 30 Apr 2022 00:51:37 +0000 (17:51 -0700)]
Merge branch 'selftests-net-add-missing-tests-to-makefile'
Hangbin Liu says:
====================
selftests: net: add missing tests to Makefile
When generating the selftests to another folder, the fixed tests are
missing as they are not in Makefile. The missing tests are generated
by command:
$ for f in $(ls *.sh); do grep -q $f Makefile || echo $f; done
====================
Link: https://lore.kernel.org/r/20220428044511.227416-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hangbin Liu [Thu, 28 Apr 2022 04:45:11 +0000 (12:45 +0800)]
selftests/net/forwarding: add missing tests to Makefile
When generating the selftests to another folder, the fixed tests are
missing as they are not in Makefile, e.g.
make -C tools/testing/selftests/ install \
TARGETS="net/forwarding" INSTALL_PATH=/tmp/kselftests
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hangbin Liu [Thu, 28 Apr 2022 04:45:10 +0000 (12:45 +0800)]
selftests/net: add missing tests to Makefile
When generating the selftests to another folder, the fixed tests are
missing as they are not in Makefile, e.g.
make -C tools/testing/selftests/ install \
TARGETS="net" INSTALL_PATH=/tmp/kselftests
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 29 Apr 2022 19:33:54 +0000 (12:33 -0700)]
Merge tag 'linux-can-fixes-for-5.18-
20220429' of git://git./linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2022-04-29
The first patch is by Oliver Hartkopp and removes the ability to
re-binding bounds sockets from the ISOTP. It turned out to be not
needed and brings unnecessary complexity.
The last 4 patches all target the grcan driver. Duoming Zhou's patch
fixes a potential dead lock in the grcan_close() function. Daniel
Hellstrom's patch fixes the dma_alloc_coherent() to use the correct
device. Andreas Larsson's 1st patch fixes a broken system id check,
the 2nd patch fixes the NAPI poll budget usage.
* tag 'linux-can-fixes-for-5.18-
20220429' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: grcan: only use the NAPI poll budget for RX
can: grcan: grcan_probe(): fix broken system id check for errata workaround needs
can: grcan: use ofdev->dev when allocating DMA memory
can: grcan: grcan_close(): fix deadlock
can: isotp: remove re-binding of bound socket
====================
Link: https://lore.kernel.org/r/20220429125612.1792561-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Andreas Larsson [Fri, 29 Apr 2022 08:46:56 +0000 (10:46 +0200)]
can: grcan: only use the NAPI poll budget for RX
The previous split budget between TX and RX made it return not using
the entire budget but at the same time not having calling called
napi_complete. This sometimes led to the poll to not be called, and at
the same time having TX and RX interrupts disabled resulting in the
driver getting stuck.
Fixes:
6cec9b07fe6a ("can: grcan: Add device driver for GRCAN and GRHCAN cores")
Link: https://lore.kernel.org/all/20220429084656.29788-4-andreas@gaisler.com
Cc: stable@vger.kernel.org
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Andreas Larsson [Fri, 29 Apr 2022 08:46:55 +0000 (10:46 +0200)]
can: grcan: grcan_probe(): fix broken system id check for errata workaround needs
The systemid property was checked for in the wrong place of the device
tree and compared to the wrong value.
Fixes:
6cec9b07fe6a ("can: grcan: Add device driver for GRCAN and GRHCAN cores")
Link: https://lore.kernel.org/all/20220429084656.29788-3-andreas@gaisler.com
Cc: stable@vger.kernel.org
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Daniel Hellstrom [Fri, 29 Apr 2022 08:46:54 +0000 (10:46 +0200)]
can: grcan: use ofdev->dev when allocating DMA memory
Use the device of the device tree node should be rather than the
device of the struct net_device when allocating DMA buffers.
The driver got away with it on sparc32 until commit
53b7670e5735
("sparc: factor the dma coherent mapping into helper") after which the
driver oopses.
Fixes:
6cec9b07fe6a ("can: grcan: Add device driver for GRCAN and GRHCAN cores")
Link: https://lore.kernel.org/all/20220429084656.29788-2-andreas@gaisler.com
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Hellstrom <daniel@gaisler.com>
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Duoming Zhou [Mon, 25 Apr 2022 04:24:00 +0000 (12:24 +0800)]
can: grcan: grcan_close(): fix deadlock
There are deadlocks caused by del_timer_sync(&priv->hang_timer) and
del_timer_sync(&priv->rr_timer) in grcan_close(), one of the deadlocks
are shown below:
(Thread 1) | (Thread 2)
| grcan_reset_timer()
grcan_close() | mod_timer()
spin_lock_irqsave() //(1) | (wait a time)
... | grcan_initiate_running_reset()
del_timer_sync() | spin_lock_irqsave() //(2)
(wait timer to stop) | ...
We hold priv->lock in position (1) of thread 1 and use
del_timer_sync() to wait timer to stop, but timer handler also need
priv->lock in position (2) of thread 2. As a result, grcan_close()
will block forever.
This patch extracts del_timer_sync() from the protection of
spin_lock_irqsave(), which could let timer handler to obtain the
needed lock.
Link: https://lore.kernel.org/all/20220425042400.66517-1-duoming@zju.edu.cn
Fixes:
6cec9b07fe6a ("can: grcan: Add device driver for GRCAN and GRHCAN cores")
Cc: stable@vger.kernel.org
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Oliver Hartkopp [Fri, 22 Apr 2022 08:23:37 +0000 (10:23 +0200)]
can: isotp: remove re-binding of bound socket
As a carry over from the CAN_RAW socket (which allows to change the CAN
interface while mantaining the filter setup) the re-binding of the
CAN_ISOTP socket needs to take care about CAN ID address information and
subscriptions. It turned out that this feature is so limited (e.g. the
sockopts remain fix) that it finally has never been needed/used.
In opposite to the stateless CAN_RAW socket the switching of the CAN ID
subscriptions might additionally lead to an interrupted ongoing PDU
reception. So better remove this unneeded complexity.
Fixes:
e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol")
Link: https://lore.kernel.org/all/20220422082337.1676-1-socketcan@hartkopp.net
Cc: stable@vger.kernel.org
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Linus Torvalds [Thu, 28 Apr 2022 19:34:50 +0000 (12:34 -0700)]
Merge tag 'net-5.18-rc5' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth, bpf and netfilter.
Current release - new code bugs:
- bridge: switchdev: check br_vlan_group() return value
- use this_cpu_inc() to increment net->core_stats, fix preempt-rt
Previous releases - regressions:
- eth: stmmac: fix write to sgmii_adapter_base
Previous releases - always broken:
- netfilter: nf_conntrack_tcp: re-init for syn packets only,
resolving issues with TCP fastopen
- tcp: md5: fix incorrect tcp_header_len for incoming connections
- tcp: fix F-RTO may not work correctly when receiving DSACK
- tcp: ensure use of most recently sent skb when filling rate samples
- tcp: fix potential xmit stalls caused by TCP_NOTSENT_LOWAT
- virtio_net: fix wrong buf address calculation when using xdp
- xsk: fix forwarding when combining copy mode with busy poll
- xsk: fix possible crash when multiple sockets are created
- bpf: lwt: fix crash when using bpf_skb_set_tunnel_key() from
bpf_xmit lwt hook
- sctp: null-check asoc strreset_chunk in sctp_generate_reconf_event
- wireguard: device: check for metadata_dst with skb_valid_dst()
- netfilter: update ip6_route_me_harder to consider L3 domain
- gre: make o_seqno start from 0 in native mode
- gre: switch o_seqno to atomic to prevent races in collect_md mode
Misc:
- add Eric Dumazet to networking maintainers
- dt: dsa: realtek: remove realtek,rtl8367s string
- netfilter: flowtable: Remove the empty file"
* tag 'net-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits)
tcp: fix F-RTO may not work correctly when receiving DSACK
Revert "ibmvnic: Add ethtool private flag for driver-defined queue limits"
net: enetc: allow tc-etf offload even with NETIF_F_CSUM_MASK
ixgbe: ensure IPsec VF<->PF compatibility
MAINTAINERS: Update BNXT entry with firmware files
netfilter: nft_socket: only do sk lookups when indev is available
net: fec: add missing of_node_put() in fec_enet_init_stop_mode()
bnx2x: fix napi API usage sequence
tls: Skip tls_append_frag on zero copy size
Add Eric Dumazet to networking maintainers
netfilter: conntrack: fix udp offload timeout sysctl
netfilter: nf_conntrack_tcp: re-init for syn packets only
net: dsa: lantiq_gswip: Don't set GSWIP_MII_CFG_RMII_CLK
net: Use this_cpu_inc() to increment net->core_stats
Bluetooth: hci_sync: Cleanup hci_conn if it cannot be aborted
Bluetooth: hci_event: Fix creating hci_conn object on error status
Bluetooth: hci_event: Fix checking for invalid handle on error status
ice: fix use-after-free when deinitializing mailbox snapshot
ice: wait 5 s for EMP reset after firmware flash
ice: Protect vf_state check by cfg_lock in ice_vc_process_vf_msg()
...
Linus Torvalds [Thu, 28 Apr 2022 18:57:00 +0000 (11:57 -0700)]
Merge tag 'thermal-5.18-rc5' of git://git./linux/kernel/git/rafael/linux-pm
Pull thermal control fixes from Rafael Wysocki:
"These take back recent chages that started to confuse users and fix up
an attr.show callback prototype in a driver.
Specifics:
- Stop warning about deprecation of the userspace thermal governor
and cooling device status interface, because there are cases in
which user space has to drive thermal management with the help of
them (Daniel Lezcano)
- Fix attr.show callback prototype in the int340x thermal driver
(Kees Cook)"
* tag 'thermal-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal/governor: Remove deprecated information
Revert "thermal/core: Deprecate changing cooling device state from userspace"
thermal: int340x: Fix attr.show callback prototype
Linus Torvalds [Thu, 28 Apr 2022 18:50:21 +0000 (11:50 -0700)]
Merge tag 'pm-5.18-rc5' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix up recent intel_idle driver changes and fix some ARM cpufreq
driver issues.
Specifics:
- Fix issues with the Qualcomm's cpufreq driver (Dmitry Baryshkov,
Vladimir Zapolskiy).
- Fix memory leak with the Sun501 driver (Xiaobing Luo).
- Make intel_idle enable C1E promotion on all CPUs when C1E is
preferred to C1 (Artem Bityutskiy).
- Make C6 optimization on Sapphire Rapids added recently work as
expected if both C1E and C1 are "preferred" (Artem Bityutskiy)"
* tag 'pm-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
intel_idle: Fix SPR C6 optimization
intel_idle: Fix the 'preferred_cstates' module parameter
cpufreq: qcom-cpufreq-hw: Clear dcvs interrupts
cpufreq: fix memory leak in sun50i_cpufreq_nvmem_probe
cpufreq: qcom-cpufreq-hw: Fix throttle frequency value on EPSS platforms
cpufreq: qcom-hw: provide online/offline operations
cpufreq: qcom-hw: fix the opp entries refcounting
cpufreq: qcom-hw: fix the race between LMH worker and cpuhp
cpufreq: qcom-hw: drop affinity hint before freeing the IRQ
Linus Torvalds [Thu, 28 Apr 2022 18:37:20 +0000 (11:37 -0700)]
Merge tag 'acpi-5.18-rc5' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael WysockiL
"These fix up the ACPI processor driver after a change made during the
5.16 cycle that inadvertently broke falling back to shallower C-states
when C3 cannot be used.
Specifics:
- Make the ACPI processor driver avoid falling back to C3 type of
C-states when C3 cannot be requested (Ville Syrjälä)
- Revert a quirk that is not necessary any more after fixing the
underlying issue properly (Ville Syrjälä)"
* tag 'acpi-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "ACPI: processor: idle: fix lockup regression on 32-bit ThinkPad T40"
ACPI: processor: idle: Avoid falling back to C3 type C-states
Linus Torvalds [Thu, 28 Apr 2022 18:13:00 +0000 (11:13 -0700)]
Merge tag 'platform-drivers-x86-v5.18-3' of git://git./linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
"Highlights:
- asus-wmi bug-fixes
- intel-sdsu bug-fixes
- build (warning) fixes
- couple of hw-id additions"
* tag 'platform-drivers-x86-v5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86/intel: pmc/core: change pmc_lpm_modes to static
platform/x86/intel/sdsi: Fix bug in multi packet reads
platform/x86/intel/sdsi: Poll on ready bit for writes
platform/x86/intel/sdsi: Handle leaky bucket
platform/x86: intel-uncore-freq: Prevent driver loading in guests
platform/x86: gigabyte-wmi: added support for B660 GAMING X DDR4 motherboard
platform/x86: dell-laptop: Add quirk entry for Latitude 7520
platform/x86: asus-wmi: Fix driver not binding when fan curve control probe fails
platform/x86: asus-wmi: Potential buffer overflow in asus_wmi_evaluate_method_buf()
tools/power/x86/intel-speed-select: fix build failure when using -Wl,--as-needed
Linus Torvalds [Thu, 28 Apr 2022 18:07:49 +0000 (11:07 -0700)]
Merge tag 'regulator-fix-v5.18-rc4' of git://git./linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"A minor fix for the DT binding documentation of the rt5190a driver"
* tag 'regulator-fix-v5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: dt-bindings: Revise the rt5190a buck/ldo description
Pengcheng Yang [Tue, 26 Apr 2022 10:03:39 +0000 (18:03 +0800)]
tcp: fix F-RTO may not work correctly when receiving DSACK
Currently DSACK is regarded as a dupack, which may cause
F-RTO to incorrectly enter "loss was real" when receiving
DSACK.
Packetdrill to demonstrate:
// Enable F-RTO and TLP
0 `sysctl -q net.ipv4.tcp_frto=2`
0 `sysctl -q net.ipv4.tcp_early_retrans=3`
0 `sysctl -q net.ipv4.tcp_congestion_control=cubic`
// Establish a connection
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
// RTT 10ms, RTO 210ms
+.1 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
+0 > S. 0:0(0) ack 1 <...>
+.01 < . 1:1(0) ack 1 win 257
+0 accept(3, ..., ...) = 4
// Send 2 data segments
+0 write(4, ..., 2000) = 2000
+0 > P. 1:2001(2000) ack 1
// TLP
+.022 > P. 1001:2001(1000) ack 1
// Continue to send 8 data segments
+0 write(4, ..., 10000) = 10000
+0 > P. 2001:10001(8000) ack 1
// RTO
+.188 > . 1:1001(1000) ack 1
// The original data is acked and new data is sent(F-RTO step 2.b)
+0 < . 1:1(0) ack 2001 win 257
+0 > P. 10001:12001(2000) ack 1
// D-SACK caused by TLP is regarded as a dupack, this results in
// the incorrect judgment of "loss was real"(F-RTO step 3.a)
+.022 < . 1:1(0) ack 2001 win 257 <sack 1001:2001,nop,nop>
// Never-retransmitted data(3001:4001) are acked and
// expect to switch to open state(F-RTO step 3.b)
+0 < . 1:1(0) ack 4001 win 257
+0 %{ assert tcpi_ca_state == 0, tcpi_ca_state }%
Fixes:
e33099f96d99 ("tcp: implement RFC5682 F-RTO")
Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/1650967419-2150-1-git-send-email-yangpc@wangsu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 28 Apr 2022 16:55:59 +0000 (09:55 -0700)]
Merge git://git./linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) Fix incorrect TCP connection tracking window reset for non-syn
packets, from Florian Westphal.
2) Incorrect dependency on CONFIG_NFT_FLOW_OFFLOAD, from Volodymyr Mytnyk.
3) Fix nft_socket from the output path, from Florian Westphal.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nft_socket: only do sk lookups when indev is available
netfilter: conntrack: fix udp offload timeout sysctl
netfilter: nf_conntrack_tcp: re-init for syn packets only
====================
Link: https://lore.kernel.org/r/20220428142109.38726-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 28 Apr 2022 16:50:29 +0000 (09:50 -0700)]
Merge tag 'gfs2-v5.18-rc4-fix2' of git://git./linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fix from Andreas Gruenbacher:
- No short reads or writes upon glock contention
* tag 'gfs2-v5.18-rc4-fix2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: No short reads or writes upon glock contention
Dany Madden [Wed, 27 Apr 2022 23:51:46 +0000 (18:51 -0500)]
Revert "ibmvnic: Add ethtool private flag for driver-defined queue limits"
This reverts commit
723ad916134784b317b72f3f6cf0f7ba774e5dae
When client requests channel or ring size larger than what the server
can support the server will cap the request to the supported max. So,
the client would not be able to successfully request resources that
exceed the server limit.
Fixes:
723ad9161347 ("ibmvnic: Add ethtool private flag for driver-defined queue limits")
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Link: https://lore.kernel.org/r/20220427235146.23189-1-drt@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Wed, 27 Apr 2022 20:30:17 +0000 (23:30 +0300)]
net: enetc: allow tc-etf offload even with NETIF_F_CSUM_MASK
The Time-Specified Departure feature is indeed mutually exclusive with
TX IP checksumming in ENETC, but TX checksumming in itself is broken and
was removed from this driver in commit
82728b91f124 ("enetc: Remove Tx
checksumming offload code").
The blamed commit declared NETIF_F_HW_CSUM in dev->features to comply
with software TSO's expectations, and still did the checksumming in
software by calling skb_checksum_help(). So there isn't any restriction
for the Time-Specified Departure feature.
However, enetc_setup_tc_txtime() doesn't understand that, and blindly
looks for NETIF_F_CSUM_MASK.
Instead of checking for things which can literally never happen in the
current code base, just remove the check and let the driver offload
tc-etf qdiscs.
Fixes:
acede3c5dad5 ("net: enetc: declare NETIF_F_HW_CSUM and do it in software")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220427203017.1291634-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Leon Romanovsky [Wed, 27 Apr 2022 17:31:52 +0000 (10:31 -0700)]
ixgbe: ensure IPsec VF<->PF compatibility
The VF driver can forward any IPsec flags and such makes the function
is not extendable and prone to backward/forward incompatibility.
If new software runs on VF, it won't know that PF configured something
completely different as it "knows" only XFRM_OFFLOAD_INBOUND flag.
Fixes:
eda0333ac293 ("ixgbe: add VF IPsec management")
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Shannon Nelson <snelson@pensando.io>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20220427173152.443102-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 28 Apr 2022 16:37:56 +0000 (09:37 -0700)]
Merge tag 'xfs-5.18-fixes-1' of git://git./fs/xfs/xfs-linux
Pull xfs fixes from Dave Chinner:
- define buffer bit flags as unsigned to fix gcc-5 + c11 warnings
- remove redundant XFS fields from MAINTAINERS
- fix inode buffer locking order regression
* tag 'xfs-5.18-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: reorder iunlink remove operation in xfs_ifree
MAINTAINERS: update IOMAP FILESYSTEM LIBRARY and XFS FILESYSTEM
xfs: convert buffer flags to unsigned.
Florian Fainelli [Wed, 27 Apr 2022 16:36:06 +0000 (09:36 -0700)]
MAINTAINERS: Update BNXT entry with firmware files
There appears to be a maintainer gap for BNXT TEE firmware files which
causes some patches to be missed. Update the entry for the BNXT Ethernet
controller with its companion firmware files.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20220427163606.126154-1-f.fainelli@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rafael J. Wysocki [Thu, 28 Apr 2022 14:51:24 +0000 (16:51 +0200)]
Merge branch 'thermal-int340x'
Merge a fix for the attr.show callback prototype in the int340x thermal
driver (Kees Cook).
* thermal-int340x:
thermal: int340x: Fix attr.show callback prototype
Florian Westphal [Thu, 28 Apr 2022 07:39:21 +0000 (09:39 +0200)]
netfilter: nft_socket: only do sk lookups when indev is available
Check if the incoming interface is available and NFT_BREAK
in case neither skb->sk nor input device are set.
Because nf_sk_lookup_slow*() assume packet headers are in the
'in' direction, use in postrouting is not going to yield a meaningful
result. Same is true for the forward chain, so restrict the use
to prerouting, input and output.
Use in output work if a socket is already attached to the skb.
Fixes:
554ced0a6e29 ("netfilter: nf_tables: add support for native socket matching")
Reported-and-tested-by: Topi Miettinen <toiwoton@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Rafael J. Wysocki [Thu, 28 Apr 2022 14:09:50 +0000 (16:09 +0200)]
Merge branch 'pm-cpuidle'
Merge cpuidle fixes for 5.18-rc5:
- Make intel_idle enable C1E promotion on all CPUs when C1E is
preferred to C1 (Artem Bityutskiy).
- Make C6 optimization on Sapphire Rapids added recently work as
expected if both C1E and C1 are "preferred" (Artem Bityutskiy).
* pm-cpuidle:
intel_idle: Fix SPR C6 optimization
intel_idle: Fix the 'preferred_cstates' module parameter
Andreas Gruenbacher [Thu, 28 Apr 2022 12:51:33 +0000 (14:51 +0200)]
gfs2: No short reads or writes upon glock contention
Commit
00bfe02f4796 ("gfs2: Fix mmap + page fault deadlocks for buffered
I/O") changed gfs2_file_read_iter() and gfs2_file_buffered_write() to
allow dropping the inode glock while faulting in user buffers. When the
lock was dropped, a short result was returned to indicate that the
operation was interrupted.
As pointed out by Linus (see the link below), this behavior is broken
and the operations should always re-acquire the inode glock and resume
the operation instead.
Link: https://lore.kernel.org/lkml/CAHk-=whaz-g_nOOoo8RRiWNjnv2R+h6_xk2F1J4TuSRxk1MtLw@mail.gmail.com/
Fixes:
00bfe02f4796 ("gfs2: Fix mmap + page fault deadlocks for buffered I/O")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Paolo Abeni [Thu, 28 Apr 2022 08:18:51 +0000 (10:18 +0200)]
Merge tag 'for-net-2022-04-27' of git://git./linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- Fix regression causing some HCI events to be discarded when they
shouldn't.
* tag 'for-net-2022-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: hci_sync: Cleanup hci_conn if it cannot be aborted
Bluetooth: hci_event: Fix creating hci_conn object on error status
Bluetooth: hci_event: Fix checking for invalid handle on error status
====================
Link: https://lore.kernel.org/r/20220427234031.1257281-1-luiz.dentz@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Yang Yingliang [Tue, 26 Apr 2022 12:52:31 +0000 (20:52 +0800)]
net: fec: add missing of_node_put() in fec_enet_init_stop_mode()
Put device node in error path in fec_enet_init_stop_mode().
Fixes:
8a448bf832af ("net: ethernet: fec: move GPR register offset and bit into DT")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20220426125231.375688-1-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Manish Chopra [Tue, 26 Apr 2022 15:39:13 +0000 (08:39 -0700)]
bnx2x: fix napi API usage sequence
While handling PCI errors (AER flow) driver tries to
disable NAPI [napi_disable()] after NAPI is deleted
[__netif_napi_del()] which causes unexpected system
hang/crash.
System message log shows the following:
=======================================
[ 3222.537510] EEH: Detected PCI bus error on PHB#384-PE#800000 [ 3222.537511] EEH: This PCI device has failed 2 times in the last hour and will be permanently disabled after 5 failures.
[ 3222.537512] EEH: Notify device drivers to shutdown [ 3222.537513] EEH: Beginning: 'error_detected(IO frozen)'
[ 3222.537514] EEH: PE#800000 (PCI 0384:80:00.0): Invoking
bnx2x->error_detected(IO frozen)
[ 3222.537516] bnx2x: [bnx2x_io_error_detected:14236(eth14)]IO error detected [ 3222.537650] EEH: PE#800000 (PCI 0384:80:00.0): bnx2x driver reports:
'need reset'
[ 3222.537651] EEH: PE#800000 (PCI 0384:80:00.1): Invoking
bnx2x->error_detected(IO frozen)
[ 3222.537651] bnx2x: [bnx2x_io_error_detected:14236(eth13)]IO error detected [ 3222.537729] EEH: PE#800000 (PCI 0384:80:00.1): bnx2x driver reports:
'need reset'
[ 3222.537729] EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'need reset'
[ 3222.537890] EEH: Collect temporary log [ 3222.583481] EEH: of node=0384:80:00.0 [ 3222.583519] EEH: PCI device/vendor:
168e14e4 [ 3222.583557] EEH: PCI cmd/status register:
00100140 [ 3222.583557] EEH: PCI-E capabilities and status follow:
[ 3222.583744] EEH: PCI-E 00:
00020010 012c8da2 00095d5e 00455c82 [ 3222.583892] EEH: PCI-E 10:
10820000 00000000 00000000 00000000 [ 3222.583893] EEH: PCI-E 20:
00000000 [ 3222.583893] EEH: PCI-E AER capability register set follows:
[ 3222.584079] EEH: PCI-E AER 00:
13c10001 00000000 00000000 00062030 [ 3222.584230] EEH: PCI-E AER 10:
00002000 000031c0 000001e0 00000000 [ 3222.584378] EEH: PCI-E AER 20:
00000000 00000000 00000000 00000000 [ 3222.584416] EEH: PCI-E AER 30:
00000000 00000000 [ 3222.584416] EEH: of node=0384:80:00.1 [ 3222.584454] EEH: PCI device/vendor:
168e14e4 [ 3222.584491] EEH: PCI cmd/status register:
00100140 [ 3222.584492] EEH: PCI-E capabilities and status follow:
[ 3222.584677] EEH: PCI-E 00:
00020010 012c8da2 00095d5e 00455c82 [ 3222.584825] EEH: PCI-E 10:
10820000 00000000 00000000 00000000 [ 3222.584826] EEH: PCI-E 20:
00000000 [ 3222.584826] EEH: PCI-E AER capability register set follows:
[ 3222.585011] EEH: PCI-E AER 00:
13c10001 00000000 00000000 00062030 [ 3222.585160] EEH: PCI-E AER 10:
00002000 000031c0 000001e0 00000000 [ 3222.585309] EEH: PCI-E AER 20:
00000000 00000000 00000000 00000000 [ 3222.585347] EEH: PCI-E AER 30:
00000000 00000000 [ 3222.586872] RTAS: event: 5, Type: Platform Error (224), Severity: 2 [ 3222.586873] EEH: Reset without hotplug activity [ 3224.762767] EEH: Beginning: 'slot_reset'
[ 3224.762770] EEH: PE#800000 (PCI 0384:80:00.0): Invoking
bnx2x->slot_reset()
[ 3224.762771] bnx2x: [bnx2x_io_slot_reset:14271(eth14)]IO slot reset initializing...
[ 3224.762887] bnx2x 0384:80:00.0: enabling device (0140 -> 0142) [ 3224.768157] bnx2x: [bnx2x_io_slot_reset:14287(eth14)]IO slot reset
--> driver unload
Uninterruptible tasks
=====================
crash> ps | grep UN
213 2 11
c000000004c89e00 UN 0.0 0 0 [eehd]
215 2 0
c000000004c80000 UN 0.0 0 0
[kworker/0:2]
2196 1 28
c000000004504f00 UN 0.1 15936 11136 wickedd
4287 1 9
c00000020d076800 UN 0.0 4032 3008 agetty
4289 1 20
c00000020d056680 UN 0.0 7232 3840 agetty
32423 2 26
c00000020038c580 UN 0.0 0 0
[kworker/26:3]
32871 4241 27
c0000002609ddd00 UN 0.1 18624 11648 sshd
32920 10130 16
c00000027284a100 UN 0.1 48512 12608 sendmail
33092 32987 0
c000000205218b00 UN 0.1 48512 12608 sendmail
33154 4567 16
c000000260e51780 UN 0.1 48832 12864 pickup
33209 4241 36
c000000270cb6500 UN 0.1 18624 11712 sshd
33473 33283 0
c000000205211480 UN 0.1 48512 12672 sendmail
33531 4241 37
c00000023c902780 UN 0.1 18624 11648 sshd
EEH handler hung while bnx2x sleeping and holding RTNL lock
===========================================================
crash> bt 213
PID: 213 TASK:
c000000004c89e00 CPU: 11 COMMAND: "eehd"
#0 [
c000000004d477e0] __schedule at
c000000000c70808
#1 [
c000000004d478b0] schedule at
c000000000c70ee0
#2 [
c000000004d478e0] schedule_timeout at
c000000000c76dec
#3 [
c000000004d479c0] msleep at
c0000000002120cc
#4 [
c000000004d479f0] napi_disable at
c000000000a06448
^^^^^^^^^^^^^^^^
#5 [
c000000004d47a30] bnx2x_netif_stop at
c0080000018dba94 [bnx2x]
#6 [
c000000004d47a60] bnx2x_io_slot_reset at
c0080000018a551c [bnx2x]
#7 [
c000000004d47b20] eeh_report_reset at
c00000000004c9bc
#8 [
c000000004d47b90] eeh_pe_report at
c00000000004d1a8
#9 [
c000000004d47c40] eeh_handle_normal_event at
c00000000004da64
And the sleeping source code
============================
crash> dis -ls
c000000000a06448
FILE: ../net/core/dev.c
LINE: 6702
6697 {
6698 might_sleep();
6699 set_bit(NAPI_STATE_DISABLE, &n->state);
6700
6701 while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
* 6702 msleep(1);
6703 while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
6704 msleep(1);
6705
6706 hrtimer_cancel(&n->timer);
6707
6708 clear_bit(NAPI_STATE_DISABLE, &n->state);
6709 }
EEH calls into bnx2x twice based on the system log above, first through
bnx2x_io_error_detected() and then bnx2x_io_slot_reset(), and executes
the following call chains:
bnx2x_io_error_detected()
+-> bnx2x_eeh_nic_unload()
+-> bnx2x_del_all_napi()
+-> __netif_napi_del()
bnx2x_io_slot_reset()
+-> bnx2x_netif_stop()
+-> bnx2x_napi_disable()
+->napi_disable()
Fix this by correcting the sequence of NAPI APIs usage,
that is delete the NAPI after disabling it.
Fixes:
7fa6f34081f1 ("bnx2x: AER revised")
Reported-by: David Christensen <drc@linux.vnet.ibm.com>
Tested-by: David Christensen <drc@linux.vnet.ibm.com>
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Link: https://lore.kernel.org/r/20220426153913.6966-1-manishc@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Tue, 26 Apr 2022 15:49:49 +0000 (18:49 +0300)]
tls: Skip tls_append_frag on zero copy size
Calling tls_append_frag when max_open_record_len == record->len might
add an empty fragment to the TLS record if the call happens to be on the
page boundary. Normally tls_append_frag coalesces the zero-sized
fragment to the previous one, but not if it's on page boundary.
If a resync happens then, the mlx5 driver posts dump WQEs in
tx_post_resync_dump, and the empty fragment may become a data segment
with byte_count == 0, which will confuse the NIC and lead to a CQE
error.
This commit fixes the described issue by skipping tls_append_frag on
zero size to avoid adding empty fragments. The fix is not in the driver,
because an empty fragment is hardly the desired behavior.
Fixes:
e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20220426154949.159055-1-maximmi@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 27 Apr 2022 22:18:39 +0000 (15:18 -0700)]
Merge https://git./linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2022-04-27
We've added 5 non-merge commits during the last 20 day(s) which contain
a total of 6 files changed, 34 insertions(+), 12 deletions(-).
The main changes are:
1) Fix xsk sockets when rx and tx are separately bound to the same umem, also
fix xsk copy mode combined with busy poll, from Maciej Fijalkowski.
2) Fix BPF tunnel/collect_md helpers with bpf_xmit lwt hook usage which triggered
a crash due to invalid metadata_dst access, from Eyal Birger.
3) Fix release of page pool in XDP live packet mode, from Toke Høiland-Jørgensen.
4) Fix potential NULL pointer dereference in kretprobes, from Adam Zabrocki.
(Masami & Steven preferred this small fix to be routed via bpf tree given it's
follow-up fix to Masami's rethook work that went via bpf earlier, too.)
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
xsk: Fix possible crash when multiple sockets are created
kprobes: Fix KRETPROBES when CONFIG_KRETPROBE_ON_RETHOOK is set
bpf, lwt: Fix crash when using bpf_skb_set_tunnel_key() from bpf_xmit lwt hook
bpf: Fix release of page_pool in BPF_PROG_RUN in test runner
xsk: Fix l2fwd for copy mode + busy poll combo
====================
Link: https://lore.kernel.org/r/20220427212748.9576-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Wed, 27 Apr 2022 20:44:37 +0000 (13:44 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
"Two patches.
Subsystems affected by this patch series: mm/kasan and mm/debug"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
docs: vm/page_owner: use literal blocks for param description
kasan: prevent cpu_quarantine corruption when CPU offline and cache shrink occur at same time
Akira Yokosawa [Wed, 27 Apr 2022 19:41:59 +0000 (12:41 -0700)]
docs: vm/page_owner: use literal blocks for param description
Sphinx generates hard-to-read lists of parameters at the bottom of the
page. Fix them by putting literal-block markers of "::" in front of
them.
Link: https://lkml.kernel.org/r/cfd3bcc0-b51d-0c68-c065-ca1c4c202447@gmail.com
Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
Fixes:
57f2b54a9379 ("Documentation/vm/page_owner.rst: update the documentation")
Cc: Shenghong Han <hanshenghong2019@email.szu.edu.cn>
Cc: Haowen Bai <baihaowen@meizu.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Alex Shi <seakeel@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zqiang [Wed, 27 Apr 2022 19:41:56 +0000 (12:41 -0700)]
kasan: prevent cpu_quarantine corruption when CPU offline and cache shrink occur at same time
kasan_quarantine_remove_cache() is called in kmem_cache_shrink()/
destroy(). The kasan_quarantine_remove_cache() call is protected by
cpuslock in kmem_cache_destroy() to ensure serialization with
kasan_cpu_offline().
However the kasan_quarantine_remove_cache() call is not protected by
cpuslock in kmem_cache_shrink(). When a CPU is going offline and cache
shrink occurs at same time, the cpu_quarantine may be corrupted by
interrupt (per_cpu_remove_cache operation).
So add a cpu_quarantine offline flags check in per_cpu_remove_cache().
[akpm@linux-foundation.org: add comment, per Zqiang]
Link: https://lkml.kernel.org/r/20220414025925.2423818-1-qiang1.zhang@intel.com
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Artem Bityutskiy [Wed, 27 Apr 2022 06:08:53 +0000 (09:08 +0300)]
intel_idle: Fix SPR C6 optimization
The Sapphire Rapids (SPR) C6 optimization was added to the end of the
'spr_idle_state_table_update()' function. However, the function has a
'return' which may happen before the optimization has a chance to run.
And this may prevent the optimization from happening.
This is an unlikely scenario, but possible if user boots with, say,
the 'intel_idle.preferred_cstates=6' kernel boot option.
This patch fixes the issue by eliminating the problematic 'return'
statement.
Fixes:
3a9cf77b60dc ("intel_idle: add core C6 optimization for SPR")
Suggested-by: Jan Beulich <jbeulich@suse.com>
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
[ rjw: Minor changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Artem Bityutskiy [Wed, 27 Apr 2022 06:08:52 +0000 (09:08 +0300)]
intel_idle: Fix the 'preferred_cstates' module parameter
Problem description.
When user boots kernel up with the 'intel_idle.preferred_cstates=4' option,
we enable C1E and disable C1 states on Sapphire Rapids Xeon (SPR). In order
for C1E to work on SPR, we have to enable the C1E promotion bit on all
CPUs. However, we enable it only on one CPU.
Fix description.
The 'intel_idle' driver already has the infrastructure for disabling C1E
promotion on every CPU. This patch uses the same infrastructure for
enabling C1E promotion on every CPU. It changes the boolean
'disable_promotion_to_c1e' variable to a tri-state 'c1e_promotion'
variable.
Tested on a 2-socket SPR system. I verified the following combinations:
* C1E promotion enabled and disabled in BIOS.
* Booted with and without the 'intel_idle.preferred_cstates=4' kernel
argument.
In all 4 cases C1E promotion was correctly set on all CPUs.
Also tested on an old Broadwell system, just to make sure it does not cause
a regression. C1E promotion was correctly disabled on that system, both C1
and C1E were exposed (as expected).
Fixes:
da0e58c038e6 ("intel_idle: add 'preferred_cstates' module argument")
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
[ rjw: Minor changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Rafael J. Wysocki [Wed, 27 Apr 2022 18:20:03 +0000 (20:20 +0200)]
Merge tag 'cpufreq-arm-fixes-5.18-rc5' of git://git./linux/kernel/git/vireshk/pm
Pull ARM cpufreq fixes for 5.18-rc5 from Viresh Kumar:
"- Fix issues with the Qualcomm's cpufreq driver (Dmitry Baryshkov and
Vladimir Zapolskiy).
- Fix memory leak with the Sun501 driver (Xiaobing Luo)."
* tag 'cpufreq-arm-fixes-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: qcom-cpufreq-hw: Clear dcvs interrupts
cpufreq: fix memory leak in sun50i_cpufreq_nvmem_probe
cpufreq: qcom-cpufreq-hw: Fix throttle frequency value on EPSS platforms
cpufreq: qcom-hw: provide online/offline operations
cpufreq: qcom-hw: fix the opp entries refcounting
cpufreq: qcom-hw: fix the race between LMH worker and cpuhp
cpufreq: qcom-hw: drop affinity hint before freeing the IRQ
Mikulas Patocka [Wed, 27 Apr 2022 15:26:40 +0000 (11:26 -0400)]
hex2bin: fix access beyond string end
If we pass too short string to "hex2bin" (and the string size without
the terminating NUL character is even), "hex2bin" reads one byte after
the terminating NUL character. This patch fixes it.
Note that hex_to_bin returns -1 on error and hex2bin return -EINVAL on
error - so we can't just return the variable "hi" or "lo" on error.
This inconsistency may be fixed in the next merge window, but for the
purpose of fixing this bug, we just preserve the existing behavior and
return -1 and -EINVAL.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Fixes:
b78049831ffe ("lib: add error checking to hex2bin")
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>