Alexandru Tachici [Fri, 29 Apr 2022 15:34:32 +0000 (18:34 +0300)]
net: phy: Add 10-BaseT1L registers
The 802.3gc specification defines the 10-BaseT1L link
mode for ethernet trafic on twisted wire pair.
PMA status register can be used to detect if the phy supports
2.4 V TX level and PCS control register can be used to
enable/disable PCS level loopback.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexandru Tachici [Fri, 29 Apr 2022 15:34:31 +0000 (18:34 +0300)]
ethtool: Add 10base-T1L link mode entry
Add entry for the 10base-T1L full duplex mode.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Behún [Fri, 29 Apr 2022 14:32:14 +0000 (16:32 +0200)]
net: dsa: mv88e6xxx: Cosmetic change spaces to tabs in dsa_switch_ops
All but 5 methods in dsa_swith_ops use tabs for indentation.
Change the 5 methods that break this rule.
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasily Averin [Fri, 29 Apr 2022 05:17:35 +0000 (08:17 +0300)]
net: enable memcg accounting for veth queues
veth netdevice defines own rx queues and allocates array containing
up to 4095 ~750-bytes-long 'struct veth_rq' elements. Such allocation
is quite huge and should be accounted to memcg.
Signed-off-by: Vasily Averin <vvs@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 1 May 2022 11:19:01 +0000 (12:19 +0100)]
Merge branch 'UDP-sock_wfree-opts'
Pavel Begunkov says:
====================
UDP sock_wfree optimisations
The series is not UDP specific but that the main beneficiary. 2/3 saves one
atomic in sock_wfree() and on top 3/3 removes an extra barrier.
Tested with UDP over dummy netdev,
2038491 ->
2099071 req/s (or around +3%).
note: in regards to 1/3, there is a "Should agree with poll..." comment
that I don't completely get, and there is no git history to explain it.
Though I can't see how it could rely on having the second check without
racing with tasks woken by wake_up*().
The series was split from a larger patchset, see
https://lore.kernel.org/netdev/cover.
1648981570.git.asml.silence@gmail.com/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Begunkov [Thu, 28 Apr 2022 10:58:19 +0000 (11:58 +0100)]
sock: optimise sock_def_write_space barriers
Now we have a separate path for sock_def_write_space() and can go one
step further. When it's called from sock_wfree() we know that there is a
preceding atomic for putting down ->sk_wmem_alloc. We can use it to
replace to replace smb_mb() with a less expensive
smp_mb__after_atomic(). It also removes an extra RCU read lock/unlock as
a small bonus.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Begunkov [Thu, 28 Apr 2022 10:58:18 +0000 (11:58 +0100)]
sock: optimise UDP sock_wfree() refcounting
For non SOCK_USE_WRITE_QUEUE sockets, sock_wfree() (atomically) puts
->sk_wmem_alloc twice. It's needed to keep the socket alive while
calling ->sk_write_space() after the first put.
However, some sockets, such as UDP, are freed by RCU
(i.e. SOCK_RCU_FREE) and use already RCU-safe sock_def_write_space().
Carve a fast path for such sockets, put down all refs in one go before
calling sock_def_write_space() but guard the socket from being freed
by an RCU read section.
note: because TCP sockets are marked with SOCK_USE_WRITE_QUEUE it
doesn't add extra checks in its path.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Begunkov [Thu, 28 Apr 2022 10:58:17 +0000 (11:58 +0100)]
sock: dedup sock_def_write_space wmem_alloc checks
Except for minor rounding differences the first ->sk_wmem_alloc test in
sock_def_write_space() is a hand coded version of sock_writeable().
Replace it with the helper, and also kill the following if duplicating
the check.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Hancock [Wed, 27 Apr 2022 19:39:28 +0000 (13:39 -0600)]
net: phy: marvell: update abilities and advertising when switching to SGMII
With some SFP modules, such as Finisar FCLF8522P2BTL, the PHY hardware
strapping defaults to 1000BaseX mode, but the kernel prefers to set them
for SGMII mode. When this happens and the PHY is soft reset, the BMSR
status register is updated, but this happens after the kernel has already
read the PHY abilities during probing. This results in support not being
detected for, and the PHY not advertising support for, 10 and 100 Mbps
modes, preventing the link from working with a non-gigabit link partner.
When the PHY is being configured for SGMII mode, call genphy_read_abilities
again in order to re-read the capabilities, and update the advertising
field accordingly.
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Horatiu Vultur [Fri, 29 Apr 2022 07:19:53 +0000 (09:19 +0200)]
net: lan966x: Fix compilation error
Starting from the blamed commit, the lan966x build fails with the
following compilation error:
drivers/net/ethernet/microchip/lan966x/lan966x_ptp.c:342:9: error: implicit declaration of function ‘ptp_find_pin_unlocked’ [-Werror=implicit-function-declaration]
342 | pin = ptp_find_pin_unlocked(phc->clock, PTP_PF_EXTTS, 0);
The issue is that there is no stub function for ptp_find_pin_unlocked
in case CONFIG_PTP_1588_CLOCK is not selected. Therefore add one.
Reported-by: kernel test robot <lkp@intel.com>
Fixes:
f3d8e0a9c28ba0 ("net: lan966x: Add support for PTP_PF_EXTTS")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yu Zhe [Fri, 29 Apr 2022 02:14:04 +0000 (19:14 -0700)]
ipv4: remove unnecessary type castings
remove unnecessary void* type castings.
Signed-off-by: Yu Zhe <yuzhe@nfschina.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 29 Apr 2022 17:43:30 +0000 (10:43 -0700)]
eth: remove remaining copies of the NAPI_POLL_WEIGHT define
Defining local versions of NAPI_POLL_WEIGHT with the same
values in the drivers just makes refactoring harder.
This patch covers three more drivers which I missed in
commit
5f012b40ef63 ("eth: remove copies of the NAPI_POLL_WEIGHT define").
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pengcheng Yang [Fri, 29 Apr 2022 10:32:56 +0000 (18:32 +0800)]
tcp: use tcp_skb_sent_after() instead in RACK
This patch doesn't change any functionality.
Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Minghao Chi [Fri, 29 Apr 2022 09:01:04 +0000 (09:01 +0000)]
net/funeth: simplify the return expression of fun_dl_info_get()
Simplify the return expression.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prabhakar Kushwaha [Sat, 30 Apr 2022 01:05:13 +0000 (04:05 +0300)]
qede: Reduce verbosity of ptp tx timestamp
Reduce verbosity of ptp tx timestamp error to reduce excessive log
messages.
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Alok Prasad <palok@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Foster [Fri, 29 Apr 2022 21:30:36 +0000 (14:30 -0700)]
net: ethernet: ocelot: remove the need for num_stats initializer
There is a desire to share the oclot_stats_layout struct outside of the
current vsc7514 driver. In order to do so, the length of the array needs to
be known at compile time, and defined in the struct ocelot and struct
felix_info.
Since the array is defined in a .c file and would be declared in the header
file via:
extern struct ocelot_stat_layout[];
the size of the array will not be known at compile time to outside modules.
To fix this, remove the need for defining the number of stats at compile
time and allow this number to be determined at initialization.
Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 30 Apr 2022 01:15:23 +0000 (18:15 -0700)]
tcp: drop skb dst in tcp_rcv_established()
In commit
f84af32cbca7 ("net: ip_queue_rcv_skb() helper")
I dropped the skb dst in tcp_data_queue().
This only dealt with so-called TCP input slow path.
When fast path is taken, tcp_rcv_established() calls
tcp_queue_rcv() while skb still has a dst.
This was mostly fine, because most dsts at this point
are not refcounted (thanks to early demux)
However, TCP packets sent over loopback have refcounted dst.
Then commit
68822bdf76f1 ("net: generalize skb freeing
deferral to per-cpu lists") came and had the effect
of delaying skb freeing for an arbitrary time.
If during this time the involved netns is dismantled, cleanup_net()
frees the struct net with embedded net->ipv6.ip6_dst_ops.
Then when eventually dst_destroy_rcu() is called,
if (dst->ops->destroy) ... triggers an use-after-free.
It is not clear if ip6_route_net_exit() lacks a rcu_barrier()
as syzbot reported similar issues before the blamed commit.
( https://groups.google.com/g/syzkaller-bugs/c/CofzW4eeA9A/m/009WjumTAAAJ )
Fixes:
68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 30 Apr 2022 12:09:26 +0000 (13:09 +0100)]
Merge branch 'lan966x-phy-reset-remove'
Michael Walle says:
====================
net: lan966x: remove PHY reset support
Remove the unneeded PHY reset node as well as the driver support for it.
This was already discussed [1] and I expect Microchip to Ack on this
removal. Since there is no user, no breakage is expected.
I'm not sure it this should go through net or net-next and if the patches
should have a Fixes: tag or not. In upstream linux there was never any user
of it, so there is no bug to be fixed. But OTOH if the schema fix isn't
backported, then there might be an older schema version still containing
the reset node. Thoughts?
The patches needed for the GPIO part are just waiting to be picked up by
Linus [2,3]. This patch and the GPIO parts are the last pieces of the
puzzle to get ethernet working on the LAN9668 on upstream linux.
[1] https://lore.kernel.org/netdev/
20220330110210.
3374165-1-michael@walle.cc/
[2] https://lore.kernel.org/linux-gpio/CACRpkdbxmN+SWt95aGHjA2ZGnN61aWaA7c5S4PaG+WePAj=htg@mail.gmail.com/
[3] https://lore.kernel.org/linux-gpio/
20220420191926.
3411830-1-michael@walle.cc/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Walle [Thu, 28 Apr 2022 11:40:49 +0000 (13:40 +0200)]
net: lan966x: remove PHY reset support
The PHY subsystem as well as the MIIM mdio driver (in case of the
integrated PHYs) will take care of the resets. A separate reset driver
isn't needed. There is no in-tree user of this feature. Remove the
support.
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Walle [Thu, 28 Apr 2022 11:40:48 +0000 (13:40 +0200)]
dt-bindings: net: lan966x: remove PHY reset
The PHY reset was intended to be a phandle for a special PHY reset
driver for the integrated PHYs as well as any external PHYs. It turns
out, that the culprit is how the reset of the switch device is done.
In particular, the switch reset also affects other subsystems like
the GPIO and the SGPIO block and it happens to be the case that the
reset lines of the external PHYs are connected to a common GPIO line.
Thus as soon as the switch issues a reset during probe time, all the
external PHYs will go into reset because all the GPIO lines will
switch to input and the pull-down on that signal will take effect.
So even if there was a special PHY reset driver, it (1) won't fix
the root cause of the problem and (2) it won't fix all the other
consumers of GPIO lines which will also be reset.
It turns out, the Ocelot SoC has the same weird behavior (or the
lack of a dedicated switch reset) and there the problem is already
solved and all the bits and pieces are already there and this PHY
reset property isn't not needed at all.
There are no users of this binding. Just remove it.
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 30 Apr 2022 11:58:45 +0000 (12:58 +0100)]
Merge branch 'ipv6-net-opts'
Pavel Begunkov says:
====================
generic net and ipv6 minor optimisations
1-3 inline simple functions that only reshuffle arguments possibly adding
extra zero args, and call another function. It was benchmarked before with
a bunch of extra patches, see for details
https://lore.kernel.org/netdev/cover.
1648981570.git.asml.silence@gmail.com/
It may increase the binary size, but it's the right thing to do and at least
without modules it actually sheds some bytes for some standard-ish config.
text data bss dec hex filename
9627200 0 0
9627200 92e640 ./arch/x86_64/boot/bzImage
text data bss dec hex filename
9627104 0 0
9627104 92e5e0 ./arch/x86_64/boot/bzImage
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Begunkov [Thu, 28 Apr 2022 10:58:48 +0000 (11:58 +0100)]
ipv6: refactor ip6_finish_output2()
Throw neigh checks in ip6_finish_output2() under a single slow path if,
so we don't have the overhead in the hot path.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Begunkov [Thu, 28 Apr 2022 10:58:47 +0000 (11:58 +0100)]
ipv6: help __ip6_finish_output() inlining
There are two callers of __ip6_finish_output(), both are in
ip6_finish_output(). We can combine the call sites into one and handle
return code after, that will inline __ip6_finish_output().
Note, error handling under NET_XMIT_CN will only return 0 if
__ip6_finish_output() succeded, and in this case it return 0.
Considering that NET_XMIT_SUCCESS is 0, it'll be returning exactly the
same result for it as before.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Begunkov [Thu, 28 Apr 2022 10:58:46 +0000 (11:58 +0100)]
net: inline dev_queue_xmit()
Inline dev_queue_xmit() and dev_queue_xmit_accel(), they both are small
proxy functions doing nothing but redirecting the control flow to
__dev_queue_xmit().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Begunkov [Thu, 28 Apr 2022 10:58:45 +0000 (11:58 +0100)]
net: inline skb_zerocopy_iter_dgram
skb_zerocopy_iter_dgram() is a small proxy function, inline it. For
that, move __zerocopy_sg_from_iter into linux/skbuff.h
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Begunkov [Thu, 28 Apr 2022 10:58:44 +0000 (11:58 +0100)]
net: inline sock_alloc_send_skb
sock_alloc_send_skb() is simple and just proxying to another function,
so we can inline it and cut associated overhead.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Sat, 30 Apr 2022 02:12:05 +0000 (19:12 -0700)]
Merge branch 'tcp-pass-back-data-left-in-socket-after-receive' of git://git./linux/kernel/git/kuba/linux
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jens Axboe [Fri, 29 Apr 2022 00:45:06 +0000 (18:45 -0600)]
tcp: pass back data left in socket after receive
This is currently done for CMSG_INQ, add an ability to do so via struct
msghdr as well and have CMSG_INQ use that too. If the caller sets
msghdr->msg_get_inq, then we'll pass back the hint in msghdr->msg_inq.
Rearrange struct msghdr a bit so we can add this member while shrinking
it at the same time. On a 64-bit build, it was 96 bytes before this
change and 88 bytes afterwards.
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/650c22ca-cffc-0255-9a05-2413a1e20826@kernel.dk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yinjun Zhang [Fri, 29 Apr 2022 07:51:24 +0000 (09:51 +0200)]
nfp: flower: utilize the tuple iifidx in offloading ct flows
The device info from which conntrack originates is stored in metadata
field of the ct flow to offload now, driver can utilize it to reduce
the number of offloaded flows.
v2: Drop inline keyword from get_netdev_from_rule() signature.
The compiler can decide.
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20220429075124.128589-1-simon.horman@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pieter Jansen van Vuuren [Thu, 28 Apr 2022 11:39:33 +0000 (12:39 +0100)]
sfc: add EF100 VF support via a write to sriov_numvfs
This patch extends the EF100 PF driver by adding .sriov_configure()
which would allow users to enable and disable virtual functions
using the sriov sysfs.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://lore.kernel.org/r/75e74d9e-14ce-0524-9668-5ab735a7cf62@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 30 Apr 2022 00:25:25 +0000 (17:25 -0700)]
Merge branch 'mptcp-path-manager-mode-selection'
Mat Martineau says:
====================
mptcp: Path manager mode selection
MPTCP already has an in-kernel path manager (PM) to add and remove TCP
subflows associated with a given MPTCP connection. This in-kernel PM has
been designed to handle typical server-side use cases, but is not very
flexible or configurable for client devices that may have more
complicated policies to implement.
This patch series from the MPTCP tree is the first step toward adding a
generic-netlink-based API for MPTCP path management, which a privileged
userspace daemon will be able to use to control subflow
establishment. These patches add a per-namespace sysctl to select the
default PM type (in-kernel or userspace) for new MPTCP sockets. New
self-tests confirm expected behavior when userspace PM is selected but
there is no daemon available to handle existing MPTCP PM events.
Subsequent patch series (already staged in the MPTCP tree) will add the
generic netlink path management API.
====================
Link: https://lore.kernel.org/r/20220427225002.231996-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mat Martineau [Wed, 27 Apr 2022 22:50:02 +0000 (15:50 -0700)]
selftests: mptcp: Add tests for userspace PM type
These tests ensure that the in-kernel path manager is bypassed when
the userspace path manager is configured. Kernel code is still
responsible for ADD_ADDR echo, so also make sure that's working.
Tested-by: Geliang Tang <geliang.tang@suse.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mat Martineau [Wed, 27 Apr 2022 22:50:01 +0000 (15:50 -0700)]
mptcp: Add a per-namespace sysctl to set the default path manager type
The new net.mptcp.pm_type sysctl determines which path manager will be
used by each newly-created MPTCP socket.
v2: Handle builds without CONFIG_SYSCTL
v3: Clarify logic for type-specific PM init (Geliang Tang and Paolo Abeni)
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mat Martineau [Wed, 27 Apr 2022 22:50:00 +0000 (15:50 -0700)]
mptcp: Make kernel path manager check for userspace-managed sockets
Userspace-managed sockets should not have their subflows or
advertisements changed by the kernel path manager.
v3: Use helper function for PM mode (Paolo Abeni)
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mat Martineau [Wed, 27 Apr 2022 22:49:59 +0000 (15:49 -0700)]
mptcp: Bypass kernel PM when userspace PM is enabled
When a MPTCP connection is managed by a userspace PM, bypass the kernel
PM for incoming advertisements and subflow events. Netlink events are
still sent to userspace.
v2: Remove unneeded check in mptcp_pm_rm_addr_received() (Kishen Maloor)
v3: Add and use helper function for PM mode (Paolo Abeni)
Acked-by: Paolo Abeni <pabeni@redhat.com>
Co-developed-by: Kishen Maloor <kishen.maloor@intel.com>
Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mat Martineau [Wed, 27 Apr 2022 22:49:58 +0000 (15:49 -0700)]
mptcp: Add a member to mptcp_pm_data to track kernel vs userspace mode
When adding support for netlink path management commands, the kernel
needs to know whether paths are being controlled by the in-kernel path
manager or a userspace PM.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mat Martineau [Wed, 27 Apr 2022 22:49:57 +0000 (15:49 -0700)]
mptcp: Remove redundant assignments in path manager init
A few members of the mptcp_pm_data struct were assigned to hard-coded
values in mptcp_pm_data_reset(), and then immediately changed in
mptcp_pm_nl_data_init().
Instead, flatten all the assignments in to mptcp_pm_data_reset().
v2: Resolve conflicts due to rename of mptcp_pm_data_reset()
v4: Resolve conflict in mptcp_pm_data_reset()
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 29 Apr 2022 23:37:56 +0000 (16:37 -0700)]
Merge branch 'net-phy-micrel-add-coma-mode-support'
Michael Walle says:
====================
net: phy: micrel: add coma mode support
Add support to disable coma mode by a GPIO line.
====================
Link: https://lore.kernel.org/r/20220427214406.1348872-1-michael@walle.cc
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Walle [Wed, 27 Apr 2022 21:44:06 +0000 (23:44 +0200)]
net: phy: micrel: add coma mode GPIO
The LAN8814 has a coma mode pin which puts the PHY into isolate and
power-dowm mode. Unfortunately, the mode cannot be disabled by a
register. Usually, the input pin has a pull-up and connected to a GPIO
which can then be used to disable the mode. Try to get the GPIO and
deassert it.
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Walle [Wed, 27 Apr 2022 21:44:05 +0000 (23:44 +0200)]
net: phy: micrel: move the PHY timestamping check
Both lan8814_ptp_init() and lan8814_ptp_probe_once() are only used if
PTP and PHY timestamping is enabed. Up until now the probe function just
returns early, if they are not needed. But we need the
phy_package_init_once() functionality for the coma mode GPIO setup. Move
the check into the functions itself.
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Walle [Wed, 27 Apr 2022 21:44:04 +0000 (23:44 +0200)]
dt-bindings: net: micrel: add coma-mode-gpios property
The LAN8814 has a coma mode pin which is used to put the PHY into
isolate and power-down mode. Usually strapped to be asserted by default.
A GPIO is then used to take the PHY out of this mode.
Signed-off-by: Michael Walle <michael@walle.cc>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David S. Miller [Fri, 29 Apr 2022 10:57:01 +0000 (11:57 +0100)]
Merge branch 'remove-NAPI_POLL_WEIGHT-copies'
Merge branch 'remove-NAPI_POLL_WEIGHT-copies'
Jakub Kicinski says:
====================
remove copies of the NAPI_POLL_WEIGHT define
netif_napi_add() takes weight as the last argument. The value of
that parameter is hard to come up with and depends on many factors,
so driver authors are encouraged to use NAPI_POLL_WEIGHT.
We should probably move weight to an "advanced" version of the API
(__netif_napi_add()?) and simplify the life of most driver authors.
In preparation for such API changes this series removes local
defines equivalent to NAPI_POLL_WEIGHT from drivers, so that a simple
coccinelle / spatch script does not get thrown off by them.
v2:
- drop staging bits (patch 2)
- fix subject (patch 8)
- add qeth change (patch 15)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 28 Apr 2022 21:23:23 +0000 (14:23 -0700)]
qeth: remove a copy of the NAPI_POLL_WEIGHT define
Defining local versions of NAPI_POLL_WEIGHT with the same
values in the drivers just makes refactoring harder.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 28 Apr 2022 21:23:22 +0000 (14:23 -0700)]
eth: velocity: remove a copy of the NAPI_POLL_WEIGHT define
Defining local versions of NAPI_POLL_WEIGHT with the same
values in the drivers just makes refactoring harder.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 28 Apr 2022 21:23:21 +0000 (14:23 -0700)]
eth: spider: remove a copy of the NAPI_POLL_WEIGHT define
Defining local versions of NAPI_POLL_WEIGHT with the same
values in the drivers just makes refactoring harder.
Acked-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 28 Apr 2022 21:23:20 +0000 (14:23 -0700)]
eth: vxge: remove a copy of the NAPI_POLL_WEIGHT define
Defining local versions of NAPI_POLL_WEIGHT with the same
values in the drivers just makes refactoring harder.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 28 Apr 2022 21:23:19 +0000 (14:23 -0700)]
eth: gfar: remove a copy of the NAPI_POLL_WEIGHT define
Defining local versions of NAPI_POLL_WEIGHT with the same
values in the drivers just makes refactoring harder.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 28 Apr 2022 21:23:18 +0000 (14:23 -0700)]
eth: benet: remove a copy of the NAPI_POLL_WEIGHT define
Defining local versions of NAPI_POLL_WEIGHT with the same
values in the drivers just makes refactoring harder.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 28 Apr 2022 21:23:17 +0000 (14:23 -0700)]
eth: atlantic: remove a copy of the NAPI_POLL_WEIGHT define
Defining local versions of NAPI_POLL_WEIGHT with the same
values in the drivers just makes refactoring harder.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 28 Apr 2022 21:23:16 +0000 (14:23 -0700)]
net: bgmac: remove a copy of the NAPI_POLL_WEIGHT define
Defining local versions of NAPI_POLL_WEIGHT with the same
values in the drivers just makes refactoring harder.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 28 Apr 2022 21:23:15 +0000 (14:23 -0700)]
slic: remove a copy of the NAPI_POLL_WEIGHT define
Defining local versions of NAPI_POLL_WEIGHT with the same
values in the drivers just makes refactoring harder.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 28 Apr 2022 21:23:14 +0000 (14:23 -0700)]
usb: lan78xx: remove a copy of the NAPI_POLL_WEIGHT define
Defining local versions of NAPI_POLL_WEIGHT with the same
values in the drivers just makes refactoring harder.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 28 Apr 2022 21:23:13 +0000 (14:23 -0700)]
eth: mtk_eth_soc: remove a copy of the NAPI_POLL_WEIGHT define
Defining local versions of NAPI_POLL_WEIGHT with the same
values in the drivers just makes refactoring harder.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 28 Apr 2022 21:23:12 +0000 (14:23 -0700)]
eth: pch_gbe: remove a copy of the NAPI_POLL_WEIGHT define
Defining local versions of NAPI_POLL_WEIGHT with the same
values in the drivers just makes refactoring harder.
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 28 Apr 2022 21:23:11 +0000 (14:23 -0700)]
eth: cpsw: remove a copy of the NAPI_POLL_WEIGHT define
Defining local versions of NAPI_POLL_WEIGHT with the same
values in the drivers just makes refactoring harder.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 28 Apr 2022 21:23:10 +0000 (14:23 -0700)]
eth: smsc: remove a copy of the NAPI_POLL_WEIGHT define
Defining local versions of NAPI_POLL_WEIGHT with the same
values in the drivers just makes refactoring harder.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 28 Apr 2022 21:23:09 +0000 (14:23 -0700)]
eth: remove copies of the NAPI_POLL_WEIGHT define
Defining local versions of NAPI_POLL_WEIGHT with the same
values in the drivers just makes refactoring harder.
Drop the special defines in a bunch of drivers where the
removal is relatively simple so grouping into one patch
does not impact reviewability.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Paul Durrant <paul@xen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nathan Rossi [Wed, 27 Apr 2022 13:09:28 +0000 (13:09 +0000)]
net: dsa: mv88e6xxx: Single chip mode detection for MV88E6*41
The mv88e6xxx driver expects switches that are configured in single chip
addressing mode to have the MDIO address configured as 0. This is due to
the switch ADDR pins representing the single chip addressing mode as 0.
However depending on the device (e.g. MV88E6*41) the switch does not
respond on address 0 or any other address below 16 (the first port
address) in single chip addressing mode. This allows for other devices
to be on the same shared MDIO bus despite the switch being in single
chip addressing mode.
When using a switch that works this way it is not possible to configure
switch driver as single chip addressing via device tree, along with
another MDIO device on the same bus with address 0, as both devices
would have the same address of 0 resulting in mdiobus_register_device
-EBUSY errors for one of the devices with address 0.
In order to support this configuration the switch node can have its MDIO
address configured as 16 (the first address that the device responds
to). During initialization the driver will treat this address similar to
how address 0 is, however because this address is also a valid
multi-chip address (in certain switch models, but not all) the driver
will configure the SMI in single chip addressing mode and attempt to
detect the switch model. If the device is configured in single chip
addressing mode this will succeed and the initialization process can
continue. If it fails to detect a valid model this is because the switch
model register is not a valid register when in multi-chip mode, it will
then fall back to the existing SMI initialization process using the MDIO
address as the multi-chip mode address.
This detection method is safe if the device is in either mode because
the single chip addressing mode read is a direct SMI/MDIO read operation
and has no side effects compared to the SMI writes required for the
multi-chip addressing mode.
In order to implement this change, the reset gpio configuration is moved
to occur before any SMI initialization. This ensures that the device has
the same/correct reset gpio state for both mv88e6xxx_smi_init calls.
Signed-off-by: Nathan Rossi <nathan@nathanrossi.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220427130928.540007-1-nathan@nathanrossi.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Volodymyr Mytnyk [Wed, 27 Apr 2022 12:05:48 +0000 (15:05 +0300)]
net: prestera: add police action support
- Add HW api to configure policer:
- SR TCM policer mode is only supported for now.
- Policer ingress/egress direction support.
- Add police action support into flower
Signed-off-by: Volodymyr Mytnyk <volodymyr.mytnyk@plvision.eu>
Link: https://lore.kernel.org/r/1651061148-21321-1-git-send-email-volodymyr.mytnyk@plvision.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lukas Wunner [Wed, 27 Apr 2022 06:30:51 +0000 (08:30 +0200)]
net: phy: Deduplicate interrupt disablement on PHY attach
phy_attach_direct() first calls phy_init_hw() (which restores interrupt
settings through ->config_intr()), then calls phy_disable_interrupts().
So if phydev->interrupts was previously set to 1, interrupts are briefly
enabled, then disabled, which seems nonsensical.
If it was previously set to 0, interrupts are disabled twice, which is
equally nonsensical.
Deduplicate interrupt disablement.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/805ccdc606bd8898d59931bd4c7c68537ed6e550.1651040826.git.lukas@wunner.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Erin MacNeil [Wed, 27 Apr 2022 20:02:37 +0000 (16:02 -0400)]
net: SO_RCVMARK socket option for SO_MARK with recvmsg()
Adding a new socket option, SO_RCVMARK, to indicate that SO_MARK
should be included in the ancillary data returned by recvmsg().
Renamed the sock_recv_ts_and_drops() function to sock_recv_cmsgs().
Signed-off-by: Erin MacNeil <lnx.erin@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Link: https://lore.kernel.org/r/20220427200259.2564-1-lnx.erin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 28 Apr 2022 20:01:50 +0000 (13:01 -0700)]
Merge git://git./linux/kernel/git/netdev/net
include/linux/netdevice.h
net/core/dev.c
6510ea973d8d ("net: Use this_cpu_inc() to increment net->core_stats")
794c24e9921f ("net-core: rx_otherhost_dropped to core_stats")
https://lore.kernel.org/all/
20220428111903.
5f4304e0@canb.auug.org.au/
drivers/net/wan/cosa.c
d48fea8401cf ("net: cosa: fix error check return value of register_chrdev()")
89fbca3307d4 ("net: wan: remove support for COSA and SRP synchronous serial boards")
https://lore.kernel.org/all/
20220428112130.
1f689e5e@canb.auug.org.au/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 28 Apr 2022 19:34:50 +0000 (12:34 -0700)]
Merge tag 'net-5.18-rc5' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth, bpf and netfilter.
Current release - new code bugs:
- bridge: switchdev: check br_vlan_group() return value
- use this_cpu_inc() to increment net->core_stats, fix preempt-rt
Previous releases - regressions:
- eth: stmmac: fix write to sgmii_adapter_base
Previous releases - always broken:
- netfilter: nf_conntrack_tcp: re-init for syn packets only,
resolving issues with TCP fastopen
- tcp: md5: fix incorrect tcp_header_len for incoming connections
- tcp: fix F-RTO may not work correctly when receiving DSACK
- tcp: ensure use of most recently sent skb when filling rate samples
- tcp: fix potential xmit stalls caused by TCP_NOTSENT_LOWAT
- virtio_net: fix wrong buf address calculation when using xdp
- xsk: fix forwarding when combining copy mode with busy poll
- xsk: fix possible crash when multiple sockets are created
- bpf: lwt: fix crash when using bpf_skb_set_tunnel_key() from
bpf_xmit lwt hook
- sctp: null-check asoc strreset_chunk in sctp_generate_reconf_event
- wireguard: device: check for metadata_dst with skb_valid_dst()
- netfilter: update ip6_route_me_harder to consider L3 domain
- gre: make o_seqno start from 0 in native mode
- gre: switch o_seqno to atomic to prevent races in collect_md mode
Misc:
- add Eric Dumazet to networking maintainers
- dt: dsa: realtek: remove realtek,rtl8367s string
- netfilter: flowtable: Remove the empty file"
* tag 'net-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits)
tcp: fix F-RTO may not work correctly when receiving DSACK
Revert "ibmvnic: Add ethtool private flag for driver-defined queue limits"
net: enetc: allow tc-etf offload even with NETIF_F_CSUM_MASK
ixgbe: ensure IPsec VF<->PF compatibility
MAINTAINERS: Update BNXT entry with firmware files
netfilter: nft_socket: only do sk lookups when indev is available
net: fec: add missing of_node_put() in fec_enet_init_stop_mode()
bnx2x: fix napi API usage sequence
tls: Skip tls_append_frag on zero copy size
Add Eric Dumazet to networking maintainers
netfilter: conntrack: fix udp offload timeout sysctl
netfilter: nf_conntrack_tcp: re-init for syn packets only
net: dsa: lantiq_gswip: Don't set GSWIP_MII_CFG_RMII_CLK
net: Use this_cpu_inc() to increment net->core_stats
Bluetooth: hci_sync: Cleanup hci_conn if it cannot be aborted
Bluetooth: hci_event: Fix creating hci_conn object on error status
Bluetooth: hci_event: Fix checking for invalid handle on error status
ice: fix use-after-free when deinitializing mailbox snapshot
ice: wait 5 s for EMP reset after firmware flash
ice: Protect vf_state check by cfg_lock in ice_vc_process_vf_msg()
...
Linus Torvalds [Thu, 28 Apr 2022 18:57:00 +0000 (11:57 -0700)]
Merge tag 'thermal-5.18-rc5' of git://git./linux/kernel/git/rafael/linux-pm
Pull thermal control fixes from Rafael Wysocki:
"These take back recent chages that started to confuse users and fix up
an attr.show callback prototype in a driver.
Specifics:
- Stop warning about deprecation of the userspace thermal governor
and cooling device status interface, because there are cases in
which user space has to drive thermal management with the help of
them (Daniel Lezcano)
- Fix attr.show callback prototype in the int340x thermal driver
(Kees Cook)"
* tag 'thermal-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal/governor: Remove deprecated information
Revert "thermal/core: Deprecate changing cooling device state from userspace"
thermal: int340x: Fix attr.show callback prototype
Linus Torvalds [Thu, 28 Apr 2022 18:50:21 +0000 (11:50 -0700)]
Merge tag 'pm-5.18-rc5' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix up recent intel_idle driver changes and fix some ARM cpufreq
driver issues.
Specifics:
- Fix issues with the Qualcomm's cpufreq driver (Dmitry Baryshkov,
Vladimir Zapolskiy).
- Fix memory leak with the Sun501 driver (Xiaobing Luo).
- Make intel_idle enable C1E promotion on all CPUs when C1E is
preferred to C1 (Artem Bityutskiy).
- Make C6 optimization on Sapphire Rapids added recently work as
expected if both C1E and C1 are "preferred" (Artem Bityutskiy)"
* tag 'pm-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
intel_idle: Fix SPR C6 optimization
intel_idle: Fix the 'preferred_cstates' module parameter
cpufreq: qcom-cpufreq-hw: Clear dcvs interrupts
cpufreq: fix memory leak in sun50i_cpufreq_nvmem_probe
cpufreq: qcom-cpufreq-hw: Fix throttle frequency value on EPSS platforms
cpufreq: qcom-hw: provide online/offline operations
cpufreq: qcom-hw: fix the opp entries refcounting
cpufreq: qcom-hw: fix the race between LMH worker and cpuhp
cpufreq: qcom-hw: drop affinity hint before freeing the IRQ
Linus Torvalds [Thu, 28 Apr 2022 18:37:20 +0000 (11:37 -0700)]
Merge tag 'acpi-5.18-rc5' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael WysockiL
"These fix up the ACPI processor driver after a change made during the
5.16 cycle that inadvertently broke falling back to shallower C-states
when C3 cannot be used.
Specifics:
- Make the ACPI processor driver avoid falling back to C3 type of
C-states when C3 cannot be requested (Ville Syrjälä)
- Revert a quirk that is not necessary any more after fixing the
underlying issue properly (Ville Syrjälä)"
* tag 'acpi-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "ACPI: processor: idle: fix lockup regression on 32-bit ThinkPad T40"
ACPI: processor: idle: Avoid falling back to C3 type C-states
Linus Torvalds [Thu, 28 Apr 2022 18:13:00 +0000 (11:13 -0700)]
Merge tag 'platform-drivers-x86-v5.18-3' of git://git./linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
"Highlights:
- asus-wmi bug-fixes
- intel-sdsu bug-fixes
- build (warning) fixes
- couple of hw-id additions"
* tag 'platform-drivers-x86-v5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86/intel: pmc/core: change pmc_lpm_modes to static
platform/x86/intel/sdsi: Fix bug in multi packet reads
platform/x86/intel/sdsi: Poll on ready bit for writes
platform/x86/intel/sdsi: Handle leaky bucket
platform/x86: intel-uncore-freq: Prevent driver loading in guests
platform/x86: gigabyte-wmi: added support for B660 GAMING X DDR4 motherboard
platform/x86: dell-laptop: Add quirk entry for Latitude 7520
platform/x86: asus-wmi: Fix driver not binding when fan curve control probe fails
platform/x86: asus-wmi: Potential buffer overflow in asus_wmi_evaluate_method_buf()
tools/power/x86/intel-speed-select: fix build failure when using -Wl,--as-needed
Linus Torvalds [Thu, 28 Apr 2022 18:07:49 +0000 (11:07 -0700)]
Merge tag 'regulator-fix-v5.18-rc4' of git://git./linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"A minor fix for the DT binding documentation of the rt5190a driver"
* tag 'regulator-fix-v5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: dt-bindings: Revise the rt5190a buck/ldo description
Pengcheng Yang [Tue, 26 Apr 2022 10:03:39 +0000 (18:03 +0800)]
tcp: fix F-RTO may not work correctly when receiving DSACK
Currently DSACK is regarded as a dupack, which may cause
F-RTO to incorrectly enter "loss was real" when receiving
DSACK.
Packetdrill to demonstrate:
// Enable F-RTO and TLP
0 `sysctl -q net.ipv4.tcp_frto=2`
0 `sysctl -q net.ipv4.tcp_early_retrans=3`
0 `sysctl -q net.ipv4.tcp_congestion_control=cubic`
// Establish a connection
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
// RTT 10ms, RTO 210ms
+.1 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
+0 > S. 0:0(0) ack 1 <...>
+.01 < . 1:1(0) ack 1 win 257
+0 accept(3, ..., ...) = 4
// Send 2 data segments
+0 write(4, ..., 2000) = 2000
+0 > P. 1:2001(2000) ack 1
// TLP
+.022 > P. 1001:2001(1000) ack 1
// Continue to send 8 data segments
+0 write(4, ..., 10000) = 10000
+0 > P. 2001:10001(8000) ack 1
// RTO
+.188 > . 1:1001(1000) ack 1
// The original data is acked and new data is sent(F-RTO step 2.b)
+0 < . 1:1(0) ack 2001 win 257
+0 > P. 10001:12001(2000) ack 1
// D-SACK caused by TLP is regarded as a dupack, this results in
// the incorrect judgment of "loss was real"(F-RTO step 3.a)
+.022 < . 1:1(0) ack 2001 win 257 <sack 1001:2001,nop,nop>
// Never-retransmitted data(3001:4001) are acked and
// expect to switch to open state(F-RTO step 3.b)
+0 < . 1:1(0) ack 4001 win 257
+0 %{ assert tcpi_ca_state == 0, tcpi_ca_state }%
Fixes:
e33099f96d99 ("tcp: implement RFC5682 F-RTO")
Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/1650967419-2150-1-git-send-email-yangpc@wangsu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 28 Apr 2022 16:55:59 +0000 (09:55 -0700)]
Merge git://git./linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) Fix incorrect TCP connection tracking window reset for non-syn
packets, from Florian Westphal.
2) Incorrect dependency on CONFIG_NFT_FLOW_OFFLOAD, from Volodymyr Mytnyk.
3) Fix nft_socket from the output path, from Florian Westphal.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nft_socket: only do sk lookups when indev is available
netfilter: conntrack: fix udp offload timeout sysctl
netfilter: nf_conntrack_tcp: re-init for syn packets only
====================
Link: https://lore.kernel.org/r/20220428142109.38726-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 28 Apr 2022 16:50:29 +0000 (09:50 -0700)]
Merge tag 'gfs2-v5.18-rc4-fix2' of git://git./linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fix from Andreas Gruenbacher:
- No short reads or writes upon glock contention
* tag 'gfs2-v5.18-rc4-fix2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: No short reads or writes upon glock contention
Dany Madden [Wed, 27 Apr 2022 23:51:46 +0000 (18:51 -0500)]
Revert "ibmvnic: Add ethtool private flag for driver-defined queue limits"
This reverts commit
723ad916134784b317b72f3f6cf0f7ba774e5dae
When client requests channel or ring size larger than what the server
can support the server will cap the request to the supported max. So,
the client would not be able to successfully request resources that
exceed the server limit.
Fixes:
723ad9161347 ("ibmvnic: Add ethtool private flag for driver-defined queue limits")
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Link: https://lore.kernel.org/r/20220427235146.23189-1-drt@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Wed, 27 Apr 2022 20:30:17 +0000 (23:30 +0300)]
net: enetc: allow tc-etf offload even with NETIF_F_CSUM_MASK
The Time-Specified Departure feature is indeed mutually exclusive with
TX IP checksumming in ENETC, but TX checksumming in itself is broken and
was removed from this driver in commit
82728b91f124 ("enetc: Remove Tx
checksumming offload code").
The blamed commit declared NETIF_F_HW_CSUM in dev->features to comply
with software TSO's expectations, and still did the checksumming in
software by calling skb_checksum_help(). So there isn't any restriction
for the Time-Specified Departure feature.
However, enetc_setup_tc_txtime() doesn't understand that, and blindly
looks for NETIF_F_CSUM_MASK.
Instead of checking for things which can literally never happen in the
current code base, just remove the check and let the driver offload
tc-etf qdiscs.
Fixes:
acede3c5dad5 ("net: enetc: declare NETIF_F_HW_CSUM and do it in software")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220427203017.1291634-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Leon Romanovsky [Wed, 27 Apr 2022 17:31:52 +0000 (10:31 -0700)]
ixgbe: ensure IPsec VF<->PF compatibility
The VF driver can forward any IPsec flags and such makes the function
is not extendable and prone to backward/forward incompatibility.
If new software runs on VF, it won't know that PF configured something
completely different as it "knows" only XFRM_OFFLOAD_INBOUND flag.
Fixes:
eda0333ac293 ("ixgbe: add VF IPsec management")
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Shannon Nelson <snelson@pensando.io>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20220427173152.443102-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 28 Apr 2022 16:37:56 +0000 (09:37 -0700)]
Merge tag 'xfs-5.18-fixes-1' of git://git./fs/xfs/xfs-linux
Pull xfs fixes from Dave Chinner:
- define buffer bit flags as unsigned to fix gcc-5 + c11 warnings
- remove redundant XFS fields from MAINTAINERS
- fix inode buffer locking order regression
* tag 'xfs-5.18-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: reorder iunlink remove operation in xfs_ifree
MAINTAINERS: update IOMAP FILESYSTEM LIBRARY and XFS FILESYSTEM
xfs: convert buffer flags to unsigned.
Florian Fainelli [Wed, 27 Apr 2022 16:36:06 +0000 (09:36 -0700)]
MAINTAINERS: Update BNXT entry with firmware files
There appears to be a maintainer gap for BNXT TEE firmware files which
causes some patches to be missed. Update the entry for the BNXT Ethernet
controller with its companion firmware files.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20220427163606.126154-1-f.fainelli@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rafael J. Wysocki [Thu, 28 Apr 2022 14:51:24 +0000 (16:51 +0200)]
Merge branch 'thermal-int340x'
Merge a fix for the attr.show callback prototype in the int340x thermal
driver (Kees Cook).
* thermal-int340x:
thermal: int340x: Fix attr.show callback prototype
Florian Westphal [Thu, 28 Apr 2022 07:39:21 +0000 (09:39 +0200)]
netfilter: nft_socket: only do sk lookups when indev is available
Check if the incoming interface is available and NFT_BREAK
in case neither skb->sk nor input device are set.
Because nf_sk_lookup_slow*() assume packet headers are in the
'in' direction, use in postrouting is not going to yield a meaningful
result. Same is true for the forward chain, so restrict the use
to prerouting, input and output.
Use in output work if a socket is already attached to the skb.
Fixes:
554ced0a6e29 ("netfilter: nf_tables: add support for native socket matching")
Reported-and-tested-by: Topi Miettinen <toiwoton@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Rafael J. Wysocki [Thu, 28 Apr 2022 14:09:50 +0000 (16:09 +0200)]
Merge branch 'pm-cpuidle'
Merge cpuidle fixes for 5.18-rc5:
- Make intel_idle enable C1E promotion on all CPUs when C1E is
preferred to C1 (Artem Bityutskiy).
- Make C6 optimization on Sapphire Rapids added recently work as
expected if both C1E and C1 are "preferred" (Artem Bityutskiy).
* pm-cpuidle:
intel_idle: Fix SPR C6 optimization
intel_idle: Fix the 'preferred_cstates' module parameter
Andreas Gruenbacher [Thu, 28 Apr 2022 12:51:33 +0000 (14:51 +0200)]
gfs2: No short reads or writes upon glock contention
Commit
00bfe02f4796 ("gfs2: Fix mmap + page fault deadlocks for buffered
I/O") changed gfs2_file_read_iter() and gfs2_file_buffered_write() to
allow dropping the inode glock while faulting in user buffers. When the
lock was dropped, a short result was returned to indicate that the
operation was interrupted.
As pointed out by Linus (see the link below), this behavior is broken
and the operations should always re-acquire the inode glock and resume
the operation instead.
Link: https://lore.kernel.org/lkml/CAHk-=whaz-g_nOOoo8RRiWNjnv2R+h6_xk2F1J4TuSRxk1MtLw@mail.gmail.com/
Fixes:
00bfe02f4796 ("gfs2: Fix mmap + page fault deadlocks for buffered I/O")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Eric Dumazet [Wed, 27 Apr 2022 20:41:47 +0000 (13:41 -0700)]
net: make sure net_rx_action() calls skb_defer_free_flush()
I missed a stray return; in net_rx_action(), which very well
is taken whenever trigger_rx_softirq() has been called on
a cpu that is no longer receiving network packets,
or receiving too few of them.
Fixes:
68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20220427204147.1310161-1-eric.dumazet@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 28 Apr 2022 08:18:51 +0000 (10:18 +0200)]
Merge tag 'for-net-2022-04-27' of git://git./linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- Fix regression causing some HCI events to be discarded when they
shouldn't.
* tag 'for-net-2022-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: hci_sync: Cleanup hci_conn if it cannot be aborted
Bluetooth: hci_event: Fix creating hci_conn object on error status
Bluetooth: hci_event: Fix checking for invalid handle on error status
====================
Link: https://lore.kernel.org/r/20220427234031.1257281-1-luiz.dentz@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 28 Apr 2022 07:39:34 +0000 (09:39 +0200)]
Merge branch 'add-reset-deassertion-for-aspeed-mdio'
Dylan Hung says:
====================
Add reset deassertion for Aspeed MDIO
Add missing reset deassertion for Aspeed MDIO bus controller. The reset
is asserted by the hardware when power-on so the driver only needs to
deassert it. To be able to work with the old DT blobs, the reset is
optional since it may be deasserted by the bootloader or the previous
kernel.
V6:
- fix merge conflict for net-next
V5:
- fix error of dt_binding_check
V4:
- use ASPEED_RESET_MII instead of hardcoding in dt-binding example
V3:
- remove reset property from the required list of the device tree
bindings
- remove "Cc: stable@vger.kernel.org" from the commit messages
- add more description in the commit message of the dt-binding
V2:
- add reset property in the device tree bindings
- add reset assertion in the error path and driver remove
====================
Link: https://lore.kernel.org/r/20220427035501.17500-1-dylan_hung@aspeedtech.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Dylan Hung [Wed, 27 Apr 2022 03:55:01 +0000 (11:55 +0800)]
ARM: dts: aspeed: add reset properties into MDIO nodes
Add reset control properties into MDIO nodes. The 4 MDIO controllers in
AST2600 SOC share one reset control bit SCU50[3].
Signed-off-by: Dylan Hung <dylan_hung@aspeedtech.com>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Dylan Hung [Wed, 27 Apr 2022 03:55:00 +0000 (11:55 +0800)]
net: mdio: add reset control for Aspeed MDIO
Add reset assertion/deassertion for Aspeed MDIO. There are 4 MDIO
controllers embedded in Aspeed AST2600 SOC and share one reset control
register SCU50[3]. To work with old DT blobs which don't have the reset
property, devm_reset_control_get_optional_shared is used in this change.
Signed-off-by: Dylan Hung <dylan_hung@aspeedtech.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Dylan Hung [Wed, 27 Apr 2022 03:54:59 +0000 (11:54 +0800)]
dt-bindings: net: add reset property for aspeed, ast2600-mdio binding
The AST2600 MDIO bus controller has a reset control bit and must be
deasserted before manipulating the MDIO controller. By default, the
hardware asserts the reset so the driver only need to deassert it.
Regarding to the old DT blobs which don't have reset property in them,
the reset deassertion is usually done by the bootloader so the reset
property is optional to work with them.
Signed-off-by: Dylan Hung <dylan_hung@aspeedtech.com>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Yang Yingliang [Tue, 26 Apr 2022 12:52:31 +0000 (20:52 +0800)]
net: fec: add missing of_node_put() in fec_enet_init_stop_mode()
Put device node in error path in fec_enet_init_stop_mode().
Fixes:
8a448bf832af ("net: ethernet: fec: move GPR register offset and bit into DT")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20220426125231.375688-1-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Manish Chopra [Tue, 26 Apr 2022 15:39:13 +0000 (08:39 -0700)]
bnx2x: fix napi API usage sequence
While handling PCI errors (AER flow) driver tries to
disable NAPI [napi_disable()] after NAPI is deleted
[__netif_napi_del()] which causes unexpected system
hang/crash.
System message log shows the following:
=======================================
[ 3222.537510] EEH: Detected PCI bus error on PHB#384-PE#800000 [ 3222.537511] EEH: This PCI device has failed 2 times in the last hour and will be permanently disabled after 5 failures.
[ 3222.537512] EEH: Notify device drivers to shutdown [ 3222.537513] EEH: Beginning: 'error_detected(IO frozen)'
[ 3222.537514] EEH: PE#800000 (PCI 0384:80:00.0): Invoking
bnx2x->error_detected(IO frozen)
[ 3222.537516] bnx2x: [bnx2x_io_error_detected:14236(eth14)]IO error detected [ 3222.537650] EEH: PE#800000 (PCI 0384:80:00.0): bnx2x driver reports:
'need reset'
[ 3222.537651] EEH: PE#800000 (PCI 0384:80:00.1): Invoking
bnx2x->error_detected(IO frozen)
[ 3222.537651] bnx2x: [bnx2x_io_error_detected:14236(eth13)]IO error detected [ 3222.537729] EEH: PE#800000 (PCI 0384:80:00.1): bnx2x driver reports:
'need reset'
[ 3222.537729] EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'need reset'
[ 3222.537890] EEH: Collect temporary log [ 3222.583481] EEH: of node=0384:80:00.0 [ 3222.583519] EEH: PCI device/vendor:
168e14e4 [ 3222.583557] EEH: PCI cmd/status register:
00100140 [ 3222.583557] EEH: PCI-E capabilities and status follow:
[ 3222.583744] EEH: PCI-E 00:
00020010 012c8da2 00095d5e 00455c82 [ 3222.583892] EEH: PCI-E 10:
10820000 00000000 00000000 00000000 [ 3222.583893] EEH: PCI-E 20:
00000000 [ 3222.583893] EEH: PCI-E AER capability register set follows:
[ 3222.584079] EEH: PCI-E AER 00:
13c10001 00000000 00000000 00062030 [ 3222.584230] EEH: PCI-E AER 10:
00002000 000031c0 000001e0 00000000 [ 3222.584378] EEH: PCI-E AER 20:
00000000 00000000 00000000 00000000 [ 3222.584416] EEH: PCI-E AER 30:
00000000 00000000 [ 3222.584416] EEH: of node=0384:80:00.1 [ 3222.584454] EEH: PCI device/vendor:
168e14e4 [ 3222.584491] EEH: PCI cmd/status register:
00100140 [ 3222.584492] EEH: PCI-E capabilities and status follow:
[ 3222.584677] EEH: PCI-E 00:
00020010 012c8da2 00095d5e 00455c82 [ 3222.584825] EEH: PCI-E 10:
10820000 00000000 00000000 00000000 [ 3222.584826] EEH: PCI-E 20:
00000000 [ 3222.584826] EEH: PCI-E AER capability register set follows:
[ 3222.585011] EEH: PCI-E AER 00:
13c10001 00000000 00000000 00062030 [ 3222.585160] EEH: PCI-E AER 10:
00002000 000031c0 000001e0 00000000 [ 3222.585309] EEH: PCI-E AER 20:
00000000 00000000 00000000 00000000 [ 3222.585347] EEH: PCI-E AER 30:
00000000 00000000 [ 3222.586872] RTAS: event: 5, Type: Platform Error (224), Severity: 2 [ 3222.586873] EEH: Reset without hotplug activity [ 3224.762767] EEH: Beginning: 'slot_reset'
[ 3224.762770] EEH: PE#800000 (PCI 0384:80:00.0): Invoking
bnx2x->slot_reset()
[ 3224.762771] bnx2x: [bnx2x_io_slot_reset:14271(eth14)]IO slot reset initializing...
[ 3224.762887] bnx2x 0384:80:00.0: enabling device (0140 -> 0142) [ 3224.768157] bnx2x: [bnx2x_io_slot_reset:14287(eth14)]IO slot reset
--> driver unload
Uninterruptible tasks
=====================
crash> ps | grep UN
213 2 11
c000000004c89e00 UN 0.0 0 0 [eehd]
215 2 0
c000000004c80000 UN 0.0 0 0
[kworker/0:2]
2196 1 28
c000000004504f00 UN 0.1 15936 11136 wickedd
4287 1 9
c00000020d076800 UN 0.0 4032 3008 agetty
4289 1 20
c00000020d056680 UN 0.0 7232 3840 agetty
32423 2 26
c00000020038c580 UN 0.0 0 0
[kworker/26:3]
32871 4241 27
c0000002609ddd00 UN 0.1 18624 11648 sshd
32920 10130 16
c00000027284a100 UN 0.1 48512 12608 sendmail
33092 32987 0
c000000205218b00 UN 0.1 48512 12608 sendmail
33154 4567 16
c000000260e51780 UN 0.1 48832 12864 pickup
33209 4241 36
c000000270cb6500 UN 0.1 18624 11712 sshd
33473 33283 0
c000000205211480 UN 0.1 48512 12672 sendmail
33531 4241 37
c00000023c902780 UN 0.1 18624 11648 sshd
EEH handler hung while bnx2x sleeping and holding RTNL lock
===========================================================
crash> bt 213
PID: 213 TASK:
c000000004c89e00 CPU: 11 COMMAND: "eehd"
#0 [
c000000004d477e0] __schedule at
c000000000c70808
#1 [
c000000004d478b0] schedule at
c000000000c70ee0
#2 [
c000000004d478e0] schedule_timeout at
c000000000c76dec
#3 [
c000000004d479c0] msleep at
c0000000002120cc
#4 [
c000000004d479f0] napi_disable at
c000000000a06448
^^^^^^^^^^^^^^^^
#5 [
c000000004d47a30] bnx2x_netif_stop at
c0080000018dba94 [bnx2x]
#6 [
c000000004d47a60] bnx2x_io_slot_reset at
c0080000018a551c [bnx2x]
#7 [
c000000004d47b20] eeh_report_reset at
c00000000004c9bc
#8 [
c000000004d47b90] eeh_pe_report at
c00000000004d1a8
#9 [
c000000004d47c40] eeh_handle_normal_event at
c00000000004da64
And the sleeping source code
============================
crash> dis -ls
c000000000a06448
FILE: ../net/core/dev.c
LINE: 6702
6697 {
6698 might_sleep();
6699 set_bit(NAPI_STATE_DISABLE, &n->state);
6700
6701 while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
* 6702 msleep(1);
6703 while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
6704 msleep(1);
6705
6706 hrtimer_cancel(&n->timer);
6707
6708 clear_bit(NAPI_STATE_DISABLE, &n->state);
6709 }
EEH calls into bnx2x twice based on the system log above, first through
bnx2x_io_error_detected() and then bnx2x_io_slot_reset(), and executes
the following call chains:
bnx2x_io_error_detected()
+-> bnx2x_eeh_nic_unload()
+-> bnx2x_del_all_napi()
+-> __netif_napi_del()
bnx2x_io_slot_reset()
+-> bnx2x_netif_stop()
+-> bnx2x_napi_disable()
+->napi_disable()
Fix this by correcting the sequence of NAPI APIs usage,
that is delete the NAPI after disabling it.
Fixes:
7fa6f34081f1 ("bnx2x: AER revised")
Reported-by: David Christensen <drc@linux.vnet.ibm.com>
Tested-by: David Christensen <drc@linux.vnet.ibm.com>
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Link: https://lore.kernel.org/r/20220426153913.6966-1-manishc@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 28 Apr 2022 00:09:31 +0000 (17:09 -0700)]
Merge https://git./linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2022-04-27
We've added 85 non-merge commits during the last 18 day(s) which contain
a total of 163 files changed, 4499 insertions(+), 1521 deletions(-).
The main changes are:
1) Teach libbpf to enhance BPF verifier log with human-readable and relevant
information about failed CO-RE relocations, from Andrii Nakryiko.
2) Add typed pointer support in BPF maps and enable it for unreferenced pointers
(via probe read) and referenced ones that can be passed to in-kernel helpers,
from Kumar Kartikeya Dwivedi.
3) Improve xsk to break NAPI loop when rx queue gets full to allow for forward
progress to consume descriptors, from Maciej Fijalkowski & Björn Töpel.
4) Fix a small RCU read-side race in BPF_PROG_RUN routines which dereferenced
the effective prog array before the rcu_read_lock, from Stanislav Fomichev.
5) Implement BPF atomic operations for RV64 JIT, and add libbpf parsing logic
for USDT arguments under riscv{32,64}, from Pu Lehui.
6) Implement libbpf parsing of USDT arguments under aarch64, from Alan Maguire.
7) Enable bpftool build for musl and remove nftw with FTW_ACTIONRETVAL usage
so it can be shipped under Alpine which is musl-based, from Dominique Martinet.
8) Clean up {sk,task,inode} local storage trace RCU handling as they do not
need to use call_rcu_tasks_trace() barrier, from KP Singh.
9) Improve libbpf API documentation and fix error return handling of various
API functions, from Grant Seltzer.
10) Enlarge offset check for bpf_skb_{load,store}_bytes() helpers given data
length of frags + frag_list may surpass old offset limit, from Liu Jian.
11) Various improvements to prog_tests in area of logging, test execution
and by-name subtest selection, from Mykola Lysenko.
12) Simplify map_btf_id generation for all map types by moving this process
to build time with help of resolve_btfids infra, from Menglong Dong.
13) Fix a libbpf bug in probing when falling back to legacy bpf_probe_read*()
helpers; the probing caused always to use old helpers, from Runqing Yang.
14) Add support for ARCompact and ARCv2 platforms for libbpf's PT_REGS
tracing macros, from Vladimir Isaev.
15) Cleanup BPF selftests to remove old & unneeded rlimit code given kernel
switched to memcg-based memory accouting a while ago, from Yafang Shao.
16) Refactor of BPF sysctl handlers to move them to BPF core, from Yan Zhu.
17) Fix BPF selftests in two occasions to work around regressions caused by latest
LLVM to unblock CI until their fixes are worked out, from Yonghong Song.
18) Misc cleanups all over the place, from various others.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (85 commits)
selftests/bpf: Add libbpf's log fixup logic selftests
libbpf: Fix up verifier log for unguarded failed CO-RE relos
libbpf: Simplify bpf_core_parse_spec() signature
libbpf: Refactor CO-RE relo human description formatting routine
libbpf: Record subprog-resolved CO-RE relocations unconditionally
selftests/bpf: Add CO-RE relos and SEC("?...") to linked_funcs selftests
libbpf: Avoid joining .BTF.ext data with BPF programs by section name
libbpf: Fix logic for finding matching program for CO-RE relocation
libbpf: Drop unhelpful "program too large" guess
libbpf: Fix anonymous type check in CO-RE logic
bpf: Compute map_btf_id during build time
selftests/bpf: Add test for strict BTF type check
selftests/bpf: Add verifier tests for kptr
selftests/bpf: Add C tests for kptr
libbpf: Add kptr type tag macros to bpf_helpers.h
bpf: Make BTF type match stricter for release arguments
bpf: Teach verifier about kptr_get kfunc helpers
bpf: Wire up freeing of referenced kptr
bpf: Populate pairs of btf_id and destructor kfunc in btf
bpf: Adapt copy_map_value for multiple offset case
...
====================
Link: https://lore.kernel.org/r/20220427224758.20976-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Arun Ramadoss [Tue, 26 Apr 2022 09:10:48 +0000 (14:40 +0530)]
net: dsa: ksz9477: move get_stats64 to ksz_common.c
The mib counters for the ksz9477 is same for the ksz9477 switch and
LAN937x switch. Hence moving it to ksz_common.c file in order to have it
generic function. The DSA hook get_stats64 now can call ksz_get_stats64.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20220426091048.9311-1-arun.ramadoss@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Mikityanskiy [Tue, 26 Apr 2022 15:49:49 +0000 (18:49 +0300)]
tls: Skip tls_append_frag on zero copy size
Calling tls_append_frag when max_open_record_len == record->len might
add an empty fragment to the TLS record if the call happens to be on the
page boundary. Normally tls_append_frag coalesces the zero-sized
fragment to the previous one, but not if it's on page boundary.
If a resync happens then, the mlx5 driver posts dump WQEs in
tx_post_resync_dump, and the empty fragment may become a data segment
with byte_count == 0, which will confuse the NIC and lead to a CQE
error.
This commit fixes the described issue by skipping tls_append_frag on
zero size to avoid adding empty fragments. The fix is not in the driver,
because an empty fragment is hardly the desired behavior.
Fixes:
e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20220426154949.159055-1-maximmi@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 27 Apr 2022 22:18:39 +0000 (15:18 -0700)]
Merge https://git./linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2022-04-27
We've added 5 non-merge commits during the last 20 day(s) which contain
a total of 6 files changed, 34 insertions(+), 12 deletions(-).
The main changes are:
1) Fix xsk sockets when rx and tx are separately bound to the same umem, also
fix xsk copy mode combined with busy poll, from Maciej Fijalkowski.
2) Fix BPF tunnel/collect_md helpers with bpf_xmit lwt hook usage which triggered
a crash due to invalid metadata_dst access, from Eyal Birger.
3) Fix release of page pool in XDP live packet mode, from Toke Høiland-Jørgensen.
4) Fix potential NULL pointer dereference in kretprobes, from Adam Zabrocki.
(Masami & Steven preferred this small fix to be routed via bpf tree given it's
follow-up fix to Masami's rethook work that went via bpf earlier, too.)
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
xsk: Fix possible crash when multiple sockets are created
kprobes: Fix KRETPROBES when CONFIG_KRETPROBE_ON_RETHOOK is set
bpf, lwt: Fix crash when using bpf_skb_set_tunnel_key() from bpf_xmit lwt hook
bpf: Fix release of page_pool in BPF_PROG_RUN in test runner
xsk: Fix l2fwd for copy mode + busy poll combo
====================
Link: https://lore.kernel.org/r/20220427212748.9576-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Wed, 27 Apr 2022 20:44:37 +0000 (13:44 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
"Two patches.
Subsystems affected by this patch series: mm/kasan and mm/debug"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
docs: vm/page_owner: use literal blocks for param description
kasan: prevent cpu_quarantine corruption when CPU offline and cache shrink occur at same time
Akira Yokosawa [Wed, 27 Apr 2022 19:41:59 +0000 (12:41 -0700)]
docs: vm/page_owner: use literal blocks for param description
Sphinx generates hard-to-read lists of parameters at the bottom of the
page. Fix them by putting literal-block markers of "::" in front of
them.
Link: https://lkml.kernel.org/r/cfd3bcc0-b51d-0c68-c065-ca1c4c202447@gmail.com
Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
Fixes:
57f2b54a9379 ("Documentation/vm/page_owner.rst: update the documentation")
Cc: Shenghong Han <hanshenghong2019@email.szu.edu.cn>
Cc: Haowen Bai <baihaowen@meizu.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Alex Shi <seakeel@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zqiang [Wed, 27 Apr 2022 19:41:56 +0000 (12:41 -0700)]
kasan: prevent cpu_quarantine corruption when CPU offline and cache shrink occur at same time
kasan_quarantine_remove_cache() is called in kmem_cache_shrink()/
destroy(). The kasan_quarantine_remove_cache() call is protected by
cpuslock in kmem_cache_destroy() to ensure serialization with
kasan_cpu_offline().
However the kasan_quarantine_remove_cache() call is not protected by
cpuslock in kmem_cache_shrink(). When a CPU is going offline and cache
shrink occurs at same time, the cpu_quarantine may be corrupted by
interrupt (per_cpu_remove_cache operation).
So add a cpu_quarantine offline flags check in per_cpu_remove_cache().
[akpm@linux-foundation.org: add comment, per Zqiang]
Link: https://lkml.kernel.org/r/20220414025925.2423818-1-qiang1.zhang@intel.com
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Artem Bityutskiy [Wed, 27 Apr 2022 06:08:53 +0000 (09:08 +0300)]
intel_idle: Fix SPR C6 optimization
The Sapphire Rapids (SPR) C6 optimization was added to the end of the
'spr_idle_state_table_update()' function. However, the function has a
'return' which may happen before the optimization has a chance to run.
And this may prevent the optimization from happening.
This is an unlikely scenario, but possible if user boots with, say,
the 'intel_idle.preferred_cstates=6' kernel boot option.
This patch fixes the issue by eliminating the problematic 'return'
statement.
Fixes:
3a9cf77b60dc ("intel_idle: add core C6 optimization for SPR")
Suggested-by: Jan Beulich <jbeulich@suse.com>
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
[ rjw: Minor changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Artem Bityutskiy [Wed, 27 Apr 2022 06:08:52 +0000 (09:08 +0300)]
intel_idle: Fix the 'preferred_cstates' module parameter
Problem description.
When user boots kernel up with the 'intel_idle.preferred_cstates=4' option,
we enable C1E and disable C1 states on Sapphire Rapids Xeon (SPR). In order
for C1E to work on SPR, we have to enable the C1E promotion bit on all
CPUs. However, we enable it only on one CPU.
Fix description.
The 'intel_idle' driver already has the infrastructure for disabling C1E
promotion on every CPU. This patch uses the same infrastructure for
enabling C1E promotion on every CPU. It changes the boolean
'disable_promotion_to_c1e' variable to a tri-state 'c1e_promotion'
variable.
Tested on a 2-socket SPR system. I verified the following combinations:
* C1E promotion enabled and disabled in BIOS.
* Booted with and without the 'intel_idle.preferred_cstates=4' kernel
argument.
In all 4 cases C1E promotion was correctly set on all CPUs.
Also tested on an old Broadwell system, just to make sure it does not cause
a regression. C1E promotion was correctly disabled on that system, both C1
and C1E were exposed (as expected).
Fixes:
da0e58c038e6 ("intel_idle: add 'preferred_cstates' module argument")
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
[ rjw: Minor changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Rafael J. Wysocki [Wed, 27 Apr 2022 18:20:03 +0000 (20:20 +0200)]
Merge tag 'cpufreq-arm-fixes-5.18-rc5' of git://git./linux/kernel/git/vireshk/pm
Pull ARM cpufreq fixes for 5.18-rc5 from Viresh Kumar:
"- Fix issues with the Qualcomm's cpufreq driver (Dmitry Baryshkov and
Vladimir Zapolskiy).
- Fix memory leak with the Sun501 driver (Xiaobing Luo)."
* tag 'cpufreq-arm-fixes-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: qcom-cpufreq-hw: Clear dcvs interrupts
cpufreq: fix memory leak in sun50i_cpufreq_nvmem_probe
cpufreq: qcom-cpufreq-hw: Fix throttle frequency value on EPSS platforms
cpufreq: qcom-hw: provide online/offline operations
cpufreq: qcom-hw: fix the opp entries refcounting
cpufreq: qcom-hw: fix the race between LMH worker and cpuhp
cpufreq: qcom-hw: drop affinity hint before freeing the IRQ
Mikulas Patocka [Wed, 27 Apr 2022 15:26:40 +0000 (11:26 -0400)]
hex2bin: fix access beyond string end
If we pass too short string to "hex2bin" (and the string size without
the terminating NUL character is even), "hex2bin" reads one byte after
the terminating NUL character. This patch fixes it.
Note that hex_to_bin returns -1 on error and hex2bin return -EINVAL on
error - so we can't just return the variable "hi" or "lo" on error.
This inconsistency may be fixed in the next merge window, but for the
purpose of fixing this bug, we just preserve the existing behavior and
return -1 and -EINVAL.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Fixes:
b78049831ffe ("lib: add error checking to hex2bin")
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mikulas Patocka [Mon, 25 Apr 2022 12:07:48 +0000 (08:07 -0400)]
hex2bin: make the function hex_to_bin constant-time
The function hex2bin is used to load cryptographic keys into device
mapper targets dm-crypt and dm-integrity. It should take constant time
independent on the processed data, so that concurrently running
unprivileged code can't infer any information about the keys via
microarchitectural convert channels.
This patch changes the function hex_to_bin so that it contains no
branches and no memory accesses.
Note that this shouldn't cause performance degradation because the size
of the new function is the same as the size of the old function (on
x86-64) - and the new function causes no branch misprediction penalties.
I compile-tested this function with gcc on aarch64 alpha arm hppa hppa64
i386 ia64 m68k mips32 mips64 powerpc powerpc64 riscv sh4 s390x sparc32
sparc64 x86_64 and with clang on aarch64 arm hexagon i386 mips32 mips64
powerpc powerpc64 s390x sparc32 sparc64 x86_64 to verify that there are
no branches in the generated code.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>