Kuniyuki Iwashima [Wed, 24 Nov 2021 02:14:19 +0000 (11:14 +0900)]
af_unix: Use offsetof() instead of sizeof().
The length of the AF_UNIX socket address contains an offset to the member
sun_path of struct sockaddr_un.
Currently, the preceding member is just sun_family, and its type is
sa_family_t and resolved to short. Therefore, the offset is represented by
sizeof(short). However, it is not clear and fragile to changes in struct
sockaddr_storage or sockaddr_un.
This commit makes it clear and robust by rewriting sizeof() with
offsetof().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xin Long [Wed, 24 Nov 2021 19:39:33 +0000 (14:39 -0500)]
bridge: use __set_bit in __br_vlan_set_default_pvid
The same optimization as the one in commit
cc0be1ad686f ("net:
bridge: Slightly optimize 'find_portno()'") is needed for the
'changed' bitmap in __br_vlan_set_default_pvid().
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/4e35f415226765e79c2a11d2c96fbf3061c486e2.1637782773.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tonghao Zhang [Thu, 25 Nov 2021 16:30:49 +0000 (00:30 +0800)]
net: ethtool: set a default driver name
The netdev (e.g. ifb, bareudp), which not support ethtool ops
(e.g. .get_drvinfo), we can use the rtnl kind as a default name.
ifb netdev may be created by others prefix, not ifbX.
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Hao Chen <chenhao288@hisilicon.com>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: Danielle Ratson <danieller@nvidia.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20211125163049.84970-1-xiangxia.m.yue@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 27 Nov 2021 00:43:20 +0000 (16:43 -0800)]
Merge branch 'selftests-net-bridge-vlan-multicast-tests'
Nikolay Aleksandrov says:
====================
selftests: net: bridge: vlan multicast tests
This patch-set adds selftests for the new vlan multicast options that
were recently added. Most of the tests check for default values,
changing options and try to verify that the changes actually take
effect. The last test checks if the dependency between vlan_filtering
and mcast_vlan_snooping holds. The rest are pretty self-explanatory.
TEST: Vlan multicast snooping enable [ OK ]
TEST: Vlan global options existence [ OK ]
TEST: Vlan mcast_snooping global option default value [ OK ]
TEST: Vlan 10 multicast snooping control [ OK ]
TEST: Vlan mcast_querier global option default value [ OK ]
TEST: Vlan 10 multicast querier enable [ OK ]
TEST: Vlan 10 tagged IGMPv2 general query sent [ OK ]
TEST: Vlan 10 tagged MLD general query sent [ OK ]
TEST: Vlan mcast_igmp_version global option default value [ OK ]
TEST: Vlan mcast_mld_version global option default value [ OK ]
TEST: Vlan 10 mcast_igmp_version option changed to 3 [ OK ]
TEST: Vlan 10 tagged IGMPv3 general query sent [ OK ]
TEST: Vlan 10 mcast_mld_version option changed to 2 [ OK ]
TEST: Vlan 10 tagged MLDv2 general query sent [ OK ]
TEST: Vlan mcast_last_member_count global option default value [ OK ]
TEST: Vlan mcast_last_member_interval global option default value [ OK ]
TEST: Vlan 10 mcast_last_member_count option changed to 3 [ OK ]
TEST: Vlan 10 mcast_last_member_interval option changed to 200 [ OK ]
TEST: Vlan mcast_startup_query_interval global option default value [ OK ]
TEST: Vlan mcast_startup_query_count global option default value [ OK ]
TEST: Vlan 10 mcast_startup_query_interval option changed to 100 [ OK ]
TEST: Vlan 10 mcast_startup_query_count option changed to 3 [ OK ]
TEST: Vlan mcast_membership_interval global option default value [ OK ]
TEST: Vlan 10 mcast_membership_interval option changed to 200 [ OK ]
TEST: Vlan 10 mcast_membership_interval mdb entry expire [ OK ]
TEST: Vlan mcast_querier_interval global option default value [ OK ]
TEST: Vlan 10 mcast_querier_interval option changed to 100 [ OK ]
TEST: Vlan 10 mcast_querier_interval expire after outside query [ OK ]
TEST: Vlan mcast_query_interval global option default value [ OK ]
TEST: Vlan 10 mcast_query_interval option changed to 200 [ OK ]
TEST: Vlan mcast_query_response_interval global option default value [ OK ]
TEST: Vlan 10 mcast_query_response_interval option changed to 200 [ OK ]
TEST: Port vlan 10 option mcast_router default value [ OK ]
TEST: Port vlan 10 mcast_router option changed to 2 [ OK ]
TEST: Flood unknown vlan multicast packets to router port only [ OK ]
TEST: Disable multicast vlan snooping when vlan filtering is disabled [ OK ]
====================
Link: https://lore.kernel.org/r/20211125140858.3639139-1-razor@blackwall.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Nikolay Aleksandrov [Thu, 25 Nov 2021 14:08:58 +0000 (16:08 +0200)]
selftests: net: bridge: add test for vlan_filtering dependency
Add a test for dependency of mcast_vlan_snooping on vlan_filtering. If
vlan_filtering gets disabled, then mcast_vlan_snooping must be
automatically disabled as well.
TEST: Disable multicast vlan snooping when vlan filtering is disabled [ OK ]
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Nikolay Aleksandrov [Thu, 25 Nov 2021 14:08:57 +0000 (16:08 +0200)]
selftests: net: bridge: add vlan mcast_router tests
Add tests for the new per-port/vlan mcast_router option, verify that
unknown multicast packets are flooded only to router ports.
TEST: Port vlan 10 option mcast_router default value [ OK ]
TEST: Port vlan 10 mcast_router option changed to 2 [ OK ]
TEST: Flood unknown vlan multicast packets to router port only [ OK ]
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Nikolay Aleksandrov [Thu, 25 Nov 2021 14:08:56 +0000 (16:08 +0200)]
selftests: net: bridge: add vlan mcast query and query response interval tests
Add tests which change the new per-vlan mcast_query_interval and verify
the new value is in effect, also add a test to change
mcast_query_response_interval's value.
TEST: Vlan mcast_query_interval global option default value [ OK ]
TEST: Vlan 10 mcast_query_interval option changed to 200 [ OK ]
TEST: Vlan mcast_query_response_interval global option default value [ OK ]
TEST: Vlan 10 mcast_query_response_interval option changed to 200 [ OK ]
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Nikolay Aleksandrov [Thu, 25 Nov 2021 14:08:55 +0000 (16:08 +0200)]
selftests: net: bridge: add vlan mcast_querier_interval tests
Add tests which change the new per-vlan mcast_querier_interval and
verify that it is taken into account when an outside querier is present.
TEST: Vlan mcast_querier_interval global option default value [ OK ]
TEST: Vlan 10 mcast_querier_interval option changed to 100 [ OK ]
TEST: Vlan 10 mcast_querier_interval expire after outside query [ OK ]
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Nikolay Aleksandrov [Thu, 25 Nov 2021 14:08:54 +0000 (16:08 +0200)]
selftests: net: bridge: add vlan mcast_membership_interval test
Add a test which changes the new per-vlan mcast_membership_interval and
verifies that a newly learned mdb entry would expire in that interval.
TEST: Vlan mcast_membership_interval global option default value [ OK ]
TEST: Vlan 10 mcast_membership_interval option changed to 200 [ OK ]
TEST: Vlan 10 mcast_membership_interval mdb entry expire [ OK ]
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Nikolay Aleksandrov [Thu, 25 Nov 2021 14:08:53 +0000 (16:08 +0200)]
selftests: net: bridge: add vlan mcast_startup_query_count/interval tests
Add tests which change the new per-vlan startup query count/interval
options and verify the proper number of queries are sent in the expected
interval.
TEST: Vlan mcast_startup_query_interval global option default value [ OK ]
TEST: Vlan mcast_startup_query_count global option default value [ OK ]
TEST: Vlan 10 mcast_startup_query_interval option changed to 100 [ OK ]
TEST: Vlan 10 mcast_startup_query_count option changed to 3 [ OK ]
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Nikolay Aleksandrov [Thu, 25 Nov 2021 14:08:52 +0000 (16:08 +0200)]
selftests: net: bridge: add vlan mcast_last_member_count/interval tests
Add tests which verify the default values of mcast_last_member_count
mcast_last_member_count and also try to change them.
TEST: Vlan mcast_last_member_count global option default value [ OK ]
TEST: Vlan mcast_last_member_interval global option default value [ OK ]
TEST: Vlan 10 mcast_last_member_count option changed to 3 [ OK ]
TEST: Vlan 10 mcast_last_member_interval option changed to 200 [ OK ]
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Nikolay Aleksandrov [Thu, 25 Nov 2021 14:08:51 +0000 (16:08 +0200)]
selftests: net: bridge: add vlan mcast igmp/mld version tests
Add tests which change the new per-vlan IGMP/MLD versions and verify
that proper tagged general query packets are sent.
TEST: Vlan mcast_igmp_version global option default value [ OK ]
TEST: Vlan mcast_mld_version global option default value [ OK ]
TEST: Vlan 10 mcast_igmp_version option changed to 3 [ OK ]
TEST: Vlan 10 tagged IGMPv3 general query sent [ OK ]
TEST: Vlan 10 mcast_mld_version option changed to 2 [ OK ]
TEST: Vlan 10 tagged MLDv2 general query sent [ OK ]
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Nikolay Aleksandrov [Thu, 25 Nov 2021 14:08:50 +0000 (16:08 +0200)]
selftests: net: bridge: add vlan mcast querier test
Add a test to try the new global vlan mcast_querier control and also
verify that tagged general query packets are properly generated when
querier is enabled for a single vlan.
TEST: Vlan mcast_querier global option default value [ OK ]
TEST: Vlan 10 multicast querier enable [ OK ]
TEST: Vlan 10 tagged IGMPv2 general query sent [ OK ]
TEST: Vlan 10 tagged MLD general query sent [ OK ]
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Nikolay Aleksandrov [Thu, 25 Nov 2021 14:08:49 +0000 (16:08 +0200)]
selftests: net: bridge: add vlan mcast snooping control test
Add the first test for bridge per-vlan multicast snooping which checks
if control of the global and per-vlan options work as expected, joins
and leaves are tested at each option value.
TEST: Vlan multicast snooping enable [ OK ]
TEST: Vlan global options existence [ OK ]
TEST: Vlan mcast_snooping global option default value [ OK ]
TEST: Vlan 10 multicast snooping control [ OK ]
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 26 Nov 2021 21:26:29 +0000 (13:26 -0800)]
Merge git://git./linux/kernel/git/netdev/net
drivers/net/ipa/ipa_main.c
8afc7e471ad3 ("net: ipa: separate disabling setup from modem stop")
76b5fbcd6b47 ("net: ipa: kill ipa_modem_init()")
Duplicated include, drop one.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Fri, 26 Nov 2021 20:58:53 +0000 (12:58 -0800)]
Merge tag 'net-5.16-rc3' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes, including fixes from netfilter.
Current release - regressions:
- r8169: fix incorrect mac address assignment
- vlan: fix underflow for the real_dev refcnt when vlan creation
fails
- smc: avoid warning of possible recursive locking
Current release - new code bugs:
- vsock/virtio: suppress used length validation
- neigh: fix crash in v6 module initialization error path
Previous releases - regressions:
- af_unix: fix change in behavior in read after shutdown
- igb: fix netpoll exit with traffic, avoid warning
- tls: fix splice_read() when starting mid-record
- lan743x: fix deadlock in lan743x_phy_link_status_change()
- marvell: prestera: fix bridge port operation
Previous releases - always broken:
- tcp_cubic: fix spurious Hystart ACK train detections for
not-cwnd-limited flows
- nexthop: fix refcount issues when replacing IPv6 groups
- nexthop: fix null pointer dereference when IPv6 is not enabled
- phylink: force link down and retrigger resolve on interface change
- mptcp: fix delack timer length calculation and incorrect early
clearing
- ieee802154: handle iftypes as u32, prevent shift-out-of-bounds
- nfc: virtual_ncidev: change default device permissions
- netfilter: ctnetlink: fix error codes and flags used for kernel
side filtering of dumps
- netfilter: flowtable: fix IPv6 tunnel addr match
- ncsi: align payload to 32-bit to fix dropped packets
- iavf: fix deadlock and loss of config during VF interface reset
- ice: avoid bpf_prog refcount underflow
- ocelot: fix broken PTP over IP and PTP API violations
Misc:
- marvell: mvpp2: increase MTU limit when XDP enabled"
* tag 'net-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (94 commits)
net: dsa: microchip: implement multi-bridge support
net: mscc: ocelot: correctly report the timestamping RX filters in ethtool
net: mscc: ocelot: set up traps for PTP packets
net: ptp: add a definition for the UDP port for IEEE 1588 general messages
net: mscc: ocelot: create a function that replaces an existing VCAP filter
net: mscc: ocelot: don't downgrade timestamping RX filters in SIOCSHWTSTAMP
net: hns3: fix incorrect components info of ethtool --reset command
net: hns3: fix one incorrect value of page pool info when queried by debugfs
net: hns3: add check NULL address for page pool
net: hns3: fix VF RSS failed problem after PF enable multi-TCs
net: qed: fix the array may be out of bound
net/smc: Don't call clcsock shutdown twice when smc shutdown
net: vlan: fix underflow for the real_dev refcnt
ptp: fix filter names in the documentation
ethtool: ioctl: fix potential NULL deref in ethtool_set_coalesce()
nfc: virtual_ncidev: change default device permissions
net/sched: sch_ets: don't peek at classes beyond 'nbands'
net: stmmac: Disable Tx queues when reconfiguring the interface
selftests: tls: test for correct proto_ops
tls: fix replacing proto_ops
...
Oleksij Rempel [Fri, 26 Nov 2021 12:39:26 +0000 (13:39 +0100)]
net: dsa: microchip: implement multi-bridge support
Current driver version is able to handle only one bridge at time.
Configuring two bridges on two different ports would end up shorting this
bridges by HW. To reproduce it:
ip l a name br0 type bridge
ip l a name br1 type bridge
ip l s dev br0 up
ip l s dev br1 up
ip l s lan1 master br0
ip l s dev lan1 up
ip l s lan2 master br1
ip l s dev lan2 up
Ping on lan1 and get response on lan2, which should not happen.
This happened, because current driver version is storing one global "Port VLAN
Membership" and applying it to all ports which are members of any
bridge.
To solve this issue, we need to handle each port separately.
This patch is dropping the global port member storage and calculating
membership dynamically depending on STP state and bridge participation.
Note: STP support was broken before this patch and should be fixed
separately.
Fixes:
c2e866911e25 ("net: dsa: microchip: break KSZ9477 DSA driver into two files")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20211126123926.2981028-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Fri, 26 Nov 2021 20:19:13 +0000 (12:19 -0800)]
Merge tag 'acpi-5.16-rc3' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix a NULL pointer dereference in the CPPC library code and a
locking issue related to printing the names of ACPI device nodes in
the device properties framework.
Specifics:
- Fix NULL pointer dereference in the CPPC library code occuring on
hybrid systems without CPPC support (Rafael Wysocki).
- Avoid attempts to acquire a semaphore with interrupts off when
printing the names of ACPI device nodes and clean up code on top of
that fix (Sakari Ailus)"
* tag 'acpi-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: CPPC: Add NULL pointer check to cppc_get_perf()
ACPI: Make acpi_node_get_parent() local
ACPI: Get acpi_device's parent from the parent field
Linus Torvalds [Fri, 26 Nov 2021 20:14:50 +0000 (12:14 -0800)]
Merge tag 'pm-5.16-rc3' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These address three issues in the intel_pstate driver and fix two
problems related to hibernation.
Specifics:
- Make intel_pstate work correctly on Ice Lake server systems with
out-of-band performance control enabled (Adamos Ttofari).
- Fix EPP handling in intel_pstate during CPU offline and online in
the active mode (Rafael Wysocki).
- Make intel_pstate support ITMT on asymmetric systems with
overclocking enabled (Srinivas Pandruvada).
- Fix hibernation image saving when using the user space interface
based on the snapshot special device file (Evan Green).
- Make the hibernation code release the snapshot block device using
the same mode that was used when acquiring it (Thomas Zeitlhofer)"
* tag 'pm-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: hibernate: Fix snapshot partial write lengths
PM: hibernate: use correct mode for swsusp_close()
cpufreq: intel_pstate: ITMT support for overclocked system
cpufreq: intel_pstate: Fix active mode offline/online EPP handling
cpufreq: intel_pstate: Add Ice Lake server to out-of-band IDs
Linus Torvalds [Fri, 26 Nov 2021 20:01:31 +0000 (12:01 -0800)]
Merge tag 'fuse-fixes-5.16-rc3' of git://git./linux/kernel/git/mszeredi/fuse
Pull fuse fix from Miklos Szeredi:
"Fix a regression caused by a bugfix in the previous release. The
symptom is a VM_BUG_ON triggered from splice to the fuse device.
Unfortunately the original bugfix was already backported to a number
of stable releases, so this fix-fix will need to be backported as
well"
* tag 'fuse-fixes-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: release pipe buf after last use
Jakub Kicinski [Fri, 26 Nov 2021 19:38:23 +0000 (11:38 -0800)]
Merge branch 'fix-broken-ptp-over-ip-on-ocelot-switches'
Vladimir Oltean says:
====================
Fix broken PTP over IP on Ocelot switches
Changes in v2: added patch 5, added Richard's ack for the whole series
sans patch 5 which is new.
Po Liu reported recently that timestamping PTP over IPv4 is broken using
the felix driver on NXP LS1028A. This has been known for a while, of
course, since it has always been broken. The reason is because IP PTP
packets are currently treated as unknown IP multicast, which is not
flooded to the CPU port in the ocelot driver design, so packets don't
reach the ptp4l program.
The series solves the problem by installing packet traps per port when
the timestamping ioctl is called, depending on the RX filter selected
(L2, L4 or both).
====================
Link: https://lore.kernel.org/r/20211126172845.3149260-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Fri, 26 Nov 2021 17:28:45 +0000 (19:28 +0200)]
net: mscc: ocelot: correctly report the timestamping RX filters in ethtool
The driver doesn't support RX timestamping for non-PTP packets, but it
declares that it does. Restrict the reported RX filters to PTP v2 over
L2 and over L4.
Fixes:
4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Fri, 26 Nov 2021 17:28:44 +0000 (19:28 +0200)]
net: mscc: ocelot: set up traps for PTP packets
IEEE 1588 support was declared too soon for the Ocelot switch. Out of
reset, this switch does not apply any special treatment for PTP packets,
i.e. when an event message is received, the natural tendency is to
forward it by MAC DA/VLAN ID. This poses a problem when the ingress port
is under a bridge, since user space application stacks (written
primarily for endpoint ports, not switches) like ptp4l expect that PTP
messages are always received on AF_PACKET / AF_INET sockets (depending
on the PTP transport being used), and never being autonomously
forwarded. Any forwarding, if necessary (for example in Transparent
Clock mode) is handled in software by ptp4l. Having the hardware forward
these packets too will cause duplicates which will confuse endpoints
connected to these switches.
So PTP over L2 barely works, in the sense that PTP packets reach the CPU
port, but they reach it via flooding, and therefore reach lots of other
unwanted destinations too. But PTP over IPv4/IPv6 does not work at all.
This is because the Ocelot switch have a separate destination port mask
for unknown IP multicast (which PTP over IP is) flooding compared to
unknown non-IP multicast (which PTP over L2 is) flooding. Specifically,
the driver allows the CPU port to be in the PGID_MC port group, but not
in PGID_MCIPV4 and PGID_MCIPV6. There are several presentations from
Allan Nielsen which explain that the embedded MIPS CPU on Ocelot
switches is not very powerful at all, so every penny they could save by
not allowing flooding to the CPU port module matters. Unknown IP
multicast did not make it.
The de facto consensus is that when a switch is PTP-aware and an
application stack for PTP is running, switches should have some sort of
trapping mechanism for PTP packets, to extract them from the hardware
data path. This avoids both problems:
(a) PTP packets are no longer flooded to unwanted destinations
(b) PTP over IP packets are no longer denied from reaching the CPU since
they arrive there via a trap and not via flooding
It is not the first time when this change is attempted. Last time, the
feedback from Allan Nielsen and Andrew Lunn was that the traps should
not be installed by default, and that PTP-unaware switching may be
desired for some use cases:
https://patchwork.ozlabs.org/project/netdev/patch/
20190813025214.18601-5-yangbo.lu@nxp.com/
To address that feedback, the present patch adds the necessary packet
traps according to the RX filter configuration transmitted by user space
through the SIOCSHWTSTAMP ioctl. Trapping is done via VCAP IS2, where we
keep 5 filters, which are amended each time RX timestamping is enabled
or disabled on a port:
- 1 for PTP over L2
- 2 for PTP over IPv4 (UDP ports 319 and 320)
- 2 for PTP over IPv6 (UDP ports 319 and 320)
The cookie by which these filters (invisible to tc) are identified is
strategically chosen such that it does not collide with the filters used
for the ocelot-8021q tagging protocol by the Felix driver, or with the
MRP traps set up by the Ocelot library.
Other alternatives were considered, like patching user space to do
something, but there are so many ways in which PTP packets could be made
to reach the CPU, generically speaking, that "do what?" is a very valid
question. The ptp4l program from the linuxptp stack already attempts to
do something: it calls setsockopt(IP_ADD_MEMBERSHIP) (and
PACKET_ADD_MEMBERSHIP, respectively) which translates in both cases into
a dev_mc_add() on the interface, in the kernel:
https://github.com/richardcochran/linuxptp/blob/v3.1.1/udp.c#L73
https://github.com/richardcochran/linuxptp/blob/v3.1.1/raw.c
Reality shows that this is not sufficient in case the interface belongs
to a switchdev driver, as dev_mc_add() does not show the intention to
trap a packet to the CPU, but rather the intention to not drop it (it is
strictly for RX filtering, same as promiscuous does not mean to send all
traffic to the CPU, but to not drop traffic with unknown MAC DA). This
topic is a can of worms in itself, and it would be great if user space
could just stay out of it.
On the other hand, setting up PTP traps privately within the driver is
not new by any stretch of the imagination:
https://elixir.bootlin.com/linux/v5.16-rc2/source/drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c#L833
https://elixir.bootlin.com/linux/v5.16-rc2/source/drivers/net/dsa/hirschmann/hellcreek.c#L1050
https://elixir.bootlin.com/linux/v5.16-rc2/source/include/linux/dsa/sja1105.h#L21
So this is the approach taken here as well. The difference here being
that we prepare and destroy the traps per port, dynamically at runtime,
as opposed to driver init time, because apparently, PTP-unaware
forwarding is a use case.
Fixes:
4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support")
Reported-by: Po Liu <po.liu@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Fri, 26 Nov 2021 17:28:43 +0000 (19:28 +0200)]
net: ptp: add a definition for the UDP port for IEEE 1588 general messages
As opposed to event messages (Sync, PdelayReq etc) which require
timestamping, general messages (Announce, FollowUp etc) do not.
In PTP they are part of different streams of data.
IEEE 1588-2008 Annex D.2 "UDP port numbers" states that the UDP
destination port assigned by IANA is 319 for event messages, and 320 for
general messages. Yet the kernel seems to be missing the definition for
general messages. This patch adds it.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Fri, 26 Nov 2021 17:28:42 +0000 (19:28 +0200)]
net: mscc: ocelot: create a function that replaces an existing VCAP filter
VCAP (Versatile Content Aware Processor) is the TCAM-based engine behind
tc flower offload on ocelot, among other things. The ingress port mask
on which VCAP rules match is present as a bit field in the actual key of
the rule. This means that it is possible for a rule to be shared among
multiple source ports. When the rule is added one by one on each desired
port, that the ingress port mask of the key must be edited and rewritten
to hardware.
But the API in ocelot_vcap.c does not allow for this. For one thing,
ocelot_vcap_filter_add() and ocelot_vcap_filter_del() are not symmetric,
because ocelot_vcap_filter_add() works with a preallocated and
prepopulated filter and programs it to hardware, and
ocelot_vcap_filter_del() does both the job of removing the specified
filter from hardware, as well as kfreeing it. That is to say, the only
option of editing a filter in place, which is to delete it, modify the
structure and add it back, does not work because it results in
use-after-free.
This patch introduces ocelot_vcap_filter_replace, which trivially
reprograms a VCAP entry to hardware, at the exact same index at which it
existed before, without modifying any list or allocating any memory.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Fri, 26 Nov 2021 17:28:41 +0000 (19:28 +0200)]
net: mscc: ocelot: don't downgrade timestamping RX filters in SIOCSHWTSTAMP
The ocelot driver, when asked to timestamp all receiving packets, 1588
v1 or NTP, says "nah, here's 1588 v2 for you".
According to this discussion:
https://patchwork.kernel.org/project/netdevbpf/patch/
20211104133204.19757-8-martin.kaistra@linutronix.de/#
24577647
drivers that downgrade from a wider request to a narrower response (or
even a response where the intersection with the request is empty) are
buggy, and should return -ERANGE instead. This patch fixes that.
Fixes:
4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support")
Suggested-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 26 Nov 2021 19:36:31 +0000 (11:36 -0800)]
Merge branch 'net-hns3-add-some-fixes-for-net'
Guangbin Huang says:
====================
net: hns3: add some fixes for -net
This series adds some fixes for the HNS3 ethernet driver.
====================
Link: https://lore.kernel.org/r/20211126120318.33921-1-huangguangbin2@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jie Wang [Fri, 26 Nov 2021 12:03:18 +0000 (20:03 +0800)]
net: hns3: fix incorrect components info of ethtool --reset command
Currently, HNS3 driver doesn't clear the reset flags of components after
successfully executing reset, it causes userspace info of
"Components reset" and "Components not reset" is incorrect.
So fix this problem by clear corresponding reset flag after reset process.
Fixes:
ddccc5e368a3 ("net: hns3: add support for triggering reset by ethtool")
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hao Chen [Fri, 26 Nov 2021 12:03:17 +0000 (20:03 +0800)]
net: hns3: fix one incorrect value of page pool info when queried by debugfs
Currently, when user queries page pool info by debugfs command
"cat page_pool_info", the cnt of allocated page for page pool may be
incorrect because of memory inconsistency problem caused by compiler
optimization.
So this patch uses READ_ONCE() to read value of pages_state_hold_cnt to
fix this problem.
Fixes:
850bfb912a6d ("net: hns3: debugfs add support dumping page pool info")
Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hao Chen [Fri, 26 Nov 2021 12:03:16 +0000 (20:03 +0800)]
net: hns3: add check NULL address for page pool
When page pool is not enabled, its address value is still NULL and page
pool should not be accessed, so add a check for it.
Fixes:
850bfb912a6d ("net: hns3: debugfs add support dumping page pool info")
Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Guangbin Huang [Fri, 26 Nov 2021 12:03:15 +0000 (20:03 +0800)]
net: hns3: fix VF RSS failed problem after PF enable multi-TCs
When PF is set to multi-TCs and configured mapping relationship between
priorities and TCs, the hardware will active these settings for this PF
and its VFs.
In this case when VF just uses one TC and its rx packets contain priority,
and if the priority is not mapped to TC0, as other TCs of VF is not valid,
hardware always put this kind of packets to the queue 0. It cause this kind
of packets of VF can not be used RSS function.
To fix this problem, set tc mode of all unused TCs of VF to the setting of
TC0, then rx packet with priority which map to unused TC will be direct to
TC0.
Fixes:
e2cb1dec9779 ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
zhangyue [Thu, 25 Nov 2021 11:36:10 +0000 (19:36 +0800)]
net: qed: fix the array may be out of bound
If the variable 'p_bit->flags' is always 0,
the loop condition is always 0.
The variable 'j' may be greater than or equal to 32.
At this time, the array 'p_aeu->bits[32]' may be out
of bound.
Signed-off-by: zhangyue <zhangyue1@kylinos.cn>
Link: https://lore.kernel.org/r/20211125113610.273841-1-zhangyue1@kylinos.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Fri, 26 Nov 2021 19:24:32 +0000 (11:24 -0800)]
Merge tag 'for-5.16-rc2-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
"One more fix to the lzo code, a missing put_page causing memory leaks
when some error branches are taken"
* tag 'for-5.16-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix the memory leak caused in lzo_compress_pages()
Tony Lu [Fri, 26 Nov 2021 02:41:35 +0000 (10:41 +0800)]
net/smc: Don't call clcsock shutdown twice when smc shutdown
When applications call shutdown() with SHUT_RDWR in userspace,
smc_close_active() calls kernel_sock_shutdown(), and it is called
twice in smc_shutdown().
This fixes this by checking sk_state before do clcsock shutdown, and
avoids missing the application's call of smc_shutdown().
Link: https://lore.kernel.org/linux-s390/1f67548e-cbf6-0dce-82b5-10288a4583bd@linux.ibm.com/
Fixes:
606a63c9783a ("net/smc: Ensure the active closing peer first closes clcsock")
Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Wen Gu <guwen@linux.alibaba.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Link: https://lore.kernel.org/r/20211126024134.45693-1-tonylu@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
wengjianfeng [Fri, 26 Nov 2021 01:31:30 +0000 (09:31 +0800)]
nfc: fdp: Merge the same judgment
Combine two judgments that return the same value
Signed-off-by: wengjianfeng <wengjianfeng@yulong.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Link: https://lore.kernel.org/r/20211126013130.27112-1-samirweng1979@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ziyang Xuan [Fri, 26 Nov 2021 01:59:42 +0000 (09:59 +0800)]
net: vlan: fix underflow for the real_dev refcnt
Inject error before dev_hold(real_dev) in register_vlan_dev(),
and execute the following testcase:
ip link add dev dummy1 type dummy
ip link add name dummy1.100 link dummy1 type vlan id 100
ip link del dev dummy1
When the dummy netdevice is removed, we will get a WARNING as following:
=======================================================================
refcount_t: decrement hit 0; leaking memory.
WARNING: CPU: 2 PID: 0 at lib/refcount.c:31 refcount_warn_saturate+0xbf/0x1e0
and an endless loop of:
=======================================================================
unregister_netdevice: waiting for dummy1 to become free. Usage count = -
1073741824
That is because dev_put(real_dev) in vlan_dev_free() be called without
dev_hold(real_dev) in register_vlan_dev(). It makes the refcnt of real_dev
underflow.
Move the dev_hold(real_dev) to vlan_dev_init() which is the call-back of
ndo_init(). That makes dev_hold() and dev_put() for vlan's real_dev
symmetrical.
Fixes:
563bcbae3ba2 ("net: vlan: fix a UAF in vlan_dev_real_dev()")
Reported-by: Petr Machata <petrm@nvidia.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Link: https://lore.kernel.org/r/20211126015942.2918542-1-william.xuanziyang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 26 Nov 2021 03:19:21 +0000 (19:19 -0800)]
ptp: fix filter names in the documentation
All the filter names are missing _PTP in them.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/20211126031921.2466944-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Julian Wiedmann [Fri, 26 Nov 2021 17:55:43 +0000 (18:55 +0100)]
ethtool: ioctl: fix potential NULL deref in ethtool_set_coalesce()
ethtool_set_coalesce() now uses both the .get_coalesce() and
.set_coalesce() callbacks. But the check for their availability is
buggy, so changing the coalesce settings on a device where the driver
provides only _one_ of the callbacks results in a NULL pointer
dereference instead of an -EOPNOTSUPP.
Fix the condition so that the availability of both callbacks is
ensured. This also matches the netlink code.
Note that reproducing this requires some effort - it only affects the
legacy ioctl path, and needs a specific combination of driver options:
- have .get_coalesce() and .coalesce_supported but no
.set_coalesce(), or
- have .set_coalesce() but no .get_coalesce(). Here eg. ethtool doesn't
cause the crash as it first attempts to call ethtool_get_coalesce()
and bails out on error.
Fixes:
f3ccfda19319 ("ethtool: extend coalesce setting uAPI with CQE mode")
Cc: Yufeng Mo <moyufeng@huawei.com>
Cc: Huazhong Tan <tanhuazhong@huawei.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Link: https://lore.kernel.org/r/20211126175543.28000-1-jwi@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Thadeu Lima de Souza Cascardo [Thu, 25 Nov 2021 14:14:57 +0000 (11:14 -0300)]
nfc: virtual_ncidev: change default device permissions
Device permissions is S_IALLUGO, with many unnecessary bits. Remove them
and also remove read and write permissions from group and others.
Before the change:
crwsrwsrwt 1 0 0 10, 125 Nov 25 13:59 /dev/virtual_nci
After the change:
crw------- 1 0 0 10, 125 Nov 25 14:05 /dev/virtual_nci
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Reviewed-by: Bongsu Jeon <bongsu.jeon@samsung.com>
Link: https://lore.kernel.org/r/20211125141457.716921-1-cascardo@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Davide Caratti [Wed, 24 Nov 2021 16:14:40 +0000 (17:14 +0100)]
net/sched: sch_ets: don't peek at classes beyond 'nbands'
when the number of DRR classes decreases, the round-robin active list can
contain elements that have already been freed in ets_qdisc_change(). As a
consequence, it's possible to see a NULL dereference crash, caused by the
attempt to call cl->qdisc->ops->peek(cl->qdisc) when cl->qdisc is NULL:
BUG: kernel NULL pointer dereference, address:
0000000000000018
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 1 PID: 910 Comm: mausezahn Not tainted 5.16.0-rc1+ #475
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+
0f1aadab 04/01/2014
RIP: 0010:ets_qdisc_dequeue+0x129/0x2c0 [sch_ets]
Code: c5 01 41 39 ad e4 02 00 00 0f 87 18 ff ff ff 49 8b 85 c0 02 00 00 49 39 c4 0f 84 ba 00 00 00 49 8b ad c0 02 00 00 48 8b 7d 10 <48> 8b 47 18 48 8b 40 38 0f ae e8 ff d0 48 89 c3 48 85 c0 0f 84 9d
RSP: 0000:
ffffbb36c0b5fdd8 EFLAGS:
00010287
RAX:
ffff956678efed30 RBX:
0000000000000000 RCX:
0000000000000000
RDX:
0000000000000002 RSI:
ffffffff9b938dc9 RDI:
0000000000000000
RBP:
ffff956678efed30 R08:
e2f3207fe360129c R09:
0000000000000000
R10:
0000000000000001 R11:
0000000000000001 R12:
ffff956678efeac0
R13:
ffff956678efe800 R14:
ffff956611545000 R15:
ffff95667ac8f100
FS:
00007f2aa9120740(0000) GS:
ffff95667b800000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000000000000018 CR3:
000000011070c000 CR4:
0000000000350ee0
Call Trace:
<TASK>
qdisc_peek_dequeued+0x29/0x70 [sch_ets]
tbf_dequeue+0x22/0x260 [sch_tbf]
__qdisc_run+0x7f/0x630
net_tx_action+0x290/0x4c0
__do_softirq+0xee/0x4f8
irq_exit_rcu+0xf4/0x130
sysvec_apic_timer_interrupt+0x52/0xc0
asm_sysvec_apic_timer_interrupt+0x12/0x20
RIP: 0033:0x7f2aa7fc9ad4
Code: b9 ff ff 48 8b 54 24 18 48 83 c4 08 48 89 ee 48 89 df 5b 5d e9 ed fc ff ff 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa <53> 48 83 ec 10 48 8b 05 10 64 33 00 48 8b 00 48 85 c0 0f 85 84 00
RSP: 002b:
00007ffe5d33fab8 EFLAGS:
00000202
RAX:
0000000000000002 RBX:
0000561f72c31460 RCX:
0000561f72c31720
RDX:
0000000000000002 RSI:
0000561f72c31722 RDI:
0000561f72c31720
RBP:
000000000000002a R08:
00007ffe5d33fa40 R09:
0000000000000014
R10:
0000000000000000 R11:
0000000000000246 R12:
0000561f7187e380
R13:
0000000000000000 R14:
0000000000000000 R15:
0000561f72c31460
</TASK>
Modules linked in: sch_ets sch_tbf dummy rfkill iTCO_wdt intel_rapl_msr iTCO_vendor_support intel_rapl_common joydev virtio_balloon lpc_ich i2c_i801 i2c_smbus pcspkr ip_tables xfs libcrc32c crct10dif_pclmul crc32_pclmul crc32c_intel ahci libahci ghash_clmulni_intel serio_raw libata virtio_blk virtio_console virtio_net net_failover failover sunrpc dm_mirror dm_region_hash dm_log dm_mod
CR2:
0000000000000018
Ensuring that 'alist' was never zeroed [1] was not sufficient, we need to
remove from the active list those elements that are no more SP nor DRR.
[1] https://lore.kernel.org/netdev/
60d274838bf09777f0371253416e8af71360bc08.
1633609148.git.dcaratti@redhat.com/
v3: fix race between ets_qdisc_change() and ets_qdisc_dequeue() delisting
DRR classes beyond 'nbands' in ets_qdisc_change() with the qdisc lock
acquired, thanks to Cong Wang.
v2: when a NULL qdisc is found in the DRR active list, try to dequeue skb
from the next list item.
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Fixes:
dcc68b4d8084 ("net: sch_ets: Add a new Qdisc")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/7a5c496eed2d62241620bdbb83eb03fb9d571c99.1637762721.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rafael J. Wysocki [Fri, 26 Nov 2021 18:45:31 +0000 (19:45 +0100)]
Merge branch 'acpi-properties'
Merge fix and cleanup related to the management of ACPI device
properties for 5.16-rc3.
* acpi-properties:
ACPI: Make acpi_node_get_parent() local
ACPI: Get acpi_device's parent from the parent field
Rafael J. Wysocki [Fri, 26 Nov 2021 18:44:40 +0000 (19:44 +0100)]
Merge branch 'pm-sleep'
Merge hibernation-related fixes for 5.16-rc3.
* pm-sleep:
PM: hibernate: Fix snapshot partial write lengths
PM: hibernate: use correct mode for swsusp_close()
Yannick Vignon [Wed, 24 Nov 2021 15:47:31 +0000 (16:47 +0100)]
net: stmmac: Disable Tx queues when reconfiguring the interface
The Tx queues were not disabled in situations where the driver needed to
stop the interface to apply a new configuration. This could result in a
kernel panic when doing any of the 3 following actions:
* reconfiguring the number of queues (ethtool -L)
* reconfiguring the size of the ring buffers (ethtool -G)
* installing/removing an XDP program (ip l set dev ethX xdp)
Prevent the panic by making sure netif_tx_disable is called when stopping
an interface.
Without this patch, the following kernel panic can be observed when doing
any of the actions above:
Unable to handle kernel paging request at virtual address
ffff80001238d040
[....]
Call trace:
dwmac4_set_addr+0x8/0x10
dev_hard_start_xmit+0xe4/0x1ac
sch_direct_xmit+0xe8/0x39c
__dev_queue_xmit+0x3ec/0xaf0
dev_queue_xmit+0x14/0x20
[...]
[ end trace
0000000000000002 ]---
Fixes:
5fabb01207a2d ("net: stmmac: Add initial XDP support")
Fixes:
aa042f60e4961 ("net: stmmac: Add support to Ethtool get/set ring parameters")
Fixes:
0366f7e06a6be ("net: stmmac: add ethtool support for get/set channels")
Signed-off-by: Yannick Vignon <yannick.vignon@nxp.com>
Link: https://lore.kernel.org/r/20211124154731.1676949-1-yannick.vignon@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Fri, 26 Nov 2021 18:33:17 +0000 (10:33 -0800)]
Merge tag 'char-misc-5.16-rc3' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc driver fix from Greg KH:
"Here is a single binder driver fix for 5.16-rc3.
It resolves a problem reported in the set of binder fixes that went
into 5.16-rc1. It has been in linux-next for a while with no reported
problems"
* tag 'char-misc-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
binder: fix test regression due to sender_euid change
Linus Torvalds [Fri, 26 Nov 2021 18:27:43 +0000 (10:27 -0800)]
Merge tag 'staging-5.16-rc3' of git://git./linux/kernel/git/gregkh/staging
Pull staging fixes from Greg KH:
"Here are some small staging driver fixes and one driver removal for
5.16-rc3.
The fixes resolve a number of small issues found in 5.16-rc1, nothing
huge at all. The driver removal was due to a platform being removed in
5.16-rc1, but this driver was forgotten about. It wasn't being built
anymore so it's safe to delete.
All have been in linux-next for a while with no reported problems"
* tag 'staging-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: rtl8192e: Fix use after free in _rtl92e_pci_disconnect()
staging: greybus: Add missing rwsem around snd_ctl_remove() calls
staging: Remove Netlogic XLP network driver
staging: r8188eu: fix a memory leak in rtw_wx_read32()
staging: r8188eu: use GFP_ATOMIC under spinlock
staging: r8188eu: Use kzalloc() with GFP_ATOMIC in atomic context
staging/fbtft: Fix backlight
staging: r8188eu: Fix breakage introduced when 5G code was removed
Linus Torvalds [Fri, 26 Nov 2021 18:22:47 +0000 (10:22 -0800)]
Merge tag 'usb-5.16-rc1' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a number of small USB fixes for reported problems for
5.16-rc3
They include:
- typec driver fixes
- new usb-serial driver ids
- usb hub enumeration issues that were much reported
- gadget driver fixes
- dwc3 driver fix
- chipidea driver fixe
All of these have been in linux-next with no reported problems"
* tag 'usb-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: serial: option: add Fibocom FM101-GL variants
usb: typec: tipd: Fix initialization sequence for cd321x
usb: typec: tipd: Fix typo in cd321x_switch_power_state
usb: hub: Fix locking issues with address0_mutex
USB: serial: pl2303: fix GC type detection
USB: serial: option: add Telit LE910S1 0x9200 composition
usb: chipidea: ci_hdrc_imx: fix potential error pointer dereference in probe
usb: hub: Fix usb enumeration issue due to address0 race
usb: typec: fusb302: Fix masking of comparator and bc_lvl interrupts
usb: dwc3: leave default DMA for PCI devices
usb: dwc2: hcd_queue: Fix use of floating point literal
usb: dwc3: gadget: Fix null pointer exception
usb: gadget: udc-xilinx: Fix an error handling path in 'xudc_probe()'
usb: xhci: tegra: Check padctrl interrupt presence in device tree
usb: dwc2: gadget: Fix ISOC flow for elapsed frames
usb: dwc3: gadget: Check for L1/L2/U3 for Start Transfer
usb: dwc3: gadget: Ignore NoStream after End Transfer
usb: dwc3: core: Revise GHWPARAMS9 offset
Linus Torvalds [Fri, 26 Nov 2021 18:10:19 +0000 (10:10 -0800)]
Merge tag 'mmc-v5.16-rc1' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC host fixes from Ulf Hansson:
- mmc_spi: Add SPI IDs to silence warning
- sdhci: Fix ADMA for PAGE_SIZE >= 64KiB
- sdhci-esdhc-imx: Disable broken CMDQ for imx8qm/imx8qxp/imx8mm
* tag 'mmc-v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: spi: Add device-tree SPI IDs
mmc: sdhci: Fix ADMA for PAGE_SIZE >= 64KiB
mmc: sdhci-esdhc-imx: disable CMDQ support
Linus Torvalds [Fri, 26 Nov 2021 17:59:55 +0000 (09:59 -0800)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"I2C has an interrupt storm fix for the i801, better timeout handling
for the new virtio driver, and some documentation fixes this time"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
docs: i2c: smbus-protocol: mention the repeated start condition
i2c: virtio: disable timeout handling
i2c: i801: Fix interrupt storm from SMB_ALERT signal
i2c: i801: Restore INTREN on unload
dt-bindings: i2c: imx-lpi2c: Fix i.MX 8QM compatible matching
Linus Torvalds [Fri, 26 Nov 2021 17:54:13 +0000 (09:54 -0800)]
Merge tag 'for-linus-5.16c-rc3-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- Kconfig fix to make it possible to control building of the privcmd
driver
- three fixes for issues identified by the kernel test robot
- a five-patch series to simplify timeout handling for Xen PV driver
initialization
- two patches to fix error paths in xenstore/xenbus driver
initialization
* tag 'for-linus-5.16c-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: make HYPERVISOR_set_debugreg() always_inline
xen: make HYPERVISOR_get_debugreg() always_inline
xen: detect uninitialized xenbus in xenbus_init
xen: flag xen_snd_front to be not essential for system boot
xen: flag pvcalls-front to be not essential for system boot
xen: flag hvc_xen to be not essential for system boot
xen: flag xen_drm_front to be not essential for system boot
xen: add "not_essential" flag to struct xenbus_driver
xen/pvh: add missing prototype to header
xen: don't continue xenstore initialization in case of errors
xen/privcmd: make option visible in Kconfig
Linus Torvalds [Fri, 26 Nov 2021 17:30:24 +0000 (09:30 -0800)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Three arm64 fixes.
The main one is a fix to the way in which we evaluate the macro
arguments to our uaccess routines, which we _think_ might be the root
cause behind some unkillable tasks we've seen in the Android arm64 CI
farm (testing is ongoing). In any case, it's worth fixing.
Other than that, we've toned down an over-zealous VM_BUG_ON() and
fixed ftrace stack unwinding in a bunch of cases.
Summary:
- Evaluate uaccess macro arguments outside of the critical section
- Tighten up VM_BUG_ON() in pmd_populate_kernel() to avoid false positive
- Fix ftrace stack unwinding using HAVE_FUNCTION_GRAPH_RET_ADDR_PTR"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: uaccess: avoid blocking within critical sections
arm64: mm: Fix VM_BUG_ON(mm != &init_mm) for trans_pgd
arm64: ftrace: use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
Qu Wenruo [Sat, 20 Nov 2021 08:34:11 +0000 (16:34 +0800)]
btrfs: fix the memory leak caused in lzo_compress_pages()
[BUG]
Fstests generic/027 is pretty easy to trigger a slow but steady memory
leak if run with "-o compress=lzo" mount option.
Normally one single run of generic/027 is enough to eat up at least 4G ram.
[CAUSE]
In commit
d4088803f511 ("btrfs: subpage: make lzo_compress_pages()
compatible") we changed how @page_in is released.
But that refactoring makes @page_in only released after all pages being
compressed.
This leaves error path not releasing @page_in. And by "error path"
things like incompressible data will also be treated as an error
(-E2BIG).
Thus it can cause a memory leak if even nothing wrong happened.
[FIX]
Add check under @out label to release @page_in when needed, so when we
hit any error, the input page is properly released.
Reported-by: Josef Bacik <josef@toxicpanda.com>
Fixes:
d4088803f511 ("btrfs: subpage: make lzo_compress_pages() compatible")
Reviewed-and-tested-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Jakub Kicinski [Fri, 26 Nov 2021 05:03:33 +0000 (21:03 -0800)]
Merge branch 'net-small-csum-optimizations'
Eric Dumazet says:
====================
net: small csum optimizations
After recent x86 csum_partial() optimizations, we can more easily
see in kernel profiles costs of add/adc operations that could
be avoided, by feeding a non zero third argument to csum_partial()
====================
Link: https://lore.kernel.org/r/20211124202446.2917972-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 24 Nov 2021 20:24:46 +0000 (12:24 -0800)]
net: optimize skb_postpull_rcsum()
Remove one pair of add/adc instructions and their dependency
against carry flag.
We can leverage third argument to csum_partial():
X = csum_block_sub(X, csum_partial(start, len, 0), 0);
-->
X = csum_block_add(X, ~csum_partial(start, len, 0), 0);
-->
X = ~csum_partial(start, len, ~X);
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 24 Nov 2021 20:24:45 +0000 (12:24 -0800)]
gro: optimize skb_gro_postpull_rcsum()
We can leverage third argument to csum_partial():
X = csum_sub(X, csum_partial(start, len, 0));
-->
X = csum_add(X, ~csum_partial(start, len, 0));
-->
X = ~csum_partial(start, len, ~X);
This removes one add/adc pair and its dependency against the carry flag.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xin Long [Wed, 24 Nov 2021 19:26:14 +0000 (14:26 -0500)]
sctp: make the raise timer more simple and accurate
Currently, the probe timer is reused as the raise timer when PLPMTUD is in
the Search Complete state. raise_count was introduced to count how many
times the probe timer has timed out. When raise_count reaches to 30, the
raise timer handler will be triggered.
During the whole processing above, the timer keeps timing out every probe_
interval. It is a waste for the Search Complete state, as the raise timer
only needs to time out after 30 * probe_interval.
Since the raise timer and probe timer are never used at the same time, it
is no need to keep probe timer 'alive' in the Search Complete state. This
patch to introduce sctp_transport_reset_raise_timer() to start the timer
as the raise timer when entering the Search Complete state. When entering
the other states, sctp_transport_reset_probe_timer() will still be called
to reset the timer to the probe timer.
raise_count can be removed from sctp_transport as no need to count probe
timer timeout for raise timer timeout. last_rtx_chunks can be removed as
sctp_transport_reset_probe_timer() can be called in the place where asoc
rtx_data_chunks is changed.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://lore.kernel.org/r/edb0e48988ea85997488478b705b11ddc1ba724a.1637781974.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xin Long [Wed, 24 Nov 2021 17:11:12 +0000 (12:11 -0500)]
tipc: delete the unlikely branch in tipc_aead_encrypt
When a skb comes to tipc_aead_encrypt(), it's always linear. The
unlikely check 'skb_cloned(skb) && tailen <= skb_tailroom(skb)'
can completely be taken care of in skb_cow_data() by the code
in branch "if (!skb_has_frag_list())".
Also, remove the 'TODO:' annotation, as the pages in skbs are not
writable, see more on commit
3cf4375a0904 ("tipc: do not write
skb_shinfo frags when doing decrytion").
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Link: https://lore.kernel.org/r/47a478da0b6095b76e3cbe7a75cbd25d9da1df9a.1637773872.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 26 Nov 2021 04:05:34 +0000 (20:05 -0800)]
Merge branch 'net-ipa-gsi-channel-flow-control'
Alex Elder says:
====================
net: ipa: GSI channel flow control
Starting with IPA v4.2, endpoint DELAY mode (which prevents data
transfer on TX endpoints) does not work properly. To address this,
changes were made to allow underlying GSI channels to be put into
a "flow controlled" state, which achieves a similar objective.
The first patch in this series implements the flow controlled
channel state and the commands used to control it. It arranges
to use the new mechanism--instead of DELAY mode--for IPA v4.2+.
In IPA v4.11, the notion of GSI channel flow control was enhanced,
and implemented in a slightly different way. For the most part this
doesn't affect the way the IPA driver uses flow control, but the
second patch adds support for the newer mechanism.
====================
Link: https://lore.kernel.org/r/20211124194416.707007-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Wed, 24 Nov 2021 19:44:16 +0000 (13:44 -0600)]
net: ipa: support enhanced channel flow control
IPA v4.2 introduced GSI channel flow control, used instead of IPA
endpoint DELAY mode to prevent a TX channel from injecting packets
into the IPA core. It used a new FLOW_CONTROLLED channel state
which could be entered using GSI generic commands.
IPA v4.11 extended the channel flow control model. Rather than
having a distinct FLOW_CONTROLLED channel state, each channel has a
"flow control" property that can be enabled or not--independent of
the channel state. The AP (or modem) can modify this property using
the same GSI generic commands as before.
The AP only uses channel flow control on modem TX channels, and only
when recovering from a modem crash. The AP has no way to discover
the state of a modem channel, so the fact that (starting with IPA
v4.11) flow control no longer uses a distinct channel state is
invisible to the AP. So enhanced flow control generally does not
change the way AP uses flow control.
There are a few small differences, however:
- There is a notion of "primary" or "secondary" flow control, and
when enabling or disabling flow control that must be specified
in a new field in the GSI generic command register. For now, we
always specify 0 (meaning "primary").
- When disabling flow control, it's possible a request will need
to be retried. We retry up to 5 times in this case.
- Another new generic command allows the current flow control
state to be queried. We do not use this.
Other than the need for retries, the code essentially works the same
way as before.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Wed, 24 Nov 2021 19:44:15 +0000 (13:44 -0600)]
net: ipa: introduce channel flow control
One quirk for certain versions of IPA is that endpoint DELAY mode
does not work properly. IPA DELAY mode prevents any packets from
being delivered to the IPA core for processing on a TX endpoint.
The AP uses DELAY mode when the modem crashes, to prevent modem TX
endpoints from generating traffic during crash recovery. Without
this, there is a chance the hardware will stall during recovery from
a modem crash.
To achieve a similar effect, a GSI FLOW_CONTROLLED channel state
was created. A STARTED TX channel can be placed in FLOW_CONTROLLED
state, which prevents the transfer of any more packets. A channel
in FLOW_CONTROLLED state can be either returned to STARTED state, or
can be transitioned to STOPPED state.
Because this operates on GSI channels, two generic commands were
added to allow the AP to control this state for modem channels
(similar to the ALLOCATE and HALT channel commands).
Previously the code assumed this quirk only applied to IPA v4.2.
In fact, channel flow control (rather than endpoint DELAY mode)
should be used for all versions *starting* with IPA v4.2.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 26 Nov 2021 03:40:41 +0000 (19:40 -0800)]
Merge branch 'mctp-serial-minor-fixes'
Jeremy Kerr says:
====================
mctp serial minor fixes
We had a few minor fixes queued for a v4 of the original series, so
they're sent here as separate changes.
v2:
- fix ordering of cancel_work vs. unregister_netdev.
====================
Link: https://lore.kernel.org/r/20211125060739.3023442-1-jk@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jeremy Kerr [Thu, 25 Nov 2021 06:07:39 +0000 (14:07 +0800)]
mctp: serial: remove unnecessary ldisc data check
Jiri assures me that a ldisc->open with tty->disc_data set should never
happen, so this check doesn't do anything.
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jeremy Kerr [Thu, 25 Nov 2021 06:07:38 +0000 (14:07 +0800)]
mctp: serial: enforce fixed MTU
The current serial driver requires a maximum MTU of 68, and it doesn't
make sense to set a MTU below the MCTP-required baseline (of 68) either.
This change sets the min_mtu & max_mtu of the mctp netdev, essentially
disallowing changes. By using these instead of a ndo_change_mtu op, we
get the netlink extacks reported too.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jeremy Kerr [Thu, 25 Nov 2021 06:07:37 +0000 (14:07 +0800)]
mctp: serial: cancel tx work on ldisc close
We want to ensure that the tx work has finished before returning from
the ldisc close op, so do a synchronous cancel.
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 26 Nov 2021 03:37:37 +0000 (19:37 -0800)]
Merge branch 'net-ipa-small-collected-improvements'
Alex Elder says:
====================
net: ipa: small collected improvements
This series contains a somewhat unrelated set of changes, some
inspired by some recent work posted for back-port. For the most
part they're meant to improve the code without changing it's
functionality. Each basically stands on its own.
====================
Link: https://lore.kernel.org/r/20211124202511.862588-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Wed, 24 Nov 2021 20:25:11 +0000 (14:25 -0600)]
net: ipa: rearrange GSI structure fields
The dummy net_device is a large field in the GSI structure, but it
is not at all interesting from the perspective of debugging. Move
it to the end of the GSI structure so the other fields are easier to
find in memory.
The channel and event ring arrays are also very large, so move them
near the end of the structure as well.
Swap the position of the result and completion fields to improve
structure packing.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Wed, 24 Nov 2021 20:25:10 +0000 (14:25 -0600)]
net: ipa: GSI only needs one completion
A mutex ensures we never submit more than one GSI command of any
kind at once. This means the per-channel and per-event ring
completion structures provide no benefit. Instead, just use the
single (existing) GSI completion to signal the completion of GSI
commands of all types.
This makes gsi_evt_ring_init() a trivial function with no inverse,
so open-code it in its sole caller and get rid of the function.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Wed, 24 Nov 2021 20:25:09 +0000 (14:25 -0600)]
net: ipa: skip SKB copy if no netdev
In ipa_endpoint_skb_copy(), a new socket buffer structure is
allocated so that some data can be copied into it. However, after
doing this, if the endpoint has a null netdev pointer, we just drop
free the socket buffer.
Instead, check endpoint->netdev pointer first, and just return early
if it's null. Also return early if the SKB allocation fails, to
avoid the deeper indentation in the normal path.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Wed, 24 Nov 2021 20:25:08 +0000 (14:25 -0600)]
net: ipa: explicitly disable HOLB drop during setup
During setup, ipa_endpoint_program() programs each endpoint with
various configuration parameters. One of those registers defines
whether to drop packets when a head-of-line blocking condition is
detected on an RX endpoint. We currently assume this is disabled;
instead, explicitly set it to be disabled.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Wed, 24 Nov 2021 20:25:07 +0000 (14:25 -0600)]
net: ipa: rework how HOL_BLOCK handling is specified
The head-of-line block (HOLB) drop timer is only meaningful when
dropping packets due to blocking is enabled. Given that, redefine
the interface so the timer is specified when enabling HOLB drop, and
use a different function when disabling.
To enable and disable HOLB drop, these functions will now be used:
ipa_endpoint_init_hol_block_enable(endpoint, milliseconds)
ipa_endpoint_init_hol_block_disable(endpoint)
The existing ipa_endpoint_init_hol_block_enable() becomes a helper
function, renamed ipa_endpoint_init_hol_block_en(), and used with
ipa_endpoint_init_hol_block_timer() to enable HOLB block on an
endpoint.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Wed, 24 Nov 2021 20:25:06 +0000 (14:25 -0600)]
net: ipa: zero unused portions of filter table memory
Not all filter table entries are used. Only certain endpoints
support filtering, and the table begins with a bitmap indicating
which endpoints use the "slots" that follow for filter rules.
Currently, unused filter table entries are not initialized.
Instead, zero-fill the entire unused portion of the filter table
memory regions, to make it more obvious that memory is unused (and
not subsequently modified).
This is not strictly necessary, but the result is reassuring when
looking at filter table memory.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Wed, 24 Nov 2021 20:25:05 +0000 (14:25 -0600)]
net: ipa: kill ipa_modem_init()
A recent commit made disabling the SMP2P "setup ready" interrupt
unrelated to ipa_modem_stop(). Given that, it seems fitting to get
rid of ipa_modem_init() and ipa_modem_exit() (which are trivial
wrapper functions), and call ipa_smp2p_init() and ipa_smp2p_exit()
directly instead.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Thu, 25 Nov 2021 12:58:08 +0000 (14:58 +0200)]
net: dsa: felix: enable cut-through forwarding between ports by default
The VSC9959 switch embedded within NXP LS1028A (and that version of
Ocelot switches only) supports cut-through forwarding - meaning it can
start the process of looking up the destination ports for a packet, and
forward towards those ports, before the entire packet has been received
(as opposed to the store-and-forward mode).
The up side is having lower forwarding latency for large packets. The
down side is that frames with FCS errors are forwarded instead of being
dropped. However, erroneous frames do not result in incorrect updates of
the FDB or incorrect policer updates, since these processes are deferred
inside the switch to the end of frame. Since the switch starts the
cut-through forwarding process after all packet headers (including IP,
if any) have been processed, packets with large headers and small
payload do not see the benefit of lower forwarding latency.
There are two cases that need special attention.
The first is when a packet is multicast (or flooded) to multiple
destinations, one of which doesn't have cut-through forwarding enabled.
The switch deals with this automatically by disabling cut-through
forwarding for the frame towards all destination ports.
The second is when a packet is forwarded from a port of lower link speed
towards a port of higher link speed. This is not handled by the hardware
and needs software intervention.
Since we practically need to update the cut-through forwarding domain
from paths that aren't serialized by the rtnl_mutex (phylink
mac_link_down/mac_link_up ops), this means we need to serialize physical
link events with user space updates of bonding/bridging domains.
Enabling cut-through forwarding is done per {egress port, traffic class}.
I don't see any reason why this would be a configurable option as long
as it works without issues, and there doesn't appear to be any user
space configuration tool to toggle this on/off, so this patch enables
cut-through forwarding on all eligible ports and traffic classes.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20211125125808.2383984-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Thu, 25 Nov 2021 12:58:07 +0000 (14:58 +0200)]
net: ocelot: remove "bridge" argument from ocelot_get_bridge_fwd_mask
The only called takes ocelot_port->bridge and passes it as the "bridge"
argument to this function, which then compares it with
ocelot_port->bridge. This is not useful.
Instead, we would like this function to return 0 if ocelot_port->bridge
is not present, which is what this patch does.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20211125125808.2383984-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Colin Ian King [Thu, 25 Nov 2021 00:29:32 +0000 (00:29 +0000)]
net: dsa: qca8k: Fix spelling mistake "Mismateched" -> "Mismatched"
There is a spelling mistake in a netdev_err error message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20211125002932.49217-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 26 Nov 2021 03:28:20 +0000 (19:28 -0800)]
Merge branch 'tls-splice_read-fixes'
Jakub Kicinski says:
====================
tls: splice_read fixes
As I work my way to unlocked and zero-copy TLS Rx the obvious bugs
in the splice_read implementation get harder and harder to ignore.
This is to say the fixes here are discovered by code inspection,
I'm not aware of anyone actually using splice_read.
====================
Link: https://lore.kernel.org/r/20211124232557.2039757-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 24 Nov 2021 23:25:57 +0000 (15:25 -0800)]
selftests: tls: test for correct proto_ops
Previous patch fixes overriding callbacks incorrectly. Triggering
the crash in sendpage_locked would be more spectacular but it's
hard to get to, so take the easier path of proving this is broken
and call getname. We're currently getting IPv4 socket info on an
IPv6 socket.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 24 Nov 2021 23:25:56 +0000 (15:25 -0800)]
tls: fix replacing proto_ops
We replace proto_ops whenever TLS is configured for RX. But our
replacement also overrides sendpage_locked, which will crash
unless TX is also configured. Similarly we plug both of those
in for TLS_HW (NIC crypto offload) even tho TLS_HW has a completely
different implementation for TX.
Last but not least we always plug in something based on inet_stream_ops
even though a few of the callbacks differ for IPv6 (getname, release,
bind).
Use a callback building method similar to what we do for struct proto.
Fixes:
c46234ebb4d1 ("tls: RX path for ktls")
Fixes:
d4ffb02dee2f ("net/tls: enable sk_msg redirect to tls socket egress")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 24 Nov 2021 23:25:55 +0000 (15:25 -0800)]
selftests: tls: test splicing decrypted records
Add tests for half-received and peeked records.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 24 Nov 2021 23:25:54 +0000 (15:25 -0800)]
tls: splice_read: fix accessing pre-processed records
recvmsg() will put peek()ed and partially read records onto the rx_list.
splice_read() needs to consult that list otherwise it may miss data.
Align with recvmsg() and also put partially-read records onto rx_list.
tls_sw_advance_skb() is pretty pointless now and will be removed in
net-next.
Fixes:
692d7b5d1f91 ("tls: Fix recvmsg() to be able to peek across multiple records")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 24 Nov 2021 23:25:53 +0000 (15:25 -0800)]
selftests: tls: test splicing cmsgs
Make sure we correctly reject splicing non-data records.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 24 Nov 2021 23:25:52 +0000 (15:25 -0800)]
tls: splice_read: fix record type check
We don't support splicing control records. TLS 1.3 changes moved
the record type check into the decrypt if(). The skb may already
be decrypted and still be an alert.
Note that decrypt_skb_update() is idempotent and updates ctx->decrypted
so the if() is pointless.
Reorder the check for decryption errors with the content type check
while touching them. This part is not really a bug, because if
decryption failed in TLS 1.3 content type will be DATA, and for
TLS 1.2 it will be correct. Nevertheless its strange to touch output
before checking if the function has failed.
Fixes:
fedf201e1296 ("net: tls: Refactor control message handling on recv")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 24 Nov 2021 23:25:51 +0000 (15:25 -0800)]
selftests: tls: add tests for handling of bad records
Test broken records.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 24 Nov 2021 23:25:50 +0000 (15:25 -0800)]
selftests: tls: factor out cmsg send/receive
Add helpers for sending and receiving special record types.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 24 Nov 2021 23:25:49 +0000 (15:25 -0800)]
selftests: tls: add helper for creating sock pairs
We have the same code 3 times, about to add a fourth copy.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ong Boon Leong [Wed, 24 Nov 2021 11:40:19 +0000 (19:40 +0800)]
net: stmmac: perserve TX and RX coalesce value during XDP setup
When XDP program is loaded, it is desirable that the previous TX and RX
coalesce values are not re-inited to its default value. This prevents
unnecessary re-configurig the coalesce values that were working fine
before.
Fixes:
ac746c8520d9 ("net: stmmac: enhance XDP ZC driver level switching performance")
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Tested-by: Kurt Kanzenbach <kurt@linutronix.de>
Link: https://lore.kernel.org/r/20211124114019.3949125-1-boon.leong.ong@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yang Yingliang [Wed, 24 Nov 2021 08:40:48 +0000 (16:40 +0800)]
tsnep: Add missing of_node_put() in tsnep_mdio_init()
The node pointer is returned by of_get_child_by_name() with
refcount incremented in tsnep_mdio_init(). Calling of_node_put()
to aovid the refcount leak in tsnep_mdio_init().
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20211124084048.175456-1-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tonghao Zhang [Thu, 25 Nov 2021 02:54:44 +0000 (10:54 +0800)]
veth: use ethtool_sprintf instead of snprintf
use ethtools api ethtool_sprintf to instead of snprintf.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Link: https://lore.kernel.org/r/20211125025444.13115-1-xiangxia.m.yue@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Wed, 24 Nov 2021 15:44:43 +0000 (15:44 +0000)]
net: macb: convert to phylink_generic_validate()
Populate the supported interfaces bitmap and MAC capabilities mask for
the macb driver and remove the old validate implementation.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1mpuRv-00D4rb-Lz@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit [Wed, 24 Nov 2021 20:44:40 +0000 (21:44 +0100)]
r8169: disable detection of chip version 60
It seems only XID 609 made it to the mass market. Therefore let's
disable detection of the other RTL8125a XID's. If nobody complains
we can remove support for RTL_GIGA_MAC_VER_60 later.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/2cd3df01-5f8b-08dd-6def-3f31a3014bde@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Fri, 26 Nov 2021 02:21:20 +0000 (18:21 -0800)]
Merge tag 'drm-fixes-2021-11-26' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"No idea if turkey comes before pull request processing, but here's the
regular week's fixes. A bunch for amdgpu, nouveau adds support for a
new GPU (like a PCI ID addition), and a scattering of fixes across
i915/hyperv/aspeed/vc4.
Specifics:
amdgpu:
- SRIOV fixes
- dma-buf double free fix
- Display fixes for GPU resets
- Fix DSC powergating regression
- GPU TSC fixes
- Interrupt handler overflow fixes
- Endian fix in IP discovery table handling
- Aldebaran ASPM fix
- Fix overclocking regression on older asics
- Backlight/ACPI fix
amdkfd:
- SVM fixes
- VMA removal race fix
hyperv:
- removal fix
aspeed:
- vga_pw sysfs file fix
vc4:
- error checking fix
nouveau:
- support GA106
- fix a few error checks
i915:
- fix wakeref handling around PXP suspend"
* tag 'drm-fixes-2021-11-26' of git://anongit.freedesktop.org/drm/drm: (25 commits)
drm/amd/display: update bios scratch when setting backlight
drm/amdgpu/pm: fix powerplay OD interface
drm/amdgpu: Skip ASPM programming on aldebaran
drm/amdgpu: fix byteorder error in amdgpu discovery
drm/amdgpu: enable Navi retry fault wptr overflow
drm/amdgpu: enable Navi 48-bit IH timestamp counter
drm/amdkfd: simplify drain retry fault
drm/amdkfd: handle VMA remove race
drm/amdkfd: process exit and retry fault race
drm/amdgpu: IH process reset count when restart
drm/amdgpu/gfx9: switch to golden tsc registers for renoir+
drm/amdgpu/gfx10: add wraparound gpu counter check for APUs as well
drm/amdgpu: move kfd post_reset out of reset_sriov function
drm/amd/display: Fixed DSC would not PG after removing DSC stream
drm/amd/display: Reset link encoder assignments for GPU reset
drm/amd/display: Set plane update flags for all planes in reset
drm/amd/display: Fix DPIA outbox timeout after GPU reset
drm/amdgpu: Fix double free of dmabuf
drm/amdgpu: Fix MMIO HDP flush on SRIOV
drm/i915/gt: Hold RPM wakelock during PXP suspend
...
Dave Airlie [Fri, 26 Nov 2021 00:36:06 +0000 (10:36 +1000)]
Merge tag 'drm-intel-fixes-2021-11-24' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
Fix wakeref handling of PXP suspend.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YZ65bsPOK+6JLv0d@intel.com
Dave Airlie [Fri, 26 Nov 2021 00:08:10 +0000 (10:08 +1000)]
Merge tag 'drm-misc-fixes-2021-11-25' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
One removal fix for hyperv, one fix in aspeed for the vga_pw sysfs file
content, one error-checking fix for vc4 and two fixes for nouveau, one
to support a new device and another one to properly check for errors.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20211125101819.ynu7zgbs7yfwedri@houat
Dave Airlie [Thu, 25 Nov 2021 23:24:15 +0000 (09:24 +1000)]
Merge tag 'amd-drm-fixes-5.16-2021-11-24' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-5.16-2021-11-24:
amdgpu:
- SRIOV fixes
- dma-buf double free fix
- Display fixes for GPU resets
- Fix DSC powergating regression
- GPU TSC fixes
- Interrupt handler overflow fixes
- Endian fix in IP discovery table handling
- Aldebaran ASPM fix
- Fix overclocking regression on older asics
- Backlight/ACPI fix
amdkfd:
- SVM fixes
- VMA removal race fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211124212056.6327-1-alexander.deucher@amd.com
Linus Torvalds [Thu, 25 Nov 2021 19:06:05 +0000 (11:06 -0800)]
Merge tag 'block-5.16-2021-11-25' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- NVMe pull request via Christoph:
- Add a NO APST quirk for a Kioxia device (Enzo Matsumiya)
- Fix write zeroes pi (Klaus Jensen)
- Various TCP transport fixes (Maurizio Lombardi and Varun
Prakash)
- Ignore invalid fast_io_fail_tmo values (Maurizio Lombardi)
- Use IOCB_NOWAIT only if the filesystem supports it (Maurizio
Lombardi)
- Module loading fix (Ming)
- Kerneldoc warning fix (Yang)
* tag 'block-5.16-2021-11-25' of git://git.kernel.dk/linux-block:
block: fix parameter not described warning
nvmet: use IOCB_NOWAIT only if the filesystem supports it
nvme: fix write zeroes pi
nvme-fabrics: ignore invalid fast_io_fail_tmo values
nvme-pci: add NO APST quirk for Kioxia device
nvme-tcp: fix memory leak when freeing a queue
nvme-tcp: validate R2T PDU in nvme_tcp_handle_r2t()
nvmet-tcp: fix incomplete data digest send
nvmet-tcp: fix memory leak when performing a controller reset
nvmet-tcp: add an helper to free the cmd buffers
nvmet-tcp: fix a race condition between release_queue and io_work
block: avoid to touch unloaded module instance when opening bdev
Linus Torvalds [Thu, 25 Nov 2021 18:57:45 +0000 (10:57 -0800)]
Merge tag 'io_uring-5.16-2021-11-25' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"A locking fix for link traversal, and fixing up an outdated function
name in a comment"
* tag 'io_uring-5.16-2021-11-25' of git://git.kernel.dk/linux-block:
io_uring: correct link-list traversal locking
io_uring: fix missed comment from *task_file rename
Linus Torvalds [Thu, 25 Nov 2021 18:48:14 +0000 (10:48 -0800)]
Merge tag '5.16-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Four small cifs/smb3 fixes:
- two multichannel fixes
- fix problem noted by kernel test robot
- update internal version number"
* tag '5.16-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update internal version number
smb2: clarify rc initialization in smb2_reconnect
cifs: populate server_hostname for extra channels
cifs: nosharesock should be set on new server
Linus Torvalds [Thu, 25 Nov 2021 18:41:28 +0000 (10:41 -0800)]
Merge tag 'asm-generic-5.16-2' of git://git./linux/kernel/git/arnd/asm-generic
Pull asm-generic syscall table update from Arnd Bergmann:
"André Almeida sends an update for the newly added futex_waitv syscall
that was initially only added to a few architectures.
Some additional ones have since made it through architecture
maintainer trees, this finishes the remaining ones"
* tag 'asm-generic-5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
futex: Wireup futex_waitv syscall
Linus Torvalds [Thu, 25 Nov 2021 18:31:37 +0000 (10:31 -0800)]
Merge tag 'arm-fixes-5.16-2' of git://git./linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"There are only a few devicetree fixes this time:
- one outdated devicetree property that slipped into the newly added
ExynosAutov9 support
- three changes to Broadcom SoCs that had incorrect number values for
interrupts or irqchips.
In the MAINTAINERS file, Nishanth Menon gets listed for TI K3 SoCs,
while Taichi Sugaya and Takao Orito take ownership of the Socionext
Milbeaut platform.
All other changes are for SoC specific drivers, fixing:
- A missing NULL pointer check in the mediatek memory driver
- An integer overflow issue in the Arm smccc firwmare interface
- A false-positive fortify-source check
- Error handling fixes for optee and smci
- Incorrect message format in one SCMI call"
* tag 'arm-fixes-5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
memory: mtk-smi: Fix a null dereference for the ostd
arm64: dts: exynos: drop samsung,ufs-shareability-reg-offset in ExynosAutov9
MAINTAINERS: Update maintainer entry for keystone platforms
MAINTAINERS: Add entry to MAINTAINERS for Milbeaut
firmware: smccc: Fix check for ARCH_SOC_ID not implemented
ARM: socfpga: Fix crash with CONFIG_FORTIRY_SOURCE
firmware: arm_scmi: Fix type error assignment in voltage protocol
firmware: arm_scmi: Fix type error in sensor protocol
firmware: arm_scmi: pm: Propagate return value to caller
firmware: arm_scmi: Fix base agent discover response
optee: fix kfree NULL pointer
ARM: dts: bcm2711: Fix PCIe interrupts
ARM: dts: BCM5301X: Add interrupt properties to GPIO node
ARM: dts: BCM5301X: Fix I2C controller interrupt
firmware: arm_scmi: Fix null de-reference on error path
Linus Torvalds [Thu, 25 Nov 2021 18:13:56 +0000 (10:13 -0800)]
Merge tag 'folio-5.16b' of git://git.infradead.org/users/willy/pagecache
Pull folio fixes from Matthew Wilcox:
"In the course of preparing the folio changes for iomap for next merge
window, we discovered some problems that would be nice to address now:
- Renaming multi-page folios to large folios.
mapping_multi_page_folio_support() is just a little too long, so we
settled on mapping_large_folio_support(). That meant renaming, eg
folio_test_multi() to folio_test_large().
Rename AS_THP_SUPPORT to match
- I hadn't included folio wrappers for zero_user_segments(), etc.
Also, multi-page^W^W large folio support is now independent of
CONFIG_TRANSPARENT_HUGEPAGE, so machines with HIGHMEM always need
to fall back to the out-of-line zero_user_segments().
Remove FS_THP_SUPPORT to match
- The build bots finally got round to telling me that I missed a
couple of architectures when adding flush_dcache_folio(). Christoph
suggested that we just add linux/cacheflush.h and not rely on
asm-generic/cacheflush.h"
* tag 'folio-5.16b' of git://git.infradead.org/users/willy/pagecache:
mm: Add functions to zero portions of a folio
fs: Rename AS_THP_SUPPORT and mapping_thp_support
fs: Remove FS_THP_SUPPORT
mm: Remove folio_test_single
mm: Rename folio_test_multi to folio_test_large
Add linux/cacheflush.h
Yang Guang [Thu, 25 Nov 2021 16:20:55 +0000 (00:20 +0800)]
block: fix parameter not described warning
The build warning:
block/blk-core.c:968: warning: Function parameter or member 'iob'
not described in 'bio_poll'.
Fixes:
5a72e899ceb4 ("block: add a struct io_comp_batch argument to fops->iopoll()")
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Yang Guang <yang.guang5@zte.com.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>