linux-2.6-microblaze.git
2 years agoMerge branch 'dm9051'
David S. Miller [Mon, 14 Feb 2022 11:18:47 +0000 (11:18 +0000)]
Merge branch 'dm9051'

Joseph CHAMG says:

====================
ADD DM9051 ETHERNET DRIVER

DM9051 is a spi interface chip,
need cs/mosi/miso/clock with an interrupt gpio pin
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: Add dm9051 driver
Joseph CHAMG [Fri, 11 Feb 2022 09:27:56 +0000 (17:27 +0800)]
net: Add dm9051 driver

Add davicom dm9051 spi ethernet driver, The driver work for the
device platform which has the spi master

Signed-off-by: Joseph CHAMG <josright123@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agodt-bindings: net: Add Davicom dm9051 SPI ethernet controller
Joseph CHAMG [Fri, 11 Feb 2022 09:27:55 +0000 (17:27 +0800)]
dt-bindings: net: Add Davicom dm9051 SPI ethernet controller

This is a new yaml base data file for configure davicom dm9051 with
device tree

Signed-off-by: Joseph CHAMG <josright123@gmail.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet/smc: Add comment for smc_tx_pending
Tony Lu [Fri, 11 Feb 2022 06:52:21 +0000 (14:52 +0800)]
net/smc: Add comment for smc_tx_pending

The previous patch introduces a lock-free version of smc_tx_work() to
solve unnecessary lock contention, which is expected to be held lock.
So this adds comment to remind people to keep an eye out for locks.

Suggested-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoGenerate netlink notification when default IPv6 route preference changes
Kalash Nainwal [Thu, 10 Feb 2022 22:09:35 +0000 (14:09 -0800)]
Generate netlink notification when default IPv6 route preference changes

Generate RTM_NEWROUTE netlink notification when the route preference
 changes on an existing kernel generated default route in response to
 RA messages. Currently netlink notifications are generated only when
 this route is added or deleted but not when the route preference
 changes, which can cause userspace routing application state to go
 out of sync with kernel.

Signed-off-by: Kalash Nainwal <kalash@arista.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet/sched: act_police: more accurate MTU policing
Davide Caratti [Thu, 10 Feb 2022 17:56:08 +0000 (18:56 +0100)]
net/sched: act_police: more accurate MTU policing

in current Linux, MTU policing does not take into account that packets at
the TC ingress have the L2 header pulled. Thus, the same TC police action
(with the same value of tcfp_mtu) behaves differently for ingress/egress.
In addition, the full GSO size is compared to tcfp_mtu: as a consequence,
the policer drops GSO packets even when individual segments have the L2 +
L3 + L4 + payload length below the configured valued of tcfp_mtu.

Improve the accuracy of MTU policing as follows:
 - account for mac_len for non-GSO packets at TC ingress.
 - compare MTU threshold with the segmented size for GSO packets.
Also, add a kselftest that verifies the correct behavior.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoetherdevice: Adjust ether_addr* prototypes to silence -Wstringop-overead
Kees Cook [Sat, 12 Feb 2022 17:14:49 +0000 (09:14 -0800)]
etherdevice: Adjust ether_addr* prototypes to silence -Wstringop-overead

With GCC 12, -Wstringop-overread was warning about an implicit cast from
char[6] to char[8]. However, the extra 2 bytes are always thrown away,
alignment doesn't matter, and the risk of hitting the edge of unallocated
memory has been accepted, so this prototype can just be converted to a
regular char *. Silences:

net/core/dev.c: In function ‘bpf_prog_run_generic_xdp’: net/core/dev.c:4618:21: warning: ‘ether_addr_equal_64bits’ reading 8 bytes from a region of size 6 [-Wstringop-overread]
 4618 |         orig_host = ether_addr_equal_64bits(eth->h_dest, > skb->dev->dev_addr);
      |                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
net/core/dev.c:4618:21: note: referencing argument 1 of type ‘const u8[8]’ {aka ‘const unsigned char[8]’}
net/core/dev.c:4618:21: note: referencing argument 2 of type ‘const u8[8]’ {aka ‘const unsigned char[8]’}
In file included from net/core/dev.c:91: include/linux/etherdevice.h:375:20: note: in a call to function ‘ether_addr_equal_64bits’
  375 | static inline bool ether_addr_equal_64bits(const u8 addr1[6+2],
      |                    ^~~~~~~~~~~~~~~~~~~~~~~

Reported-by: Marc Kleine-Budde <mkl@pengutronix.de>
Tested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Link: https://lore.kernel.org/netdev/20220212090811.uuzk6d76agw2vv73@pengutronix.de
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: lan966x: Fix when CONFIG_IPV6 is not set
Horatiu Vultur [Sat, 12 Feb 2022 20:03:43 +0000 (21:03 +0100)]
net: lan966x: Fix when CONFIG_IPV6 is not set

When CONFIG_IPV6 is not set, then the linking of the lan966x driver
fails with the following error:

drivers/net/ethernet/microchip/lan966x/lan966x_main.c:444: undefined
reference to `ipv6_mc_check_mld'

The fix consists in adding a check also for IS_ENABLED(CONFIG_IPV6)

Fixes: 47aeea0d57e80c ("net: lan966x: Implement the callback SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: lan966x: Fix when CONFIG_PTP_1588_CLOCK is compiled as module
Horatiu Vultur [Sat, 12 Feb 2022 20:45:44 +0000 (21:45 +0100)]
net: lan966x: Fix when CONFIG_PTP_1588_CLOCK is compiled as module

When CONFIG_PTP_1588_CLOCK is compiled as a module, then the linking of
the lan966x fails because it can't find references to the following
functions 'ptp_clock_index', 'ptp_clock_register' and
'ptp_clock_unregister'

The fix consists in adding CONFIG_PTP_1588_CLOCK_OPTIONAL as a
dependency for the driver.

Fixes: d096459494a887 ("net: lan966x: Add support for ptp clocks")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'lan743x-enhancements'
David S. Miller [Sun, 13 Feb 2022 12:07:26 +0000 (12:07 +0000)]
Merge branch 'lan743x-enhancements'

Raju Lakkaraju says:

====================
net: lan743x: PCI11010 / PCI11414 devices Enhancements

This patch series adds support of the Ethernet function of the PCI11010 / PCI11414 devices to the LAN743x driver.
The PCI1xxxx family of devices consists of a PCIe switch with a variety of embedded PCI endpoints on its downstream ports.
The PCI11010 / PCI11414 devices include an Ethernet 10/100/1000/2500 function as one of those embedded endpoints.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: lan743x: Add support for Clause-45 MDIO PHY management
Raju Lakkaraju [Sat, 12 Feb 2022 15:53:15 +0000 (21:23 +0530)]
net: lan743x: Add support for Clause-45 MDIO PHY management

Add support for Clause-45 MDIO PHY management

Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: lan743x: Add support for SGMII interface
Raju Lakkaraju [Sat, 12 Feb 2022 15:53:14 +0000 (21:23 +0530)]
net: lan743x: Add support for SGMII interface

This change facilitates the selection between SGMII and (R)GIII
interfaces

Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: lan743x: Increase MSI(x) vectors to 16 and Int de-assertion timers to 10
Raju Lakkaraju [Sat, 12 Feb 2022 15:53:13 +0000 (21:23 +0530)]
net: lan743x: Increase MSI(x) vectors to 16 and Int de-assertion timers to 10

Increase MSI / MSI-X vectors supported from 8 to 16 and
Interrupt De-assertion timers from 8 to 10

Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: lan743x: Add support for 4 Tx queues
Raju Lakkaraju [Sat, 12 Feb 2022 15:53:12 +0000 (21:23 +0530)]
net: lan743x: Add support for 4 Tx queues

Add support for 4 Tx queues

Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: lan743x: Add PCI11010 / PCI11414 device IDs
Raju Lakkaraju [Sat, 12 Feb 2022 15:53:11 +0000 (21:23 +0530)]
net: lan743x: Add PCI11010 / PCI11414 device IDs

PCI11010/PCI11414 devices are enhancement of Ethernet LAN743x chip family.

Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: wwan: iosm: Enable M.2 7360 WWAN card support
M Chetan Kumar [Thu, 10 Feb 2022 15:34:45 +0000 (21:04 +0530)]
net: wwan: iosm: Enable M.2 7360 WWAN card support

This patch enables Intel M.2 7360 WWAN card support on
IOSM Driver.

Control path implementation is a reuse whereas data path
implementation it uses a different protocol called as MUX
Aggregation. The major portion of this patch covers the MUX
Aggregation protocol implementation used for IP traffic
communication.

For M.2 7360 WWAN card, driver exposes 2 wwan AT ports for
control communication.  The user space application or the
modem manager to use wwan AT port for data path establishment.

During probe, driver reads the mux protocol device capability
register to know the mux protocol version supported by device.
Base on which the right mux protocol is initialized for data
path communication.

An overview of an Aggregation Protocol
1>  An IP packet is encapsulated with 16 octet padding header
    to form a Datagram & the start offset of the Datagram is
    indexed into Datagram Header (DH).
2>  Multiple such Datagrams are composed & the start offset of
    each DH is indexed into Datagram Table Header (DTH).
3>  The Datagram Table (DT) is IP session specific & table_length
    item in DTH holds the number of composed datagram pertaining
    to that particular IP session.
4>  And finally the offset of first DTH is indexed into DBH (Datagram
    Block Header).

So in TX/RX flow Datagram Block (Datagram Block Header + Payload)is
exchanged between driver & device.

Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoRevert "net: ethernet: cavium: use div64_u64() instead of do_div()"
Jakub Kicinski [Fri, 11 Feb 2022 02:05:44 +0000 (18:05 -0800)]
Revert "net: ethernet: cavium: use div64_u64() instead of do_div()"

This reverts commit 038fcdaf0470de89619bc4cc199e329391e6566c.

Christophe points out div64_u64() and do_div() have different
calling conventions. One updates the param, the other returns
the result.

Reported-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/all/056a7276-c6f0-cd7e-9e46-1d8507a0b6b1@wanadoo.fr/
Fixes: 038fcdaf0470 ("net: ethernet: cavium: use div64_u64() instead of do_div()")
Link: https://lore.kernel.org/r/20220211020544.3262694-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: moxa: use GFP_KERNEL
Julia Lawall [Thu, 10 Feb 2022 20:42:15 +0000 (21:42 +0100)]
net: moxa: use GFP_KERNEL

Platform_driver probe functions aren't called with locks
held and thus don't need GFP_ATOMIC. Use GFP_KERNEL instead.

Problem found with Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Link: https://lore.kernel.org/r/20220210204223.104181-1-Julia.Lawall@inria.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoocteontx2-af: fix array bound error
Hariprasad Kelam [Fri, 11 Feb 2022 15:55:39 +0000 (21:25 +0530)]
octeontx2-af: fix array bound error

This patch fixes below error by using proper data type.

drivers/net/ethernet/marvell/octeontx2/af/rpm.c: In function
'rpm_cfg_pfc_quanta_thresh':
include/linux/find.h:40:23: error: array subscript 'long unsigned
int[0]' is partly outside array bounds of 'u16[1]' {aka 'short unsigned
int[1]'} [-Werror=array-bounds]
   40 |                 val = *addr & GENMASK(size - 1, offset);

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Link: https://lore.kernel.org/r/20220211155539.13931-1-hkelam@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 'wireless-next-2022-02-11' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Fri, 11 Feb 2022 14:19:23 +0000 (14:19 +0000)]
Merge tag 'wireless-next-2022-02-11' of git://git./linux/kernel/git/wireless/wireless-next

wireless-next patches for v5.18

First set of patches for v5.18, with both wireless and stack patches.
rtw89 now has AP mode support and wcn36xx has survey support. But
otherwise pretty normal.

Major changes:

ath11k

* add LDPC FEC type in 802.11 radiotap header

* enable RX PPDU stats in monitor co-exist mode

wcn36xx

* implement survey reporting

brcmfmac

* add CYW43570 PCIE device

rtw88

* rtw8821c: enable RFE 6 devices

rtw89

* AP mode support

mt76

* mt7916 support

* background radar detection support

2 years agoMerge branch 'ipv6-loopback'
David S. Miller [Fri, 11 Feb 2022 11:44:27 +0000 (11:44 +0000)]
Merge branch 'ipv6-loopback'

Eric Dumazet says:

====================
ipv6: remove addrconf reliance on loopback

Second patch in this series removes IPv6 requirement about the netns
loopback device being the last device being dismantled.

This was needed because rt6_uncached_list_flush_dev()
and ip6_dst_ifdown() had to switch dst dev to a known
device (loopback).

Instead of loopback, we can use the (hidden) blackhole_netdev
which is also always there.

This will allow future simplfications of netdev_run_to()
and other parts of the stack like default_device_exit_batch().

Last two patches are optimizations for both IP families.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoipv4: add (struct uncached_list)->quarantine list
Eric Dumazet [Thu, 10 Feb 2022 21:42:31 +0000 (13:42 -0800)]
ipv4: add (struct uncached_list)->quarantine list

This is an optimization to keep the per-cpu lists as short as possible:

Whenever rt_flush_dev() changes one rtable dst.dev
matching the disappearing device, it can can transfer the object
to a quarantine list, waiting for a final rt_del_uncached_list().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoipv6: add (struct uncached_list)->quarantine list
Eric Dumazet [Thu, 10 Feb 2022 21:42:30 +0000 (13:42 -0800)]
ipv6: add (struct uncached_list)->quarantine list

This is an optimization to keep the per-cpu lists as short as possible:

Whenever rt6_uncached_list_flush_dev() changes one rt6_info
matching the disappearing device, it can can transfer the object
to a quarantine list, waiting for a final rt6_uncached_list_del().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoipv6: give an IPv6 dev to blackhole_netdev
Eric Dumazet [Thu, 10 Feb 2022 21:42:29 +0000 (13:42 -0800)]
ipv6: give an IPv6 dev to blackhole_netdev

IPv6 addrconf notifiers wants the loopback device to
be the last device being dismantled at netns deletion.

This caused many limitations and work arounds.

Back in linux-5.3, Mahesh added a per host blackhole_netdev
that can be used whenever we need to make sure objects no longer
refer to a disappearing device.

If we attach to blackhole_netdev an ip6_ptr (allocate an idev),
then we can use this special device (which is never freed)
in place of the loopback_dev (which can be freed).

This will permit improvements in netdev_run_todo() and other parts
of the stack where had steps to make sure loopback_dev was
the last device to disappear.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoipv6: get rid of net->ipv6.rt6_stats->fib_rt_uncache
Eric Dumazet [Thu, 10 Feb 2022 21:42:28 +0000 (13:42 -0800)]
ipv6: get rid of net->ipv6.rt6_stats->fib_rt_uncache

This counter has never been visible, there is little point
trying to maintain it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agodsa: mv88e6xxx: make serdes SGMII/Fiber tx amplitude configurable
Holger Brunck [Thu, 10 Feb 2022 17:48:23 +0000 (18:48 +0100)]
dsa: mv88e6xxx: make serdes SGMII/Fiber tx amplitude configurable

The mv88e6352, mv88e6240 and mv88e6176  have a serdes interface. This patch
allows to configure the output swing to a desired value in the
phy-handle of the port. The value which is peak to peak has to be
specified in microvolts. As the chips only supports eight dedicated
values we return EINVAL if the value in the DTS does not match one of
these values.

Signed-off-by: Holger Brunck <holger.brunck@hitachienergy.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agodt-bindings: phy: Add `tx-p2p-microvolt` property binding
Marek Behún [Thu, 10 Feb 2022 17:48:22 +0000 (18:48 +0100)]
dt-bindings: phy: Add `tx-p2p-microvolt` property binding

Common PHYs and network PCSes often have the possibility to specify
peak-to-peak voltage on the differential pair - the default voltage
sometimes needs to be changed for a particular board.

Add properties `tx-p2p-microvolt` and `tx-p2p-microvolt-names` for this
purpose. The second property is needed to specify the mode for the
corresponding voltage in the `tx-p2p-microvolt` property, if the voltage
is to be used only for speficic mode. More voltage-mode pairs can be
specified.

Example usage with only one voltage (it will be used for all supported
PHY modes, the `tx-p2p-microvolt-names` property is not needed in this
case):

  tx-p2p-microvolt = <915000>;

Example usage with voltages for multiple modes:

  tx-p2p-microvolt = <915000>, <1100000>, <1200000>;
  tx-p2p-microvolt-names = "2500base-x", "usb", "pcie";

Add these properties into a separate file phy/transmit-amplitude.yaml,
which should be referenced by any binding that uses it.

Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoipv6: Reject routes configurations that specify dsfield (tos)
Guillaume Nault [Thu, 10 Feb 2022 15:08:08 +0000 (16:08 +0100)]
ipv6: Reject routes configurations that specify dsfield (tos)

The ->rtm_tos option is normally used to route packets based on both
the destination address and the DS field. However it's ignored for
IPv6 routes. Setting ->rtm_tos for IPv6 is thus invalid as the route
is going to work only on the destination address anyway, so it won't
behave as specified.

Suggested-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'dsa-cleanup'
David S. Miller [Fri, 11 Feb 2022 11:17:33 +0000 (11:17 +0000)]
Merge branch 'dsa-cleanup'

Vladimir Oltean says:

====================
More aggressive DSA cleanup

This series deletes some code which is apparently not needed.

I've had these patches in my tree for a while, and testing on my boards
didn't reveal any issues.

Compared to the RFC v1 series, the only change is the addition of patch 3.
https://patchwork.kernel.org/project/netdevbpf/cover/20220107184842.550334-1-vladimir.oltean@nxp.com/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dsa: remove lockdep class for DSA slave address list
Vladimir Oltean [Thu, 10 Feb 2022 13:45:00 +0000 (15:45 +0200)]
net: dsa: remove lockdep class for DSA slave address list

Since commit 2f1e8ea726e9 ("net: dsa: link interfaces with the DSA
master to get rid of lockdep warnings"), suggested by Cong Wang, the
DSA interfaces and their master have different dev->nested_level, which
makes netif_addr_lock() stop complaining about potentially recursive
locking on the same lock class.

So we no longer need DSA slave interfaces to have their own lockdep
class.

Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dsa: remove lockdep class for DSA master address list
Vladimir Oltean [Thu, 10 Feb 2022 13:44:59 +0000 (15:44 +0200)]
net: dsa: remove lockdep class for DSA master address list

Since commit 2f1e8ea726e9 ("net: dsa: link interfaces with the DSA
master to get rid of lockdep warnings"), suggested by Cong Wang, the
DSA interfaces and their master have different dev->nested_level, which
makes netif_addr_lock() stop complaining about potentially recursive
locking on the same lock class.

So we no longer need DSA masters to have their own lockdep class.

Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dsa: remove ndo_get_phys_port_name and ndo_get_port_parent_id
Vladimir Oltean [Thu, 10 Feb 2022 13:44:58 +0000 (15:44 +0200)]
net: dsa: remove ndo_get_phys_port_name and ndo_get_port_parent_id

There are no legacy ports, DSA registers a devlink instance with ports
unconditionally for all switch drivers. Therefore, delete the old-style
ndo operations used for determining bridge forwarding domains.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'smc-optimizations'
David S. Miller [Fri, 11 Feb 2022 11:14:58 +0000 (11:14 +0000)]
Merge branch 'smc-optimizations'

D. Wythe says:

====================
net/smc: Optimizing performance in short-lived scenarios

This patch set aims to optimizing performance of SMC in short-lived
links scenarios, which is quite unsatisfactory right now.

In our benchmark, we test it with follow scripts:

./wrk -c 10000 -t 4 -H 'Connection: Close' -d 20 http://smc-server

Current performance figures like that:

Running 20s test @ http://11.213.45.6
  4 threads and 10000 connections
  4956 requests in 20.06s, 3.24MB read
  Socket errors: connect 0, read 0, write 672, timeout 0
Requests/sec:    247.07
Transfer/sec:    165.28KB

There are many reasons for this phenomenon, this patch set doesn't
solve it all though, but it can be well alleviated with it in.

Patch 1/5  (Make smc_tcp_listen_work() independent) :

Separate smc_tcp_listen_work() from smc_listen_work(), make them
independent of each other, the busy SMC handshake can not affect new TCP
connections visit any more. Avoid discarding a large number of TCP
connections after being overstock, which is undoubtedly raise the
connection establishment time.

Patch 2/5 (Limit SMC backlog connections):

Since patch 1 has separated smc_tcp_listen_work() from
smc_listen_work(), an unrestricted TCP accept have come into being. This
patch try to put a limit on SMC backlog connections refers to
implementation of TCP.

Patch 3/5 (Limit SMC visits when handshake workqueue congested):

Considering the complexity of SMC handshake right now, in short-lived
links scenarios, this may not be the main scenario of SMC though, it's
performance is still quite poor. This patch try to provide constraint on
SMC handshake when handshake workqueue congested, which is the sign of
SMC handshake stacking in our opinion.

Patch 4/5 (Dynamic control handshake limitation by socket options)

This patch allow applications dynamically control the ability of SMC
handshake limitation. Since SMC don't support set SMC socket option
before,
this patch also have to support SMC's owns socket options.

Patch 5/5 (Add global configure for handshake limitation by netlink)

This patch provides a way to get benefit of handshake limitation
without
modifying any code for applications, which is quite useful for most
existing applications.

After this patch set, performance figures like that:

Running 20s test @ http://11.213.45.6
  4 threads and 10000 connections
  693253 requests in 20.10s, 452.88MB read
Requests/sec:  34488.13
Transfer/sec:     22.53MB

That's a quite well performance improvement, about to 6 to 7 times in my
environment.
---
changelog:
v1 -> v2:
- fix compile warning
- fix invalid dependencies in kconfig
v2 -> v3:
- correct spelling mistakes
- fix useless variable declare
v3 -> v4
- make smc_tcp_ls_wq be static
v4 -> v5
- add dynamic control for SMC auto fallback by socket options
- add global configure for SMC auto fallback through netlink
v5 -> v6
- move auto fallback to net namespace scope
- remove auto fallback attribute in SMC_GEN_SYS_INFO
- add independent attributes for auto fallback
v6 -> v7
- fix wording and the naming issues, rename 'auto fallback' to handshake
  limitation.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet/smc: Add global configure for handshake limitation by netlink
D. Wythe [Thu, 10 Feb 2022 09:11:38 +0000 (17:11 +0800)]
net/smc: Add global configure for handshake limitation by netlink

Although we can control SMC handshake limitation through socket options,
which means that applications who need it must modify their code. It's
quite troublesome for many existing applications. This patch modifies
the global default value of SMC handshake limitation through netlink,
providing a way to put constraint on handshake without modifies any code
for applications.

Suggested-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet/smc: Dynamic control handshake limitation by socket options
D. Wythe [Thu, 10 Feb 2022 09:11:37 +0000 (17:11 +0800)]
net/smc: Dynamic control handshake limitation by socket options

This patch aims to add dynamic control for SMC handshake limitation for
every smc sockets, in production environment, it is possible for the
same applications to handle different service types, and may have
different opinion on SMC handshake limitation.

This patch try socket options to complete it, since we don't have socket
option level for SMC yet, which requires us to implement it at the same
time.

This patch does the following:

- add new socket option level: SOL_SMC.
- add new SMC socket option: SMC_LIMIT_HS.
- provide getter/setter for SMC socket options.

Link: https://lore.kernel.org/all/20f504f961e1a803f85d64229ad84260434203bd.1644323503.git.alibuda@linux.alibaba.com/
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet/smc: Limit SMC visits when handshake workqueue congested
D. Wythe [Thu, 10 Feb 2022 09:11:36 +0000 (17:11 +0800)]
net/smc: Limit SMC visits when handshake workqueue congested

This patch intends to provide a mechanism to put constraint on SMC
connections visit according to the pressure of SMC handshake process.
At present, frequent visits will cause the incoming connections to be
backlogged in SMC handshake queue, raise the connections established
time. Which is quite unacceptable for those applications who base on
short lived connections.

There are two ways to implement this mechanism:

1. Put limitation after TCP established.
2. Put limitation before TCP established.

In the first way, we need to wait and receive CLC messages that the
client will potentially send, and then actively reply with a decline
message, in a sense, which is also a sort of SMC handshake, affect the
connections established time on its way.

In the second way, the only problem is that we need to inject SMC logic
into TCP when it is about to reply the incoming SYN, since we already do
that, it's seems not a problem anymore. And advantage is obvious, few
additional processes are required to complete the constraint.

This patch use the second way. After this patch, connections who beyond
constraint will not informed any SMC indication, and SMC will not be
involved in any of its subsequent processes.

Link: https://lore.kernel.org/all/1641301961-59331-1-git-send-email-alibuda@linux.alibaba.com/
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet/smc: Limit backlog connections
D. Wythe [Thu, 10 Feb 2022 09:11:35 +0000 (17:11 +0800)]
net/smc: Limit backlog connections

Current implementation does not handling backlog semantics, one
potential risk is that server will be flooded by infinite amount
connections, even if client was SMC-incapable.

This patch works to put a limit on backlog connections, referring to the
TCP implementation, we divides SMC connections into two categories:

1. Half SMC connection, which includes all TCP established while SMC not
connections.

2. Full SMC connection, which includes all SMC established connections.

For half SMC connection, since all half SMC connections starts with TCP
established, we can achieve our goal by put a limit before TCP
established. Refer to the implementation of TCP, this limits will based
on not only the half SMC connections but also the full connections,
which is also a constraint on full SMC connections.

For full SMC connections, although we know exactly where it starts, it's
quite hard to put a limit before it. The easiest way is to block wait
before receive SMC confirm CLC message, while it's under protection by
smc_server_lgr_pending, a global lock, which leads this limit to the
entire host instead of a single listen socket. Another way is to drop
the full connections, but considering the cast of SMC connections, we
prefer to keep full SMC connections.

Even so, the limits of full SMC connections still exists, see commits
about half SMC connection below.

After this patch, the limits of backend connection shows like:

For SMC:

1. Client with SMC-capability can makes 2 * backlog full SMC connections
   or 1 * backlog half SMC connections and 1 * backlog full SMC
   connections at most.

2. Client without SMC-capability can only makes 1 * backlog half TCP
   connections and 1 * backlog full TCP connections.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet/smc: Make smc_tcp_listen_work() independent
D. Wythe [Thu, 10 Feb 2022 09:11:34 +0000 (17:11 +0800)]
net/smc: Make smc_tcp_listen_work() independent

In multithread and 10K connections benchmark, the backend TCP connection
established very slowly, and lots of TCP connections stay in SYN_SENT
state.

Client: smc_run wrk -c 10000 -t 4 http://server

the netstate of server host shows like:
    145042 times the listen queue of a socket overflowed
    145042 SYNs to LISTEN sockets dropped

One reason of this issue is that, since the smc_tcp_listen_work() shared
the same workqueue (smc_hs_wq) with smc_listen_work(), while the
smc_listen_work() do blocking wait for smc connection established. Once
the workqueue became congested, it's will block the accept() from TCP
listen.

This patch creates a independent workqueue(smc_tcp_ls_wq) for
smc_tcp_listen_work(), separate it from smc_listen_work(), which is
quite acceptable considering that smc_tcp_listen_work() runs very fast.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agodt-bindings: net: dsa: realtek: convert to YAML schema, add MDIO
Luiz Angelo Daros de Luca [Wed, 9 Feb 2022 18:41:16 +0000 (15:41 -0300)]
dt-bindings: net: dsa: realtek: convert to YAML schema, add MDIO

Schema changes:

- support for mdio-connected switches (mdio driver), recognized by
  checking the presence of property "reg"
- new compatible strings for rtl8367s and rtl8367rb
- "interrupt-controller" was not added as a required property. It might
  still work polling the ports when missing.

Examples changes:

- renamed "switch_intc" to make it unique between examples
- removed "dsa-mdio" from mdio compatible property
- renamed phy@0 to ethernet-phy@0 (not tested with real HW)
  phy@ requires #phy-cells

Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Fri, 11 Feb 2022 01:29:56 +0000 (17:29 -0800)]
Merge git://git./linux/kernel/git/netdev/net

No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 'net-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Fri, 11 Feb 2022 00:01:22 +0000 (16:01 -0800)]
Merge tag 'net-5.17-rc4' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from netfilter and can.

Current release - new code bugs:

   - sparx5: fix get_stat64 out-of-bound access and crash

   - smc: fix netdev ref tracker misuse

  Previous releases - regressions:

   - eth: ixgbevf: require large buffers for build_skb on 82599VF, avoid
     overflows

   - eth: ocelot: fix all IP traffic getting trapped to CPU with PTP
     over IP

   - bonding: fix rare link activation misses in 802.3ad mode

  Previous releases - always broken:

   - tcp: fix tcp sock mem accounting in zero-copy corner cases

   - remove the cached dst when uncloning an skb dst and its metadata,
     since we only have one ref it'd lead to an UaF

   - netfilter:
      - conntrack: don't refresh sctp entries in closed state
      - conntrack: re-init state for retransmitted syn-ack, avoid
        connection establishment getting stuck with strange stacks
      - ctnetlink: disable helper autoassign, avoid it getting lost
      - nft_payload: don't allow transport header access for fragments

   - dsa: fix use of devres for mdio throughout drivers

   - eth: amd-xgbe: disable interrupts during pci removal

   - eth: dpaa2-eth: unregister netdev before disconnecting the PHY

   - eth: ice: fix IPIP and SIT TSO offload"

* tag 'net-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits)
  net: dsa: mv88e6xxx: fix use-after-free in mv88e6xxx_mdios_unregister
  net: mscc: ocelot: fix mutex lock error during ethtool stats read
  ice: Avoid RTNL lock when re-creating auxiliary device
  ice: Fix KASAN error in LAG NETDEV_UNREGISTER handler
  ice: fix IPIP and SIT TSO offload
  ice: fix an error code in ice_cfg_phy_fec()
  net: mpls: Fix GCC 12 warning
  dpaa2-eth: unregister the netdev before disconnecting from the PHY
  skbuff: cleanup double word in comment
  net: macb: Align the dma and coherent dma masks
  mptcp: netlink: process IPv6 addrs in creating listening sockets
  selftests: mptcp: add missing join check
  net: usb: qmi_wwan: Add support for Dell DW5829e
  vlan: move dev_put into vlan_dev_uninit
  vlan: introduce vlan_dev_free_egress_priority
  ax25: fix UAF bugs of net_device caused by rebinding operation
  net: dsa: fix panic when DSA master device unbinds on shutdown
  net: amd-xgbe: disable interrupts during pci removal
  tipc: rate limit warning for received illegal binding update
  net: mdio: aspeed: Add missing MODULE_DEVICE_TABLE
  ...

2 years agoMerge tag 'linux-kselftest-fixes-5.17-rc4' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Thu, 10 Feb 2022 23:42:48 +0000 (15:42 -0800)]
Merge tag 'linux-kselftest-fixes-5.17-rc4' of git://git./linux/kernel/git/shuah/linux-kselftest

Pull Kselftest fixes from Shuah Khan:
 "Build and run-time fixes to pidfd, clone3, and ir tests"

* tag 'linux-kselftest-fixes-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/ir: fix build with ancient kernel headers
  selftests: fixup build warnings in pidfd / clone3 tests
  pidfd: fix test failure due to stack overflow on some arches

2 years agoMerge tag 'linux-kselftest-kunit-fixes-5.17-rc4' of git://git.kernel.org/pub/scm...
Linus Torvalds [Thu, 10 Feb 2022 23:39:59 +0000 (15:39 -0800)]
Merge tag 'linux-kselftest-kunit-fixes-5.17-rc4' of git://git./linux/kernel/git/shuah/linux-kselftest

Pull KUnit fixes from Shuah Khan:
 "Fixes to the test and usage documentation"

* tag 'linux-kselftest-kunit-fixes-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  Documentation: KUnit: Fix usage bug
  kunit: fix missing f in f-string in run_checks.py

2 years agonet: dsa: mv88e6xxx: fix use-after-free in mv88e6xxx_mdios_unregister
Vladimir Oltean [Thu, 10 Feb 2022 17:40:17 +0000 (19:40 +0200)]
net: dsa: mv88e6xxx: fix use-after-free in mv88e6xxx_mdios_unregister

Since struct mv88e6xxx_mdio_bus *mdio_bus is the bus->priv of something
allocated with mdiobus_alloc_size(), this means that mdiobus_free(bus)
will free the memory backing the mdio_bus as well. Therefore, the
mdio_bus->list element is freed memory, but we continue to iterate
through the list of MDIO buses using that list element.

To fix this, use the proper list iterator that handles element deletion
by keeping a copy of the list element next pointer.

Fixes: f53a2ce893b2 ("net: dsa: mv88e6xxx: don't use devres for mdiobus")
Reported-by: Rafael Richter <rafael.richter@gin.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220210174017.3271099-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net...
Jakub Kicinski [Thu, 10 Feb 2022 19:45:35 +0000 (11:45 -0800)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-02-10

Dan Carpenter propagates an error in FEC configuration.

Jesse fixes TSO offloads of IPIP and SIT frames.

Dave adds a dedicated LAG unregister function to resolve a KASAN error
and moves auxiliary device re-creation after LAG removal to the service
task to avoid issues with RTNL lock.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  ice: Avoid RTNL lock when re-creating auxiliary device
  ice: Fix KASAN error in LAG NETDEV_UNREGISTER handler
  ice: fix IPIP and SIT TSO offload
  ice: fix an error code in ice_cfg_phy_fec()
====================

Link: https://lore.kernel.org/r/20220210170515.2609656-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: mscc: ocelot: fix mutex lock error during ethtool stats read
Colin Foster [Thu, 10 Feb 2022 15:04:51 +0000 (07:04 -0800)]
net: mscc: ocelot: fix mutex lock error during ethtool stats read

An ongoing workqueue populates the stats buffer. At the same time, a user
might query the statistics. While writing to the buffer is mutex-locked,
reading from the buffer wasn't. This could lead to buggy reads by ethtool.

This patch fixes the former blamed commit, but the bug was introduced in
the latter.

Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Fixes: 1e1caa9735f90 ("ocelot: Clean up stats update deferred work")
Fixes: a556c76adc052 ("net: mscc: Add initial Ocelot switch support")
Reported-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/all/20220210150451.416845-2-colin.foster@in-advantage.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: dsa: qca8k: fix noderef.cocci warnings
kernel test robot [Wed, 9 Feb 2022 22:13:04 +0000 (06:13 +0800)]
net: dsa: qca8k: fix noderef.cocci warnings

drivers/net/dsa/qca8k.c:422:37-43: ERROR: application of sizeof to pointer

 sizeof when applied to a pointer typed expression gives the size of
 the pointer

Generated by: scripts/coccinelle/misc/noderef.cocci

Fixes: 90386223f44e ("net: dsa: qca8k: add support for larger read/write size with mgmt Ethernet")
CC: Ansuel Smith <ansuelsmth@gmail.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: kernel test robot <lkp@intel.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220209221304.GA17529@d2214a582157
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoice: Avoid RTNL lock when re-creating auxiliary device
Dave Ertman [Fri, 21 Jan 2022 00:27:56 +0000 (16:27 -0800)]
ice: Avoid RTNL lock when re-creating auxiliary device

If a call to re-create the auxiliary device happens in a context that has
already taken the RTNL lock, then the call flow that recreates auxiliary
device can hang if there is another attempt to claim the RTNL lock by the
auxiliary driver.

To avoid this, any call to re-create auxiliary devices that comes from
an source that is holding the RTNL lock (e.g. netdev notifier when
interface exits a bond) should execute in a separate thread.  To
accomplish this, add a flag to the PF that will be evaluated in the
service task and dealt with there.

Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA")
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Reviewed-by: Jonathan Toppins <jtoppins@redhat.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoice: Fix KASAN error in LAG NETDEV_UNREGISTER handler
Dave Ertman [Tue, 18 Jan 2022 21:08:20 +0000 (13:08 -0800)]
ice: Fix KASAN error in LAG NETDEV_UNREGISTER handler

Currently, the same handler is called for both a NETDEV_BONDING_INFO
LAG unlink notification as for a NETDEV_UNREGISTER call.  This is
causing a problem though, since the netdev_notifier_info passed has
a different structure depending on which event is passed.  The problem
manifests as a call trace from a BUG: KASAN stack-out-of-bounds error.

Fix this by creating a handler specific to NETDEV_UNREGISTER that only
is passed valid elements in the netdev_notifier_info struct for the
NETDEV_UNREGISTER event.

Also included is the removal of an unbalanced dev_put on the peer_netdev
and related braces.

Fixes: 6a8b357278f5 ("ice: Respond to a NETDEV_UNREGISTER event for LAG")
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoice: fix IPIP and SIT TSO offload
Jesse Brandeburg [Fri, 14 Jan 2022 23:38:39 +0000 (15:38 -0800)]
ice: fix IPIP and SIT TSO offload

The driver was avoiding offload for IPIP (at least) frames due to
parsing the inner header offsets incorrectly when trying to check
lengths.

This length check works for VXLAN frames but fails on IPIP frames
because skb_transport_offset points to the inner header in IPIP
frames, which meant the subtraction of transport_header from
inner_network_header returns a negative value (-20).

With the code before this patch, everything continued to work, but GSO
was being used to segment, causing throughputs of 1.5Gb/s per thread.
After this patch, throughput is more like 10Gb/s per thread for IPIP
traffic.

Fixes: e94d44786693 ("ice: Implement filter sync, NDO operations and bump version")
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoice: fix an error code in ice_cfg_phy_fec()
Dan Carpenter [Fri, 7 Jan 2022 08:02:06 +0000 (11:02 +0300)]
ice: fix an error code in ice_cfg_phy_fec()

Propagate the error code from ice_get_link_default_override() instead
of returning success.

Fixes: ea78ce4dab05 ("ice: add link lenient and default override support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agonet/switchdev: use struct_size over open coded arithmetic
Minghao Chi (CGEL ZTE) [Thu, 10 Feb 2022 06:10:08 +0000 (06:10 +0000)]
net/switchdev: use struct_size over open coded arithmetic

Replace zero-length array with flexible-array member and make use
of the struct_size() helper in kmalloc(). For example:

struct switchdev_deferred_item {
    ...
    unsigned long data[];
};

Make use of the struct_size() helper instead of an open-coded version
in order to avoid any potential type mistakes.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi (CGEL ZTE) <chi.minghao@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoipv4: Reject again rules with high DSCP values
Guillaume Nault [Thu, 10 Feb 2022 12:24:51 +0000 (13:24 +0100)]
ipv4: Reject again rules with high DSCP values

Commit 563f8e97e054 ("ipv4: Stop taking ECN bits into account in
fib4-rules") replaced the validation test on frh->tos. While the new
test is stricter for ECN bits, it doesn't detect the use of high order
DSCP bits. This would be fine if IPv4 could properly handle them. But
currently, most IPv4 lookups are done with the three high DSCP bits
masked. Therefore, using these bits doesn't lead to the expected
result.

Let's reject such configurations again, so that nobody starts to
use and make any assumption about how the stack handles the three high
order DSCP bits in fib4 rules.

Fixes: 563f8e97e054 ("ipv4: Stop taking ECN bits into account in fib4-rules")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoocteontx2-pf: Add TC feature for VFs
Subbaraya Sundeep [Thu, 10 Feb 2022 11:51:44 +0000 (17:21 +0530)]
octeontx2-pf: Add TC feature for VFs

This patch adds TC feature for VFs also. When MCAM
rules are allocated for a VF then either TC or ntuple
filters can be used. Below are the commands to use
TC feature for a VF(say lbk0):

devlink dev param set pci/0002:01:00.1 name mcam_count value 16 \
 cmode runtime
ethtool -K lbk0 hw-tc-offload on
ifconfig lbk0 up
tc qdisc add dev lbk0 ingress
tc filter add dev lbk0 parent ffff: protocol ip flower skip_sw \
 dst_mac 98:03:9b:83:aa:12 action police rate 100Mbit burst 5000

Also to modify any fields of the hardware context with
NIX_AQ_INSTOP_WRITE command then corresponding masks of those
fields must be set as per hardware. This was missing in
ingress ratelimiting context. This patch sets those masks also.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: make net->dev_unreg_count atomic
Eric Dumazet [Thu, 10 Feb 2022 02:59:32 +0000 (18:59 -0800)]
net: make net->dev_unreg_count atomic

Having to acquire rtnl from netdev_run_todo() for every dismantled
device is not desirable when/if rtnl is under stress.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: mpls: Fix GCC 12 warning
Victor Erminpour [Thu, 10 Feb 2022 00:28:38 +0000 (16:28 -0800)]
net: mpls: Fix GCC 12 warning

When building with automatic stack variable initialization, GCC 12
complains about variables defined outside of switch case statements.
Move the variable outside the switch, which silences the warning:

./net/mpls/af_mpls.c:1624:21: error: statement will never be executed [-Werror=switch-unreachable]
  1624 |                 int err;
       |                     ^~~

Signed-off-by: Victor Erminpour <victor.erminpour@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoqed: prevent a fw assert during device shutdown
Venkata Sudheer Kumar Bhavaraju [Wed, 9 Feb 2022 19:28:14 +0000 (11:28 -0800)]
qed: prevent a fw assert during device shutdown

Device firmware can assert if the device shutdown path in driver
encounters an async. events from mfw (processed in
qed_mcp_handle_events()) after qed_mcp_unload_req() returns.
A call to qed_mcp_unload_req() currently marks the device as inactive
and thus stops any new events, but there is a windows where in-flight
events might still be received by the driver.

To prevent this race condition, atomically set QED_MCP_BYPASS_PROC_BIT
in qed_mcp_unload_req() to make sure qed_mcp_handle_events() ignores all
events. Wait for any event that might already be in-process to complete
by monitoring QED_MCP_IN_PROCESSING_BIT.

Signed-off-by: Pravin Kumar Ganesh Dhende <pdhende@marvell.com>
Signed-off-by: Venkata Sudheer Kumar Bhavaraju <vbhavaraju@marvell.com>
Signed-off-by: Alok Prasad <palok@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agodpaa2-eth: unregister the netdev before disconnecting from the PHY
Robert-Ionut Alexa [Wed, 9 Feb 2022 15:57:43 +0000 (17:57 +0200)]
dpaa2-eth: unregister the netdev before disconnecting from the PHY

The netdev should be unregistered before we are disconnecting from the
MAC/PHY so that the dev_close callback is called and the PHY and the
phylink workqueues are actually stopped before we are disconnecting and
destroying the phylink instance.

Fixes: 719479230893 ("dpaa2-eth: add MAC/PHY support through phylink")
Signed-off-by: Robert-Ionut Alexa <robert-ionut.alexa@nxp.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoskbuff: cleanup double word in comment
Tom Rix [Wed, 9 Feb 2022 15:02:42 +0000 (07:02 -0800)]
skbuff: cleanup double word in comment

Remove the second 'to'.

Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: macb: Align the dma and coherent dma masks
Marc St-Amand [Wed, 9 Feb 2022 09:43:25 +0000 (15:13 +0530)]
net: macb: Align the dma and coherent dma masks

Single page and coherent memory blocks can use different DMA masks
when the macb accesses physical memory directly. The kernel is clever
enough to allocate pages that fit into the requested address width.

When using the ARM SMMU, the DMA mask must be the same for single
pages and big coherent memory blocks. Otherwise the translation
tables turn into one big mess.

  [   74.959909] macb ff0e0000.ethernet eth0: DMA bus error: HRESP not OK
  [   74.959989] arm-smmu fd800000.smmu: Unhandled context fault: fsr=0x402, iova=0x3165687460, fsynr=0x20001, cbfrsynra=0x877, cb=1
  [   75.173939] macb ff0e0000.ethernet eth0: DMA bus error: HRESP not OK
  [   75.173955] arm-smmu fd800000.smmu: Unhandled context fault: fsr=0x402, iova=0x3165687460, fsynr=0x20001, cbfrsynra=0x877, cb=1

Since using the same DMA mask does not hurt direct 1:1 physical
memory mappings, this commit always aligns DMA and coherent masks.

Signed-off-by: Marc St-Amand <mstamand@ciena.com>
Signed-off-by: Harini Katakam <harini.katakam@xilinx.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'ping6-cmsg'
David S. Miller [Thu, 10 Feb 2022 15:04:52 +0000 (15:04 +0000)]
Merge branch 'ping6-cmsg'

Jakub Kicinski says:

====================
net: ping6: support basic socket cmsgs

Add support for common SOL_SOCKET cmsgs in ICMPv6 sockets.
Extend the cmsg tests to cover more cmsgs and socket types.

SOL_IPV6 cmsgs to follow.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoselftests: net: test standard socket cmsgs across UDP and ICMP sockets
Jakub Kicinski [Thu, 10 Feb 2022 00:36:49 +0000 (16:36 -0800)]
selftests: net: test standard socket cmsgs across UDP and ICMP sockets

Test TIMESTAMPING and TXTIME across UDP / ICMP and IP versions.

Before ICMPv6 support:
  # ./tools/testing/selftests/net/cmsg_time.sh
    Case ICMPv6  - ts cnt returned '0', expected '2'
    Case ICMPv6  - ts0 SCHED returned '', expected 'OK'
    Case ICMPv6  - ts0 SND returned '', expected 'OK'
    Case ICMPv6  - TXTIME abs returned '', expected 'OK'
    Case ICMPv6  - TXTIME rel returned '', expected 'OK'
  FAIL - 5/36 cases failed

After:
  # ./tools/testing/selftests/net/cmsg_time.sh
  OK

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoselftests: net: cmsg_sender: support Tx timestamping
Jakub Kicinski [Thu, 10 Feb 2022 00:36:48 +0000 (16:36 -0800)]
selftests: net: cmsg_sender: support Tx timestamping

Support requesting Tx timestamps:

 $ ./cmsg_sender -p i -t -4 $tgt 123 -d 1000
 SCHED ts0 61us
   SND ts0 1071us

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoselftests: net: cmsg_sender: support setting SO_TXTIME
Jakub Kicinski [Thu, 10 Feb 2022 00:36:47 +0000 (16:36 -0800)]
selftests: net: cmsg_sender: support setting SO_TXTIME

Add ability to send delayed packets.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoselftests: net: cmsg_so_mark: test with SO_MARK set by setsockopt
Jakub Kicinski [Thu, 10 Feb 2022 00:36:46 +0000 (16:36 -0800)]
selftests: net: cmsg_so_mark: test with SO_MARK set by setsockopt

Test if setting SO_MARK with setsockopt works and if cmsg
takes precedence over it.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoselftests: net: cmsg_so_mark: test ICMP and RAW sockets
Jakub Kicinski [Thu, 10 Feb 2022 00:36:45 +0000 (16:36 -0800)]
selftests: net: cmsg_so_mark: test ICMP and RAW sockets

Use new capabilities of cmsg_sender to test ICMP and RAW sockets,
previously only UDP was tested.

Before SO_MARK support was added to ICMPv6:

  # ./cmsg_so_mark.sh
   Case ICMP rejection returned 0, expected 1
  FAIL - 1/12 cases failed

After:

  # ./cmsg_so_mark.sh
  OK

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoselftests: net: cmsg_sender: support icmp and raw sockets
Jakub Kicinski [Thu, 10 Feb 2022 00:36:44 +0000 (16:36 -0800)]
selftests: net: cmsg_sender: support icmp and raw sockets

Support sending fake ICMP(v6) messages and UDP via RAW sockets.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoselftests: net: make cmsg_so_mark ready for more options
Jakub Kicinski [Thu, 10 Feb 2022 00:36:43 +0000 (16:36 -0800)]
selftests: net: make cmsg_so_mark ready for more options

Parametrize the code so that it can support UDP and ICMP
sockets in the future, and more cmsg types.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoselftests: net: rename cmsg_so_mark
Jakub Kicinski [Thu, 10 Feb 2022 00:36:42 +0000 (16:36 -0800)]
selftests: net: rename cmsg_so_mark

Rename the file in prep for generalization.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ping6: support setting socket options via cmsg
Jakub Kicinski [Thu, 10 Feb 2022 00:36:41 +0000 (16:36 -0800)]
net: ping6: support setting socket options via cmsg

Minor reordering of the code and a call to sock_cmsg_send()
gives us support for setting the common socket options via
cmsg (the usual ones - SO_MARK, SO_TIMESTAMPING_OLD, SCM_TXTIME).

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ping6: support packet timestamping
Jakub Kicinski [Thu, 10 Feb 2022 00:36:40 +0000 (16:36 -0800)]
net: ping6: support packet timestamping

Nothing prevents the user from requesting timestamping
on ping6 sockets, yet timestamps are not going to be reported.
Plumb the flags through.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ping6: remove a pr_debug() statement
Jakub Kicinski [Thu, 10 Feb 2022 00:36:39 +0000 (16:36 -0800)]
net: ping6: remove a pr_debug() statement

We have ftrace and BPF today, there's no need for printing arguments
at the start of a function.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge tag 'mt76-for-kvalo-2022-02-04' of https://github.com/nbd168/wireless into...
Kalle Valo [Thu, 10 Feb 2022 14:36:03 +0000 (16:36 +0200)]
Merge tag 'mt76-for-kvalo-2022-02-04' of https://github.com/nbd168/wireless into main

mt76 patches for 5.18

- mt7915 mcu code cleanup
- mt7916 support
- fixes for SDIO support
- fixes for DFS
- power management fixes
- stability improvements
- background radar detection support

2 years agoMerge tag 'ieee802154-for-davem-2022-02-10' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Thu, 10 Feb 2022 14:28:04 +0000 (14:28 +0000)]
Merge tag 'ieee802154-for-davem-2022-02-10' of git://git./linux/kernel/git/sschmidt/wpan-next

Stefan Schmidt says:

====================
pull-request: ieee802154-next 2022-02-10

An update from ieee802154 for your *net-next* tree.

There is more ongoing in ieee802154 than usual. This will be the first pull
request for this cycle, but I expect one more. Depending on review and rework
times.

Pavel Skripkin ported the atusb driver over to the new USB api to avoid unint
problems as well as making use of the modern api without kmalloc() needs in he
driver.

Miquel Raynal landed some changes to ensure proper frame checksum checking with
hwsim, documenting our use of wake and stop_queue and eliding a magic value by
using the proper define.

David Girault documented the address struct used in ieee802154.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge tag 'mips-fixes-5.17_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips...
Linus Torvalds [Thu, 10 Feb 2022 13:52:00 +0000 (05:52 -0800)]
Merge tag 'mips-fixes-5.17_3' of git://git./linux/kernel/git/mips/linux

Pull MIPS fix from Thomas Bogendoerfer:
 "Device tree fix for Ingenic CI20"

* tag 'mips-fixes-5.17_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: DTS: CI20: fix how ddc power is enabled

2 years agoMerge tag 'audit-pr-20220209' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoor...
Linus Torvalds [Thu, 10 Feb 2022 13:43:43 +0000 (05:43 -0800)]
Merge tag 'audit-pr-20220209' of git://git./linux/kernel/git/pcmoore/audit

Pull audit fix from Paul Moore:
 "Another audit fix, this time a single rather small but important fix
  for an oops/page-fault caused by improperly accessing userspace
  memory"

* tag 'audit-pr-20220209' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: don't deref the syscall args when checking the openat2 open_how::flags

2 years agotipc: improve size validations for received domain records
Jon Maloy [Sat, 5 Feb 2022 19:11:18 +0000 (14:11 -0500)]
tipc: improve size validations for received domain records

The function tipc_mon_rcv() allows a node to receive and process
domain_record structs from peer nodes to track their views of the
network topology.

This patch verifies that the number of members in a received domain
record does not exceed the limit defined by MAX_MON_DOMAIN, something
that may otherwise lead to a stack overflow.

tipc_mon_rcv() is called from the function tipc_link_proto_rcv(), where
we are reading a 32 bit message data length field into a uint16.  To
avert any risk of bit overflow, we add an extra sanity check for this in
that function.  We cannot see that happen with the current code, but
future designers being unaware of this risk, may introduce it by
allowing delivery of very large (> 64k) sk buffers from the bearer
layer.  This potential problem was identified by Eric Dumazet.

This fixes CVE-2022-0435

Reported-by: Samuel Page <samuel.page@appgate.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Fixes: 35c55c9877f8 ("tipc: add neighbor monitoring framework")
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Samuel Page <samuel.page@appgate.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next...
David S. Miller [Thu, 10 Feb 2022 11:00:13 +0000 (11:00 +0000)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2022-02-09

This series contains updates to ice driver only.

Brett adds support for QinQ. This begins with code refactoring and
re-organization of VLAN configuration functions to allow for
introduction of VSI VLAN ops to enable setting and calling of
respective operations based on device support of single or double
VLANs. Implementations are added for outer VLAN support.

To support QinQ, the device must be set to double VLAN mode (DVM).
In order for this to occur, the DDP package and NVM must also support
DVM. Functions to determine compatibility and properly configure the
device are added as well as setting the proper bits to advertise and
utilize the proper offloads. Support for VIRTCHNL_VF_OFFLOAD_VLAN_V2
is also included to allow for VF to negotiate and utilize this
functionality.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agobrcmfmac: of: remove redundant variable len
Colin Ian King [Mon, 7 Feb 2022 13:33:29 +0000 (13:33 +0000)]
brcmfmac: of: remove redundant variable len

The variable len is being assigned bit is never used. The variable
and the strlen call are redundant and can be removed.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220207133329.336664-1-colin.i.king@gmail.com
2 years agobrcmfmac: p2p: Replace one-element arrays with flexible-array members
Gustavo A. R. Silva [Fri, 4 Feb 2022 23:22:28 +0000 (17:22 -0600)]
brcmfmac: p2p: Replace one-element arrays with flexible-array members

There is a regular need in the kernel to provide a way to declare having
a dynamically sized set of trailing elements in a structure. Kernel code
should always use “flexible array members”[1] for these cases. The older
style of one-element or zero-length arrays should no longer be used[2].

This helps with the ongoing efforts to globally enable -Warray-bounds
and get us closer to being able to tighten the FORTIFY_SOURCE routines
on memcpy().

This issue was found with the help of Coccinelle and audited and fixed,
manually.

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.16/process/deprecated.html#zero-length-and-one-element-arrays

Link: https://github.com/KSPP/linux/issues/79
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220204232228.GA442895@embeddedor
2 years agortw89: coex: set EN bit to PLT register
Ping-Ke Shih [Tue, 8 Feb 2022 08:27:51 +0000 (16:27 +0800)]
rtw89: coex: set EN bit to PLT register

B_AX_PLT_EN is to enable polluted mechanism. If it is enabled and
gnt_bt = 1 while wlan TX, B_AX_BT_PLT_PKT_CNT counter will increase,
but TX counter to BB will not. Without this bit BTCoex mechanism might
have some problems.

Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220208082751.43553-1-pkshih@realtek.com
2 years agortw89: recover rates of rate adaptive mechanism
Chien-Hsun Liao [Tue, 8 Feb 2022 08:24:27 +0000 (16:24 +0800)]
rtw89: recover rates of rate adaptive mechanism

Some APs like CMW270 only support one phyrate and the function
rtw89_phy_ra_mask_rssi could disable that rate. To fix such problem, we
restore the rate mask if we find that the rate_mask is empty.
Also, apply missed legacy rates from sta->supp_rates[].

Signed-off-by: Chien-Hsun Liao <ben.liao@realtek.com>
Signed-off-by: Kuan-Chung Chen <damon.chen@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220208082427.42433-3-pkshih@realtek.com
2 years agortw88: recover rates of rate adaptive mechanism
Chien-Hsun Liao [Tue, 8 Feb 2022 08:24:26 +0000 (16:24 +0800)]
rtw88: recover rates of rate adaptive mechanism

Some APs like CMW270 only support one phyrate and the function
rtw_update_rate_mask could disable that rate. To fix such problem, we
restore the rate mask if we find that the rate_mask is empty.

Signed-off-by: Chien-Hsun Liao <ben.liao@realtek.com>
Signed-off-by: Kuan-Chung Chen <damon.chen@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220208082427.42433-2-pkshih@realtek.com
2 years agortw89: declare AP mode support
Ping-Ke Shih [Mon, 7 Feb 2022 06:39:00 +0000 (14:39 +0800)]
rtw89: declare AP mode support

Things are ready for AP mode, so declare this driver can support it.

Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220207063900.43643-8-pkshih@realtek.com
2 years agortw89: debug: add stations entry to show ID assignment
Ping-Ke Shih [Mon, 7 Feb 2022 06:38:59 +0000 (14:38 +0800)]
rtw89: debug: add stations entry to show ID assignment

In order to trace the relation of IDs, we add this debugfs entry to make
them clear.

The output looks like:
  map:
          mac_id:    07 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
          addr_cam:  07 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
          bssid_cam: 01 00 00 00 00 00 00 00
          sec_cam:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  VIF [0] 94:08:53:8e:ef:21
          bssid_cam_idx=0
          addr_cam_idx=0
          -> bssid_cam_idx=0
          sec_cam_bitmap=00 00 00 00 00 00 00 00
  STA [1] 58:00:e3:bb:9c:4f
          addr_cam_idx=1
          -> bssid_cam_idx=0
          sec_cam_bitmap=00 00 00 00 00 00 00 00
  STA [2] 94:08:53:8e:ef:75
          addr_cam_idx=2
          -> bssid_cam_idx=0
          sec_cam_bitmap=00 00 00 00 00 00 00 00

Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220207063900.43643-7-pkshih@realtek.com
2 years agortw89: implement ieee80211_ops::start_ap and stop_ap
Ping-Ke Shih [Mon, 7 Feb 2022 06:38:58 +0000 (14:38 +0800)]
rtw89: implement ieee80211_ops::start_ap and stop_ap

Configure firmware and hardware to run AP mode. The start_ap() setup
bssid, mac port, mac_id entry, and does RFK. The stop_ap() reset the
state.

Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220207063900.43643-6-pkshih@realtek.com
2 years agortw89: maintain assoc/disassoc STA states of firmware and hardware
Ping-Ke Shih [Mon, 7 Feb 2022 06:38:57 +0000 (14:38 +0800)]
rtw89: maintain assoc/disassoc STA states of firmware and hardware

In AP mode, when a STA associate to us, we need to create an entry in
firmware and hardware, and then they can transmit data properly.

The entry index called mac_id which is assigned when sta_add, and we ask
firmware to create an entry for an associated station. Also, the address
CAM should be filled so hardware can know which packet is ours, and lookup
the mac_id for further use.

Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220207063900.43643-5-pkshih@realtek.com
2 years agortw89: only STA mode change vif_type mapping dynamically
Ping-Ke Shih [Mon, 7 Feb 2022 06:38:56 +0000 (14:38 +0800)]
rtw89: only STA mode change vif_type mapping dynamically

vif_type mapping indicates hardware operating mode corresponding to vif
type. In STA mode, hardware mode should be INFRA or NO_LINK mode
dynamically according to association status. Since AP mode don't need to
change this by association status intuitively, just do the mapping in
STA mode.

Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220207063900.43643-4-pkshih@realtek.com
2 years agortw89: add addr_cam field to sta to support AP mode
Ping-Ke Shih [Mon, 7 Feb 2022 06:38:55 +0000 (14:38 +0800)]
rtw89: add addr_cam field to sta to support AP mode

In AP mode, each connected station needs an entry of address CAM. The
address CAM of vif is still needed to assit in AP itself.

For station mode, it still uses vif's address CAM.

Add a help macro rtw89_get_addr_cam_of() to get addr_cam from vif or sta
for all use cases.

Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220207063900.43643-3-pkshih@realtek.com
2 years agortw89: extend role_maintain to support AP mode
Ping-Ke Shih [Mon, 7 Feb 2022 06:38:54 +0000 (14:38 +0800)]
rtw89: extend role_maintain to support AP mode

Fill mac_id and self_role depends on the operation mode.

In AP mode, echo connected station has an unique mac_id, and each vif also
has one mac_id to represent itself.

The self_role is assigned to vif if the operation mode is decided, and
RTW89_SELF_ROLE_AP_CLIENT is assigned to the connected STA in AP mode,

Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220207063900.43643-2-pkshih@realtek.com
2 years agortw88: fix use after free in rtw_hw_scan_update_probe_req()
Dan Carpenter [Thu, 3 Feb 2022 08:25:32 +0000 (11:25 +0300)]
rtw88: fix use after free in rtw_hw_scan_update_probe_req()

This code needs to use skb_queue_walk_safe() instead of skb_queue_walk()
because it frees the list iterator.

Fixes: d95984b5580d ("rtw88: fix memory overrun and memory leak during hw_scan")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220203082532.GA25151@kili
2 years agoMerge branch 'mptcp-fixes-for-5-17'
Jakub Kicinski [Thu, 10 Feb 2022 05:44:37 +0000 (21:44 -0800)]
Merge branch 'mptcp-fixes-for-5-17'

Mat Martineau says:

====================
mptcp: Fixes for 5.17

Patch 1 fixes a MPTCP selftest bug that combined the results of two
separate tests in the test output.

Patch 2 fixes a problem where advertised IPv6 addresses were not actually
available for incoming MP_JOIN requests.
====================

Link: https://lore.kernel.org/r/20220210012508.226880-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agomptcp: netlink: process IPv6 addrs in creating listening sockets
Kishen Maloor [Thu, 10 Feb 2022 01:25:08 +0000 (17:25 -0800)]
mptcp: netlink: process IPv6 addrs in creating listening sockets

This change updates mptcp_pm_nl_create_listen_socket() to create
listening sockets bound to IPv6 addresses (where IPv6 is supported).

Fixes: 1729cf186d8a ("mptcp: create the listening socket for new port")
Acked-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests: mptcp: add missing join check
Matthieu Baerts [Thu, 10 Feb 2022 01:25:07 +0000 (17:25 -0800)]
selftests: mptcp: add missing join check

This function also writes the name of the test with its ID, making clear
a new test has been executed.

Without that, the ADD_ADDR results from this test was appended at the
end of the previous test causing confusions. Especially when the second
test was failing, we had:

  17 signal invalid addresses     syn[ ok ] - synack[ ok ] - ack[ ok ]
                                  add[ ok ] - echo  [ ok ]
                                  add[fail] got 2 ADD_ADDR[s] expected 3

In fact, this 17th test was OK but not the 18th one.

Now we have:

  17 signal invalid addresses     syn[ ok ] - synack[ ok ] - ack[ ok ]
                                  add[ ok ] - echo  [ ok ]
  18 signal addresses race test   syn[fail] got 2 JOIN[s] syn expected 3
   - synack[fail] got 2 JOIN[s] synack expected
   - ack[fail] got 2 JOIN[s] ack expected 3
                                  add[fail] got 2 ADD_ADDR[s] expected 3

Fixes: 33c563ad28e3 ("selftests: mptcp: add_addr and echo race test")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Jakub Kicinski [Thu, 10 Feb 2022 05:35:07 +0000 (21:35 -0800)]
Merge git://git./linux/kernel/git/netfilter/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

1) Conntrack sets on CHECKSUM_UNNECESSARY for UDP packet with no checksum,
   from Kevin Mitchell.

2) skb->priority support for nfqueue, from Nicolas Dichtel.

3) Remove conntrack extension register API, from Florian Westphal.

4) Move nat destroy hook to nf_nat_hook instead, to remove
   nf_ct_ext_destroy(), also from Florian.

5) Wrap pptp conntrack NAT hooks into single structure, from Florian Westphal.

6) Support for tcp option set to noop for nf_tables, also from Florian.

7) Do not run x_tables comment match from packet path in nf_tables,
   from Florian Westphal.

8) Replace spinlock by cmpxchg() loop to update missed ct event,
   from Florian Westphal.

9) Wrap cttimeout hooks into single structure, from Florian.

10) Add fast nft_cmp expression for up to 16-bytes.

11) Use cb->ctx to store context in ctnetlink dump, instead of using
    cb->args[], from Florian Westphal.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: ctnetlink: use dump structure instead of raw args
  nfqueue: enable to set skb->priority
  netfilter: nft_cmp: optimize comparison for 16-bytes
  netfilter: cttimeout: use option structure
  netfilter: ecache: don't use nf_conn spinlock
  netfilter: nft_compat: suppress comment match
  netfilter: exthdr: add support for tcp option removal
  netfilter: conntrack: pptp: use single option structure
  netfilter: conntrack: remove extension register api
  netfilter: conntrack: handle ->destroy hook via nat_ops instead
  netfilter: conntrack: move extension sizes into core
  netfilter: conntrack: make all extensions 8-byte alignned
  netfilter: nfqueue: enable to get skb->priority
  netfilter: conntrack: mark UDP zero checksum as CHECKSUM_UNNECESSARY
====================

Link: https://lore.kernel.org/r/20220209133616.165104-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agotcp: Don't acquire inet_listen_hashbucket::lock with disabled BH.
Sebastian Andrzej Siewior [Wed, 9 Feb 2022 18:56:57 +0000 (19:56 +0100)]
tcp: Don't acquire inet_listen_hashbucket::lock with disabled BH.

Commit
   9652dc2eb9e40 ("tcp: relax listening_hash operations")

removed the need to disable bottom half while acquiring
listening_hash.lock. There are still two callers left which disable
bottom half before the lock is acquired.

On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts
as a lock to ensure that resources, that are protected by disabling
bottom halves, remain protected.
This leads to a circular locking dependency if the lock acquired with
disabled bottom halves is also acquired with enabled bottom halves
followed by disabling bottom halves. This is the reverse locking order.
It has been observed with inet_listen_hashbucket::lock:

local_bh_disable() + spin_lock(&ilb->lock):
  inet_listen()
    inet_csk_listen_start()
      sk->sk_prot->hash() := inet_hash()
local_bh_disable()
__inet_hash()
  spin_lock(&ilb->lock);
    acquire(&ilb->lock);

Reverse order: spin_lock(&ilb2->lock) + local_bh_disable():
  tcp_seq_next()
    listening_get_next()
      spin_lock(&ilb2->lock);
acquire(&ilb2->lock);

  tcp4_seq_show()
    get_tcp4_sock()
      sock_i_ino()
read_lock_bh(&sk->sk_callback_lock);
  acquire(softirq_ctrl) // <---- whoops
  acquire(&sk->sk_callback_lock)

Drop local_bh_disable() around __inet_hash() which acquires
listening_hash->lock. Split inet_unhash() and acquire the
listen_hashbucket lock without disabling bottom halves; the inet_ehash
lock with disabled bottom halves.

Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/12d6f9879a97cd56c09fb53dee343cbb14f7f1f7.camel@gmx.de
Link: https://lkml.kernel.org/r/X9CheYjuXWc75Spa@hirez.programming.kicks-ass.net
Link: https://lore.kernel.org/r/YgQOebeZ10eNx1W6@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Jakub Kicinski [Thu, 10 Feb 2022 02:17:54 +0000 (18:17 -0800)]
Merge https://git./linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2022-02-09

We've added 126 non-merge commits during the last 16 day(s) which contain
a total of 201 files changed, 4049 insertions(+), 2215 deletions(-).

The main changes are:

1) Add custom BPF allocator for JITs that pack multiple programs into a huge
   page to reduce iTLB pressure, from Song Liu.

2) Add __user tagging support in vmlinux BTF and utilize it from BPF
   verifier when generating loads, from Yonghong Song.

3) Add per-socket fast path check guarding from cgroup/BPF overhead when
   used by only some sockets, from Pavel Begunkov.

4) Continued libbpf deprecation work of APIs/features and removal of their
   usage from samples, selftests, libbpf & bpftool, from Andrii Nakryiko
   and various others.

5) Improve BPF instruction set documentation by adding byte swap
   instructions and cleaning up load/store section, from Christoph Hellwig.

6) Switch BPF preload infra to light skeleton and remove libbpf dependency
   from it, from Alexei Starovoitov.

7) Fix architecture-agnostic macros in libbpf for accessing syscall
   arguments from BPF progs for non-x86 architectures,
   from Ilya Leoshkevich.

8) Rework port members in struct bpf_sk_lookup and struct bpf_sock to be
   of 16-bit field with anonymous zero padding, from Jakub Sitnicki.

9) Add new bpf_copy_from_user_task() helper to read memory from a different
   task than current. Add ability to create sleepable BPF iterator progs,
   from Kenny Yu.

10) Implement XSK batching for ice's zero-copy driver used by AF_XDP and
    utilize TX batching API from XSK buffer pool, from Maciej Fijalkowski.

11) Generate temporary netns names for BPF selftests to avoid naming
    collisions, from Hangbin Liu.

12) Implement bpf_core_types_are_compat() with limited recursion for
    in-kernel usage, from Matteo Croce.

13) Simplify pahole version detection and finally enable CONFIG_DEBUG_INFO_DWARF5
    to be selected with CONFIG_DEBUG_INFO_BTF, from Nathan Chancellor.

14) Misc minor fixes to libbpf and selftests from various folks.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (126 commits)
  selftests/bpf: Cover 4-byte load from remote_port in bpf_sk_lookup
  bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide
  libbpf: Fix compilation warning due to mismatched printf format
  selftests/bpf: Test BPF_KPROBE_SYSCALL macro
  libbpf: Add BPF_KPROBE_SYSCALL macro
  libbpf: Fix accessing the first syscall argument on s390
  libbpf: Fix accessing the first syscall argument on arm64
  libbpf: Allow overriding PT_REGS_PARM1{_CORE}_SYSCALL
  selftests/bpf: Skip test_bpf_syscall_macro's syscall_arg1 on arm64 and s390
  libbpf: Fix accessing syscall arguments on riscv
  libbpf: Fix riscv register names
  libbpf: Fix accessing syscall arguments on powerpc
  selftests/bpf: Use PT_REGS_SYSCALL_REGS in bpf_syscall_macro
  libbpf: Add PT_REGS_SYSCALL_REGS macro
  selftests/bpf: Fix an endianness issue in bpf_syscall_macro test
  bpf: Fix bpf_prog_pack build HPAGE_PMD_SIZE
  bpf: Fix leftover header->pages in sparc and powerpc code.
  libbpf: Fix signedness bug in btf_dump_array_data()
  selftests/bpf: Do not export subtest as standalone test
  bpf, x86_64: Fail gracefully on bpf_jit_binary_pack_finalize failures
  ...
====================

Link: https://lore.kernel.org/r/20220209210050.8425-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: drop_monitor: support drop reason
Menglong Dong [Wed, 9 Feb 2022 06:08:38 +0000 (14:08 +0800)]
net: drop_monitor: support drop reason

In the commit c504e5c2f964 ("net: skb: introduce kfree_skb_reason()")
drop reason is introduced to the tracepoint of kfree_skb. Therefore,
drop_monitor is able to report the drop reason to users by netlink.

The drop reasons are reported as string to users, which is exactly
the same as what we do when reporting it to ftrace.

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220209060838.55513-1-imagedong@tencent.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: usb: qmi_wwan: Add support for Dell DW5829e
Slark Xiao [Wed, 9 Feb 2022 02:47:17 +0000 (10:47 +0800)]
net: usb: qmi_wwan: Add support for Dell DW5829e

Dell DW5829e same as DW5821e except the CAT level.
DW5821e supports CAT16 but DW5829e supports CAT9.
Also, DW5829e includes normal and eSIM type.
Please see below test evidence:

T:  Bus=04 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#=  5 Spd=5000 MxCh= 0
D:  Ver= 3.10 Cls=ef(misc ) Sub=02 Prot=01 MxPS= 9 #Cfgs=  1
P:  Vendor=413c ProdID=81e6 Rev=03.18
S:  Manufacturer=Dell Inc.
S:  Product=DW5829e Snapdragon X20 LTE
S:  SerialNumber=0123456789ABCDEF
C:  #Ifs= 6 Cfg#= 1 Atr=a0 MxPwr=896mA
I:  If#=0x0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
I:  If#=0x1 Alt= 0 #EPs= 1 Cls=03(HID  ) Sub=00 Prot=00 Driver=usbhid
I:  If#=0x2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
I:  If#=0x3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
I:  If#=0x4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
I:  If#=0x5 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option

T:  Bus=04 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#=  7 Spd=5000 MxCh= 0
D:  Ver= 3.10 Cls=ef(misc ) Sub=02 Prot=01 MxPS= 9 #Cfgs=  1
P:  Vendor=413c ProdID=81e4 Rev=03.18
S:  Manufacturer=Dell Inc.
S:  Product=DW5829e-eSIM Snapdragon X20 LTE
S:  SerialNumber=0123456789ABCDEF
C:  #Ifs= 6 Cfg#= 1 Atr=a0 MxPwr=896mA
I:  If#=0x0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
I:  If#=0x1 Alt= 0 #EPs= 1 Cls=03(HID  ) Sub=00 Prot=00 Driver=usbhid
I:  If#=0x2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
I:  If#=0x3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
I:  If#=0x4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
I:  If#=0x5 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option

Signed-off-by: Slark Xiao <slark_xiao@163.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Link: https://lore.kernel.org/r/20220209024717.8564-1-slark_xiao@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoaudit: don't deref the syscall args when checking the openat2 open_how::flags
Paul Moore [Wed, 9 Feb 2022 19:49:38 +0000 (14:49 -0500)]
audit: don't deref the syscall args when checking the openat2 open_how::flags

As reported by Jeff, dereferencing the openat2 syscall argument in
audit_match_perm() to obtain the open_how::flags can result in an
oops/page-fault.  This patch fixes this by using the open_how struct
that we store in the audit_context with audit_openat2_how().

Independent of this patch, Richard Guy Briggs posted a similar patch
to the audit mailing list roughly 40 minutes after this patch was
posted.

Cc: stable@vger.kernel.org
Fixes: 1c30e3af8a79 ("audit: add support for the openat2 syscall")
Reported-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>