git.monstr.eu Git - linux-2.6-microblaze.git/log

net: stmmac: sti: convert to devm_stmmac_pltfr_probe()

Convert sti to use the generic devm_stmmac_pltfr_probe() which will
call plat_dat->init()/plat_dat->exit() as appropriate, thus
simplifying the code.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1u4jMj-000rCM-31@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: sti: use phy_interface_mode_is_rgmii()

Replace the custom IS_PHY_IF_MODE_RGMII() macro with our generic
phy_interface_mode_is_rgmii() inline function.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1u4jMd-000rCG-VU@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

docs: networking: clarify intended audience of netdevices.rst

The netdevices doc is dangerously broad. At least make it clear
that it's intended for developers, not for users.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250415172653.811147-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: mediatek: init val in .phy_led_polarity_set for AN7581

Fix smatch warning for uninitialised val in .phy_led_polarity_set for
AN7581 driver.

Correctly init to 0 to set polarity high by default.

Reported-by: Simon Horman <horms@kernel.org>
Fixes: 6a325aed130b ("net: phy: mediatek: add Airoha PHY ID to SoC driver")
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Link: https://patch.msgid.link/20250415105313.3409-1-ansuelsmth@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: sun8i: use stmmac_pltfr_probe()

Using stmmac_pltfr_probe() simplifies the probe function. This will not
only call plat_dat->init (sun8i_dwmac_init), but also plat_dat->exit
(sun8i_dwmac_exit) appropriately if stmmac_dvr_probe() fails. This
results in an overall simplification of the glue driver.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Link: https://patch.msgid.link/E1u4dKb-000dV7-3B@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: intel: remove unnecessary setting max_speed

Phylink will already limit the MAC speed according to the interface,
so if 2500BASE-X is selected, the maximum speed will be 2.5G.
Similarly, if SGMII is selected, the maximum speed will be 1G.
It is, therefore, not necessary to set a speed limit. Remove setting
plat_dat->max_speed from this glue driver.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1u4dIh-000dT5-Kt@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

eth: bnxt: add support rx side device memory TCP

Currently, bnxt_en driver satisfies the requirements of the Device
memory TCP, which is HDS.
So, it implements rx-side Device memory TCP for bnxt_en driver.
It requires only converting the page API to netmem API.
`struct page` of agg rings are changed to `netmem_ref netmem` and
corresponding functions are changed to a variant of netmem API.

It also passes PP_FLAG_ALLOW_UNREADABLE_NETMEM flag to a parameter of
page_pool.
The netmem will be activated only when a user requests devmem TCP.

When netmem is activated, received data is unreadable and netmem is
disabled, received data is readable.
But drivers don't need to handle both cases because netmem core API will
handle it properly.
So, using proper netmem API is enough for drivers.

Device memory TCP can be tested with
tools/testing/selftests/drivers/net/hw/ncdevmem.
This is tested with BCM57504-N425G and firmware version 232.0.155.8/pkg
232.1.132.8.

Reviewed-by: Mina Almasry <almasrymina@google.com>
Tested-by: David Wei <dw@davidwei.uk>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Link: https://patch.msgid.link/20250415052458.1260575-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: lan743x: Allocate rings outside ZONE_DMA

The driver allocates ring elements using GFP_DMA flags. There is
no dependency from LAN743x hardware on memory allocation should be
in DMA_ZONE. Hence modifying the flags to use only GFP_ATOMIC. This
is consistent with other callers of lan743x_rx_init_ring_element().

Reported-by: Zhang, Liyin(CN) <Liyin.Zhang.CN@windriver.com>
Signed-off-by: Thangaraj Samynathan <thangaraj.s@microchip.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250415044509.6695-1-thangaraj.s@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-ethernet-ti-am65-cpsw-fix-mac-address-fetching'

Michael Walle says:

====================
net: ethernet: ti: am65-cpsw: Fix MAC address fetching

MAC addresses can be fetched from a NVMEM device. of_get_mac_address()
will return EPROBE_DEFER if that device is not available yet. That
isn't handled correctly by the driver and it will always fall back
to either a random MAC address or it's own "fetch by fuse" method.

Also, if the ethernet (sub)node has a link to the nvmem device,
it will fail to create a device link as the fwnode parameter isn't
populated. That's fixed in the first patch.
====================

Link: https://patch.msgid.link/20250414084336.4017237-1-mwalle@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: ti: am65-cpsw: handle -EPROBE_DEFER

of_get_mac_address() might fetch the MAC address from NVMEM and that
driver might not have been loaded. In that case, -EPROBE_DEFER is
returned. Right now, this will trigger an immediate fallback to
am65_cpsw_am654_get_efuse_macid() possibly resulting in a random MAC
address although the MAC address is stored in the referenced NVMEM.

Fix it by handling the -EPROBE_DEFER return code correctly. This also
means that the creation of the MDIO device has to be moved to a later
stage as -EPROBE_DEFER must not be returned after child devices are
created.

Signed-off-by: Michael Walle <mwalle@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250414084336.4017237-3-mwalle@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: ti: am65-cpsw: set fwnode for ports

fwnode needs to be set for a device for fw_devlink to be able to
track/enforce its dependencies correctly. Without this, you'll see error
messages like this when the supplier has probed and tries to make sure
all its fwnode consumers are linked to it using device links:

am65-cpsw-nuss 8000000.ethernet: Failed to create device link (0x180) with supplier ..

Reviewed-by: Saravana Kannan <saravanak@google.com>
Signed-off-by: Michael Walle <mwalle@kernel.org>
Link: https://patch.msgid.link/20250414084336.4017237-2-mwalle@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-ptp-driver-opt-in-for-supported-ptp-ioctl-flags'

Jacob Keller says:

====================
net: ptp: driver opt-in for supported PTP ioctl flags

Both the PTP_EXTTS_REQUEST(2) and PTP_PEROUT_REQUEST(2) ioctls take flags
from userspace to modify their behavior. Drivers are supposed to check
these flags, rejecting requests for flags they do not support.

Many drivers today do not check these flags, despite many attempts to
squash individual drivers as these mistakes are discovered. Additionally,
any new flags added can require updating every driver if their validation
checks are poorly implemented.

It is clear that driver authors will not reliably check for unsupported
flags. The root of the issue is that drivers must essentially opt out of
every flag, rather than opt in to the ones they support.

Instead, lets introduce .supported_perout_flags and .supported_extts_flags
to the ptp_clock_info structure. This is a pattern taken from several
ethtool ioctls which enabled validation to move out of the drivers and into
the shared ioctl handlers. This pattern has worked quite well and makes it
much more difficult for drivers to accidentally accept flags they do not
support.

With this approach, drivers which do not set the supported fields will have
the core automatically reject any request which has flags. Drivers must opt
in to each flag they support by adding it to the list, with the sole
exception being the PTP_ENABLE_FEATURE flag of the PTP_EXTTS_REQUEST ioctl
since it is entirely handled by the ptp_chardev.c file.

This change will ensure that all current and future drivers are safe for
extension when we need to extend these ioctls.

I opted to keep all the driver changes into one patch per ioctl type. The
changes are relatively small and straight forward. Splitting it per-driver
would make the series large, and also break flags between the introduction
of the supported field and setting it in each driver.

The non-Intel drivers are compile-tested only, and I would appreciate
confirmation and testing from their respective maintainers. (It is also
likely that I missed some of the driver authors especially for drivers
which didn't make any checks at all and do not set either of the supported
flags yet)

v1: https://lore.kernel.org/20250408-jk-supported-perout-flags-v1-0-d2f8e3df64f3@intel.com
====================

Link: https://patch.msgid.link/20250414-jk-supported-perout-flags-v2-0-f6b17d15475c@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ptp: introduce .supported_perout_flags to ptp_clock_info

The PTP_PEROUT_REQUEST2 ioctl has gained support for flags specifying
specific output behavior including PTP_PEROUT_ONE_SHOT,
PTP_PEROUT_DUTY_CYCLE, PTP_PEROUT_PHASE.

Driver authors are notorious for not checking the flags of the request.
This results in misinterpreting the request, generating an output signal
that does not match the requested value. It is anticipated that even more
flags will be added in the future, resulting in even more broken requests.

Expecting these issues to be caught during review or playing whack-a-mole
after the fact is not a great solution.

Instead, introduce the supported_perout_flags field in the ptp_clock_info
structure. Update the core character device logic to explicitly reject any
request which has a flag not on this list.

This ensures that drivers must 'opt in' to the flags they support. Drivers
which don't set the .supported_perout_flags field will not need to check
that unsupported flags aren't passed, as the core takes care of this.

Update the drivers which do support flags to set this new field.

Note the following driver files set n_per_out to a non-zero value but did
not check the flags at all:

• drivers/ptp/ptp_clockmatrix.c
• drivers/ptp/ptp_idt82p33.c
• drivers/ptp/ptp_fc3.c
• drivers/net/ethernet/ti/am65-cpts.c
• drivers/net/ethernet/aquantia/atlantic/aq_ptp.c
• drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c
• drivers/net/dsa/sja1105/sja1105_ptp.c
• drivers/net/ethernet/freescale/dpaa2/dpaa2-ptp.c
• drivers/net/ethernet/mscc/ocelot_vsc7514.c
• drivers/net/ethernet/intel/i40e/i40e_ptp.c

Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20250414-jk-supported-perout-flags-v2-2-f6b17d15475c@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ptp: introduce .supported_extts_flags to ptp_clock_info

The PTP_EXTTS_REQUEST(2) ioctl has a flags field which specifies how the
external timestamp request should behave. This includes which edge of the
signal to timestamp, as well as a specialized "offset" mode. It is expected
that more flags will be added in the future.

Driver authors routinely do not check the flags, often accepting requests
with flags which they do not support. Even drivers which do check flags may
not be future-proofed to reject flags not yet defined. Thus, any future
flag additions often require manually updating drivers to reject these
flags.

This approach of hoping we catch flag checks during review, or playing
whack-a-mole after the fact is the wrong approach.

Introduce the "supported_extts_flags" field to the ptp_clock_info
structure. This field defines the set of flags the device actually
supports.

Update the core character device logic to check this field and reject
unsupported requests. Getting this right is somewhat tricky. First, to
avoid unnecessary repetition and make basic functionality work when
.supported_extts_flags is 0, the core always accepts the PTP_ENABLE_FEATURE
flag. This flag is used to set the 'on' parameter to the .enable function
and is thus always 'supported' by all drivers.

For backwards compatibility, the PTP_RISING_EDGE and PTP_FALLING_EDGE flags
are merely "hints" when using the old PTP_EXTTS_REQUEST ioctl, and are not
expected to be enforced. If the user issues PTP_EXTTS_REQUEST2, the
PTP_STRICT_FLAGS flag is added which is supposed to inform the driver to
strictly validate the flags and reject unsupported requests. To handle
this, first check if the driver reports PTP_STRICT_FLAGS support. If it
does not, then always allow the PTP_RISING_EDGE and PTP_FALLING_EDGE flags.
This keeps backwards compatibility with the original PTP_EXTTS_REQUEST
ioctl where these flags are not guaranteed to be honored.

This way, drivers which do not set the supported_extts_flags will continue
to accept requests for the original PTP_EXTTS_REQUEST ioctl. The core will
automatically reject requests with new flags, and correctly reject requests
with PTP_STRICT_FLAGS, where the driver is supposed to strictly validate
the flags.

Update the various drivers, refactoring their validation logic into the
.supported_extts_flags field. For consistency and readability,
PTP_ENABLE_FEATURE is not set in the supported flags list, and
PTP_EXTTS_EDGES is expanded to PTP_RISING_EDGE | PTP_FALLING_EDGE in all
cases.

Note the following driver files set n_ext_ts to a non-zero value but did
not check flags at all:

• drivers/net/ethernet/freescale/dpaa2/dpaa2-ptp.c
• drivers/net/ethernet/freescale/enetc/enetc_ptp.c
• drivers/net/ethernet/intel/i40e/i40e_ptp.c
• drivers/net/ethernet/marvell/octeontx2/nic/otx2_ptp.c
• drivers/net/ethernet/renesas/ravb_ptp.c
• drivers/net/ethernet/renesas/rtsn.c
• drivers/net/ethernet/renesas/rtsn.h
• drivers/net/ethernet/ti/am65-cpts.c
• drivers/net/ethernet/ti/cpts.h
• drivers/net/ethernet/ti/icssg/icss_iep.c
• drivers/net/ethernet/xscale/ptp_ixp46x.c
• drivers/net/phy/bcm-phy-ptp.c
• drivers/ptp/ptp_ocp.c
• drivers/ptp/ptp_pch.c
• drivers/ptp/ptp_qoriq.c

These drivers behavior does change slightly: they will now reject the
PTP_EXTTS_REQUEST2 ioctl, because they do not strictly validate their
flags. This also makes them no longer incorrectly accept PTP_EXT_OFFSET.

Also note that the renesas ravb driver does not support PTP_STRICT_FLAGS.
We could leave the .supported_extts_flags as 0, but I added the
PTP_RISING_EDGE | PTP_FALLING_EDGE since the driver previously manually
validated these flags. This is equivalent to 0 because the core will allow
these flags regardless unless PTP_STRICT_FLAGS is also set.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20250414-jk-supported-perout-flags-v2-1-f6b17d15475c@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bridge: locally receive all multicast packets if IFF_ALLMULTI is set

If multicast snooping is enabled, multicast packets may not always end up
on the local bridge interface, if the host is not a member of the multicast
group. Similar to how IFF_PROMISC allows all packets to be received
locally, let IFF_ALLMULTI allow all multicast packets to be received.

OpenWrt uses a user space daemon for DHCPv6/RA/NDP handling, and in relay
mode it sets the ALLMULTI flag in order to receive all relevant queries on
the network.

This works for normal network interfaces and non-snooping bridges, but not
snooping bridges (unless multicast routing is enabled).

Reported-by: Felix Fietkau <nbd@nbd.name>
Closes: https://github.com/openwrt/openwrt/issues/15857#issuecomment-2662851243
Signed-off-by: Shengyu Qu <wiagn233@outlook.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/OSZPR01MB8434308370ACAFA90A22980798B32@OSZPR01MB8434.jpnprd01.prod.outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: remove redundant dependency on NETDEVICES for PHYLINK and PHYLIB

drivers/net/phy/Kconfig is included from drivers/net/Kconfig in an
"if NETDEVICES" section. Therefore we don't have to duplicate the
dependency here. And if e.g. PHYLINK is selected somewhere, then the
dependency is ignored anyway (see note in Kconfig help).

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/085892cd-aa11-4c22-bf8a-574a5c6dcd7c@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tc: Return an error if filters try to attach too many actions

While developing the fix for the buffer sizing issue in [0], I noticed
that the kernel will happily accept a long list of actions for a filter,
and then just silently truncate that list down to a maximum of 32
actions.

That seems less than ideal, so this patch changes the action parsing to
return an error message and refuse to create the filter in this case.
This results in an error like:

# ip link add type veth
# tc qdisc replace dev veth0 root handle 1: fq_codel
# tc -echo filter add dev veth0 parent 1: u32 match u32 0 0 $(for i in $(seq 33); do echo action pedit munge ip dport set 22; done)
Error: Only 32 actions supported per filter.
We have an error talking to the kernel

Instead of just creating a filter with 32 actions and dropping the last
one.

This is obviously a change in UAPI. But seeing as creating more than 32
filters has never actually *worked*, it seems that returning an explicit
error is better, and any use cases that get broken by this were already
broken just in more subtle ways.

[0] https://lore.kernel.org/r/20250407105542.16601-1-toke@redhat.com

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20250409145523.164506-1-toke@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

octeon_ep_vf: Remove octep_vf_wq

commit cb7dd712189f ("octeon_ep_vf: Add driver framework and device
initialization") added octep_vf_wq but it has never been used. Remove it.

Reported-by: Dr. David Alan Gilbert <linux@treblig.org>
Closes: https://lore.kernel.org/netdev/Z70bEoTKyeBau52q@gallifrey/
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Dr. David Alan Gilbert <linux@treblig.org>
Link: https://patch.msgid.link/20250414-octeon-wq-v1-1-23700e4bd208@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-stmmac-ingenic-cleanups'

Russell King says:

====================
net: stmmac: ingenic: cleanups

Another series for another stmmac glue platform.

Convert Ingenic to use the stmmac platform PM ops and the
devm_stmmac_pltfr_probe() helper.
====================

Link: https://patch.msgid.link/Z_0u9pA0Ziop-BuU@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: ingenic: convert to devm_stmmac_pltfr_probe()

As Ingenic now uses the stmmac platform PM ops, convert it to use
devm_stmmac_pltfr_probe() which will call the plat_dat->init() method
before stmmac_drv_probe() and appropriately cleaning up via the
->exit() method, thus simplifying the code. Using the devm_*()
variant also allows removal of the explicit call to
stmmac_pltfr_remove().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1u4M5S-000YGJ-9K@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: ingenic: convert to stmmac_pltfr_pm_ops

Convert the Ingenic glue driver to use the generic stmmac platform
power management operations.

In order to do this, we need to make ingenic_mac_init() arguments
compatible with plat_dat->init() by adding a plat_dat member to struct
ingenic_mac. This allows the custom suspend/resume operations to be
removed, and the PM ops pointer replaced with stmmac_pltfr_pm_ops.

This will adds runtime PM and noirq suspend/resume ops to this driver.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1u4M5N-000YGD-5i@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: imx: use stmmac_pltfr_probe()

Using stmmac_pltfr_probe() simplifies the probe function. This will not
only call plat_dat->init (imx_dwmac_init), but also plat_dat->exit
(imx_dwmac_exit) appropriately if stmmac_dvr_probe() fails. This
results in an overall simplification of the glue driver.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1u4Flp-000XlM-Tb@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-stmmac-anarion-cleanups'

Russell King says:

====================
net: stmmac: anarion: cleanups

A series of cleanups to the anarion glue driver.

Clean up anarion_config_dt() error handling, printing a human readable
error rather than the numeric errno, and use ERR_CAST().

Using a switch statement with incorrect "fallthrough;" for RGMII vs
non-RGMII is unnecessary when we have phy_interface_mode_is_rgmii().
Convert to use the helper.

Use stmmac_pltfr_probe() rahter than open-coding the call to the
init function (which stmmac_pltfr_probe() will do for us.)

Finally, convert to use devm_stmmac_pltfr_probe() which allows the
removal of the .remove initialiser in the driver structure.

Not tested on hardware.
====================

Link: https://patch.msgid.link/Z_zP9BvZlqeq3Ssl@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: anarion: use devm_stmmac_pltfr_probe()

Convert anarion to use devm_stmmac_pltfr_probe() which allows the
removal of an explicit call to stmmac_pltfr_remove().

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1u4Flf-000XjS-Fi@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: anarion: use stmmac_pltfr_probe()

Rather than open-coding the call to anarion_gmac_init() and then
stmmac_dvr_probe(), omitting the cleanup of calling
anarion_gmac_exit(), use stmmac_pltfr_probe() which will handle this
for us.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1u4Fla-000XjM-Bw@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: anarion: clean up interface parsing

anarion_config_dt() used a switch statement to check for the RGMII
modes, complete with an unnecessary "fallthrough", and also printed
the numerical value of the PHY interface mode on error. Clean this
up using the phy_interface_mode_is_rgmii() helper, and print the
English version of the PHY interface mode on error.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1u4FlV-000XjG-83@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: anarion: clean up anarion_config_dt() error handling

When enabled, print a user friendly description of the error when
failing to ioremap() the control resource, and use ERR_CAST() when
propagating the error. This allows us to get rid of the "err" local
variable in anarion_config_dt().

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1u4FlQ-000XjA-2V@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-introduce-nlmsg_payload-helper'

Breno Leitao says:

====================
net: Introduce nlmsg_payload helper

In the current codebase, there are multiple instances where the
structure size is checked before assigning it to a Netlink message. This
check is crucial for ensuring that the structure is correctly mapped
onto the Netlink message, providing a layer of security.

To streamline this process, Jakub Kicinski suggested creating a helper
function, `nlmsg_payload`, which verifies if the structure fits within
the message. If it does, the function returns the data; otherwise, it
returns NULL. This approach simplifies the code and reduces redundancy.

This patchset introduces the `nlmsg_payload` helper and updates several
parts of the code to use it. Further updates will follow in subsequent
patchsets.

v1: https://lore.kernel.org/20250411-nlmsg-v1-0-ddd4e065cb15@debian.org
====================

Link: https://patch.msgid.link/20250414-nlmsg-v2-0-3d90cb42c6af@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: fib_rules: Use nlmsg_payload in fib_{new,del}rule()

Leverage the new nlmsg_payload() helper to avoid checking for message
size and then reading the nlmsg data.

Suggested-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250414-nlmsg-v2-10-3d90cb42c6af@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: fib_rules: Use nlmsg_payload in fib_valid_dumprule_req

Leverage the new nlmsg_payload() helper to avoid checking for message
size and then reading the nlmsg data.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250414-nlmsg-v2-9-3d90cb42c6af@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mpls: Use nlmsg_payload in mpls_valid_getroute_req

Leverage the new nlmsg_payload() helper to avoid checking for message
size and then reading the nlmsg data.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250414-nlmsg-v2-8-3d90cb42c6af@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: Use nlmsg_payload in inet6_rtm_valid_getaddr_req

Leverage the new nlmsg_payload() helper to avoid checking for message
size and then reading the nlmsg data.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250414-nlmsg-v2-7-3d90cb42c6af@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: Use nlmsg_payload in inet6_valid_dump_ifaddr_req

Leverage the new nlmsg_payload() helper to avoid checking for message
size and then reading the nlmsg data.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250414-nlmsg-v2-6-3d90cb42c6af@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mpls: Use nlmsg_payload in mpls_valid_fib_dump_req

Leverage the new nlmsg_payload() helper to avoid checking for message
size and then reading the nlmsg data.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250414-nlmsg-v2-5-3d90cb42c6af@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rtnetlink: Use nlmsg_payload in valid_fdb_dump_strict

Leverage the new nlmsg_payload() helper to avoid checking for message
size and then reading the nlmsg data.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250414-nlmsg-v2-4-3d90cb42c6af@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

neighbour: Use nlmsg_payload in neigh_valid_get_req

Update neigh_valid_get_req function to utilize the new nlmsg_payload()
helper function.

This change improves code clarity and safety by ensuring that the
Netlink message payload is properly validated before accessing its data.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250414-nlmsg-v2-3-3d90cb42c6af@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

neighbour: Use nlmsg_payload in neightbl_valid_dump_info

Update neightbl_valid_dump_info function to utilize the new
nlmsg_payload() helper function.

This change improves code clarity and safety by ensuring that the
Netlink message payload is properly validated before accessing its data.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250414-nlmsg-v2-2-3d90cb42c6af@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netlink: Introduce nlmsg_payload helper

Create a new helper function, nlmsg_payload(), to simplify checking and
retrieving Netlink message payloads.

This reduces boilerplate code for users who need to verify the message
length before accessing its data.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250414-nlmsg-v2-1-3d90cb42c6af@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'qed-deadcoding'

Dr. David Alan Gilbert says:

====================
qed deadcoding

  This is a set of deadcode removals for the qed ethernet
device.  I've tried to avoid removing anything that
are trivial firmware wrappers.

  One odd one I've not removed is qed_bw_update(),
it doesn't seem to be called but looks like the only
caller of the bw_update(..) method which qedf does
define.  Perhaps qed_bw_update is supposed to be called
somewhere?
====================

Link: https://patch.msgid.link/20250414005247.341243-1-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

qed: Remove unused qed_db_recovery_dp

qed_db_recovery_dp() was added in 2018 as part of
commit 36907cd5cd72 ("qed: Add doorbell overflow recovery mechanism")
but has remained unused.

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Link: https://patch.msgid.link/20250414005247.341243-6-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

qed: Remove unused qed_print_mcp_trace_*

While most of the trace code is reachable by other routes
(I think mostly via the qed_features_lookup[] array), there
are a couple of unused wrappers.

qed_print_mcp_trace_line() and qed_print_mcp_trace_results_cont()
were added in 2018 as part of
commit a3f723079df8 ("qed*: Utilize FW 8.37.7.0")
but have remained unused.

Remove them.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Link: https://patch.msgid.link/20250414005247.341243-5-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

qed: Remove unused qed_ptt_invalidate

qed_ptt_invalidate() was added in 2015 as part of
commit fe56b9e6a8d9 ("qed: Add module with basic common support")
but has remained unused.

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Link: https://patch.msgid.link/20250414005247.341243-4-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

qed: Remove unused qed_calc_*_ctx_validation functions

qed_calc_session_ctx_validation() and qed_calc_task_ctx_validation()
were added as part of 2017's
commit da09091732ae ("qed*: Utilize FW 8.33.1.0")

but have remained unused.

Remove them.

This leaves; con_region_offsets[], task_region_offsets[],
cdu_crc8_table and qed_calc_cdu_validation_byte() unused.

Remove them.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Link: https://patch.msgid.link/20250414005247.341243-3-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

qed: Remove unused qed_memset_*ctx functions

qed_memset_session_ctx() and qed_memset_task_ctx() were added in 2017
as part of
commit da09091732ae ("qed*: Utilize FW 8.33.1.0")

but have not been used.

Remove them.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Link: https://patch.msgid.link/20250414005247.341243-2-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: remove device_phy_find_device

AFAICS this function has never had a user.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/ab7b8094-2eea-4e82-a047-fd60117f220b@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mptcp-various-small-and-unrelated-improvements'

Matthieu Baerts says:

====================
mptcp: various small and unrelated improvements

Here are various unrelated patches:

- Patch 1: sched: remove unused structure.

- Patch 2: sched: split the validation part, a preparation for later.

- Patch 3: pm: clarify code, not to think there is a possible UaF.
Note: a previous version has already been sent individually to Netdev.

- Patch 4: subflow: simplify subflow_hmac_valid by passing subflow_req.

- Patch 5: mib: add counter for MPJoin rejected by the PM.

- Patch 6: selftests: validate this new MPJoinRejected counter.

- Patch 7: selftests: define nlh variable only where needed.

- Patch 8: selftests: show how to use IPPROTO_MPTCP with getaddrinfo.
Note: a previous version has already been sent individually to Netdev.

v1: https://lore.kernel.org/20250411-net-next-mptcp-sched-mib-sft-misc-v1-0-85ac8c6654c3@kernel.org
====================

Link: https://patch.msgid.link/20250413-net-next-mptcp-sched-mib-sft-misc-v2-0-0f83a4350150@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mptcp: use IPPROTO_MPTCP for getaddrinfo

mptcp_connect.c is a startup tutorial of MPTCP programming, however
there is a lack of ai_protocol(IPPROTO_MPTCP) usage. Add comment for
getaddrinfo MPTCP support.

This patch first uses IPPROTO_MPTCP to get addrinfo, and if glibc
version is too old, it falls back to using IPPROTO_TCP.

Co-developed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250413-net-next-mptcp-sched-mib-sft-misc-v2-8-0f83a4350150@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mptcp: diag: drop nlh parameter of recv_nlmsg

It's strange that 'nlh' variable is set to NULL in get_mptcpinfo() and then
this NULL pointer is passed to recv_nlmsg(). In fact, this variable should
be defined in recv_nlmsg(), not get_mptcpinfo().

So this patch drops this useless 'nlh' parameter of recv_nlmsg() and define
'nlh' variable in recv_nlmsg().

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250413-net-next-mptcp-sched-mib-sft-misc-v2-7-0f83a4350150@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mptcp: validate MPJoinRejected counter

The parent commit adds this new counter, incremented when receiving a
connection request, if the PM didn't allow the creation of new subflows.

Most of the time, it is then kept at 0, except when the PM limits cause
the receiver side to reject new MPJoin connections. This is the case in
the following tests:

- single subflow, limited by server
- multiple subflows, limited by server
- subflows limited by server w cookies
- userspace pm type rejects join
- userspace pm type prevents mp_prio

Simply set join_syn_rej=1 when checking the MPJoin counters for these
tests.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250413-net-next-mptcp-sched-mib-sft-misc-v2-6-0f83a4350150@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: add MPJoinRejected MIB counter

This counter is useful to understand why some paths are rejected, and
not created as expected.

It is incremented when receiving a connection request, if the PM didn't
allow the creation of new subflows.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250413-net-next-mptcp-sched-mib-sft-misc-v2-5-0f83a4350150@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: pass right struct to subflow_hmac_valid

subflow_hmac_valid() needs to access the MPTCP socket and the subflow
request, but not the request sock that is passed in argument.

Instead, the subflow request can be directly passed to avoid getting it
via an additional cast.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250413-net-next-mptcp-sched-mib-sft-misc-v2-4-0f83a4350150@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: pm: Return local variable instead of freed pointer

Commit e4c28e3d5c090 ("mptcp: pm: move generic PM helpers to pm.c")
removed an unnecessary if-check, which resulted in returning a freed
pointer.

This still works due to the implicit boolean conversion when returning
the freed pointer from mptcp_remove_anno_list_by_saddr(), but it can be
confusing and potentially error-prone. To improve clarity, add a local
variable to explicitly return a boolean value instead.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250413-net-next-mptcp-sched-mib-sft-misc-v2-3-0f83a4350150@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: sched: split validation part

A new interface .validate has been added in struct bpf_struct_ops
recently. This patch prepares a future struct_ops support by
implementing it as a new helper mptcp_validate_scheduler() for struct
mptcp_sched_ops.

In this helper, check whether the required ops "get_subflow" of struct
mptcp_sched_ops has been implemented.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250413-net-next-mptcp-sched-mib-sft-misc-v2-2-0f83a4350150@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: sched: remove mptcp_sched_data

This is a follow-up of commit b68b106b0f15 ("mptcp: sched: reduce size
for unused data"), now removing the mptcp_sched_data structure.

Now is a good time to do that, because the previously mentioned WIP work
has been updated, no longer depending on this structure.

Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250413-net-next-mptcp-sched-mib-sft-misc-v2-1-0f83a4350150@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: txgbe: Update module description

Because of the addition of support for 25G/40G devices, update the module
description.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20250414022421.375101-1-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ncsi: Fix GCPS 64-bit member variables

Correct Get Controller Packet Statistics (GCPS) 64-bit wide member
variables, as per DSP0222 v1.0.0 and forward specs. The Driver currently
collects these stats, but they are yet to be exposed to the user.
Therefore, no user impact.

Statistics fixes:
Total Bytes Received (byte range 28..35)
Total Bytes Transmitted (byte range 36..43)
Total Unicast Packets Received (byte range 44..51)
Total Multicast Packets Received (byte range 52..59)
Total Broadcast Packets Received (byte range 60..67)
Total Unicast Packets Transmitted (byte range 68..75)
Total Multicast Packets Transmitted (byte range 76..83)
Total Broadcast Packets Transmitted (byte range 84..91)
Valid Bytes Received (byte range 204..11)

Signed-off-by: Hari Kalavakunta <kalavakunta.hari.prasad@gmail.com>
Reviewed-by: Paul Fertser <fercerpav@gmail.com>
Link: https://patch.msgid.link/20250410012309.1343-1-kalavakunta.hari.prasad@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

tipc: Removing deprecated strncpy()

This patch suggests the replacement of strncpy with strscpy
as per Documentation/process/deprecated.
The strncpy() fails to guarantee NULL termination,
The function adds zero pads which isn't really convenient for short strings
as it may cause performance issues.

strscpy() is a preferred replacement because
it overcomes the limitations of strncpy mentioned above.

Compile Tested

Signed-off-by: Kevin Paul Reddy Janagari <kevinpaul468@gmail.com>
Reviewed-by: Tung Nguyen <tung.quang.nguyen@est.tech>
Tested-by: Tung Nguyen <tung.quang.nguyen@est.tech>
Link: https://patch.msgid.link/20250411085010.6249-1-kevinpaul468@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'net-dsa-mt7530-modernize-mib-handling-fix'

Christian Marangi says:

====================
net: dsa: mt7530: modernize MIB handling + fix

This small series modernize MIB handling for MT7530 and also
implement .get_stats64.

It was reported that kernel and Switch MIB desync in scenario where
a packet is forwarded from a port to another. In such case, the
forwarding is offloaded and the kernel is not aware of the
transmitted packet. To handle this, read the counter directly
from Switch registers.
====================

Link: https://patch.msgid.link/20250410163022.3695-1-ansuelsmth@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dsa: mt7530: implement .get_stats64

It was reported that the internally calculated counter might differ from
the real one from the Switch MIB. This can happen if the switch directly
forward packets between the ports or offload small packets like ARP
request. In such case, the kernel counter will desync compared to the
real one transmitted and received by the Switch.

To correctly provide the real info to the kernel, implement .get_stats64
that will directly read the current MIB counter from the switch
register.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Link: https://patch.msgid.link/20250410163022.3695-7-ansuelsmth@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dsa: mt7530: move remaining MIB counter to define

For consistency with the other MIB counter, move also the remaining MIB
counter to define and update the custom MIB table.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Link: https://patch.msgid.link/20250410163022.3695-6-ansuelsmth@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dsa: mt7530: move pkt stats and err MIB counter to eth_mac stats API

Drop custom handling of TX/RX packet stats and error MIB counter and handle
them in the standard .get_eth_mac_stats API

The MIB entry are dropped from the custom MIB table and converted to
a define providing only the MIB offset.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Link: https://patch.msgid.link/20250410163022.3695-5-ansuelsmth@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dsa: mt7530: move pause MIB counter to eth_ctrl stats API

Drop custom handling of TX/RX pause frame MIB counter and handle
them in the standard .get_eth_ctrl_stats API

The MIB entry are dropped from the custom MIB table and converted to
a define providing only the MIB offset.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Link: https://patch.msgid.link/20250410163022.3695-4-ansuelsmth@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dsa: mt7530: move pkt size and rx err MIB counter to rmon stats API

Drop custom handling of packet size and RX error MIB counter and handle
them in the standard .get_rmon_stats API

The MIB entry are dropped from the custom MIB table and converted to
a define providing only the MIB offset.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Link: https://patch.msgid.link/20250410163022.3695-3-ansuelsmth@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dsa: mt7530: generalize read port stats logic

In preparation for migration to use of standard MIB API, generalize the
read port stats logic to a dedicated function.

This will permit to manually provide the offset and size of the MIB
counter to directly access specific counter.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Link: https://patch.msgid.link/20250410163022.3695-2-ansuelsmth@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'eth-fbnic-extend-hardware-stats-coverage'

Mohsin Bashir says:

====================
eth: fbnic: extend hardware stats coverage

This patch series extends the coverage for hardware stats reported via
`ethtool -S`, queue API, and rtnl link stats. The patchset is organized
as follow:

- The first patch adds locking support to protect hardware stats.
- The second patch provides coverage to the hardware queue stats.
- The third patch covers the RX buffer related stats.
- The fourth patch covers the TMI (TX MAC Interface) stats.
- The last patch cover the TTI (TX TEI Interface) stats.
====================

Link: https://patch.msgid.link/20250410070859.4160768-1-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

eth: fbnic: add support for TTI HW stats

Add coverage for the TX Extension (TEI) Interface (TTI) stats. We are
tracking packets and control message drops because of credit exhaustion
on the TX interface.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250410070859.4160768-6-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

eth: fbnic: add support for TMI stats

This patch add coverage for TMI stats including PTP stats and drop
stats.

PTP stats include illegal requests, bad timestamp and good timestamps.
The bad timestamp and illegal request counters are reported under as
`error` via `ethtool -T` Both these counters are individually being
reported via `ethtool -S`

The good timestamp stats are being reported as `pkts` via `ethtool -T`

ethtool -S eth0 | grep "ptp"
     ptp_illegal_req: 0
     ptp_good_ts: 0
     ptp_bad_ts: 0

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250410070859.4160768-5-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

eth: fbnic: add coverage for RXB stats

This patch provides coverage to the RXB (RX Buffer) stats. RXB stats
are divided into 3 sections: RXB enqueue, RXB FIFO, and RXB dequeue
stats.

The RXB enqueue/dequeue stats are indexed from 0-3 and cater for the
input/output counters whereas, the RXB fifo stats are indexed from 0-7.

The RXB also supports pause frame stats counters which we are leaving
for a later patch.

ethtool -S eth0 | grep rxb
     rxb_integrity_err0: 0
     rxb_mac_err0: 0
     rxb_parser_err0: 0
     rxb_frm_err0: 0
     rxb_drbo0_frames: 1433543
     rxb_drbo0_bytes: 775949081
     ---
     ---
     rxb_intf3_frames: 1195711
     rxb_intf3_bytes: 739650210
     rxb_pbuf3_frames: 1195711
     rxb_pbuf3_bytes: 765948092

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250410070859.4160768-4-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

eth: fbnic: add coverage for hw queue stats

This patch provides support for hardware queue stats and covers
packet errors for RX-DMA engine, RCQ drops and BDQ drops.

The packet errors are also aggregated with the `rx_errors` stats in the
`rtnl_link_stats` as well as with the `hw_drops` in the queue API.

The RCQ and BDQ drops are aggregated with `rx_over_errors` in the
`rtnl_link_stats` as well as with the `hw_drop_overruns` in the queue API.

ethtool -S eth0 | grep -E 'rde'
     rde_0_pkt_err: 0
     rde_0_pkt_cq_drop: 0
     rde_0_pkt_bdq_drop: 0
     ---
     ---
     rde_127_pkt_err: 0
     rde_127_pkt_cq_drop: 0
     rde_127_pkt_bdq_drop: 0

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250410070859.4160768-3-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

eth: fbnic: add locking support for hw stats

This patch adds lock protection for the hardware statistics for fbnic.
The hardware statistics access via ndo_get_stats64 is not protected by
the rtnl_lock(). Since these stats can be accessed from different places
in the code such as service task, ethtool, Q-API, and net_device_ops, a
lock-less approach can lead to races.

Note that this patch is not a fix rather, just a prep for the subsequent
changes in this series.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250410070859.4160768-2-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: mdio: Add RTL9300 MDIO driver

Add a driver for the MDIO controller on the RTL9300 family of Ethernet
switches with integrated SoC. There are 4 physical SMI interfaces on the
RTL9300 however access is done using the switch ports. The driver takes
the MDIO bus hierarchy from the DTS and uses this to configure the
switch ports so they are associated with the correct PHY. This mapping
is also used when dealing with software requests from phylib.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250409231554.3943115-1-chris.packham@alliedtelesis.co.nz
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'net-stmmac-qcom-ethqos-simplifications'

Russell King says:

====================
net: stmmac: qcom-ethqos: simplifications

Remove unnecessary code from the qcom-ethqos glue driver.

Start by consistently using -> serdes_speed to set the speed of the
serdes PHY rather than sometimes using ->serdes_speed and sometimes
using ->speed.

This then allows the removal of ->speed in the second patch.

There is no need to set the maximum speed just because we're using
2500BASE-X - phylink already knows that 2500BASE-X can't support
faster speeds.

This then makes qcom_ethqos_speed_mode_2500() redundant as it's
setting the interface mode to the value that was determined in the
switch statement that already determined that the interface mode
had this value.

Not tested on hardware.
====================

Link: https://patch.msgid.link/Z_p0LzY2_HFupWK0@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: qcom-ethqos: remove speed_mode_2500() method

qcom-ethqos doesn't need to implement the speed_mode_2500() method as
it is only setting priv->plat->phy_interface to 2500BASE-X, which is
already a pre-condition for assigning speed_mode_2500 in
qcom_ethqos_probe(). So, qcom_ethqos_speed_mode_2500() has no effect.
Remove it.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1u3bYa-000EcW-H1@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: qcom-ethqos: remove unnecessary setting max_speed

Phylink will already limit the MAC speed according to the interface,
so if 2500BASE-X is selected, the maximum speed will be 2.5G. It is,
therefore, not necessary to set a speed limit. Remove setting
plat_dat->max_speed from this glue driver.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1u3bYV-000EcQ-Cv@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: qcom-ethqos: remove ethqos->speed

Rather than ethqos_fix_mac_speed() storing the speed in struct
qcom_ethqos and then functions that are only called from here reading
that speed, pass the speed to the called functions instead.

This removes all readers of this struct member, which then allows the
removal of the two places that set its value and the struct member.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1u3bYQ-000EcK-9K@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: qcom-ethqos: set serdes speed using serdes_speed

ethqos->serdes_speed represents the current speed the serdes was
configured for, which should be the same as ethqos->speed. Since we
wish to remove ethqos->speed to simplify the code, switch to using the
serdes_speed instead.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1u3bYL-000EcE-5c@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'rxrpc-afs-add-afs-gssapi-security-class-to-af_rxrpc-and-kafs'

David Howells says:

====================
rxrpc, afs: Add AFS GSSAPI security class to AF_RXRPC and kafs

Here's a set of patches to add basic support for the AFS GSSAPI security
class to AF_RXRPC and kafs.  It provides transport security for keys that
match the security index 6 (YFS) for connections to the AFS fileserver and
VL server.

Note that security index 4 (OpenAFS) can also be supported using this, but
it needs more work as it's slightly different.

The patches also provide the ability to secure the callback channel -
connections from the fileserver back to the client that are used to pass
file change notifications, amongst other things.  When challenged by the
fileserver, kafs will generate a token specific to that server and include
it in the RESPONSE packet as the appdata.  The server then extracts this
and uses it to send callback RPC calls back to the client.

It can also be used to provide transport security on the callback channel,
but a further set of patches is required to provide the token and key to
set that up when the client responds to the fileserver's challenge.

This makes use of the previously added crypto-krb5 library that is now
upstream (last commit fc0cf10c04f4).

This series of patches consist of the following parts:

(0) Update kdoc comments to remove some kdoc builder warnings.

(1) Push reponding to CHALLENGE packets over to recvmsg() or the kernel
     equivalent so that the application layer can include user-defined
     information in the RESPONSE packet.  In a follow-up patch set, this
     will allow the callback channel to be secured by the AFS filesystem.

(2) Add the AF_RXRPC RxGK security class that uses a key obtained from the
     AFS GSS security service to do Kerberos 5-based encryption instead of
     pcbc(fcrypt) and pcbc(des).

(3) Add support for callback channel encryption in kafs.

(4) Provide the test rxperf server module with some fixed krb5 keys.
====================

Link: https://patch.msgid.link/20250411095303.2316168-1-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rxrpc: rxperf: Add test RxGK server keys

Add RxGK server keys of bytes containing { 0, 1, 2, 3, 4, ... } to the
server keyring for the rxperf test server. This allows the rxperf test
client to connect to it.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20250411095303.2316168-15-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rxrpc: Add more CHALLENGE/RESPONSE packet tracing

Add more tracing for CHALLENGE and RESPONSE packets. Currently, rxrpc only
has client-relevant tracepoints (rx_challenge and tx_response), but add the
server-side ones too.

Further, record the service ID in the rx_challenge tracepoint as well.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20250411095303.2316168-14-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

afs: Use rxgk RESPONSE to pass token for callback channel

Implement in kafs the hook for adding appdata into a RESPONSE packet
generated in response to an RxGK CHALLENGE packet, and include the key for
securing the callback channel so that notifications from the fileserver get
encrypted.

This will be necessary when more complex notifications are used that convey
changed data around.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20250411095303.2316168-13-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rxrpc: Display security params in the afs_cb_call tracepoint

Make the afs_cb_call tracepoint display some security parameters to make
debugging easier.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20250411095303.2316168-12-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rxrpc: Allow the app to store private data on peer structs

Provide a way for the application (e.g. the afs filesystem) to store
private data on the rxrpc_peer structs for later retrieval via the call
object.

This will allow afs to store a pointer to the afs_server object on the
rxrpc_peer struct, thereby obviating the need for afs to keep lookup tables
by which it can associate an incoming call with server that transmitted it.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20250411095303.2316168-11-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rxrpc: rxgk: Implement connection rekeying

Implement rekeying of connections with the RxGK security class. This
involves regenerating the keys with a different key number as part of the
input data after a certain amount of time or a certain amount of bytes
encrypted. Rekeying may be triggered by either end.

The LSW of the key number is inserted into the security-specific field in
the RX header, and we try and expand it to 32-bits to make it last longer.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20250411095303.2316168-10-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)

Implement the basic parts of the yfs-rxgk security class (security index 6)
to support GSSAPI-negotiated security.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20250411095303.2316168-9-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rxrpc: rxgk: Provide infrastructure and key derivation

Provide some infrastructure for implementing the RxGK transport security
class:

(1) A definition of an encoding type, including:

- Relevant crypto-layer names
- Lengths of the crypto keys and checksums involved
- Crypto functions specific to the encoding type
- Crypto scheme used for that type

(2) A definition of a crypto scheme, including:

- Underlying crypto handlers
- The pseudo-random function, PRF, used in base key derivation
- Functions for deriving usage keys Kc, Ke and Ki
- Functions for en/decrypting parts of an sk_buff

(3) A key context, with the usage keys required for a derivative of a
     transport key for a specific key number.  This includes keys for
     securing packets for transmission, extracting received packets and
     dealing with response packets.

(3) A function to look up an encoding type by number.

(4) A function to set up a key context and derive the keys.

(5) A function to set up the keys required to extract the ticket obtained
     from the GSS negotiation in the server.

(6) Miscellaneous functions for context handling.

The keys and key derivation functions are described in:

tools.ietf.org/html/draft-wilkinson-afs3-rxgk-11

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20250411095303.2316168-8-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rxrpc: Add YFS RxGK (GSSAPI) security class

Add support for the YFS-variant RxGK security class to support
GSSAPI-derived authentication.  This also allows the use of better crypto
over the rxkad security class.

The key payload is XDR encoded of the form:

    typedef int64_t opr_time;

    const AFSTOKEN_RK_TIX_MAX = 12000; /* Matches entry in rxkad.h */

    struct token_rxkad {
afs_int32 viceid;
afs_int32 kvno;
afs_int64 key;
afs_int32 begintime;
afs_int32 endtime;
afs_int32 primary_flag;
opaque ticket<AFSTOKEN_RK_TIX_MAX>;
    };

    struct token_rxgk {
opr_time begintime;
opr_time endtime;
afs_int64 level;
afs_int64 lifetime;
afs_int64 bytelife;
afs_int64 enctype;
opaque key<>;
opaque ticket<>;
    };

    const AFSTOKEN_UNION_NOAUTH = 0;
    const AFSTOKEN_UNION_KAD = 2;
    const AFSTOKEN_UNION_YFSGK = 6;

    union ktc_tokenUnion switch (afs_int32 type) {
case AFSTOKEN_UNION_KAD:
    token_rxkad kad;
case AFSTOKEN_UNION_YFSGK:
    token_rxgk  gk;
    };

    const AFSTOKEN_LENGTH_MAX = 16384;
    typedef opaque token_opaque<AFSTOKEN_LENGTH_MAX>;

    const AFSTOKEN_MAX = 8;
    const AFSTOKEN_CELL_MAX = 64;

    struct ktc_setTokenData {
afs_int32 flags;
string cell<AFSTOKEN_CELL_MAX>;
token_opaque tokens<AFSTOKEN_MAX>;
    };

The parser for the basic token struct is already present, as is the rxkad
token type.  This adds a parser for the rxgk token type.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20250411095303.2316168-7-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rxrpc: Add the security index for yfs-rxgk

Add the security index and abort codes for the YFS variant of rxgk.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://patch.msgid.link/20250411095303.2316168-6-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rxrpc: Allow CHALLENGEs to the passed to the app for a RESPONSE

Allow the app to request that CHALLENGEs be passed to it through an
out-of-band queue that allows recvmsg() to pick it up so that the app can
add data to it with sendmsg().

This will allow the application (AFS or userspace) to interact with the
process if it wants to and put values into user-defined fields. This will
be used by AFS when talking to a fileserver to supply that fileserver with
a crypto key by which callback RPCs can be encrypted (ie. notifications
from the fileserver to the client).

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20250411095303.2316168-5-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rxrpc: Remove some socket lock acquire/release annotations

Remove some socket lock acquire/release annotations as lock_sock() and
release_sock() don't have them and so the checker gets confused. Removing
all of them, however, causes warnings about "context imbalance" and "wrong
count at exit" to occur instead.

Probably lock_sock() and release_sock() should have annotations on
indicating their taking of sk_lock - there is a dep_map in socket_lock_t,
but I don't know if that matters to the static checker.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20250411095303.2316168-4-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rxrpc: Pull out certain app callback funcs into an ops table

A number of functions separately furnish an AF_RXRPC socket with callback
function pointers into a kernel app (such as the AFS filesystem) that is
using it. Replace most of these with an ops table for the entire socket.
This makes it easier to add more callback functions.

Note that the call incoming data processing callback is retaind as that
gets set to different things, depending on the type of op.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20250411095303.2316168-3-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rxrpc: kdoc: Update function descriptions and add link from rxrpc.rst

Update the kerneldoc function descriptions to add "Return:" sections for
AF_RXRPC exported functions that have return values to stop the kdoc
builder from throwing warnings.

Also add links from the rxrpc.rst API doc to add a function API reference
at the end. (Note that the API doc really needs updating, but that's
beyond this patchset).

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20250411095303.2316168-2-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-mlx5-hws-refactor-action-ste-handling'

Tariq Toukan says:

====================
net/mlx5: HWS, Refactor action STE handling

This patch series by Vlad refactors how action STEs are handled for
hardware steering.

Definitions
----------

* STE (Steering Table Entry): a building block for steering rules.
  Simple rules consist of a single STE that specifies both the match
  value and what actions to do. For more complex rules we have one or
  more match STEs that point to one or more action STEs. It is these
  action STEs which this patch series is primarily concerned with.

* RTC (Rule Table Context): a table that contains STEs. A matcher
  currently consists of a match RTC and, if necessary, an action RTC.
  This patch series decouples action RTCs from matchers and moves action
  RTCs to a central pool.

* Matcher: a logical container for steering rules. While the items above
  describe hardware concepts, a matcher is purely a software construct.

Current situation
-----------------

As mentioned above, a matcher currently consists of a match RTC (or
more, in case of complex matchers) and zero or one action STCs. An
action STC is only allocated if the matcher contains sufficiently
complicated action templates, or many actions.

When adding a rule, we decide based on its action template whether it
requires action STEs. If yes, we allocate the required number of action
STEs from the matcher's action STE.

When updating a rule, we need to prevent the rule ever being in an
invalid state. So we need to allocate and write new action STEs first,
then update the match STE to point to them, and finally release the old
action STEs. So there is a state when a rule needs double the action
STEs it normally uses.

Thus, for a given matcher of log_sz=N, log_action_ste_sz=A, the action
STC log_size is (N + A + 1). We need enough space to hold all the rules'
action STEs, and effectively double that space to account for the not
very common case of rules being updated. We could manage with much fewer
extra action STEs, but RTCs are allocated in powers of two. This results
in effective utilization of action RTCs of 50%, outside rule update
cases.

This is further complicated when resizing matchers. To avoid updating
all the rules to point to new match STEs, we keep existing action RTCs
around as resize_data, and only free them when the matcher is freed.

Action STE pool
---------------

This patch series decouples action RTCs from matchers by creating a
per-queue pool. When a rule needs to allocate action STEs it does so
from the pool, creating a new RTC if needed. During update two sets of
action STEs are in use, possibly from different RTCs.

The pool is sharded per-queue to avoid lock contention. Each per-queue
pool consists of 3 elements, corresponding to rx-only, tx-only and
rx-and-tx use cases. The series takes this approach because rules that
are bidirectional require that their action STEs have the same index in
the rx- and tx-RTCs, and using a single RTC would result in
unidirectional rules wasting the STEs for the unused direction.

Pool elements, in turn, consist of a list of RTCs. The driver
progressively allocates larger RTCs as they are needed to amortize the
cost of allocation.

Allocation of elements (STEs) inside RTCs is modelled by an existing
mechanism, somewhat confusingly also known as a pool. The first few
patches in the series refactor this abstraction to simplify it and adapt
it to the new schema.

Finally, this series implements periodic cleanup of unused action RTCs
as a new feature. Previously, once a matcher allocated an action RTC, it
would only be freed when the matcher is freed. This resulted in a lot of
wasted memory for matchers that had previously grown, but were now
mostly unused.

Conversely, action STE pools have a timestamp of when they were last
used. A cleanup routine periodically checks all pools. If a pool's last
usage was too far in the past, it is destroyed.

Benchmarks
----------

The test module creates a batch of (1 << 18) rules per queue and then
deletes them, in a loop. The rules are complex enough to require two
action STEs per rule.

Each queue is manipulated from a separate kernel workqueue, so there is
a 1:1 correspondence between threads and queues.

There are sleep statements between insert and delete batches so that
memory usage can be evaluated using `free -m`. The numbers below are the
diff between base memory usage (without the mlx5 module inserted) and
peak usage while running a test. The values are rounded to the nearest
hundred megabytes. The `queues` column lists how many queues the test
used.

queues          mem_before      mem_after
1               1300M            800M
4               4000M           2300M
8               7300M           3300M

Across all of the tests, insertion and deletion rates are the same
before and after these patches.

Summary of the patches
----------------------

* Patch 1: Fix matcher action template attach to avoid overrunning the
  buffer and correctly report errors.
* Patches 2-7: Cleanup the existing pool abstraction. Clarify semantics,
  and use cases, simplify API and callers.
* Patch 8: Implement the new action STE pool structure.
* Patch 9: Use the action STE pool when manipulating rules.
* Patch 10: Remove action RTC from matcher.
* Patch 11: Add logic to periodically check and free unused action RTCs.
* Patch 12: Export action STE tables in debugfs for our dump tool.
====================

Link: https://patch.msgid.link/1744312662-356571-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: HWS, Export action STE tables to debugfs

Introduce a new type of dump object and dump all action STE tables,
along with information on their RTCs and STEs.

Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Hamdan Agbariya <hamdani@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Link: https://patch.msgid.link/1744312662-356571-13-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: HWS, Free unused action STE tables

Periodically check for unused action STE tables and free their
associated resources. In order to do this safely, add a per-queue lock
to synchronize the garbage collect work with regular operations on
steering rules.

Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Link: https://patch.msgid.link/1744312662-356571-12-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: HWS, Cleanup matcher action STE table

Remove the matcher action STE implementation now that the code uses
per-queue action STE pools. This also allows simplifying matcher code
because it is now only handling a single type of RTC/STE.

The matcher resize data is also going away. Matchers were saving old
action STE data because the rules still used it, but now that data lives
in the action STE pool and is no longer coupled to a matcher.

Furthermore, matchers no longer need to rehash a due to action template
addition. If a new action template needs more action STEs, we simply
update the matcher's num_of_action_stes and future rules will allocate
the correct number. Existing rules are unaffected by such an operation
and can continue to use their existing action STEs.

The range action was using the matcher action STE implementation, but
there was no reason to do this other than the container fitting the
purpose. Extract that information to a separate structure.

Finally, stop dumping per-matcher information about action RTCs,
because they no longer exist. A later patch in this series will add
support for dumping action STE pools.

Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Link: https://patch.msgid.link/1744312662-356571-11-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: HWS, Use the new action STE pool

Use the central action STE pool when creating / updating rules.

Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Link: https://patch.msgid.link/1744312662-356571-10-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: HWS, Implement action STE pool

Implement a per-queue pool of action STEs that match STEs can link to,
regardless of matcher.

The code relies on hints to optimize whether a given rule is added to
rx-only, tx-only or both. Correspondingly, action STEs need to be added
to different RTC for ingress or egress paths. For rx-and-tx rules, the
current rule implementation dictates that the offsets for a given rule
must be the same in both RTCs.

To avoid wasting STEs, each action STE pool element holds 3 pools:
rx-only, tx-only, and rx-and-tx, corresponding to the possible values of
the pool optimization enum. The implementation then chooses at rule
creation / update which of these elements to allocate from.

Each element holds multiple action STE tables, which wrap an RTC, an STE
range, the logic to buddy-allocate offsets from the range, and an STC
that allows match STEs to point to this table. When allocating offsets
from an element, we iterate through available action STE tables and, if
needed, create a new table.

Similar to the previous implementation, this iteration does not free any
resources. This is implemented in a subsequent patch.

Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Link: https://patch.msgid.link/1744312662-356571-9-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: HWS, Fix pool size optimization

The optimization to create a size-one STE range for the unused direction
was broken. The hardware prevents us from creating RTCs over unallocated
STE space, so the only reason this has worked so far is because the
optimization was never used.

Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Link: https://patch.msgid.link/1744312662-356571-8-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: HWS, Add fullness tracking to pool

Future users will need to query whether a pool is empty.

Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Link: https://patch.msgid.link/1744312662-356571-7-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: HWS, Cleanup after pool refactoring

Remove members which are now no longer used. In fact, many of the
`struct mlx5hws_pool_chunk` were not even written to beyond being
initialized, but they were used in various internals.

Also cleanup some local variables which made more sense when the API was
thicker.

Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Link: https://patch.msgid.link/1744312662-356571-6-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>