git.monstr.eu Git - linux-2.6-microblaze.git/log

ipv6: fib6: avoid indirect calls from fib6_rule_lookup

It was reported that a considerable amount of cycles were spent on the
expensive indirect calls on fib6_rule_lookup. This patch introduces an
inline helper called pol_route_func that uses the indirect_call_wrappers
to avoid the indirect calls.

This patch saves around 50ns per call.

Performance was measured on the receiver by checking the amount of
syncookies that server was able to generate under a synflood load.

Traffic was generated using trafgen[1] which was pushing around 1Mpps on
a single queue. Receiver was using only one rx queue which help to
create a bottle neck and make the experiment rx-bounded.

These are the syncookies generated over 10s from the different runs:

Whithout the patch:
TcpExtSyncookiesSent            3553749            0.0
TcpExtSyncookiesSent            3550895            0.0
TcpExtSyncookiesSent            3553845            0.0
TcpExtSyncookiesSent            3541050            0.0
TcpExtSyncookiesSent            3539921            0.0
TcpExtSyncookiesSent            3557659            0.0
TcpExtSyncookiesSent            3526812            0.0
TcpExtSyncookiesSent            3536121            0.0
TcpExtSyncookiesSent            3529963            0.0
TcpExtSyncookiesSent            3536319            0.0

With the patch:
TcpExtSyncookiesSent            3611786            0.0
TcpExtSyncookiesSent            3596682            0.0
TcpExtSyncookiesSent            3606878            0.0
TcpExtSyncookiesSent            3599564            0.0
TcpExtSyncookiesSent            3601304            0.0
TcpExtSyncookiesSent            3609249            0.0
TcpExtSyncookiesSent            3617437            0.0
TcpExtSyncookiesSent            3608765            0.0
TcpExtSyncookiesSent            3620205            0.0
TcpExtSyncookiesSent            3601895            0.0

Without the patch the average is 354263 pkt/s or 2822 ns/pkt and with
the patch the average is 360738 pkt/s or 2772 ns/pkt which gives an
estimate of 50 ns per packet.

[1] http://netsniff-ng.org/

Changelog since v1:
- Change ordering in the ICW (Paolo Abeni)

Cc: Luigi Rizzo <lrizzo@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

indirect_call_wrapper: extend indirect wrapper to support up to 4 calls

There are many places where 2 annotations are not enough. This patch
adds INDIRECT_CALL_3 and INDIRECT_CALL_4 to cover such cases.

Signed-off-by: Brian Vazquez <brianvv@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_dcb: Fix a spelling typo in spectrum_dcb.c

This patch fixes a spelling typo in spectrum_dcb.c

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rtnetlink: add keepalived rtm_protocol

Keepalived can set global static ip routes or virtual ip routes dynamically
following VRRP protocol states. Using a dedicated rtm_protocol will help
keeping track of it.

Changes in v2:
- fix tab/space indenting

Signed-off-by: Alexandre Cassen <acassen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-atlantic-additional-A2-features'

Igor Russkikh says:

====================
net: atlantic: additional A2 features

This patchset adds more features to A2:
* half duplex rates;
* EEE;
* flow control;
* link partner capabilities reporting;
* phy loopback.

Feature-wise A2 is almost on-par with A1 save for WoL and filtering, which
will be submitted as separate follow-up patchset(s).
====================

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: atlantic: A2: phy loopback support

This patch adds the phy loopback support on A2.

Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: atlantic: A2: report link partner capabilities

This patch adds link partner capabilities reporting support on A2.
In particular, the following capabilities are available for reporting:
* link rate;
* EEE;
* flow control.

Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: atlantic: A2: flow control support

This patch adds flow control support on A2.

Co-developed-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: atlantic: A2: EEE support

This patch adds EEE support on A2.

Signed-off-by: Nikita Danilov <ndanilov@marvell.com>
Co-developed-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: atlantic: remove baseX usage

This patch removes 2.5G baseX wrong usage/reporting, since it shouldn't have
been mixed with baseT.

Signed-off-by: Nikita Danilov <ndanilov@marvell.com>
Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: atlantic: A2: half duplex support

This patch adds support for 10M/100M/1G half duplex rates, which are
supported by A2 in additional to full duplex rates supported by A1.

Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/core/devlink.c: remove new uninitialized_var() usage

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcindex_change: Remove redundant null check

arg cannot be NULL since its already being dereferenced
before. Remove the redundant NULL check.

Signed-off-by: Gaurav Singh <gaurav1086@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mtk_eth_soc: use resolved link config in mac_link_up()

Convert the mtk_eth_soc driver to use the finalised link parameters in
mac_link_up() rather than the parameters in mac_config().

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'Multicast-improvement-in-Ocelot-and-Felix-drivers'

Vladimir Oltean says:

====================
Multicast improvement in Ocelot and Felix drivers

This series makes some basic multicast forwarding functionality work for
Felix DSA and for Ocelot switchdev. IGMP/MLD snooping in Felix is still
missing, and there are other improvements to be made in the general area
of multicast address filtering towards the CPU, but let's get these
hardware-specific fixes out of the way first.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: mscc: ocelot: support IPv4, IPv6 and plain Ethernet mdb entries

The current procedure for installing a multicast address is hardcoded
for IPv4. But, in the ocelot hardware, there are 3 different procedures
for IPv4, IPv6 and for regular L2 multicast.

For IPv6 (33-33-xx-xx-xx-xx), it's the same as for IPv4
(01-00-5e-xx-xx-xx), except that the destination port mask is stuffed
into first 2 bytes of the MAC address except into first 3 bytes.

For plain Ethernet multicast, there's no port-in-address stuffing going
on, instead the DEST_IDX (pointer to PGID) is used there, just as for
unicast. So we have to use one of the nonreserved multicast PGIDs that
the hardware has allocated for this purpose.

This patch classifies the type of multicast address based on its first
bytes, then redirects to one of the 3 different hardware procedures.

Note that this gives us a really better way of redirecting PTP frames
sent at 01-1b-19-00-00-00 to the CPU. Previously, Yangbo Lu tried to add
a trapping rule for PTP EtherType but got a lot of pushback:

https://patchwork.ozlabs.org/project/netdev/patch/20190813025214.18601-5-yangbo.lu@nxp.com/

But right now, that isn't needed at all. The application stack (ptp4l)
does this for the PTP multicast addresses it's interested in (which are
configurable, and include 01-1b-19-00-00-00):

memset(&mreq, 0, sizeof(mreq));
mreq.mr_ifindex = index;
mreq.mr_type = PACKET_MR_MULTICAST;
mreq.mr_alen = MAC_LEN;
memcpy(mreq.mr_address, addr1, MAC_LEN);

err1 = setsockopt(fd, SOL_PACKET, PACKET_ADD_MEMBERSHIP, &mreq,
sizeof(mreq));

Into the kernel, this translates into a dev_mc_add on the switch network
interfaces, and our drivers know that it means they should translate it
into a host MDB address (make the CPU port be the destination).
Previously, this was broken because all mdb addresses were treated as
IPv4 (which 01-1b-19-00-00-00 obviously is not).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mscc: ocelot: introduce macros for iterating over PGIDs

The current iterators are impossible to understand at first glance
without switching back and forth between the definitions and their
actual use in the for loops.

So introduce some convenience names to help readability.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: felix: call port mdb operations from ocelot

This adds the mdb hooks in felix and exports the mdb functions from
ocelot.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mscc: ocelot: make the NPI port a proper target for FDB and MDB

When used in DSA mode (as seen in Felix), the DEST_IDX in the MAC table
should point to the PGID for the CPU port (PGID_CPU) and not for the
Ethernet port where the CPU queues are redirected to (also known as Node
Processor Interface - NPI).

Because for Felix this distinction shouldn't really matter (from DSA
perspective, the NPI port _is_ the CPU port), make the ocelot library
act upon the CPU port when NPI mode is enabled. This has no effect for
the mscc_ocelot driver for VSC7514, because that does not use NPI (and
ocelot->npi is -1).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mscc: ocelot: fix encoding destination ports into multicast IPv4 address

The ocelot hardware designers have made some hacks to support multicast
IPv4 and IPv6 addresses. Normally, the MAC table matches on MAC
addresses and the destination ports are selected through the DEST_IDX
field of the respective MAC table entry. The DEST_IDX points to a Port
Group ID (PGID) which contains the bit mask of ports that frames should
be forwarded to. But there aren't a lot of PGIDs (only 80 or so) and
there are clearly many more IP multicast addresses than that, so it
doesn't scale to use this PGID mechanism, so something else was done.
Since the first portion of the MAC address is known, the hack they did
was to use a single PGID for _flooding_ unknown IPv4 multicast
(PGID_MCIPV4 == 62), but for known IP multicast, embed the destination
ports into the first 3 bytes of the MAC address recorded in the MAC
table.

The VSC7514 datasheet explains it like this:

    3.9.1.5 IPv4 Multicast Entries

    MAC table entries with the ENTRY_TYPE = 2 settings are interpreted
    as IPv4 multicast entries.
    IPv4 multicasts entries match IPv4 frames, which are classified to
    the specified VID, and which have DMAC = 0x01005Exxxxxx, where
    xxxxxx is the lower 24 bits of the MAC address in the entry.
    Instead of a lookup in the destination mask table (PGID), the
    destination set is programmed as part of the entry MAC address. This
    is shown in the following table.

    Table 78: IPv4 Multicast Destination Mask

        Destination Ports            Record Bit Field
        ---------------------------------------------
        Ports 10-0                   MAC[34-24]

    Example: All IPv4 multicast frames in VLAN 12 with MAC 01005E112233 are
    to be forwarded to ports 3, 8, and 9. This is done by inserting the
    following entry in the MAC table entry:
    VALID = 1
    VID = 12
    MAC = 0x000308112233
    ENTRY_TYPE = 2
    DEST_IDX = 0

But this procedure is not at all what's going on in the driver. In fact,
the code that embeds the ports into the MAC address looks like it hasn't
actually been tested. This patch applies the procedure described in the
datasheet.

Since there are many other fixes to be made around multicast forwarding
until it works properly, there is no real reason for this patch to be
backported to stable trees, or considered a real fix of something that
should have worked.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlxsw-Offload-TC-action-pedit-munge-tcp-udp-sport-dport'

Ido Schimmel says:

====================
mlxsw: Offload TC action pedit munge tcp/udp sport/dport

Petr says:

On Spectrum-2 and Spectrum-3, it is possible to overwrite L4 port number of
a TCP or UDP packet in the ACL engine. That corresponds to the pedit munges
of tcp and udp sport resp. dport fields. Offload these munges on the
systems where they are supported.

The current offloading code assumes that all systems support the same set
of fields. This now changes, so in patch #1 first split handling of pedit
munges by chip type. The analysis of which packet field a given munge
describes is kept generic.

Patch #2 introduces the new flexible action fields. Patch #3 then adds the
new pedit fields, and dispatches on them on Spectrum>1.

Patch #4 adds a forwarding selftest for pedit dsfield, applicable to SW as
well as HW datapaths.
====================

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: Add a test for pedit munge tcp, udp sport, dport

Add a test that checks that pedit adjusts port numbers of tcp and udp
packets.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_acl: Support FLOW_ACTION_MANGLE for TCP, UDP ports

Spectrum-2 supports an ACL action L4_PORT, which allows TCP and UDP source
and destination port number change. Offload suitable mangles to this
action.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: core_acl_flex_actions: Add L4_PORT_ACTION

Add fields related to L4_PORT_ACTION, which is used for changing of TCP and
UDP port numbers.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: Split handling of pedit mangle by chip type

Certain ACL actions are only available on some Spectrum revisions. In
particular, L4_PORT_ACTION is not available on Spectrum-1. Introduce a
new ops struct intended to hold these differences, mlxsw_sp_rulei_ops.
Prime it with a sole member, act_mangle_field, meant for handling of
pedit mangles.

Create two ops structures, one for Spectrum-1, the other for Spectrum-2
and above. Add callbacks for act_mangle_field and dispatch to the common
handler.

Invoke mlxsw_sp_rulei_ops.act_mangle_field from the field mangler
instead of calling the common handler directly.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'Add-Marvell-88E1340S-88E1548P-support'

Maxim Kochetkov says:

====================
Add Marvell 88E1340S, 88E1548P support

This patch series add new PHY id support.
Russell King asked to use single style for referencing functions.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: marvell: Add Marvell 88E1548P support

Add support for this new phy ID.

Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: marvell: Add Marvell 88E1340S support

Add support for this new phy ID.

Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: marvell: use a single style for referencing functions

The kernel in general does not use &func referencing format.

Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'r8169-mark-device-as-detached-in-PCI-D3-and-improve-locking'

Heiner Kallweit says:

====================
r8169: mark device as detached in PCI D3 and improve locking

Mark the netdevice as detached whenever parent is in PCI D3hot and not
accessible. This mainly applies to runtime-suspend state.
In addition take RTNL lock in suspend calls, this allows to remove
the driver-specific mutex and improve PM callbacks in general.
====================

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: improve rtl8169_runtime_resume

Simplify rtl8169_runtime_resume() by calling rtl8169_resume().

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: remove driver-specific mutex

Now that the critical sections are protected with RTNL lock, we don't
need a separate mutex any longer.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: use RTNL to protect critical sections

Most relevant ops (open, close, ethtool ops) are protected with RTNL
lock by net core. Make sure that such ops can't be interrupted by
e.g. (runtime-)suspending by taking the RTNL lock in suspend ops
and the PCI error handler.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: add rtl8169_up

Factor out bringing device up to a new function rtl8169_up(), similar
to rtl8169_down() for bringing the device down.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: remove no longer needed checks for device being runtime-active

Because the netdevice is marked as detached now when parent is not
accessible we can remove quite some checks.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: mark device as not present when in PCI D3

Mark the netdevice as detached whenever we go into PCI D3hot.
This allows to remove some checks e.g. from ethtool ops because
dev_ethtool() checks for netif_device_present() in the beginning.

In this context move waking up the queue out of rtl_reset_work()
because in cases where netif_device_attach() is called afterwards
the queue should be woken up by the latter function only.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: core: try to runtime-resume detached device in __dev_open

A netdevice may be marked as detached because the parent is
runtime-suspended and not accessible whilst interface or link is down.
An example are PCI network devices that go into PCI D3hot, see e.g.
__igc_shutdown() or rtl8169_net_suspend().
If netdevice is down and marked as detached we can only open it if
we runtime-resume it before __dev_open() calls netif_device_present().

Therefore, if netdevice is detached, try to runtime-resume the parent
and only return with an error if it's still detached.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'prepare-dwmac-meson8b-for-G12A-specific-initialization'

Martin Blumenstingl says:

====================
prepare dwmac-meson8b for G12A specific initialization

Some users are reporting that RGMII (and sometimes also RMII) Ethernet
is not working for them on G12A/G12B/SM1 boards. Upon closer inspection
of the vendor code for these SoCs new register bits are found.

It's not clear yet how these registers work. Add a new compatible string
as the first preparation step to improve Ethernet support on these SoCs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: dwmac-meson8b: add a compatible string for G12A SoCs

Amlogic Meson G12A, G12B and SM1 have the same (at least as far as we
know at the time of writing) PRG_ETHERNET glue register implementation.
This implementation however is slightly different from AXG as it now has
an undocument "auto cali idx val" register in PRG_ETH1[17:16] which
seems to be related to RGMII Ethernet.

Add a new compatible string for G12A SoCs so the logic for this new
register can be implemented in the future.

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: dwmac-meson: Add a compatible string for G12A onwards

Amlogic Meson G12A, G12B and SM1 have the same (at least as far as we
know at the time of writing) PRG_ETHERNET glue register implementation.
This implementation however is slightly different from AXG as it now has
an undocument "auto cali idx val" register in PRG_ETH1[17:16] which
seems to be related to RGMII Ethernet.

Add a compatible string for G12A and newer so the new registers can be
used.

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'devlink-Add-board-serial_number-field-to-info_get-cb'

Vasundhara Volam says:

====================
devlink: Add board.serial_number field to info_get cb.

This patchset adds support for board.serial_number to devlink info_get
cb and also use it in bnxt_en driver.

Sample output:

$ devlink dev info pci/0000:af:00.1
pci/0000:af:00.1:
  driver bnxt_en
  serial_number 00-10-18-FF-FE-AD-1A-00
  board.serial_number 433551F+172300000
  versions:
      fixed:
        board.id 7339763 Rev 0.
        asic.id 16D7
        asic.rev 1
      running:
        fw 216.1.216.0
        fw.psid 0.0.0
        fw.mgmt 216.1.192.0
        fw.mgmt.api 1.10.1
        fw.ncsi 0.0.0.0
        fw.roce 216.1.16.0

v2:
- Modify board_serial_number to board.serial_number for maintaining
consistency.
- Combine 2 lines in second patchset as column limit is 100 now
====================

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Add board.serial_number field to info_get cb

Add board.serial_number field info to info_get cb via devlink,
if driver can fetch the information from the device.

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: Add support for board.serial_number to info_get cb.

Board serial number is a serial number, often available in PCI
*Vital Product Data*.

Also, update devlink-info.rst documentation file.

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'Cosmetic-cleanup-in-SJA1105-DSA-driver'

Vladimir Oltean says:

====================
Cosmetic cleanup in SJA1105 DSA driver

This removes the sparse warnings from the sja1105 driver and makes some
structures constant.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: sja1105: make the instantiations of struct sja1105_info constant

Since struct sja1105_private only holds a const pointer to one of these
structures based on device tree compatible string, the structures
themselves can be made const.

Also add an empty line between each structure definition, to appease
checkpatch.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: sja1105: make config table operation structures constant

The per-chip instantiations of struct sja1105_table_ops and struct
sja1105_dynamic_table_ops can be made constant, so do that.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: sja1105: remove empty structures from config table ops

Sparse is complaining and giving the following warning message:
'Using plain integer as NULL pointer'.

This is not what's going on, instead {0} is used as a zero initializer
for the structure members, to indicate that the particular chip revision
does not support those particular config tables.

But since the config tables are declared globally, the unpopulated
elements are zero-initialized anyway. So, to make sparse shut up, let's
remove the zero initializers.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-dsa-qca8k-Improve-SGMII-interface-handling'

Jonathan McDowell says:

====================
net: dsa: qca8k: Improve SGMII interface handling

This 3 patch series migrates the qca8k switch driver over to PHYLINK,
and then adds the SGMII clean-ups (i.e. the missing initialisation) on
top of that as a second patch. The final patch is a simple spelling fix
in a comment.

As before, tested with a device where the CPU connection is RGMII (i.e.
the common current use case) + one where the CPU connection is SGMII. I
don't have any devices where the SGMII interface is brought out to
something other than the CPU.

v5:
- Move spelling fix to separate patch
- Use ds directly rather than ds->priv
v4:
- Enable pcs_poll so we keep phylink updated when doing in-band
negotiation
- Explicitly check for PHY_INTERFACE_MODE_1000BASEX when setting SGMII
port mode.
- Address Vladimir's review comments
v3:
- Move phylink changes to separate patch
- Address rmk review comments
v2:
- Switch to phylink
- Avoid need for device tree configuration options
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: qca8k: Minor comment spelling fix

Signed-off-by: Jonathan McDowell <noodles@earth.li>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: qca8k: Improve SGMII interface handling

This patch improves the handling of the SGMII interface on the QCA8K
devices. Previously the driver did no configuration of the port, even if
it was selected. We now configure it up in the appropriate
PHY/MAC/Base-X mode depending on what phylink tells us we are connected
to and ensure it is enabled.

Tested with a device where the CPU connection is RGMII (i.e. the common
current use case) + one where the CPU connection is SGMII. I don't have
any devices where the SGMII interface is brought out to something other
than the CPU.

Signed-off-by: Jonathan McDowell <noodles@earth.li>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: qca8k: Switch to PHYLINK instead of PHYLIB

Update the driver to use the new PHYLINK callbacks, removing the
legacy adjust_link callback.

Signed-off-by: Jonathan McDowell <noodles@earth.li>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bonding-initial-support-for-hardware-crypto-offload'

Jarod Wilson says:

====================
bonding: initial support for hardware crypto offload

This is an initial functional implementation for doing pass-through of
hardware encryption from bonding device to capable slaves, in active-backup
bond setups. This was developed and tested using ixgbe-driven Intel x520
interfaces with libreswan and a transport mode connection, primarily using
netperf, with assorted connection failures forced during transmission. The
failover works quite well in my testing, and overall performance is right
on par with offload when running on a bare interface, no bond involved.

Caveats: this is ONLY enabled for active-backup, because I'm not sure
how one would manage multiple offload handles for different devices all
running at the same time in the same xfrm, and it relies on some minor
changes to both the xfrm code and slave device driver code to get things
to behave, and I don't have immediate access to any other hardware that
could function similarly, but the NIC driver changes are minimal and
straight-forward enough that I've included what I think ought to be
enough for mlx5 devices too.

v2: reordered patches, switched (back) to using CONFIG_XFRM_OFFLOAD
to wrap the code additions and wrapped overlooked additions.
v3: rebase w/net-next open, add proper cc list to cover letter
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: support hardware encryption offload to slaves

Currently, this support is limited to active-backup mode, as I'm not sure
about the feasilibity of mapping an xfrm_state's offload handle to
multiple hardware devices simultaneously, and we rely on being able to
pass some hints to both the xfrm and NIC driver about whether or not
they're operating on a slave device.

I've tested this atop an Intel x520 device (ixgbe) using libreswan in
transport mode, succesfully achieving ~4.3Gbps throughput with netperf
(more or less identical to throughput on a bare NIC in this system),
as well as successful failover and recovery mid-netperf.

v2: just use CONFIG_XFRM_OFFLOAD for wrapping, isolate more code with it

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Jakub Kicinski <kuba@kernel.org>
CC: Steffen Klassert <steffen.klassert@secunet.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: netdev@vger.kernel.org
CC: intel-wired-lan@lists.osuosl.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlx5: become aware of when running as a bonding slave

I've been unable to get my hands on suitable supported hardware to date,
but I believe this ought to be all that is needed to enable the mlx5
driver to also work with bonding active-backup crypto offload passthru.

CC: Boris Pismenny <borisp@mellanox.com>
CC: Saeed Mahameed <saeedm@mellanox.com>
CC: Leon Romanovsky <leon@kernel.org>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Jakub Kicinski <kuba@kernel.org>
CC: Steffen Klassert <steffen.klassert@secunet.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe_ipsec: become aware of when running as a bonding slave

Slave devices in a bond doing hardware encryption also need to be aware
that they're slaves, so we operate on the slave instead of the bonding
master to do the actual hardware encryption offload bits.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Jakub Kicinski <kuba@kernel.org>
CC: Steffen Klassert <steffen.klassert@secunet.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: netdev@vger.kernel.org
CC: intel-wired-lan@lists.osuosl.org
Acked-by: Jeff Kirsher <Jeffrey.t.kirsher@intel.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xfrm: bail early on slave pass over skb

This is prep work for initial support of bonding hardware encryption
pass-through support. The bonding driver will fill in the slave_dev
pointer, and we use that to know not to skb_push() again on a given
skb that was already processed on the bond device.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Jakub Kicinski <kuba@kernel.org>
CC: Steffen Klassert <steffen.klassert@secunet.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: netdev@vger.kernel.org
CC: intel-wired-lan@lists.osuosl.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'devlink-Support-get-set-mac-address-of-a-port-function'

Parav Pandit says:

====================
devlink: Support get,set mac address of a port function

Currently, ip link set dev <pfndev> vf <vf_num> <param> <value> has
below few limitations.

1. Command is limited to set VF parameters only.
It cannot set the default MAC address for the PCI PF.

2. It can be set only on system where PCI SR-IOV capability exists.
In smartnic based system, eswitch of a NIC resides on a different
embedded cpu which has the VF and PF representors for the SR-IOV
functions of a host system in which this smartnic is plugged-in.

3. It cannot setup the function attributes of sub-function described
in detail in comprehensive RFC [1] and [2].

This series covers the first small part to let user query and set MAC
address (hardware address) of a PCI PF/VF which is represented by
devlink port pcipf, pcivf port flavours respectively.

Whenever a devlink port manages a function connected to a devlink port,
it allows to query and set its hardware address.

Driver implements necessary get/set callback functions if it supports
port function for a given port type.

Patch summary:
Patch-1 Prepares devlink port fill routines for extack
Patch-2 and 3 extended devlink interface to get/set port function
attributes, mainly hardware address to start with.

Patch-2 Extended port dump command to query port function hardware
address
Patch-3 Introduces a command to set the hardware address of a port
function

Patch-4 to 9 refactors and implement devlink callbacks in mlx5_core
driver.
Patch-4 Constify the mac address pointer in set routines
Patch-5 Introduces eswich check helper to use in devlink facing
callbacks
Patch-6 Moves port index, port number conversion routine to eswitch
header file
Patch-7 Implements port function query devlink callback
Patch-8 Refactors mac address setting routine to uniformly use
state_lock
Patch-9 Implements port function set devlink callback

[1] https://lore.kernel.org/netdev/20200519092258.GF4655@nanopsycho/
[2] https://marc.info/?l=linux-netdev&m=158555928517777&w=2
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: E-switch, Supporting setting devlink port function mac address

Enable user to set mac address of the PCI PF and VF port function.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Split mac address setting function for using state_lock

Refactor mac address setting function to let caller hold the necessary
state_lock mutex, so that subsequent patch and use this helper routine.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: E-switch, Support querying port function mac address

Support querying mac address of the eswitch devlink port function.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Move helper to eswitch layer

To use port number to port index conversion at eswitch level, move it to
eswitch header.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: E-switch, Introduce and use eswitch support check helper

Introduce an helper routine to get esw from a devlink device and use it
at eswitch callbacks and in subsequent patch.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Constify mac address pointer

Since none of the functions need to modify the input mac address,
constify them.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/devlink: Support setting hardware address of port function

PCI PF and VF devlink port can manage the function represented by a
devlink port.

Allow users to set port function's hardware address.

Example of a PCI VF port which supports a port function:
$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
  function:
    hw_addr 00:00:00:00:00:00

$ devlink port function set pci/0000:06:00.0/2 hw_addr 00:11:22:33:44:55

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
  function:
    hw_addr 00:11:22:33:44:55

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/devlink: Support querying hardware address of port function

PCI PF and VF devlink port can manage the function represented by
a devlink port.

Enable users to query port function's hardware address.

Example of a PCI VF port which supports a port function:
$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
  function:
    hw_addr 00:11:22:33:44:66

$ devlink port show pci/0000:06:00.0/2 -jp
{
    "port": {
        "pci/0000:06:00.0/2": {
            "type": "eth",
            "netdev": "enp6s0pf0vf1",
            "flavour": "pcivf",
            "pfnum": 0,
            "vfnum": 1,
            "function": {
                "hw_addr": "00:11:22:33:44:66"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/devlink: Prepare devlink port functions to fill extack

Prepare devlink port related functions to optionally fill up
the extack information which will be used in subsequent patch by port
function attribute(s).

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'Marvell-mvpp2-improvements'

Russell King says:

====================
Marvell mvpp2 improvements

This series primarily cleans up mvpp2, but also fixes a left-over from
91a208f2185a ("net: phylink: propagate resolved link config via
mac_link_up()").

Patch 1 introduces some port helpers:
  mvpp2_port_supports_xlg() - does the port support the XLG MAC
  mvpp2_port_supports_rgmii() - does the port support RGMII modes

Patch 2 introduces mvpp2_phylink_to_port(), rather than having repeated
  open coding of container_of().

Patch 3 introduces mvpp2_modify(), which reads-modifies-writes a
  register - I've converted the phylink specific code to use this
  helper.

Patch 4 moves the hardware control of the pause modes from
  mvpp2_xlg_config() (which is called via the phylink_config method)
  to mvpp2_mac_link_up() - a change that was missed in the above
  referenced commit.

v2: remove "inline" in patch 2.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: set xlg flow control in mvpp2_mac_link_up()

Set the flow control settings in mvpp2_mac_link_up() for 10G links
just as we do for 1G and slower links. This is now the preferred
location.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: add register modification helper

Add a helper to read-modify-write a register, and use it in the phylink
helpers.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: add mvpp2_phylink_to_port() helper

Add a helper to convert the struct phylink_config pointer passed in
from phylink to the drivers internal struct mvpp2_port.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: add port support helpers

The mvpp2 code has tests scattered amongst the code to determine
whether the port supports the XLG, and whether the port supports
RGMII mode.

Rather than having these tests scattered, provide a couple of helper
functions, so that future additions can ensure that they get these
tests correct.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

Remove redundant skb null check

Remove the redundant null check for skb.

Signed-off-by: Gaurav Singh <gaurav1086@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'tcp-remove-two-indirect-calls-from-xmit-path'

Eric Dumazet says:

====================
tcp: remove two indirect calls from xmit path

__tcp_transmit_skb() does two indirect calls per packet, lets get rid
of them.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: remove indirect calls for icsk->icsk_af_ops->send_check

Mitigate RETPOLINE costs in __tcp_transmit_skb()
by using INDIRECT_CALL_INET() wrapper.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: remove indirect calls for icsk->icsk_af_ops->queue_xmit

Mitigate RETPOLINE costs in __tcp_transmit_skb()
by using INDIRECT_CALL_INET() wrapper.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

of: mdio: preserve phy dev_flags in of_phy_connect()

Replace assignment "=" with OR "|=" for "phy->dev_flags" so "dev_flags"
configured in phy probe() function can be preserved.

The idea is similar to commit e7312efbd5de ("net: phy: modify assignment
to OR for dev_flags in phy_attach_direct").

Signed-off-by: Tao Ren <rentao.bupt@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Avoid overwriting valid skb->napi_id

This will be useful to allow busy poll for tunneled traffic. In case of
busy poll for sessions over tunnels, the underlying physical device's
queues need to be polled.

Tunnels schedule NAPI either via netif_rx() for backlog queue or
schedule the gro_cell_poll(). netif_rx() propagates the valid skb->napi_id
to the socket. OTOH, gro_cell_poll() stamps the skb->napi_id again by
calling skb_mark_napi_id() with the tunnel NAPI which is not a busy poll
candidate. This was preventing tunneled traffic to use busy poll. A valid
NAPI ID in the skb indicates it was already marked for busy poll by a
NAPI driver and hence needs to be copied into the socket.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Remove redundant condition in qdisc_graft

parent cannot be NULL here since its in the else part
of the if (parent == NULL) condition. Remove the extra
check on parent pointer.

Signed-off-by: Gaurav Singh <gaurav1086@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'Ocelot-Felix-driver-cleanup'

Vladimir Oltean says:

====================
Ocelot/Felix driver cleanup

Some of the code in the mscc felix and ocelot drivers was added while in
a bit of a hurry. Let's take a moment and put things in relative order.

First 3 patches are sparse warning fixes.

Patches 4-9 perform some further splitting between mscc_felix,
mscc_ocelot, and the common hardware library they share. Meaning that
some code is being moved from the library into just the mscc_ocelot
module.

Patches 10-12 refactor the naming conventions in the existing VCAP code
(for tc-flower offload), since we're going to be adding some more code
for VCAP IS1 (previous tentatives already submitted here:
https://patchwork.ozlabs.org/project/netdev/cover/20200602051828.5734-1-xiaoliang.yang_1@nxp.com/),
and that code would be confusing to read and maintain using current
naming conventions.

No functional modification is intended. I checked that the VCAP IS2 code
still works by applying a tc ingress filter with an EtherType key and
'drop' action.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: mscc: ocelot: unexpose ocelot_vcap_policer_{add,del}

Remove the function prototypes from ocelot_police.h and make these
functions static. We need to move them above their callers. Note that
moving the implementations to ocelot_police.c is not trivially possible
due to dependency on is2_entry_set() which is static to ocelot_vcap.c.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mscc: ocelot: generalize the "ACE/ACL" names

Access Control Lists (and their respective Access Control Entries) are
specifically entries in the VCAP IS2, the security enforcement block,
according to the documentation.
Let's rename the structures and functions to something more generic, so
that VCAP IS1 structures (which would otherwise have to be called
Ingress Classification Entries) can reuse the same code without
confusion.

Some renaming that was done:

struct ocelot_ace_rule -> struct ocelot_vcap_filter
struct ocelot_acl_block -> struct ocelot_vcap_block
enum ocelot_ace_type -> enum ocelot_vcap_key_type
struct ocelot_ace_vlan -> struct ocelot_vcap_key_vlan
enum ocelot_ace_action -> enum ocelot_vcap_action
struct ocelot_ace_stats -> struct ocelot_vcap_stats
enum ocelot_ace_type -> enum ocelot_vcap_key_type
struct ocelot_ace_frame_* -> struct ocelot_vcap_key_*

No functional change is intended.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mscc: ocelot: rename ocelot_ace.{c, h} to ocelot_vcap.{c,h}

Access Control Lists (and their respective Access Control Entries) are
specifically entries in the VCAP IS2, the security enforcement block,
according to the documentation.

Let's rename the files that deal with generic operations on the VCAP
TCAM, so that VCAP IS1 and ES0 can reuse the same code without
confusion.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mscc: ocelot: move net_device related functions to ocelot_net.c

The ocelot hardware library shouldn't contain too much net_device
specific code, since it is shared with DSA which abstracts that
structure away. So much as much of this code as possible into the
mscc_ocelot driver and outside of the common library.

We're making an exception for MDB and LAG code. That is not yet exported
to DSA, but when it will, most of the code that's already in ocelot.c
will remain there. So, there's no point in moving code to ocelot_net.c
just to move it back later.

We could have moved all net_device code to ocelot_vsc7514.c directly,
but let's operate under the assumption that if a new switchdev ocelot
driver gets added, it'll define its SoC-specific stuff in a new
ocelot_vsc*.c file and it'll reuse the rest of the code.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mscc: ocelot: move ocelot_regs.c into ocelot_vsc7514.c

ocelot_regs.c actually shouldn't be part of the common library. It
describes the register map of the VSC7514 switch. The way ocelot
switches work, they'll have highly optimized register maps, so another
SoC will likely have the same registers but laid out completely
different in memory (so there's little room for reusing this structure).
So move it to ocelot_vsc7514.c instead.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mscc: ocelot: rename MSCC_OCELOT_SWITCH_OCELOT to MSCC_OCELOT_SWITCH

Putting 'ocelot' in the config's name twice just to say that 'it's the
ocelot driver running on the ocelot SoC' is a bit confusing. Instead,
it's just the ocelot driver. Now that we've renamed the previous symbol
that was holding the MSCC_OCELOT_SWITCH_OCELOT into *_LIB (because
that's what it is), we're free to use this name for the driver.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mscc: ocelot: convert MSCC_OCELOT_SWITCH into a library

Hide the CONFIG_MSCC_OCELOT_SWITCH option from users. It is meant to be
only a hardware library which is selected by the drivers that use it
(ocelot, felix).

Since it is "selected" from Kconfig, all its dependencies are manually
transferred to the driver that selects it. This is because "select" in
Kconfig language is a bit of a mess, and doesn't handle dependencies of
selected options quite right.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mscc: ocelot: rename module to mscc_ocelot

mscc_ocelot is a slightly better name for a module than ocelot_board or
ocelot_vsc7514 is, so let's use that.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mscc: ocelot: rename ocelot_board.c to ocelot_vsc7514.c

To follow the model of felix and seville where we have one
platform-specific file, rename this file to the actual SoC it serves.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mscc: ocelot: access EtherType using __be16

Get rid of sparse "cast to restricted __be16" warnings.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mscc: ocelot: use plain int when interacting with TCAM tables

sparse is rightfully complaining about the fact that:

warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
   26 |   __builtin_constant_p((l) > (h)), (l) > (h), 0)))
      |                            ^
note: in expansion of macro ‘GENMASK_INPUT_CHECK’
   39 |  (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
      |   ^~~~~~~~~~~~~~~~~~~
note: in expansion of macro ‘GENMASK’
  127 |   mask = GENMASK(width, 0);
      |          ^~~~~~~

So replace the variables that go into GENMASK with plain, signed integer
types.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: felix: make vcap is2 keys and actions static

Get rid of some sparse warnings.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'Strict-mode-for-VRF'

Andrea Mayer says:

====================
Strict mode for VRF

This patch set adds the new "strict mode" functionality to the Virtual
Routing and Forwarding infrastructure (VRF). Hereafter we discuss the
requirements and the main features of the "strict mode" for VRF.

On VRF creation, it is necessary to specify the associated routing table used
during the lookup operations. Currently, there is no mechanism that avoids
creating multiple VRFs sharing the same routing table. In other words, it is not
possible to force a one-to-one relationship between a specific VRF and the table
associated with it.

The "strict mode" imposes that each VRF can be associated to a routing table
only if such routing table is not already in use by any other VRF.
In particular, the strict mode ensures that:

1) given a specific routing table, the VRF (if exists) is uniquely identified;
2) given a specific VRF, the related table is not shared with any other VRF.

Constraints (1) and (2) force a one-to-one relationship between each VRF and the
corresponding routing table.

The strict mode feature is designed to be network-namespace aware and it can be
directly enabled/disabled acting on the "strict_mode" parameter.
Read and write operations are carried out through the classic sysctl command on
net.vrf.strict_mode path, i.e: sysctl -w net.vrf.strict_mode=1.

Only two distinct values {0,1} are accepted by the strict_mode parameter:

- with strict_mode=0, multiple VRFs can be associated with the same table.
   This is the (legacy) default kernel behavior, the same that we experience
   when the strict mode patch set is not applied;

- with strict_mode=1, the one-to-one relationship between the VRFs and the
   associated tables is guaranteed. In this configuration, the creation of a VRF
   which refers to a routing table already associated with another VRF fails and
   the error is returned to the user.

The kernel keeps track of the associations between a VRF and the routing table
during the VRF setup, in the "management" plane. Therefore, the strict mode does
not impact the performance or the intrinsic functionality of the data plane in
any way.

When the strict mode is active it is always possible to disable the strict mode,
while the reverse operation is not always allowed.
Setting the strict_mode parameter to 0 is equivalent to removing the one-to-one
constraint between any single VRF and its associated routing table.

Conversely, if the strict mode is disabled and there are multiple VRFs that
refer to the same routing table, then it is prohibited to set the strict_mode
parameter to 1. In this configuration, any attempt to perform the operation will
lead to an error and it will be reported to the user.
To enable strict mode once again (by setting the strict_mode parameter to 1),
you must first remove all the VRFs that share common tables.

There are several use cases which can take advantage from the introduction of
the strict mode feature. In particular, the strict mode allows us to:

  i) guarantee the proper functioning of some applications which deal with
     routing protocols;

ii) perform some tunneling decap operations which require to use specific
     routing tables for segregating and forwarding the traffic.

Considering (i), the creation of different VRFs that point to the same table
leads to the situation where two different routing entities believe they have
exclusive access to the same table. This leads to the situation where different
routing daemons can conflict for gaining routes control due to overlapping
tables. By enabling strict mode it is possible to prevent this situation which
often occurs due to incorrect configurations done by the users.
The ability to enable/disable the strict mode functionality does not depend on
the tool used for configuring the networking. In essence, the strict mode patch
solves, at the kernel level, what some other patches [1] had tried to solve at
the userspace level (using only iproute2) with all the related problems.

Considering (ii), the introduction of the strict mode functionality allows us
implementing the SRv6 End.DT4 behavior. Such behavior terminates a SR tunnel and
it forwards the IPv4 traffic according to the routes present in the routing
table supplied during the configuration. The SRv6 End.DT4 can be realized
exploiting the routing capabilities made available by the VRF infrastructure.
This behavior could leverage a specific VRF for forcing the traffic to be
forwarded in accordance with the routes available in the VRF table.
Anyway, in order to make the End.DT4 properly work, it must be guaranteed that
the table used for the route lookup operations is bound to one and only one VRF.
In this way, it is possible to use the table for uniquely retrieving the
associated VRF and for routing packets.

I would like to thank David Ahern for his constant and valuable support during
the design and development phases of this patch set.

Comments, suggestions and improvements are very welcome!
====================

Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: add selftest for the VRF strict mode

The new strict mode functionality is tested in different configurations and
on different network namespaces.

Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: David S. Miller <davem@davemloft.net>

vrf: add l3mdev registration for table to VRF device lookup

During the initialization phase of the VRF module, the callback for table
to VRF device lookup is registered in l3mdev.

Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: David S. Miller <davem@davemloft.net>

vrf: add sysctl parameter for strict mode

Add net.vrf.strict_mode sysctl parameter.

When net.vrf.strict_mode=0 (default) it is possible to associate multiple
VRF devices to the same table. Conversely, when net.vrf.strict_mode=1 a
table can be associated to a single VRF device.

When switching from net.vrf.strict_mode=0 to net.vrf.strict_mode=1, a check
is performed to verify that all tables have at most one VRF associated,
otherwise the switch is not allowed.

The net.vrf.strict_mode parameter is per network namespace.

Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: David S. Miller <davem@davemloft.net>

vrf: track associations between VRF devices and tables

Add the data structures and the processing logic to keep track of the
associations between VRF devices and routing tables.
When a VRF is instantiated, it needs to refer to a given routing table.
For each table, we explicitly keep track of the existing VRFs that refer to
the table.

Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: David S. Miller <davem@davemloft.net>

l3mdev: add infrastructure for table to VRF mapping

Add infrastructure to l3mdev (the core code for Layer 3 master devices) in
order to find out the corresponding VRF device for a given table id.
Therefore, the l3mdev implementations:
- can register a callback that returns the device index of the l3mdev
associated with a given table id;
- can offer the lookup function (table to VRF device).

Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/sched: cls_u32: Use struct_size() in kzalloc()

Make use of the struct_size() helper instead of an open-coded version
in order to avoid any potential type mistakes.

This code was detected with the help of Coccinelle and, audited and
fixed manually.

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

taprio: Use struct_size() in kzalloc()

Make use of the struct_size() helper instead of an open-coded version
in order to avoid any potential type mistakes. Also, remove unnecessary
variable _size_.

This code was detected with the help of Coccinelle and, audited and
fixed manually.

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'Clause-45-PHY-probing-improvements'

Russell King says:

====================
Clause 45 PHY probing improvements

Last time this series was posted back in May, Florian reviewed the
patches, which was the only feedback I received. I'm now posting
them without the RFC tag.

This series aims to improve the probing for Clause 45 PHYs.

The first four patches clean up get_phy_device() and called functions,
updating the kernel doc, adding information about the various error
return values.

We then provide better kerneldoc for get_phy_device(), describing what
is going on, and more importantly what the various return codes mean.

Patch 6 adds support for probing MMDs >= 8 to check for their presence.

Patch 7 changes get_phy_c45_ids() to only set the returned
devices_in_package if we successfully find a PHY.

Patch 8 splits the use of "devices in package" from the "mmds present".

Patch 9 expands our ID reading to cover the other MMDs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>