linux-2.6-microblaze.git
22 months agonet: phy: mxl-gpy: rename the FW type field name
Michael Walle [Tue, 12 Jul 2022 13:15:53 +0000 (15:15 +0200)]
net: phy: mxl-gpy: rename the FW type field name

Align the firmware field name with the reference manual where it is
called "major".

Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: phy: mxl-gpy: cache PHY firmware version
Michael Walle [Tue, 12 Jul 2022 13:15:52 +0000 (15:15 +0200)]
net: phy: mxl-gpy: cache PHY firmware version

Cache the firmware version during probe. There is no need to read the
firmware version on every autonegotiation.

Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: phy: mxl-gpy: fix version reporting
Michael Walle [Tue, 12 Jul 2022 13:15:51 +0000 (15:15 +0200)]
net: phy: mxl-gpy: fix version reporting

The commit 09ce6b20103b ("net: phy: mxl-gpy: add temperature sensor")
will overwrite the return value and the reported version will be wrong.
Fix it.

Fixes: 09ce6b20103b ("net: phy: mxl-gpy: add temperature sensor")
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: ip6mr: add RTM_GETROUTE netlink op
David Lamparter [Tue, 12 Jul 2022 12:10:02 +0000 (14:10 +0200)]
net: ip6mr: add RTM_GETROUTE netlink op

The IPv6 multicast routing code previously implemented only the dump
variant of RTM_GETROUTE.  Implement single MFC item retrieval by copying
and adapting the respective IPv4 code.

Tested against FRRouting's IPv6 PIM stack.

Signed-off-by: David Lamparter <equinox@diac24.net>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agoMerge branch 'devlink-cosmetic-fixes'
David S. Miller [Wed, 13 Jul 2022 12:49:44 +0000 (13:49 +0100)]
Merge branch 'devlink-cosmetic-fixes'

Jiri Pirko says:

====================
net: devlink: devl_* cosmetic fixes

Hi. This patches just fixes some small cosmetic issues of devl_* related
functions which I found on the way.
===================

Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: devlink: move unlocked function prototypes alongside the locked ones
Jiri Pirko [Tue, 12 Jul 2022 10:24:24 +0000 (12:24 +0200)]
net: devlink: move unlocked function prototypes alongside the locked ones

Maintain the same order as it is in devlink.c for function prototypes.
The most of the locked variants would very likely soon be removed
and the unlocked version would be the only one.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: devlink: use helpers to work with devlink->lock mutex
Jiri Pirko [Tue, 12 Jul 2022 10:24:23 +0000 (12:24 +0200)]
net: devlink: use helpers to work with devlink->lock mutex

As far as the lock helpers exist as the drivers need to work with the
devlink->lock mutex, use the helpers internally in devlink.c in order to
be consistent.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: devlink: fix unlocked vs locked functions descriptions
Jiri Pirko [Tue, 12 Jul 2022 10:24:22 +0000 (12:24 +0200)]
net: devlink: fix unlocked vs locked functions descriptions

To be unified with the rest of the code, the unlocked version (devl_*)
of function should have the same description in documentation as the
locked one. Add the missing documentation. Also, add "Context"
annotation for the locked versions where it is missing.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agoocteontx2-af: Skip CGX/RPM probe incase of zero lmac count
Hariprasad Kelam [Tue, 12 Jul 2022 06:42:50 +0000 (12:12 +0530)]
octeontx2-af: Skip CGX/RPM probe incase of zero lmac count

In few error cases MAC(CGX/RPM) block is having 0 lmacs.
AF driver uses MAC block with lmac pair to get firmware
data etc. These commands will fail as there is no LMAC
associated with MAC block.

This patch skips the probe of these MAC blocks such that AF driver
uses correct MAC block and LMAC pair for firmware communication and
define new LMAC_AF_ERROR types for command timeout etc.

This patch also enables channel back pressure for all LMACs.

Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agoMerge branch 'prestera-port-range-filters'
David S. Miller [Wed, 13 Jul 2022 11:16:56 +0000 (12:16 +0100)]
Merge branch 'prestera-port-range-filters'

Maksym Glubokiy says:

====================
net: prestera: add support for port range filters

This adds support for port-range rules:

  $ tc qdisc add ... clsact
  $ tc filter add ... flower ... src_port <PMIN>-<PMAX> ...
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
Co-developed-by: Volodymyr Mytnyk <volodymyr.mytnyk@plvision.eu>
Signed-off-by: Volodymyr Mytnyk <volodymyr.mytnyk@plvision.eu>
Signed-off-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu>
22 months agonet: prestera: add support for port range filters
Maksym Glubokiy [Mon, 11 Jul 2022 15:09:08 +0000 (18:09 +0300)]
net: prestera: add support for port range filters

This adds support for port-range rules:

  $ tc qdisc add ... clsact
  $ tc filter add ... flower ... src_port <PMIN>-<PMAX> ...

Co-developed-by: Volodymyr Mytnyk <volodymyr.mytnyk@plvision.eu>
Signed-off-by: Volodymyr Mytnyk <volodymyr.mytnyk@plvision.eu>
Signed-off-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: extract port range fields from fl_flow_key
Maksym Glubokiy [Mon, 11 Jul 2022 15:09:07 +0000 (18:09 +0300)]
net: extract port range fields from fl_flow_key

So it can be used for port range filter offloading.

Co-developed-by: Volodymyr Mytnyk <volodymyr.mytnyk@plvision.eu>
Signed-off-by: Volodymyr Mytnyk <volodymyr.mytnyk@plvision.eu>
Signed-off-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agoMerge branch 'prestera-mdb-offload'
David S. Miller [Wed, 13 Jul 2022 11:14:05 +0000 (12:14 +0100)]
Merge branch 'prestera-mdb-offload'

Oleksandr Mazur says:

====================
net: marvell: prestera: add MDB offloading support

This patch series adds support for the MDB handling for the marvell
Prestera Driver. It's used to propagate IGMP groups registered within
the Kernel to the underlying HW (offload registered groups).

Features:
 - define (and implement) main internal MDB-related objects;
 - define (and implement) main HW APIs for MDB handling;
 - add MDB handling support for both regular ports as well as Bond
   interfaces;
 - Mirrored behavior of Bridge driver upon multicast router appearance
   (all traffic flooded when there's no mcast router; mcast router
    receives all mcast traffic, and only group-specific registered mcast
    traffic is being received by ports who've explicitly joined any group
    when there's a registered mcast router);
 - Reworked prestera driver bridge flags (especially multicast)
   setting - thus making it possible to react over mcast disabled messages
   properly by offloading this state to the underlying HW
   (disabling multicast flooding);

Limitations:
 - Not full (partial) IGMPv3 support (due to limited switchdev
   notification capabilities:
     MDB events are being propagated only with a single MAC entry,
     while IGMPv3 has Group-Specific requests and responses);
 - It's possible that multiple IP groups would receive traffic from
   other groups, as MDB events are being propagated with a single MAC
   entry, which makes it possible to map a few IPs over same MAC;

Co-developed-by: Yevhen Orlov <yevhen.orlov@plvision.eu>
Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu>
Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
PATCH V5:
 - fix checkpatch errors (4/4).
 - remove function forward declarations, and move
   function implementations to match the removed
   forward declarations (4/4).
 - rebased changes on top of latest master.
PATCH V4:
 - fix clang warning - var uninitialized when used.
PATCH V3:
 - add missing function implementations to 2/4 (HW API implementation),
   only definitions were added int V1, V2, and implementation waas missed
   by mistake.

Reported-by: kernel test robot <lkp@intel.com>
 - fix compiletime warning (unused variable)

PATCH V2:
 - include all the patches of patch series (V1's been sent to
   closed net-next, also had not all patches included);
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: marvell: prestera: implement software MDB entries allocation
Oleksandr Mazur [Mon, 11 Jul 2022 11:28:22 +0000 (14:28 +0300)]
net: marvell: prestera: implement software MDB entries allocation

Define bridge MDB entry (software entry):
  - entry that get's created upon receiving MDB management events
    (create/delete), that inherently defines a software entry,
    which can be enabled (offloaded to the HW) or disabled (removed
    from HW).
    This separation is done to achieve a better highlevel
    management of HW resources - software MDB entry could exist,
    while it's not necessarily should be configured on the HW.
    For example: by default, the Linux behavior would not replicate
    multicast traffic to multicast group members if there's no
    active multicast router and thus - no actual multicast traffic
    can be received/sent. So, until multicast router appears on the
    system no HW configuration should be applied, although SW MDB entries
    should be tracked.
    Another example would be altering state of 'multicast enabled' on
    the bridge: MC_DISABLED should invoke disabling / clearing multicast
    groups of specified bridge on the HW, yet upon receiving 'multicast
    enabled' event, driver should reconfigure any existing software MDB
    groups on the HW.
    Keeping track of software MDB entries in such way makes it possible
    to properly react on such events.
Define bridge MDB port entry (software entry):
  - entry that helps keeping track (on software - driver - level) of which
    bridge mebemer interface joined any give MDB group;

Co-developed-by: Yevhen Orlov <yevhen.orlov@plvision.eu>
Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu>
Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: marvell: prestera: define and implement MDB / flood domain API for entries creat...
Oleksandr Mazur [Mon, 11 Jul 2022 11:28:21 +0000 (14:28 +0300)]
net: marvell: prestera: define and implement MDB / flood domain API for entries creation and deletion

Define and implement prestera API calls for managing MDB and
  flood domain (ports) entries (create / delete / find calls).

Co-developed-by: Yevhen Orlov <yevhen.orlov@plvision.eu>
Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu>
Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: marvell: prestera: define MDB/flood domain entries and HW API to offload them...
Oleksandr Mazur [Mon, 11 Jul 2022 11:28:20 +0000 (14:28 +0300)]
net: marvell: prestera: define MDB/flood domain entries and HW API to offload them to the HW

Define MDB entry that can be offloaded:
  - FDB entry, that defines an multicast group to which traffic can be
    replicated to;
Define flood domain:
  - Arrangement of ports (list), that have joined multicast group, which
    would receive and replicate to multicast traffic of specified group;
Define flood domain port:
  - single flood domain list entry, that is associated with any given
    bridge port interface (could be LAG interface or physical port-member).
    Applicable to both Q and D bridges;

Co-developed-by: Yevhen Orlov <yevhen.orlov@plvision.eu>
Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu>
Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: marvell: prestera: rework bridge flags setting
Oleksandr Mazur [Mon, 11 Jul 2022 11:28:19 +0000 (14:28 +0300)]
net: marvell: prestera: rework bridge flags setting

Separate flags to make it possible to alter them separately;
Move bridge flags setting logic from HW API level to prestera_main
  where it belongs;
Move bridge flags parsing (and setting using prestera API) to
  prestera_switchdev.c - module responsible for bridge operations
  handling;

Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agoip6_tunnel: allow to inherit from VLAN encapsulated IP
Matthias May [Mon, 11 Jul 2022 09:17:22 +0000 (11:17 +0200)]
ip6_tunnel: allow to inherit from VLAN encapsulated IP

The current code allows to inherit the TTL (hop_limit) from the
payload when skb->protocol is ETH_P_IP or ETH_P_IPV6.
However when the payload is VLAN encapsulated (e.g because the tunnel
is of type GRETAP), then this inheriting does not work, because the
visible skb->protocol is of type ETH_P_8021Q or ETH_P_8021AD.

Instead of skb->protocol, use skb_protocol().

Signed-off-by: Matthias May <matthias.may@westermo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agoip6_gre: use actual protocol to select xmit
Matthias May [Mon, 11 Jul 2022 09:17:21 +0000 (11:17 +0200)]
ip6_gre: use actual protocol to select xmit

When the payload is a VLAN encapsulated IPv6/IPv6 frame, we can
skip the 802.1q/802.1ad ethertypes and jump to the actual protocol.
This way we treat IPv4/IPv6 frames as IP instead of as "other".

Signed-off-by: Matthias May <matthias.may@westermo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agoip6_gre: set DSCP for non-IP
Matthias May [Mon, 11 Jul 2022 09:17:20 +0000 (11:17 +0200)]
ip6_gre: set DSCP for non-IP

The current code always forces a dscp of 0 for all non-IP frames.
However when setting a specific TOS with the command

ip link add name tep0 type ip6gretap local fdd1:ced0:5d88:3fce::1
remote fdd1:ced0:5d88:3fce::2 tos 0xa0

one would expect all GRE encapsulated frames to have a TOS of 0xA0.
and not only when the payload is IPv4/IPv6.

Signed-off-by: Matthias May <matthias.may@westermo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agoip_tunnel: allow to inherit from VLAN encapsulated IP
Matthias May [Mon, 11 Jul 2022 09:17:19 +0000 (11:17 +0200)]
ip_tunnel: allow to inherit from VLAN encapsulated IP

The current code allows to inherit the TOS, TTL, DF from the payload
when skb->protocol is ETH_P_IP or ETH_P_IPV6.
However when the payload is VLAN encapsulated (e.g because the tunnel
is of type GRETAP), then this inheriting does not work, because the
visible skb->protocol is of type ETH_P_8021Q or ETH_P_8021AD.

Instead of skb->protocol, use skb_protocol().

Signed-off-by: Matthias May <matthias.may@westermo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agoqlogic: qed: fix clang -Wformat warnings
Justin Stitt [Mon, 11 Jul 2022 23:24:04 +0000 (16:24 -0700)]
qlogic: qed: fix clang -Wformat warnings

When building with Clang we encounter these warnings:
| drivers/net/ethernet/qlogic/qed/qed_dev.c:416:30: error: format
| specifies type 'char' but the argument has type 'u32' (aka 'unsigned
| int') [-Werror,-Wformat] i);
-
| drivers/net/ethernet/qlogic/qed/qed_dev.c:630:13: error: format
| specifies type 'char' but the argument has type 'int' [-Werror,-Wformat]
| p_llh_info->num_ppfid - 1);

For the first warning, `i` is a u32 which is much wider than the format
specifier `%hhd` describes. This results in a loss of bits after 2^7.

The second warning involves implicit integer promotion as the resulting
type of addition cannot be smaller than an int.

example:
``
uint8_t a = 4, b = 7;
int size = sizeof(a + b - 1);
printf("%d\n", size);
// output: 4
```

See more:
(https://wiki.sei.cmu.edu/confluence/display/c/INT02-C.+Understand+integer+conversion+rules)
"Integer types smaller than int are promoted when an operation is
performed on them. If all values of the original type can be represented
as an int, the value of the smaller type is converted to an int;
otherwise, it is converted to an unsigned int."

Link: https://github.com/ClangBuiltLinux/linux/issues/378
Signed-off-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20220711232404.2189257-1-justinstitt@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoigb: add xdp frags support to ndo_xdp_xmit
Lorenzo Bianconi [Mon, 11 Jul 2022 23:07:51 +0000 (16:07 -0700)]
igb: add xdp frags support to ndo_xdp_xmit

Add the capability to map non-linear xdp frames in XDP_TX and
ndo_xdp_xmit callback.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20220711230751.3124415-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoMerge branch 'mptcp-support-changes-to-initial-subflow-priority'
Jakub Kicinski [Wed, 13 Jul 2022 01:37:22 +0000 (18:37 -0700)]
Merge branch 'mptcp-support-changes-to-initial-subflow-priority'

Mat Martineau says:

====================
mptcp: Support changes to initial subflow priority

This series updates the in-kernel MPTCP path manager to allow changes to
subflow priority for the first subflow created for each MPTCP connection
(the one created with the MP_CAPABLE handshake).

Patches 1 and 2 do some refactoring to simplify the new functionality.

Patch 3 introduces the new feature to change the initial subflow
priority and send the MP_PRIO header on that subflow.

Patch 4 cleans up code related to tracking endpoint ids on the initial
subflow.

Patch 5 adds a selftest to confirm that subflow priorities are updated
as expected.
====================

Link: https://lore.kernel.org/r/20220711191633.80826-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoselftests: mptcp: add MPC backup tests
Paolo Abeni [Mon, 11 Jul 2022 19:16:33 +0000 (12:16 -0700)]
selftests: mptcp: add MPC backup tests

Add a couple of test-cases covering the newly introduced
features - priority update for the MPC subflow.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agomptcp: more accurate MPC endpoint tracking
Paolo Abeni [Mon, 11 Jul 2022 19:16:32 +0000 (12:16 -0700)]
mptcp: more accurate MPC endpoint tracking

Currently the id accounting for the ID 0 subflow is not correct:
at creation time we mark (correctly) as unavailable the endpoint
id corresponding the MPC subflow source address, while at subflow
removal time set as available the id 0.

With this change we track explicitly the endpoint id corresponding
to the MPC subflow so that we can mark it as available at removal time.
Additionally this allow deleting the initial subflow via the NL PM
specifying the corresponding endpoint id.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agomptcp: allow the in kernel PM to set MPC subflow priority
Paolo Abeni [Mon, 11 Jul 2022 19:16:31 +0000 (12:16 -0700)]
mptcp: allow the in kernel PM to set MPC subflow priority

Any local endpoints configured on the address matching the
MPC subflow are currently ignored.

Specifically, setting a backup flag on them has no effect
on the first subflow, as the MPC handshake can't carry such
info.

This change refactors the MPC endpoint id accounting to
additionally fetch the priority info from the relevant endpoint
and eventually trigger the MP_PRIO handshake as needed.

As a result, the MPC subflow now switches to backup priority
after that the MPTCP socket is fully established, according
to the local endpoint configuration.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agomptcp: address lookup improvements
Paolo Abeni [Mon, 11 Jul 2022 19:16:30 +0000 (12:16 -0700)]
mptcp: address lookup improvements

When looking-up a socket address in the endpoint list, we
must prefer port-based matches over address only match.

Ensure that port-based endpoints are listed first, using
head insertion for them. Additionally be sure that only
port-based endpoints carry a non zero port number.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agomptcp: introduce and use mptcp_pm_send_ack()
Paolo Abeni [Mon, 11 Jul 2022 19:16:29 +0000 (12:16 -0700)]
mptcp: introduce and use mptcp_pm_send_ack()

The in-kernel PM has a bit of duplicate code related to ack
generation. Create a new helper factoring out the PM-specific
needs and use it in a couple of places.

As a bonus, mptcp_subflow_send_ack() is not used anymore
outside its own compilation unit and can become static.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: ip_tunnel: use strscpy to replace strlcpy
XueBing Chen [Mon, 11 Jul 2022 13:55:37 +0000 (21:55 +0800)]
net: ip_tunnel: use strscpy to replace strlcpy

The strlcpy should not be used because it doesn't limit the source
length. Preferred is strscpy.

Signed-off-by: XueBing Chen <chenxuebing@jari.cn>
Link: https://lore.kernel.org/r/2a08f6c1.e30.181ed8b49ad.Coremail.chenxuebing@jari.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agotcp: make retransmitted SKB fit into the send window
Yonglong Li [Mon, 11 Jul 2022 09:47:18 +0000 (17:47 +0800)]
tcp: make retransmitted SKB fit into the send window

current code of __tcp_retransmit_skb only check TCP_SKB_CB(skb)->seq
in send window, and TCP_SKB_CB(skb)->seq_end maybe out of send window.
If receiver has shrunk his window, and skb is out of new window,  it
should retransmit a smaller portion of the payload.

test packetdrill script:
    0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
   +0 fcntl(3, F_GETFL) = 0x2 (flags O_RDWR)
   +0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0

   +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
   +0 > S 0:0(0)  win 65535 <mss 1460,sackOK,TS val 100 ecr 0,nop,wscale 8>
 +.05 < S. 0:0(0) ack 1 win 6000 <mss 1000,nop,nop,sackOK>
   +0 > . 1:1(0) ack 1

   +0 write(3, ..., 10000) = 10000

   +0 > . 1:2001(2000) ack 1 win 65535
   +0 > . 2001:4001(2000) ack 1 win 65535
   +0 > . 4001:6001(2000) ack 1 win 65535

 +.05 < . 1:1(0) ack 4001 win 1001

and tcpdump show:
192.168.226.67.55 > 192.0.2.1.8080: Flags [.], seq 1:2001, ack 1, win 65535, length 2000
192.168.226.67.55 > 192.0.2.1.8080: Flags [.], seq 2001:4001, ack 1, win 65535, length 2000
192.168.226.67.55 > 192.0.2.1.8080: Flags [P.], seq 4001:5001, ack 1, win 65535, length 1000
192.168.226.67.55 > 192.0.2.1.8080: Flags [.], seq 5001:6001, ack 1, win 65535, length 1000
192.0.2.1.8080 > 192.168.226.67.55: Flags [.], ack 4001, win 1001, length 0
192.168.226.67.55 > 192.0.2.1.8080: Flags [.], seq 5001:6001, ack 1, win 65535, length 1000
192.168.226.67.55 > 192.0.2.1.8080: Flags [P.], seq 4001:5001, ack 1, win 65535, length 1000

when cient retract window to 1001, send window is [4001,5002],
but TLP send 5001-6001 packet which is out of send window.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Yonglong Li <liyonglong@chinatelecom.cn>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/1657532838-20200-1-git-send-email-liyonglong@chinatelecom.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonfp: support TX VLAN ctag insert in NFDK
Diana Wang [Mon, 11 Jul 2022 09:30:48 +0000 (11:30 +0200)]
nfp: support TX VLAN ctag insert in NFDK

Add support for TX VLAN ctag insert
which may be configured via ethtool.
e.g.
     # ethtool -K $DEV tx-vlan-offload on

The NIC supplies VLAN insert information as packet metadata.
The fields of this VLAN metadata including vlan_proto and vlan tag.

Configuration control bit NFP_NET_CFG_CTRL_TXVLAN_V2 is to
signal availability of ctag-insert features of the firmware.

NFDK is used to communicate via PCIE to NFP-3800 based NICs
while NFD3 is used for other NICs supported by the NFP driver.
This features is currently implemented only for NFD3 and
this patch adds support for it with NFDK.

Signed-off-by: Diana Wang <na.wang@corigine.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20220711093048.1911698-1-simon.horman@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonfp: fix clang -Wformat warnings
Justin Stitt [Tue, 12 Jul 2022 00:01:52 +0000 (17:01 -0700)]
nfp: fix clang -Wformat warnings

When building with Clang we encounter these warnings:
| drivers/net/ethernet/netronome/nfp/nfp_app.c:233:99: error: format
| specifies type 'unsigned char' but the argument has underlying type
| 'unsigned int' [-Werror,-Wformat] nfp_err(pf->cpp, "unknown FW app ID
| 0x%02hhx, driver too old or support for FW not built in\n", id);
-
| drivers/net/ethernet/netronome/nfp/nfp_main.c:396:11: error: format
| specifies type 'unsigned char' but the argument has type 'int'
| [-Werror,-Wformat] serial, interface >> 8, interface & 0xff);

Correct format specifier for `id` is `%x` since the default type for the
`nfp_app_id` enum is `unsigned int`. The second warning is also solved
by using the `%x` format specifier as the expressions involving
`interface` are implicity promoted to integers (%x is used to maintain
hexadecimal representation).

Link: https://github.com/ClangBuiltLinux/linux/issues/378
Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20220712000152.2292031-1-justinstitt@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoMerge branch 'dt-bindings-net-convert-sff-sfp-to-dtschema'
Jakub Kicinski [Wed, 13 Jul 2022 00:27:20 +0000 (17:27 -0700)]
Merge branch 'dt-bindings-net-convert-sff-sfp-to-dtschema'

Ioana Ciornei says:

====================
dt-bindings: net: convert sff,sfp to dtschema

This patch set converts the sff,sfp to dtschema.

The first patch does a somewhat mechanical conversion without changing
anything else beside the format in which the dt binding is presented.

In the second patch we rename some dt nodes to be generic. The last two
patches change the GPIO related properties so that they uses the -gpios
preferred suffix. This way, all the DTBs are passing the validation
against the sff,sfp.yaml binding.
====================

Link: https://lore.kernel.org/r/20220707091437.446458-1-ioana.ciornei@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoarch: arm64: dts: marvell: rename the sfp GPIO properties
Ioana Ciornei [Thu, 7 Jul 2022 09:14:37 +0000 (12:14 +0300)]
arch: arm64: dts: marvell: rename the sfp GPIO properties

Rename the GPIO related sfp properties to include the preffered -gpios
suffix. Also, with this change the dtb_check will no longer complain
when trying to verify the DTS against the sff,sfp.yaml binding.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoarch: arm64: dts: lx2160a-clearfog-itx: rename the sfp GPIO properties
Ioana Ciornei [Thu, 7 Jul 2022 09:14:36 +0000 (12:14 +0300)]
arch: arm64: dts: lx2160a-clearfog-itx: rename the sfp GPIO properties

Rename the 'mod-def0-gpio' property to 'mod-def0-gpios' so that we use
the preferred -gpios suffix. Also, with this change the dtb_check will
not complain when trying to verify the DTS against the sff,sfp.yaml
binding.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agodt-bindings: net: sff,sfp: rename example dt nodes to be more generic
Ioana Ciornei [Thu, 7 Jul 2022 09:14:35 +0000 (12:14 +0300)]
dt-bindings: net: sff,sfp: rename example dt nodes to be more generic

Rename the dt nodes shown in the sff,sfp.yaml examples so that they are
generic and not really tied to a specific platform.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agodt-bindings: net: convert sff,sfp to dtschema
Ioana Ciornei [Thu, 7 Jul 2022 09:14:34 +0000 (12:14 +0300)]
dt-bindings: net: convert sff,sfp to dtschema

Convert the sff,sfp.txt bindings to the DT schema format.
Also add the new path to the list of maintained files.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: change the type of ip_route_input_rcu to static
Zhengchao Shao [Mon, 11 Jul 2022 07:35:49 +0000 (15:35 +0800)]
net: change the type of ip_route_input_rcu to static

The type of ip_route_input_rcu should be static.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20220711073549.8947-1-shaozhengchao@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
22 months agoMerge branch 'mlx5-devlink-mutex-removal-part-1'
Paolo Abeni [Tue, 12 Jul 2022 08:26:28 +0000 (10:26 +0200)]
Merge branch 'mlx5-devlink-mutex-removal-part-1'

Moshe Shemesh Says:
===================
1) Fix devlink lock in mlx5 devlink eswitch callbacks

Following the commit 14e426bf1a4d "devlink: hold the instance lock
during eswitch_mode callbacks" which takes devlink instance lock for all
devlink eswitch callbacks and adds a temporary workaround, this patchset
removes the workaround, replaces devlink API functions by devl_ API
where called from mlx5 driver eswitch callbacks flows and adds devlink
instance lock in other driver's path that leads to these functions.
While moving to devl_ API the patchset removes part of the devlink API
functions which mlx5 was the last one to use and so not used by any
driver now.

The patchset also remove DEVLINK_NL_FLAG_NO_LOCK flag from the callbacks
of port_new/port which are called only from mlx5 driver and the already
locked by the patchset as parallel paths to the eswitch callbacks using
devl_ API functions.

This patchset will be followed by another patchset that will remove
DEVLINK_NL_FLAG_NO_LOCK flag from devlink reload and devlink health
callbacks. Thus we will have all devlink callbacks locked and it will
pave the way to remove devlink mutex.
===================

Link: https://lore.kernel.org/r/20220711081408.69452-1-saeed@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
22 months agodevlink: Hold the instance lock in port_new / port_del callbacks
Moshe Shemesh [Mon, 11 Jul 2022 08:14:08 +0000 (01:14 -0700)]
devlink: Hold the instance lock in port_new / port_del callbacks

Let the core take the devlink instance lock around port_new and port_del
callbacks and remove the now redundant locking in the only driver that
currently use them.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
22 months agonet/mlx5: Remove devl_unlock from mlx5_devlink_eswitch_mode_set
Moshe Shemesh [Mon, 11 Jul 2022 08:14:07 +0000 (01:14 -0700)]
net/mlx5: Remove devl_unlock from mlx5_devlink_eswitch_mode_set

The callback mlx5_devlink_eswitch_mode_set() had unlocked devlink as a
temporary workaround once devlink instance lock was added to devlink
eswitch callbacks. Now that all flows triggered by this function
that took devlink lock are using devl_ API and all parallel paths are
locked we can remove this workaround.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
22 months agonet/mlx5: Use devl_ API in mlx5e_devlink_port_register
Moshe Shemesh [Mon, 11 Jul 2022 08:14:06 +0000 (01:14 -0700)]
net/mlx5: Use devl_ API in mlx5e_devlink_port_register

As part of the flows invoked by mlx5_devlink_eswitch_mode_set() get to
mlx5_rescan_drivers_locked() which can call mlx5e_probe()/mlx5e_remove
and register/unregister mlx5e driver ports accordingly. This can lead to
deadlock once mlx5_devlink_eswitch_mode_set() will use devlink lock.
Use devl_port_register/unregister() instead of
devlink_port_register/unregister() and add devlink instance locks in the
driver paths to this function to have it locked while calling devl_ API
function.

If remove or probe were called by module init or module cleanup flows,
need to lock devlink just before calling devl_port_register(), otherwise
it is called by attach/detach or register/unregister flow and we can
have the flow locked. Added flag to distinguish between these cases.

This will be used by the downstream patch to invoke
mlx5_devlink_eswitch_mode_set() with devlink locked.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
22 months agodevlink: Remove unused functions devlink_rate_leaf_create/destroy
Moshe Shemesh [Mon, 11 Jul 2022 08:14:05 +0000 (01:14 -0700)]
devlink: Remove unused functions devlink_rate_leaf_create/destroy

The previous patch removed the last usage of the functions
devlink_rate_leaf_create() and devlink_rate_nodes_destroy(). Thus,
remove these function from devlink API.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
22 months agonet/mlx5: Use devl_ API in mlx5_esw_devlink_sf_port_register
Moshe Shemesh [Mon, 11 Jul 2022 08:14:04 +0000 (01:14 -0700)]
net/mlx5: Use devl_ API in mlx5_esw_devlink_sf_port_register

The function mlx5_esw_devlink_sf_port_register() calls
devlink_port_register() and devlink_rate_leaf_create(). Use devl_ API to
call devl_port_register() and devl_rate_leaf_create() accordingly and
add devlink instance lock in driver paths to this function.

Similarly, use devl_ API to call devl_port_unregister() and
devl_rate_leaf_destroy() in mlx5_esw_devlink_sf_port_unregister() and
ensure locking devlink instance lock on all the paths to this function
too.

This will be used by the downstream patch to invoke
mlx5_devlink_eswitch_mode_set() with devlink lock held.

Note this patch is taking devlink lock on mlx5_devlink_sf_port_new/del()
which are devlink callbacks for port_new/del(). We will take these locks
off once these callbacks will be locked by devlink too.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
22 months agonet/mlx5: Use devl_ API in mlx5_esw_offloads_devlink_port_register
Moshe Shemesh [Mon, 11 Jul 2022 08:14:03 +0000 (01:14 -0700)]
net/mlx5: Use devl_ API in mlx5_esw_offloads_devlink_port_register

The function mlx5_esw_offloads_devlink_port_register() calls
devlink_port_register() and devlink_rate_leaf_create(). Use devl_ API to
call devl_port_register() and devl_rate_leaf_create() accordingly and
add devlink instance lock in driver paths to this function.

Similarly, use devl_ API to call devl_port_unregister() and
devl_rate_leaf_destroy() in mlx5_esw_offloads_devlink_port_unregister()
and ensure locking devlink instance lock on the paths to this function
too.

This will be used by the downstream patch to invoke
mlx5_devlink_eswitch_mode_set() with devlink lock held.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
22 months agodevlink: Remove unused function devlink_rate_nodes_destroy
Moshe Shemesh [Mon, 11 Jul 2022 08:14:02 +0000 (01:14 -0700)]
devlink: Remove unused function devlink_rate_nodes_destroy

The previous patch removed the last usage of the function
devlink_rate_nodes_destroy(). Thus, remove this function from devlink
API.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
22 months agonet/mlx5: Use devl_ API for rate nodes destroy
Moshe Shemesh [Mon, 11 Jul 2022 08:14:01 +0000 (01:14 -0700)]
net/mlx5: Use devl_ API for rate nodes destroy

Use devl_rate_nodes_destroy() instead of devlink_rate_nodes_destroy().
Add devlink instance lock in the driver paths to this function to have
it locked while calling devl_ API function.

This will be used by the downstream patch to invoke
mlx5_devlink_eswitch_mode_set() with devlink lock held.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
22 months agonet/mlx5: Remove devl_unlock from mlx5_eswtich_mode_callback_enter
Moshe Shemesh [Mon, 11 Jul 2022 08:14:00 +0000 (01:14 -0700)]
net/mlx5: Remove devl_unlock from mlx5_eswtich_mode_callback_enter

The function mlx5_eswtich_mode_callback_enter() was added as a temporary
workaround once devlink instance lock was added to devlink eswitch
callbacks. However, code review and testing show that all the callbacks
part to eswitch_mode_set don't take devlink instance lock in any flow
and so unlocking devlink instance lock while entering these functions is
not needed.

Remove devl_lock from mlx5_eswtich_mode_callback_enter() and devl_unlock
from mlx5_eswtich_mode_callback_exit(). Also remove the functions
mlx5_eswtich_mode_callback_enter()/exit() as they are not needed any
more. The callback eswitch_mode_set will be treated separately in the
following patches.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
22 months agoamd-xgbe: fix clang -Wformat warnings
Justin Stitt [Fri, 8 Jul 2022 23:26:53 +0000 (16:26 -0700)]
amd-xgbe: fix clang -Wformat warnings

When building with Clang we encounter the following warning:
| drivers/net/ethernet/amd/xgbe/xgbe-dcb.c:234:42: error: format specifies
| type 'unsigned char' but the argument has type '__u16' (aka 'unsigned
| short') [-Werror,-Wformat] pfc->pfc_cap, pfc->pfc_en, pfc->mbc,
| pfc->delay);

pfc->pfc_cap , pfc->pfc_cn, pfc->mbc are all of type `u8` while pfc->delay is
of type `u16`. The correct format specifiers `%hh[u|x]` were used for
the first three but not for pfc->delay, which is causing the warning
above.

Variadic functions (printf-like) undergo default argument promotion.
Documentation/core-api/printk-formats.rst specifically recommends using
the promoted-to-type's format flag. In this case `%d` (or `%x` to
maintain hex representation) should be used since both u8's and u16's
are fully representable by an int.

Moreover, C11 6.3.1.1 states:
(https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf) `If an int
can represent all values of the original type ..., the value is
converted to an int; otherwise, it is converted to an unsigned int.
These are called the integer promotions.`

Link: https://github.com/ClangBuiltLinux/linux/issues/378
Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20220708232653.556488-1-justinstitt@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoatm: he: Use the bitmap API to allocate bitmaps
Christophe JAILLET [Sat, 9 Jul 2022 15:05:45 +0000 (17:05 +0200)]
atm: he: Use the bitmap API to allocate bitmaps

Use bitmap_zalloc()/bitmap_free() instead of hand-writing them.

It is less verbose and it improves the semantic.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/7f795bd6d5b2a00f581175b7069b229c2e5a4192.1657379127.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet/fq_impl: Use the bitmap API to allocate bitmaps
Christophe JAILLET [Sat, 9 Jul 2022 14:37:53 +0000 (16:37 +0200)]
net/fq_impl: Use the bitmap API to allocate bitmaps

Use bitmap_zalloc()/bitmap_free() instead of hand-writing them.

It is less verbose and it improves the semantic.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/c7bf099af07eb497b02d195906ee8c11fea3b3bd.1657377335.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: dsa: hellcreek: Use the bitmap API to allocate bitmaps
Christophe JAILLET [Sat, 9 Jul 2022 14:24:45 +0000 (16:24 +0200)]
net: dsa: hellcreek: Use the bitmap API to allocate bitmaps

Use devm_bitmap_zalloc() instead of hand-writing them.

It is less verbose and it improves the semantic.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Kurt Kanzenbach <kurt@linutronix.de>
Link: https://lore.kernel.org/r/8306e2ae69a5d8553691f5d10a86a4390daf594b.1657376651.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoMerge branch 'tls-rx-follow-ups-to-nopad'
Jakub Kicinski [Tue, 12 Jul 2022 02:48:36 +0000 (19:48 -0700)]
Merge branch 'tls-rx-follow-ups-to-nopad'

Jakub Kicinski says:

====================
tls: rx: follow-ups to NoPad

A few fixes for issues spotted by Maxim.
====================

Link: https://lore.kernel.org/r/20220709025255.323864-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoselftests: tls: add test for NoPad getsockopt
Jakub Kicinski [Sat, 9 Jul 2022 02:52:55 +0000 (19:52 -0700)]
selftests: tls: add test for NoPad getsockopt

Make sure setsockopt / getsockopt behave as expected.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agotls: rx: fix the NoPad getsockopt
Jakub Kicinski [Sat, 9 Jul 2022 02:52:54 +0000 (19:52 -0700)]
tls: rx: fix the NoPad getsockopt

Maxim reports do_tls_getsockopt_no_pad() will
always return an error. Indeed looks like refactoring
gone wrong - remove err and use value.

Reported-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Fixes: 88527790c079 ("tls: rx: add sockopt for enabling optimistic decrypt with TLS 1.3")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agotls: rx: add counter for NoPad violations
Jakub Kicinski [Sat, 9 Jul 2022 02:52:53 +0000 (19:52 -0700)]
tls: rx: add counter for NoPad violations

As discussed with Maxim add a counter for true NoPad violations.
This should help deployments catch unexpected padded records vs
just control records which always need re-encryption.

https: //lore.kernel.org/all/b111828e6ac34baad9f4e783127eba8344ac252d.camel@nvidia.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agotls: fix spelling of MIB
Jakub Kicinski [Sat, 9 Jul 2022 02:52:52 +0000 (19:52 -0700)]
tls: fix spelling of MIB

MIN -> MIB

Fixes: 88527790c079 ("tls: rx: add sockopt for enabling optimistic decrypt with TLS 1.3")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agobcm63xx: fix Tx cleanup when NAPI poll budget is zero
Sieng-Piaw Liew [Fri, 8 Jul 2022 08:03:03 +0000 (16:03 +0800)]
bcm63xx: fix Tx cleanup when NAPI poll budget is zero

NAPI poll() function may be passed a budget value of zero, i.e. during
netpoll, which isn't NAPI context.
Therefore, napi_consume_skb() must be given budget value instead of
!force to truly discern netpoll-like scenarios.

Fixes: c63c615e22eb ("bcm63xx_enet: switch to napi_build_skb() to reuse skbuff_heads")
Signed-off-by: Sieng-Piaw Liew <liew.s.piaw@gmail.com>
Link: https://lore.kernel.org/r/20220708080303.298-1-liew.s.piaw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoMerge branch 'octeontx2-exact-match-table'
Jakub Kicinski [Mon, 11 Jul 2022 17:52:18 +0000 (10:52 -0700)]
Merge branch 'octeontx2-exact-match-table'

Ratheesh Kannoth says:

====================
octeontx2: Exact Match Table.

Exact match table and Field hash support for CN10KB silicon
====================

Link: https://lore.kernel.org/r/20220708044151.2972645-1-rkannoth@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoocteontx2-af: Enable Exact match flag in kex profile
Ratheesh Kannoth [Fri, 8 Jul 2022 04:41:51 +0000 (10:11 +0530)]
octeontx2-af: Enable Exact match flag in kex profile

Enabled EXACT match flag in Kex default profile. Since
there is no space in key, NPC_PARSE_NIBBLE_ERRCODE
is removed

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoocteontx2-pf: Add support for exact match table.
Ratheesh Kannoth [Fri, 8 Jul 2022 04:41:50 +0000 (10:11 +0530)]
octeontx2-pf: Add support for exact match table.

NPC exact match table can support more entries than RPM
dmac filters. This requires field size of DMAC filter count
and index to be increased.

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoocteontx2-af: Invoke exact match functions if supported
Ratheesh Kannoth [Fri, 8 Jul 2022 04:41:49 +0000 (10:11 +0530)]
octeontx2-af: Invoke exact match functions if supported

If exact match table is supported, call functions to add/del/update
entries in exact match table instead of RPM dmac filters

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoocteontx2-af: Wrapper functions for MAC addr add/del/update/reset
Ratheesh Kannoth [Fri, 8 Jul 2022 04:41:48 +0000 (10:11 +0530)]
octeontx2-af: Wrapper functions for MAC addr add/del/update/reset

These functions are wrappers for mac add/addr/del/update in
exact match table. These will be invoked from mbox handler routines
if exact matct table is supported and enabled.

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoocteontx2: Modify mbox request and response structures
Ratheesh Kannoth [Fri, 8 Jul 2022 04:41:47 +0000 (10:11 +0530)]
octeontx2: Modify mbox request and response structures

Exact match table modification requires wider fields as it has
more number of slots to fill in. Modifying an entry in exact match
table may cause hash collision and may be required to delete entry
from 4-way 2K table and add to fully associative 32 entry CAM table.

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoocteontx2-af: Debugsfs support for exact match.
Ratheesh Kannoth [Fri, 8 Jul 2022 04:41:46 +0000 (10:11 +0530)]
octeontx2-af: Debugsfs support for exact match.

There debugfs files created.
1. General information on exact match table
2. Exact match table entries.
3. NPC mcam drop on hit count stats.

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoocteontx2-af: Drop rules for NPC MCAM
Ratheesh Kannoth [Fri, 8 Jul 2022 04:41:45 +0000 (10:11 +0530)]
octeontx2-af: Drop rules for NPC MCAM

NPC exact match table installs drop on hit rules in
NPC mcam for each channel. This rule has broadcast and multicast
bits cleared. Exact match bit cleared and channel bits
set. If exact match table hit bit is 0, corresponding NPC mcam
drop rule will be hit for the packet and will be dropped.

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoocteontx2-af: FLR handler for exact match table.
Ratheesh Kannoth [Fri, 8 Jul 2022 04:41:44 +0000 (10:11 +0530)]
octeontx2-af: FLR handler for exact match table.

FLR handler should remove/free all exact match table resources
corresponding to each interface.

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoocteontx2-af: devlink configuration support
Ratheesh Kannoth [Fri, 8 Jul 2022 04:41:43 +0000 (10:11 +0530)]
octeontx2-af: devlink configuration support

CN10KB silicon supports Exact match feature. This feature can be disabled
through devlink configuration. Devlink command fails if DMAC filter rules
are already present. Once disabled, legacy RPM based DMAC filters will be
configured.

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoocteontx2-af: Exact match scan from kex profile
Ratheesh Kannoth [Fri, 8 Jul 2022 04:41:42 +0000 (10:11 +0530)]
octeontx2-af: Exact match scan from kex profile

CN10KB silicon supports exact match table. Scanning KEX
profile should check for exact match feature is enabled
and then set profile masks properly.

These kex profile masks are required to configure NPC
MCAM drop rules. If there is a miss in exact match table,
these drop rules will drop those packets.

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoocteontx2-af: Exact match support
Ratheesh Kannoth [Fri, 8 Jul 2022 04:41:41 +0000 (10:11 +0530)]
octeontx2-af: Exact match support

CN10KB silicon has support for exact match table. This table
can be used to match maimum 64 bit value of KPU parsed output.
Hit/non hit in exact match table can be used as a KEX key to
NPC mcam.

This patch makes use of Exact match table to increase number of
DMAC filters supported. NPC  mcam is no more need for each of these
DMAC entries as will be populated in Exact match table.

This patch implements following

1. Initialization of exact match table only for CN10KB.
2. Add/del/update interface function for exact match table.

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoocteontx2-af: Use hashed field in MCAM key
Ratheesh Kannoth [Fri, 8 Jul 2022 04:41:40 +0000 (10:11 +0530)]
octeontx2-af: Use hashed field in MCAM key

CN10KB variant of CN10K series of silicons supports
a new feature where in a large protocol field
(eg 128bit IPv6 DIP) can be condensed into a small
hashed 32bit data. This saves a lot of space in MCAM key
and allows user to add more protocol fields into the filter.
A max of two such protocol data can be hashed.
This patch adds support for hashing IPv6 SIP and/or DIP.

Signed-off-by: Suman Ghosh <sumang@marvell.com>
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agofddi/skfp: fix repeated words in comments
Jilin Yuan [Fri, 8 Jul 2022 15:03:54 +0000 (23:03 +0800)]
fddi/skfp: fix repeated words in comments

Delete the redundant word 'test'.

Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agoethernet/via: fix repeated words in comments
Jilin Yuan [Fri, 8 Jul 2022 14:53:04 +0000 (22:53 +0800)]
ethernet/via: fix repeated words in comments

Delete the redundant word 'driver'.

Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: Find dst with sk's xfrm policy not ctl_sk
sewookseo [Thu, 7 Jul 2022 10:01:39 +0000 (10:01 +0000)]
net: Find dst with sk's xfrm policy not ctl_sk

If we set XFRM security policy by calling setsockopt with option
IPV6_XFRM_POLICY, the policy will be stored in 'sock_policy' in 'sock'
struct. However tcp_v6_send_response doesn't look up dst_entry with the
actual socket but looks up with tcp control socket. This may cause a
problem that a RST packet is sent without ESP encryption & peer's TCP
socket can't receive it.
This patch will make the function look up dest_entry with actual socket,
if the socket has XFRM policy(sock_policy), so that the TCP response
packet via this function can be encrypted, & aligned on the encrypted
TCP socket.

Tested: We encountered this problem when a TCP socket which is encrypted
in ESP transport mode encryption, receives challenge ACK at SYN_SENT
state. After receiving challenge ACK, TCP needs to send RST to
establish the socket at next SYN try. But the RST was not encrypted &
peer TCP socket still remains on ESTABLISHED state.
So we verified this with test step as below.
[Test step]
1. Making a TCP state mismatch between client(IDLE) & server(ESTABLISHED).
2. Client tries a new connection on the same TCP ports(src & dst).
3. Server will return challenge ACK instead of SYN,ACK.
4. Client will send RST to server to clear the SOCKET.
5. Client will retransmit SYN to server on the same TCP ports.
[Expected result]
The TCP connection should be established.

Cc: Maciej Żenczykowski <maze@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Sehee Lee <seheele@google.com>
Signed-off-by: Sewook Seo <sewookseo@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agoMerge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Jakub Kicinski [Sat, 9 Jul 2022 19:24:15 +0000 (12:24 -0700)]
Merge https://git./linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2022-07-09

We've added 94 non-merge commits during the last 19 day(s) which contain
a total of 125 files changed, 5141 insertions(+), 6701 deletions(-).

The main changes are:

1) Add new way for performing BTF type queries to BPF, from Daniel Müller.

2) Add inlining of calls to bpf_loop() helper when its function callback is
   statically known, from Eduard Zingerman.

3) Implement BPF TCP CC framework usability improvements, from Jörn-Thorben Hinz.

4) Add LSM flavor for attaching per-cgroup BPF programs to existing LSM
   hooks, from Stanislav Fomichev.

5) Remove all deprecated libbpf APIs in prep for 1.0 release, from Andrii Nakryiko.

6) Add benchmarks around local_storage to BPF selftests, from Dave Marchevsky.

7) AF_XDP sample removal (given move to libxdp) and various improvements around AF_XDP
   selftests, from Magnus Karlsson & Maciej Fijalkowski.

8) Add bpftool improvements for memcg probing and bash completion, from Quentin Monnet.

9) Add arm64 JIT support for BPF-2-BPF coupled with tail calls, from Jakub Sitnicki.

10) Sockmap optimizations around throughput of UDP transmissions which have been
    improved by 61%, from Cong Wang.

11) Rework perf's BPF prologue code to remove deprecated functions, from Jiri Olsa.

12) Fix sockmap teardown path to avoid sleepable sk_psock_stop, from John Fastabend.

13) Fix libbpf's cleanup around legacy kprobe/uprobe on error case, from Chuang Wang.

14) Fix libbpf's bpf_helpers.h to work with gcc for the case of its sec/pragma
    macro, from James Hilliard.

15) Fix libbpf's pt_regs macros for riscv to use a0 for RC register, from Yixun Lan.

16) Fix bpftool to show the name of type BPF_OBJ_LINK, from Yafang Shao.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (94 commits)
  selftests/bpf: Fix xdp_synproxy build failure if CONFIG_NF_CONNTRACK=m/n
  bpf: Correctly propagate errors up from bpf_core_composites_match
  libbpf: Disable SEC pragma macro on GCC
  bpf: Check attach_func_proto more carefully in check_return_code
  selftests/bpf: Add test involving restrict type qualifier
  bpftool: Add support for KIND_RESTRICT to gen min_core_btf command
  MAINTAINERS: Add entry for AF_XDP selftests files
  selftests, xsk: Rename AF_XDP testing app
  bpf, docs: Remove deprecated xsk libbpf APIs description
  selftests/bpf: Add benchmark for local_storage RCU Tasks Trace usage
  libbpf, riscv: Use a0 for RC register
  libbpf: Remove unnecessary usdt_rel_ip assignments
  selftests/bpf: Fix few more compiler warnings
  selftests/bpf: Fix bogus uninitialized variable warning
  bpftool: Remove zlib feature test from Makefile
  libbpf: Cleanup the legacy uprobe_event on failed add/attach_event()
  libbpf: Fix wrong variable used in perf_event_uprobe_open_legacy()
  libbpf: Cleanup the legacy kprobe_event on failed add/attach_event()
  selftests/bpf: Add type match test against kernel's task_struct
  selftests/bpf: Add nested type to type based tests
  ...
====================

Link: https://lore.kernel.org/r/20220708233145.32365-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoixp4xx_eth: Set MAC address from device tree
Linus Walleij [Fri, 8 Jul 2022 23:55:30 +0000 (01:55 +0200)]
ixp4xx_eth: Set MAC address from device tree

If there is a MAC address specified in the device tree, then
use it. This is already perfectly legal to specify in accordance
with the generic ethernet-controller.yaml schema.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agoixp4xx_eth: Fall back to random MAC address
Linus Walleij [Fri, 8 Jul 2022 23:55:29 +0000 (01:55 +0200)]
ixp4xx_eth: Fall back to random MAC address

If the firmware does not provide a MAC address to the driver,
fall back to generating a random MAC address.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agoaf_unix: fix unix_sysctl_register() error path
Eric Dumazet [Fri, 8 Jul 2022 16:28:58 +0000 (16:28 +0000)]
af_unix: fix unix_sysctl_register() error path

We want to kfree(table) if @table has been kmalloced,
ie for non initial network namespace.

Fixes: 849d5aa3a1d8 ("af_unix: Do not call kmemdup() for init_net's sysctl table.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agoMerge branch 'mptcp-selftest-improvements-and-header-tweak'
David S. Miller [Sat, 9 Jul 2022 11:19:24 +0000 (12:19 +0100)]
Merge branch 'mptcp-selftest-improvements-and-header-tweak'

Mat Martineau says:

====================
mptcp: Self test improvements and a header tweak

Patch 1 moves a definition to a header so it can be used in a struct
declaration.

Patch 2 adjusts a time threshold for a selftest that runs much slower on
debug kernels (and even more on slow CI infrastructure), to reduce
spurious failures.

Patches 3 & 4 improve userspace PM test coverage.

Patches 5 & 6 clean up output from a test script and selftest helper
tool.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agoselftests: mptcp: update pm_nl_ctl usage header
Geliang Tang [Fri, 8 Jul 2022 17:14:13 +0000 (10:14 -0700)]
selftests: mptcp: update pm_nl_ctl usage header

The usage header of pm_nl_ctl command doesn't match with the context. So
this patch adds the missing userspace PM keywords 'ann', 'rem', 'csf',
'dsf', 'events' and 'listen' in it.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agoselftests: mptcp: avoid Terminated messages in userspace_pm
Geliang Tang [Fri, 8 Jul 2022 17:14:12 +0000 (10:14 -0700)]
selftests: mptcp: avoid Terminated messages in userspace_pm

There're some 'Terminated' messages in the output of userspace pm tests
script after killing './pm_nl_ctl events' processes:

Created network namespaces ns1, ns2          [OK]
./userspace_pm.sh: line 166: 13735 Terminated              ip netns exec "$ns2" ./pm_nl_ctl events >> "$client_evts" 2>&1
./userspace_pm.sh: line 172: 13737 Terminated              ip netns exec "$ns1" ./pm_nl_ctl events >> "$server_evts" 2>&1
Established IPv4 MPTCP Connection ns2 => ns1     [OK]
./userspace_pm.sh: line 166: 13753 Terminated              ip netns exec "$ns2" ./pm_nl_ctl events >> "$client_evts" 2>&1
./userspace_pm.sh: line 172: 13755 Terminated              ip netns exec "$ns1" ./pm_nl_ctl events >> "$server_evts" 2>&1
Established IPv6 MPTCP Connection ns2 => ns1     [OK]
ADD_ADDR 10.0.2.2 (ns2) => ns1, invalid token     [OK]

This patch adds a helper kill_wait(), in it using 'wait $pid 2>/dev/null'
commands after 'kill $pid' to avoid printing out these Terminated messages.
Use this helper instead of using 'kill $pid'.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agoselftests: mptcp: userspace pm subflow tests
Geliang Tang [Fri, 8 Jul 2022 17:14:11 +0000 (10:14 -0700)]
selftests: mptcp: userspace pm subflow tests

This patch adds userspace pm subflow tests support for mptcp_join.sh
script. Add userspace pm create subflow and destroy test cases in
userspace_tests().

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agoselftests: mptcp: userspace pm address tests
Geliang Tang [Fri, 8 Jul 2022 17:14:10 +0000 (10:14 -0700)]
selftests: mptcp: userspace pm address tests

This patch adds userspace pm tests support for mptcp_join.sh script. Add
userspace pm add_addr and rm_addr test cases in userspace_tests().

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agoselftests: mptcp: tweak simult_flows for debug kernels
Paolo Abeni [Fri, 8 Jul 2022 17:14:09 +0000 (10:14 -0700)]
selftests: mptcp: tweak simult_flows for debug kernels

The mentioned test measures the transfer run-time to verify
that the user-space program is able to use the full aggregate B/W.

Even on (virtual) link-speed-bound tests, debug kernel can slow
down the transfer enough to cause sporadic test failures.

Instead of unconditionally raising the maximum allowed run-time,
tweak when the running kernel is a debug one, and use some simple/
rough heuristic to guess such scenarios.

Note: this intentionally avoids looking for /boot/config-<version> as
the latter file is not always available in our reference CI
environments.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Co-developed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agomptcp: move MPTCPOPT_HMAC_LEN to net/mptcp.h
Geliang Tang [Fri, 8 Jul 2022 17:14:08 +0000 (10:14 -0700)]
mptcp: move MPTCPOPT_HMAC_LEN to net/mptcp.h

Move macro MPTCPOPT_HMAC_LEN definition from net/mptcp/protocol.h to
include/net/mptcp.h.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agobcm63xx_enet: change the driver variables to static
Yang Yingliang [Thu, 7 Jul 2022 13:58:01 +0000 (21:58 +0800)]
bcm63xx_enet: change the driver variables to static

bcm63xx_enetsw_driver and bcm63xx_enet_driver are only used in
bcm63xx_enet.c now, change them to static.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20220707135801.1483941-1-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: phylink: fix SGMII inband autoneg enable
Russell King (Oracle) [Thu, 7 Jul 2022 09:20:02 +0000 (10:20 +0100)]
net: phylink: fix SGMII inband autoneg enable

When we are operating in SGMII inband mode, it implies that there is a
PHY connected, and the ethtool advertisement for autoneg applies to
the PHY, not the SGMII link. When in 1000base-X mode, then this applies
to the 802.3z link and needs to be applied to the PCS.

Fix this.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1o9Ng2-005Qbe-3H@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoDocumentation: add a description for net.core.high_order_alloc_disable
Antoine Tenart [Thu, 7 Jul 2022 08:02:45 +0000 (10:02 +0200)]
Documentation: add a description for net.core.high_order_alloc_disable

A description is missing for the net.core.high_order_alloc_disable
option in admin-guide/sysctl/net.rst ; add it. The above sysctl option
was introduced by commit ce27ec60648d ("net: add high_order_alloc_disable
sysctl/static key").

Thanks to Eric for running again the benchmark cited in the above
commit, showing this knob is now mostly of historical importance.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220707080245.180525-1-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: rxrpc: fix clang -Wformat warning
Justin Stitt [Thu, 7 Jul 2022 18:20:52 +0000 (11:20 -0700)]
net: rxrpc: fix clang -Wformat warning

When building with Clang we encounter this warning:
| net/rxrpc/rxkad.c:434:33: error: format specifies type 'unsigned short'
| but the argument has type 'u32' (aka 'unsigned int') [-Werror,-Wformat]
| _leave(" = %d [set %hx]", ret, y);

y is a u32 but the format specifier is `%hx`. Going from unsigned int to
short int results in a loss of data. This is surely not intended
behavior. If it is intended, the warning should be suppressed through
other means.

This patch should get us closer to the goal of enabling the -Wformat
flag for Clang builds.

Link: https://github.com/ClangBuiltLinux/linux/issues/378
Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20220707182052.769989-1-justinstitt@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoMerge branch 'tls-pad-strparser-internal-header-decrypt_ctx-etc'
Jakub Kicinski [Sat, 9 Jul 2022 01:38:48 +0000 (18:38 -0700)]
Merge branch 'tls-pad-strparser-internal-header-decrypt_ctx-etc'

Jakub Kicinski says:

====================
tls: pad strparser, internal header, decrypt_ctx etc.

A grab bag of non-functional refactoring to make the series
which will let us decrypt into a fresh skb smaller.

Patches in this series are not strictly required to get the
decryption into a fresh skb going, they are more in the "things
which had been annoying me for a while" category.
====================

Link: https://lore.kernel.org/r/20220708010314.1451462-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agotls: rx: make tls_wait_data() return an recvmsg retcode
Jakub Kicinski [Fri, 8 Jul 2022 01:03:14 +0000 (18:03 -0700)]
tls: rx: make tls_wait_data() return an recvmsg retcode

tls_wait_data() sets the return code as an output parameter
and always returns ctx->recv_pkt on success.

Return the error code directly and let the caller read the skb
from the context. Use positive return code to indicate ctx->recv_pkt
is ready.

While touching the definition of the function rename it.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agotls: create an internal header
Jakub Kicinski [Fri, 8 Jul 2022 01:03:13 +0000 (18:03 -0700)]
tls: create an internal header

include/net/tls.h is getting a little long, and is probably hard
for driver authors to navigate. Split out the internals into a
header which will live under net/tls/. While at it move some
static inlines with a single user into the source files, add
a few tls_ prefixes and fix spelling of 'proccess'.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agotls: rx: coalesce exit paths in tls_decrypt_sg()
Jakub Kicinski [Fri, 8 Jul 2022 01:03:12 +0000 (18:03 -0700)]
tls: rx: coalesce exit paths in tls_decrypt_sg()

Jump to the free() call, instead of having to remember
to free the memory in multiple places.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agotls: rx: wrap decrypt params in a struct
Jakub Kicinski [Fri, 8 Jul 2022 01:03:11 +0000 (18:03 -0700)]
tls: rx: wrap decrypt params in a struct

The max size of iv + aad + tail is 22B. That's smaller
than a single sg entry (32B). Don't bother with the
memory packing, just create a struct which holds the
max size of those members.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agotls: rx: always allocate max possible aad size for decrypt
Jakub Kicinski [Fri, 8 Jul 2022 01:03:10 +0000 (18:03 -0700)]
tls: rx: always allocate max possible aad size for decrypt

AAD size is either 5 or 13. Really no point complicating
the code for the 8B of difference. This will also let us
turn the chunked up buffer into a sane struct.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agostrparser: pad sk_skb_cb to avoid straddling cachelines
Jakub Kicinski [Fri, 8 Jul 2022 01:03:09 +0000 (18:03 -0700)]
strparser: pad sk_skb_cb to avoid straddling cachelines

sk_skb_cb lives within skb->cb[]. skb->cb[] straddles
2 cache lines, each containing 24B of data.
The first cache line does not contain much interesting
information for users of strparser, so pad things a little.
Previously strp_msg->full_len would live in the first cache
line and strp_msg->offset in the second.

We need to reorder the 8 byte temp_reg with struct tls_msg
to prevent a 4B hole which would push the struct over 48B.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoselftests/bpf: Fix xdp_synproxy build failure if CONFIG_NF_CONNTRACK=m/n
Maxim Mikityanskiy [Fri, 8 Jul 2022 13:03:19 +0000 (16:03 +0300)]
selftests/bpf: Fix xdp_synproxy build failure if CONFIG_NF_CONNTRACK=m/n

When CONFIG_NF_CONNTRACK=m, struct bpf_ct_opts and enum member
BPF_F_CURRENT_NETNS are not exposed. This commit allows building the
xdp_synproxy selftest in such cases. Note that nf_conntrack must be
loaded before running the test if it's compiled as a module.

This commit also allows this selftest to be successfully compiled when
CONFIG_NF_CONNTRACK is disabled.

One unused local variable of type struct bpf_ct_opts is also removed.

Fixes: fb5cd0ce70d4 ("selftests/bpf: Add selftests for raw syncookie helpers")
Reported-by: Yauheni Kaliuta <ykaliuta@redhat.com>
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220708130319.1016294-1-maximmi@nvidia.com
22 months agobpf: Correctly propagate errors up from bpf_core_composites_match
Daniel Müller [Thu, 7 Jul 2022 21:19:31 +0000 (21:19 +0000)]
bpf: Correctly propagate errors up from bpf_core_composites_match

This change addresses a comment made earlier [0] about a missing return
of an error when __bpf_core_types_match is invoked from
bpf_core_composites_match, which could have let to us erroneously
ignoring errors.

Regarding the typedef name check pointed out in the same context, it is
not actually an issue, because callers of the function perform a name
check for the root type anyway. To make that more obvious, let's add
comments to the function (similar to what we have for
bpf_core_types_are_compat, which is called in pretty much the same
context).

[0]: https://lore.kernel.org/bpf/165708121449.4919.13204634393477172905.git-patchwork-notify@kernel.org/T/#m55141e8f8cfd2e8d97e65328fa04852870d01af6

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220707211931.3415440-1-deso@posteo.net
22 months agolibbpf: Disable SEC pragma macro on GCC
James Hilliard [Wed, 6 Jul 2022 11:18:38 +0000 (05:18 -0600)]
libbpf: Disable SEC pragma macro on GCC

It seems the gcc preprocessor breaks with pragmas when surrounding
__attribute__.

Disable these pragmas on GCC due to upstream bugs see:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55578
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90400

Fixes errors like:
error: expected identifier or '(' before '#pragma'
  106 | SEC("cgroup/bind6")
      | ^~~

error: expected '=', ',', ';', 'asm' or '__attribute__' before '#pragma'
  114 | char _license[] SEC("license") = "GPL";
      | ^~~

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220706111839.1247911-1-james.hilliard1@gmail.com