linux-2.6-microblaze.git
3 years agonet: mscc: ocelot: create TCAM skeleton from tc filter chains
Vladimir Oltean [Fri, 2 Oct 2020 12:02:22 +0000 (15:02 +0300)]
net: mscc: ocelot: create TCAM skeleton from tc filter chains

For Ocelot switches, there are 2 ingress pipelines for flow offload
rules: VCAP IS1 (Ingress Classification) and IS2 (Security Enforcement).
IS1 and IS2 support different sets of actions. The pipeline order for a
packet on ingress is:

Basic classification -> VCAP IS1 -> VCAP IS2

Furthermore, IS1 is looked up 3 times, and IS2 is looked up twice (each
TCAM entry can be configured to match only on the first lookup, or only
on the second, or on both etc).

Because the TCAMs are completely independent in hardware, and because of
the fixed pipeline, we actually have very limited options when it comes
to offloading complex rules to them while still maintaining the same
semantics with the software data path.

This patch maps flow offload rules to ingress TCAMs according to a
predefined chain index number. There is going to be a script in
selftests that clarifies the usage model.

There is also an egress TCAM (VCAP ES0, the Egress Rewriter), which is
modeled on top of the default chain 0 of the egress qdisc, because it
doesn't have multiple lookups.

Suggested-by: Allan W. Nielsen <allan.nielsen@microchip.com>
Co-developed-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mscc: ocelot: introduce conversion helpers between port and netdev
Vladimir Oltean [Fri, 2 Oct 2020 12:02:21 +0000 (15:02 +0300)]
net: mscc: ocelot: introduce conversion helpers between port and netdev

Since the mscc_ocelot_switch_lib is common between a pure switchdev and
a DSA driver, the procedure of retrieving a net_device for a certain
port index differs, as those are registered by their individual
front-ends.

Up to now that has been dealt with by always passing the port index to
the switch library, but now, we're going to need to work with net_device
pointers from the tc-flower offload, for things like indev, or mirred.
It is not desirable to refactor that, so let's make sure that the flower
offload core has the ability to translate between a net_device and a
port index properly.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mscc: ocelot: offload multiple tc-flower actions in same rule
Vladimir Oltean [Fri, 2 Oct 2020 12:02:20 +0000 (15:02 +0300)]
net: mscc: ocelot: offload multiple tc-flower actions in same rule

At this stage, the tc-flower offload of mscc_ocelot can only delegate
rules to the VCAP IS2 security enforcement block. These rules have, in
hardware, separate bits for policing and for overriding the destination
port mask and/or copying to the CPU. So it makes sense that we attempt
to expose some more of that low-level complexity instead of simply
choosing between a single type of action.

Something similar happens with the VCAP IS1 block, where the same action
can contain enable bits for VLAN classification and for QoS
classification at the same time.

So model the action structure after the hardware description, and let
the high-level ocelot_flower.c construct an action vector from multiple
tc actions.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge tag 'mac80211-next-for-net-next-2020-10-02' of git://git.kernel.org/pub/scm...
David S. Miller [Fri, 2 Oct 2020 22:33:13 +0000 (15:33 -0700)]
Merge tag 'mac80211-next-for-net-next-2020-10-02' of git://git./linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
Another set of changes, this time with:
 * lots more S1G band support
 * 6 GHz scanning, finally
 * kernel-doc fixes
 * non-split wiphy dump fixes in nl80211
 * various other small cleanups/features
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/smscx5xx: change to of_get_mac_address() eth_platform_get_mac_address()
Łukasz Stelmach [Wed, 30 Sep 2020 14:25:25 +0000 (16:25 +0200)]
net/smscx5xx: change to of_get_mac_address() eth_platform_get_mac_address()

Use more generic eth_platform_get_mac_address() which can get a MAC
address from other than DT platform specific sources too. Check if the
obtained address is valid.

Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodt-bindings: net: dsa: b53: Add missing reg property to example
Kurt Kanzenbach [Fri, 2 Oct 2020 06:20:51 +0000 (08:20 +0200)]
dt-bindings: net: dsa: b53: Add missing reg property to example

The switch has a certain MDIO address and this needs to be specified using the
reg property. Add it to the example.

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'net-dsa-Improve-dsa_untag_bridge_pvid'
David S. Miller [Fri, 2 Oct 2020 20:36:07 +0000 (13:36 -0700)]
Merge branch 'net-dsa-Improve-dsa_untag_bridge_pvid'

Florian Fainelli says:

====================
net: dsa: Improve dsa_untag_bridge_pvid()

This patch series is based on the recent discussions with Vladimir:

https://lore.kernel.org/netdev/20201001030623.343535-1-f.fainelli@gmail.com/

the simplest way forward was to call dsa_untag_bridge_pvid() after
eth_type_trans() has been set which guarantees that skb->protocol is set
to a correct value and this allows us to utilize
__vlan_find_dev_deep_rcu() properly without playing or using the bridge
master as a net_device reference.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: Utilize __vlan_find_dev_deep_rcu()
Florian Fainelli [Fri, 2 Oct 2020 02:42:15 +0000 (19:42 -0700)]
net: dsa: Utilize __vlan_find_dev_deep_rcu()

Now that we are guaranteed that dsa_untag_bridge_pvid() is called after
eth_type_trans() we can utilize __vlan_find_dev_deep_rcu() which will
take care of finding an 802.1Q upper on top of a bridge master.

A common use case, prior to 12a1526d067 ("net: dsa: untag the bridge
pvid from rx skbs") was to configure a bridge 802.1Q upper like this:

ip link add name br0 type bridge vlan_filtering 0
ip link add link br0 name br0.1 type vlan id 1

in order to pop the default_pvid VLAN tag.

With this change we restore that behavior while still allowing the DSA
receive path to automatically pop the VLAN tag.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: Obtain VLAN protocol from skb->protocol
Florian Fainelli [Fri, 2 Oct 2020 02:42:14 +0000 (19:42 -0700)]
net: dsa: Obtain VLAN protocol from skb->protocol

Now that dsa_untag_bridge_pvid() is called after eth_type_trans() we are
guaranteed that skb->protocol will be set to a correct value, thus
allowing us to avoid calling vlan_eth_hdr().

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: b53: Set untag_bridge_pvid
Florian Fainelli [Fri, 2 Oct 2020 02:42:13 +0000 (19:42 -0700)]
net: dsa: b53: Set untag_bridge_pvid

Indicate to the DSA receive path that we need to untage the bridge PVID,
this allows us to remove the dsa_untag_bridge_pvid() calls from
net/dsa/tag_brcm.c.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: Call dsa_untag_bridge_pvid() from dsa_switch_rcv()
Florian Fainelli [Fri, 2 Oct 2020 02:42:12 +0000 (19:42 -0700)]
net: dsa: Call dsa_untag_bridge_pvid() from dsa_switch_rcv()

When a DSA switch driver needs to call dsa_untag_bridge_pvid(), it can
set dsa_switch::untag_brige_pvid to indicate this is necessary.

This is a pre-requisite to making sure that we are always calling
dsa_untag_bridge_pvid() after eth_type_trans() has been called.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec...
David S. Miller [Fri, 2 Oct 2020 20:16:15 +0000 (13:16 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2020-10-02

1) Add a full xfrm compatible layer for 32-bit applications on
   64-bit kernels. From Dmitry Safonov.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonetlink: fix policy dump leak
Johannes Berg [Fri, 2 Oct 2020 07:46:04 +0000 (09:46 +0200)]
netlink: fix policy dump leak

[ Upstream commit a95bc734e60449e7b073ff7ff70c35083b290ae9 ]

If userspace doesn't complete the policy dump, we leak the
allocated state. Fix this.

Fixes: d07dcf9aadd6 ("netlink: add infrastructure to expose policies to userspace")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomac80211: avoid processing non-S1G elements on S1G band
Thomas Pedersen [Thu, 1 Oct 2020 17:47:48 +0000 (10:47 -0700)]
mac80211: avoid processing non-S1G elements on S1G band

In ieee80211_determine_chantype(), the sband->ht_cap was
being processed before S1G Operation element.  Since the
HT capability element should not be present on the S1G
band, avoid processing potential garbage by moving the
call to ieee80211_apply_htcap_overrides() to after the S1G
block.

Also, in case of a missing S1G Operation element, we would
continue trying to process non-S1G elements (and return
with a channel width of 20MHz). Instead, just assume
primary channel is equal to operating and infer the
operating width from the BSS channel, then return.

Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com>
Link: https://lore.kernel.org/r/20201001174748.24520-1-thomas@adapt-ip.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
3 years agonl80211: fix non-split wiphy information
Johannes Berg [Mon, 28 Sep 2020 11:07:18 +0000 (13:07 +0200)]
nl80211: fix non-split wiphy information

When dumping wiphy information, we try to split the data into
many submessages, but for old userspace we still support the
old mode where this doesn't happen.

However, in this case we were not resetting our state correctly
and dumping multiple messages for each wiphy, which would have
broken such older userspace.

This was broken pretty much immediately afterwards because it
only worked in the original commit where non-split dumps didn't
have any more data than split dumps...

Fixes: fe1abafd942f ("nl80211: re-add channel width and extended capa advertising")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20200928130717.3e6d9c6bada2.Ie0f151a8d0d00a8e1e18f6a8c9244dd02496af67@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
3 years agonl80211: reduce non-split wiphy dump size
Johannes Berg [Mon, 28 Sep 2020 11:06:56 +0000 (13:06 +0200)]
nl80211: reduce non-split wiphy dump size

When wiphy dumps cannot be split, such as in events or with
older userspace that doesn't support it, the size can today
be too big.

Reduce it, by doing two things:

 1) remove data that couldn't have been present before the
    split capability was introduced since it's new, such as
    HE capabilities

 2) as suggested by Martin Willi, remove management frame
    subtypes from the split dumps, as just (1) isn't even
    enough due to other new code capabilities. This is fine
    as old consumers (really just wpa_supplicant) didn't
    check this data before they got support for split dumps.

Reported-by: Martin Willi <martin@strongswan.org>
Suggested-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Tested-by: Martin Willi <martin@strongswan.org>
Link: https://lore.kernel.org/r/20200928130655.53bce7873164.I71f06c9a221cd0630429a1a56eeae68a13beca61@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
3 years agolib8390: Use netif_msg_init to initialize msg_enable bits
Armin Wolf [Wed, 30 Sep 2020 20:44:56 +0000 (22:44 +0200)]
lib8390: Use netif_msg_init to initialize msg_enable bits

Use netif_msg_init() to process param settings
and use only the proper initialized value of
ei_local->msg_level for later processing;

Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: phy: realtek: Modify 2.5G PHY name to RTL8226
Willy Liu [Wed, 30 Sep 2020 06:48:58 +0000 (14:48 +0800)]
net: phy: realtek: Modify 2.5G PHY name to RTL8226

Realtek single-chip Ethernet PHY solutions can be separated as below:
10M/100Mbps: RTL8201X
1Gbps: RTL8211X
2.5Gbps: RTL8226/RTL8221X
RTL8226 is the first version for realtek that compatible 2.5Gbps single PHY.
Since RTL8226 is single port only, realtek changes its name to RTL8221B from
the second version.
PHY ID for RTL8226 is 0x001cc800 and RTL8226B/RTL8221B is 0x001cc840.

RTL8125 is not a single PHY solution, it integrates PHY/MAC/PCIE bus
controller and embedded memory.

Signed-off-by: Willy Liu <willy.liu@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agocaif_virtio: Remove redundant initialization of variable err
Jing Xiangfeng [Wed, 30 Sep 2020 01:29:54 +0000 (09:29 +0800)]
caif_virtio: Remove redundant initialization of variable err

After commit a8c7687bf216 ("caif_virtio: Check that vringh_config is not
null"), the variable err is being initialized with '-EINVAL' that is
meaningless. So remove it.

Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet-sysfs: Fix inconsistent of format with argument type in net-sysfs.c
Ye Bin [Wed, 30 Sep 2020 01:08:38 +0000 (09:08 +0800)]
net-sysfs: Fix inconsistent of format with argument type in net-sysfs.c

Fix follow warnings:
[net/core/net-sysfs.c:1161]: (warning) %u in format string (no. 1)
requires 'unsigned int' but the argument type is 'int'.
[net/core/net-sysfs.c:1162]: (warning) %u in format string (no. 1)
requires 'unsigned int' but the argument type is 'int'.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Ye Bin <yebin10@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agopktgen: Fix inconsistent of format with argument type in pktgen.c
Ye Bin [Wed, 30 Sep 2020 01:08:37 +0000 (09:08 +0800)]
pktgen: Fix inconsistent of format with argument type in pktgen.c

Fix follow warnings:
[net/core/pktgen.c:925]: (warning) %u in format string (no. 1)
requires 'unsigned int' but the argument type is 'signed int'.
[net/core/pktgen.c:942]: (warning) %u in format string (no. 1)
requires 'unsigned int' but the argument type is 'signed int'.
[net/core/pktgen.c:962]: (warning) %u in format string (no. 1)
requires 'unsigned int' but the argument type is 'signed int'.
[net/core/pktgen.c:984]: (warning) %u in format string (no. 1)
requires 'unsigned int' but the argument type is 'signed int'.
[net/core/pktgen.c:1149]: (warning) %d in format string (no. 1)
requires 'int' but the argument type is 'unsigned int'.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Ye Bin <yebin10@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodrivers/net/wan/hdlc_fr: Correctly handle special skb->protocol values
Xie He [Mon, 28 Sep 2020 12:56:43 +0000 (05:56 -0700)]
drivers/net/wan/hdlc_fr: Correctly handle special skb->protocol values

The fr_hard_header function is used to prepend the header to skbs before
transmission. It is used in 3 situations:
1) When a control packet is generated internally in this driver;
2) When a user sends an skb on an Ethernet-emulating PVC device;
3) When a user sends an skb on a normal PVC device.

These 3 situations need to be handled differently by fr_hard_header.
Different headers should be prepended to the skb in different situations.

Currently fr_hard_header distinguishes these 3 situations using
skb->protocol. For situation 1 and 2, a special skb->protocol value
will be assigned before calling fr_hard_header, so that it can recognize
these 2 situations. All skb->protocol values other than these special ones
are treated by fr_hard_header as situation 3.

However, it is possible that in situation 3, the user sends an skb with
one of the special skb->protocol values. In this case, fr_hard_header
would incorrectly treat it as situation 1 or 2.

This patch tries to solve this issue by using skb->dev instead of
skb->protocol to distinguish between these 3 situations. For situation
1, skb->dev would be NULL; for situation 2, skb->dev->type would be
ARPHRD_ETHER; and for situation 3, skb->dev->type would be ARPHRD_DLCI.

This way fr_hard_header would be able to distinguish these 3 situations
correctly regardless what skb->protocol value the user tries to use in
situation 3.

Cc: Krzysztof Halasa <khc@pm.waw.pl>
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
David S. Miller [Thu, 1 Oct 2020 21:29:01 +0000 (14:29 -0700)]
Merge git://git./linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2020-10-01

The following pull-request contains BPF updates for your *net-next* tree.

We've added 90 non-merge commits during the last 8 day(s) which contain
a total of 103 files changed, 7662 insertions(+), 1894 deletions(-).

Note that once bpf(/net) tree gets merged into net-next, there will be a small
merge conflict in tools/lib/bpf/btf.c between commit 1245008122d7 ("libbpf: Fix
native endian assumption when parsing BTF") from the bpf tree and the commit
3289959b97ca ("libbpf: Support BTF loading and raw data output in both endianness")
from the bpf-next tree. Correct resolution would be to stick with bpf-next, it
should look like:

  [...]
        /* check BTF magic */
        if (fread(&magic, 1, sizeof(magic), f) < sizeof(magic)) {
                err = -EIO;
                goto err_out;
        }
        if (magic != BTF_MAGIC && magic != bswap_16(BTF_MAGIC)) {
                /* definitely not a raw BTF */
                err = -EPROTO;
                goto err_out;
        }

        /* get file size */
  [...]

The main changes are:

1) Add bpf_snprintf_btf() and bpf_seq_printf_btf() helpers to support displaying
   BTF-based kernel data structures out of BPF programs, from Alan Maguire.

2) Speed up RCU tasks trace grace periods by a factor of 50 & fix a few race
   conditions exposed by it. It was discussed to take these via BPF and
   networking tree to get better testing exposure, from Paul E. McKenney.

3) Support multi-attach for freplace programs, needed for incremental attachment
   of multiple XDP progs using libxdp dispatcher model, from Toke Høiland-Jørgensen.

4) libbpf support for appending new BTF types at the end of BTF object, allowing
   intrusive changes of prog's BTF (useful for future linking), from Andrii Nakryiko.

5) Several BPF helper improvements e.g. avoid atomic op in cookie generator and add
   a redirect helper into neighboring subsys, from Daniel Borkmann.

6) Allow map updates on sockmaps from bpf_iter context in order to migrate sockmaps
   from one to another, from Lorenz Bauer.

7) Fix 32 bit to 64 bit assignment from latest alu32 bounds tracking which caused
   a verifier issue due to type downgrade to scalar, from John Fastabend.

8) Follow-up on tail-call support in BPF subprogs which optimizes x64 JIT prologue
   and epilogue sections, from Maciej Fijalkowski.

9) Add an option to perf RB map to improve sharing of event entries by avoiding remove-
   on-close behavior. Also, add BPF_PROG_TEST_RUN for raw_tracepoint, from Song Liu.

10) Fix a crash in AF_XDP's socket_release when memory allocation for UMEMs fails,
    from Magnus Karlsson.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'net-ravb-Add-support-for-explicit-internal-clock-delay-c
David S. Miller [Thu, 1 Oct 2020 19:53:31 +0000 (12:53 -0700)]
Merge branch 'net-ravb-Add-support-for-explicit-internal-clock-delay-c
onfiguration'

Geert Uytterhoeven says:

====================
net/ravb: Add support for explicit internal clock delay configuration

Some Renesas EtherAVB variants support internal clock delay
configuration, which can add larger delays than the delays that are
typically supported by the PHY (using an "rgmii-*id" PHY mode, and/or
"[rt]xc-skew-ps" properties).

Historically, the EtherAVB driver configured these delays based on the
"rgmii-*id" PHY mode.  This caused issues with PHY drivers that
implement PHY internal delays properly[1].  Hence a backwards-compatible
workaround was added by masking the PHY mode[2].

This patch series implements the next step of the plan outlined in [3],
and adds proper support for explicit configuration of the MAC internal
clock delays using new "[rt]x-internal-delay-ps" properties.  If none of
these properties is present, the driver falls back to the old handling.

This can be considered the MAC counterpart of commit 9150069bf5fc0e86
("dt-bindings: net: Add tx and rx internal delays"), which applies to
the PHY.  Note that unlike commit 92252eec913b2dd5 ("net: phy: Add a
helper to return the index for of the internal delay"), no helpers are
provided to parse the DT properties, as so far there is a single user
only, which supports only zero or a single fixed value.  Of course such
helpers can be added later, when the need arises, or when deemed useful
otherwise.

This series consists of 3 parts:
  1. DT binding updates documenting the new properties, for both the
     generic ethernet-controller and the EtherAVB-specific bindings,
  2. Conversion to json-schema of the Renesas EtherAVB DT bindings.
     Technically, the conversion is independent of all of the above.
     I included it in this series, as it shows how all sanity checks on
     "[rt]x-internal-delay-ps" values are implemented as DT binding
     checks,
  3. EtherAVB driver update implementing support for the new properties.

Given Rob has provided his acks for the DT binding updates, all of this
can be merged through net-next.

Changes compared to v3[4]:
  - Add Reviewed-by,
  - Drop the DT updates, as they will be merged through renesas-devel and
    arm-soc, and have a hard dependency on this series.

Changes compared to v2[5]:
  - Update recently added board DTS files,
  - Add Reviewed-by.

Changes compared to v1[6]:
  - Added "[PATCH 1/7] dt-bindings: net: ethernet-controller: Add
    internal delay properties",
  - Replace "renesas,[rt]xc-delay-ps" by "[rt]x-internal-delay-ps",
  - Incorporated EtherAVB DT binding conversion to json-schema,
  - Add Reviewed-by.

Impacted, tested:
  - Salvator-X(S) with R-Car H3 ES1.0 and ES2.0, M3-W, and M3-N.

Not impacted, tested:
  - Ebisu with R-Car E3.

Impacted, not tested:
  - Salvator-X(S) with other SoC variants,
  - ULCB with R-Car H3/M3-W/M3-N variants,
  - V3MSK and Eagle with R-Car V3M,
  - Draak with R-Car V3H,
  - HiHope RZ/G2[MN] with RZ/G2M or RZ/G2N,
  - Beacon EmbeddedWorks RZ/G2M Development Kit.

To ease testing, I have pushed this series and the DT updates to the
topic/ravb-internal-clock-delays-v4 branch of my renesas-drivers
repository at
git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers.git.

Thanks for applying!

References:
  [1] Commit bcf3440c6dd78bfe ("net: phy: micrel: add phy-mode support
      for the KSZ9031 PHY")
  [2] Commit 9b23203c32ee02cd ("ravb: Mask PHY mode to avoid inserting
      delays twice").
      https://lore.kernel.org/r/20200529122540.31368-1-geert+renesas@glider.be/
  [3] https://lore.kernel.org/r/CAMuHMdU+MR-2tr3-pH55G0GqPG9HwH3XUd=8HZxprFDMGQeWUw@mail.gmail.com/
  [4] https://lore.kernel.org/linux-devicetree/20200819134344.27813-1-geert+renesas@glider.be/
  [5] https://lore.kernel.org/linux-devicetree/20200706143529.18306-1-geert+renesas@glider.be/
  [6] https://lore.kernel.org/linux-devicetree/20200619191554.24942-1-geert+renesas@glider.be/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoravb: Add support for explicit internal clock delay configuration
Geert Uytterhoeven [Thu, 1 Oct 2020 10:10:08 +0000 (12:10 +0200)]
ravb: Add support for explicit internal clock delay configuration

Some EtherAVB variants support internal clock delay configuration, which
can add larger delays than the delays that are typically supported by
the PHY (using an "rgmii-*id" PHY mode, and/or "[rt]xc-skew-ps"
properties).

Historically, the EtherAVB driver configured these delays based on the
"rgmii-*id" PHY mode.  This caused issues with PHY drivers that
implement PHY internal delays properly[1].  Hence a backwards-compatible
workaround was added by masking the PHY mode[2].

Add proper support for explicit configuration of the MAC internal clock
delays using the new "[rt]x-internal-delay-ps" properties.
Fall back to the old handling if none of these properties is present.

[1] Commit bcf3440c6dd78bfe ("net: phy: micrel: add phy-mode support for
    the KSZ9031 PHY")
[2] Commit 9b23203c32ee02cd ("ravb: Mask PHY mode to avoid inserting
    delays twice").

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoravb: Split delay handling in parsing and applying
Geert Uytterhoeven [Thu, 1 Oct 2020 10:10:07 +0000 (12:10 +0200)]
ravb: Split delay handling in parsing and applying

Currently, full delay handling is done in both the probe and resume
paths.  Split it in two parts, so the resume path doesn't have to redo
the parsing part over and over again.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodt-bindings: net: renesas,etheravb: Convert to json-schema
Geert Uytterhoeven [Thu, 1 Oct 2020 10:10:06 +0000 (12:10 +0200)]
dt-bindings: net: renesas,etheravb: Convert to json-schema

Convert the Renesas Ethernet AVB (EthernetAVB-IF) Device Tree binding
documentation to json-schema.

Add missing properties.
Update the example to match reality.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodt-bindings: net: renesas,ravb: Document internal clock delay properties
Geert Uytterhoeven [Thu, 1 Oct 2020 10:10:05 +0000 (12:10 +0200)]
dt-bindings: net: renesas,ravb: Document internal clock delay properties

Some EtherAVB variants support internal clock delay configuration, which
can add larger delays than the delays that are typically supported by
the PHY (using an "rgmii-*id" PHY mode, and/or "[rt]xc-skew-ps"
properties).

Add properties for configuring the internal MAC delays.
These properties are mandatory, even when specified as zero, to
distinguish between old and new DTBs.

Update the (bogus) example accordingly.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodt-bindings: net: ethernet-controller: Add internal delay properties
Geert Uytterhoeven [Thu, 1 Oct 2020 10:10:04 +0000 (12:10 +0200)]
dt-bindings: net: ethernet-controller: Add internal delay properties

Internal Receive and Transmit Clock Delays are a common setting for
RGMII capable devices.

While these delays are typically applied by the PHY, some MACs support
configuring internal clock delay settings, too.  Hence add standardized
properties to configure this.

This is the MAC counterpart of commit 9150069bf5fc0e86 ("dt-bindings:
net: Add tx and rx internal delays"), which applies to the PHY.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge tag 'mlx5-updates-2020-09-30' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Thu, 1 Oct 2020 19:24:52 +0000 (12:24 -0700)]
Merge tag 'mlx5-updates-2020-09-30' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2020-09-30

Updates and cleanups for mlx5 driver:

1) From Ariel, Dan Carpenter and Gostavo, Fixes to the previous
   mlx5 Connection track series.

2) From Yevgeny, trivial cleanups for Software steering

3) From Hamdan, Support for Flow source hint in software steering and
   E-Switch

4) From Parav and Sunil, Small and trivial E-Switch updates and
   cleanups in preparation for mlx5 Sub-functions support
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'introduce BPF_F_PRESERVE_ELEMS'
Alexei Starovoitov [Thu, 1 Oct 2020 06:18:12 +0000 (23:18 -0700)]
Merge branch 'introduce BPF_F_PRESERVE_ELEMS'

Song Liu says:

====================
This set introduces BPF_F_PRESERVE_ELEMS to perf event array for better
sharing of perf event. By default, perf event array removes the perf event
when the map fd used to add the event is closed. With BPF_F_PRESERVE_ELEMS
set, however, the perf event will stay in the array until it is removed, or
the map is closed.
---
Changes v3 => v5:
1. Clean up in selftest. (Alexei)

Changes v2 => v3:
1. Move perf_event_fd_array_map_free() to avoid unnecessary forward
   declaration. (Daniel)

Changes v1 => v2:
1. Rename the flag as BPF_F_PRESERVE_ELEMS. (Alexei, Daniel)
2. Simplify the code and selftest. (Daniel, Alexei)
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agoselftests/bpf: Add tests for BPF_F_PRESERVE_ELEMS
Song Liu [Wed, 30 Sep 2020 22:49:27 +0000 (15:49 -0700)]
selftests/bpf: Add tests for BPF_F_PRESERVE_ELEMS

Add tests for perf event array with and without BPF_F_PRESERVE_ELEMS.

Add a perf event to array via fd mfd. Without BPF_F_PRESERVE_ELEMS, the
perf event is removed when mfd is closed. With BPF_F_PRESERVE_ELEMS, the
perf event is removed when the map is freed.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200930224927.1936644-3-songliubraving@fb.com
3 years agobpf: Introduce BPF_F_PRESERVE_ELEMS for perf event array
Song Liu [Wed, 30 Sep 2020 22:49:26 +0000 (15:49 -0700)]
bpf: Introduce BPF_F_PRESERVE_ELEMS for perf event array

Currently, perf event in perf event array is removed from the array when
the map fd used to add the event is closed. This behavior makes it
difficult to the share perf events with perf event array.

Introduce perf event map that keeps the perf event open with a new flag
BPF_F_PRESERVE_ELEMS. With this flag set, perf events in the array are not
removed when the original map fd is closed. Instead, the perf event will
stay in the map until 1) it is explicitly removed from the array; or 2)
the array is freed.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200930224927.1936644-2-songliubraving@fb.com
3 years agonet/mlx5e: Fix potential null pointer dereference
Gustavo A. R. Silva [Fri, 25 Sep 2020 16:49:13 +0000 (11:49 -0500)]
net/mlx5e: Fix potential null pointer dereference

Calls to kzalloc() and kvzalloc() should be null-checked
in order to avoid any potential failures. In this case,
a potential null pointer dereference.

Fix this by adding null checks for _parse_attr_ and _flow_
right after allocation.

Addresses-Coverity-ID: 1497154 ("Dereference before null check")
Fixes: c620b772152b ("net/mlx5: Refactor tc flow attributes structure")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: Fix a use after free on error in mlx5_tc_ct_shared_counter_get()
Dan Carpenter [Mon, 28 Sep 2020 09:05:56 +0000 (12:05 +0300)]
net/mlx5e: Fix a use after free on error in mlx5_tc_ct_shared_counter_get()

This code frees "shared_counter" and then dereferences on the next line
to get the error code.

Fixes: 1edae2335adf ("net/mlx5e: CT: Use the same counter for both directions")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: Fix dereference on pointer attr after null check
Ariel Levkovich [Mon, 28 Sep 2020 16:34:10 +0000 (19:34 +0300)]
net/mlx5: Fix dereference on pointer attr after null check

When removing a flow from the slow path fdb, a flow attr struct is
allocated for the rule removal process. If the allocation fails the
code prints a warning message but continues with the removal flow
which include dereferencing a pointer which could be null.
Fix this by exiting the function in case the attr allocation failed.

Fixes: c620b772152b ("net/mlx5: Refactor tc flow attributes structure")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: Use dma device access helper
Parav Pandit [Wed, 9 Sep 2020 17:41:38 +0000 (20:41 +0300)]
net/mlx5: Use dma device access helper

Use the PCI device directly for dma accesses as non PCI device unlikely
support IOMMU and dma mappings.
Introduce and use helper routine to access DMA device.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: E-Switch, Support flow source for local vport
Hamdan Igbaria [Sun, 30 Aug 2020 09:32:37 +0000 (12:32 +0300)]
net/mlx5: E-Switch, Support flow source for local vport

Set flow source as hint for local vport.

Signed-off-by: Hamdan Igbaria <hamdani@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: E-switch, Move devlink eswitch ports closer to eswitch
Parav Pandit [Mon, 31 Aug 2020 19:47:47 +0000 (22:47 +0300)]
net/mlx5: E-switch, Move devlink eswitch ports closer to eswitch

Currently devlink eswitch ports are registered and unregistered by the
representor layer.
However it is better to register them at eswitch layer so that in future
user initiated command port add and delete commands can also
register/unregister devlink ports without depending on representor layer.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: E-switch, Use helper function to load unload representor
Parav Pandit [Mon, 31 Aug 2020 18:18:59 +0000 (21:18 +0300)]
net/mlx5: E-switch, Use helper function to load unload representor

To register and unregister devlink ports when loading/unload
representors, refactor the code to helper functions.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: E-switch, Add helper to check egress ACL need
Parav Pandit [Wed, 2 Sep 2020 05:59:30 +0000 (08:59 +0300)]
net/mlx5: E-switch, Add helper to check egress ACL need

Currently only VF vports need egress ACL table.
Add a generic helper to check whether a vport need egress
ACL table or not.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: E-switch, Use PF num in metadata reg c0
sunils [Thu, 10 Sep 2020 21:13:26 +0000 (00:13 +0300)]
net/mlx5: E-switch, Use PF num in metadata reg c0

Currently only 256 vports can be supported as only 8 bits are
reserved for them and 8 bits are reserved for vhca_ids in
metadata reg c0. To support more than 256 vports, replace
vhca_id with a unique shorter 4-bit PF number which covers
upto 16 PF's. Use remaining 12 bits for vports ranging 1-4095.
This will continue to generate unique metadata even if
multiple PCI devices have same switch_id.

Signed-off-by: sunils <sunils@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: DR, Add support for rule creation with flow source hint
Hamdan Igbaria [Mon, 15 Jun 2020 15:18:14 +0000 (18:18 +0300)]
net/mlx5: DR, Add support for rule creation with flow source hint

Skip the rule according to flow arrival source, in case of RX and the
source is local port skip and in case of TX and the source is uplink
skip, we get this info according to the flow source hint we get from
upper layers when creating the rule.
This is needed because for example in case of FDB table which has a TX
and RX tables and we are inserting a rule with an encap action which
is only a TX action, in this case rule will fail on RX, so we can rely
on the flow source hint and skip RX in such case.
Until now we relied on metadata regc_0 that upper layer mapped the
port in the regc_0, but the problem is that upper layer did not always
use regc_0 for port mapping, so now we added support to flow source
hint which upper layers will pass to SW steering when creating a rule.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Hamdan Igbaria <hamdani@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: DR, Call ste_builder directly with tag pointer
Yevgeny Kliteynik [Mon, 31 Aug 2020 08:56:40 +0000 (11:56 +0300)]
net/mlx5: DR, Call ste_builder directly with tag pointer

Instead of getting the tag in each function, call the builder
directly with the tag. This will allow to use the same function
for building the tag and the bitmask.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: DR, Remove unneeded local variable
Yevgeny Kliteynik [Mon, 31 Aug 2020 08:57:28 +0000 (11:57 +0300)]
net/mlx5: DR, Remove unneeded local variable

The misc3 variable is used only once and can be dropped.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: DR, Remove unneeded vlan check from L2 builder
Yevgeny Kliteynik [Mon, 31 Aug 2020 08:57:14 +0000 (11:57 +0300)]
net/mlx5: DR, Remove unneeded vlan check from L2 builder

When we create a matcher we check that all fields are consumed.
There is no need for this specific check. This keeps the STE
builder functions simple and clean.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: DR, Remove unneeded check from source port builder
Yevgeny Kliteynik [Mon, 31 Aug 2020 08:56:59 +0000 (11:56 +0300)]
net/mlx5: DR, Remove unneeded check from source port builder

Mask validity for ste builders is checked by mlx5dr_ste_build_pre_check
during matcher creation.
It already checks the mask value of source_vport, so removing
this duplicated check.
Also, moving there the check of source_eswitch_owner_vhca_id mask.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: DR, Replace the check for valid STE entry
Yevgeny Kliteynik [Mon, 31 Aug 2020 08:56:07 +0000 (11:56 +0300)]
net/mlx5: DR, Replace the check for valid STE entry

Validity check is done by reading the next lu_type from the STE,
this check can be replaced by checking the refcount.
This will make the check independent on internal STE structure.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agoselftests/bpf: Fix alignment of .BTF_ids
Jean-Philippe Brucker [Wed, 30 Sep 2020 09:36:01 +0000 (11:36 +0200)]
selftests/bpf: Fix alignment of .BTF_ids

Fix a build failure on arm64, due to missing alignment information for
the .BTF_ids section:

resolve_btfids.test.o: in function `test_resolve_btfids':
tools/testing/selftests/bpf/prog_tests/resolve_btfids.c:140:(.text+0x29c): relocation truncated to fit: R_AARCH64_LDST32_ABS_LO12_NC against `.BTF_ids'
ld: tools/testing/selftests/bpf/prog_tests/resolve_btfids.c:140: warning: one possible cause of this error is that the symbol is being referenced in the indicated code as if it had a larger alignment than was declared where it was defined

In vmlinux, the .BTF_ids section is aligned to 4 bytes by vmlinux.lds.h.
In test_progs however, .BTF_ids doesn't have alignment constraints. The
arm64 linker expects the btf_id_set.cnt symbol, a u32, to be naturally
aligned but finds it misaligned and cannot apply the relocation. Enforce
alignment of .BTF_ids to 4 bytes.

Fixes: cd04b04de119 ("selftests/bpf: Add set test to resolve_btfids")
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/bpf/20200930093559.2120126-1-jean-philippe@linaro.org
3 years agoMerge branch 'drop_monitor-Convert-to-use-devlink-tracepoint'
David S. Miller [Thu, 1 Oct 2020 01:01:27 +0000 (18:01 -0700)]
Merge branch 'drop_monitor-Convert-to-use-devlink-tracepoint'

Ido Schimmel says:

====================
drop_monitor: Convert to use devlink tracepoint

Drop monitor is able to monitor both software and hardware originated
drops. Software drops are monitored by having drop monitor register its
probe on the 'kfree_skb' tracepoint. Hardware originated drops are
monitored by having devlink call into drop monitor whenever it receives
a dropped packet from the underlying hardware.

This patch set converts drop monitor to monitor both software and
hardware originated drops in the same way - by registering its probe on
the relevant tracepoint.

In addition to drop monitor being more consistent, it is now also
possible to build drop monitor as module instead of as a builtin and
still monitor hardware originated drops. Initially, CONFIG_NET_DEVLINK
implied CONFIG_NET_DROP_MONITOR, but after commit def2fbffe62c
("kconfig: allow symbols implied by y to become m") we can have
CONFIG_NET_DEVLINK=y and CONFIG_NET_DROP_MONITOR=m and hardware
originated drops will not be monitored.

Patch set overview:

Patch #1 adds a tracepoint in devlink for trap reports.

Patch #2 prepares probe functions in drop monitor for the new
tracepoint.

Patch #3 converts drop monitor to use the new tracepoint.

Patches #4-#6 perform cleanups after the conversion.

Patch #7 adds a test case for drop monitor. Both software originated
drops and hardware originated drops (using netdevsim) are tested.

Tested:

| CONFIG_NET_DEVLINK | CONFIG_NET_DROP_MONITOR | Build | SW drops | HW drops |
| -------------------|-------------------------|-------|----------|----------|
|          y         |            y            |   v   |     v    |     v    |
|          y         |            m            |   v   |     v    |     v    |
|          y         |            n            |   v   |     x    |     x    |
|          n         |            y            |   v   |     v    |     x    |
|          n         |            m            |   v   |     v    |     x    |
|          n         |            n            |   v   |     x    |     x    |
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests: net: Add drop monitor test
Ido Schimmel [Tue, 29 Sep 2020 08:15:56 +0000 (11:15 +0300)]
selftests: net: Add drop monitor test

Test that drop monitor correctly captures both software and hardware
originated packet drops.

# ./drop_monitor_tests.sh

Software drops test
    TEST: Capturing active software drops                               [ OK ]
    TEST: Capturing inactive software drops                             [ OK ]

Hardware drops test
    TEST: Capturing active hardware drops                               [ OK ]
    TEST: Capturing inactive hardware drops                             [ OK ]

Tests passed:   4
Tests failed:   0

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodrop_monitor: Filter control packets in drop monitor
Ido Schimmel [Tue, 29 Sep 2020 08:15:55 +0000 (11:15 +0300)]
drop_monitor: Filter control packets in drop monitor

Previously, devlink called into drop monitor in order to report hardware
originated drops / exceptions. devlink intentionally filtered control
packets and did not pass them to drop monitor as they were not dropped
by the underlying hardware.

Now drop monitor registers its probe on a generic 'devlink_trap_report'
tracepoint and should therefore perform this filtering itself instead of
having devlink do that.

Add the trap type as metadata and have drop monitor ignore control
packets.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodrop_monitor: Remove duplicate struct
Ido Schimmel [Tue, 29 Sep 2020 08:15:54 +0000 (11:15 +0300)]
drop_monitor: Remove duplicate struct

'struct net_dm_hw_metadata' is a duplicate of 'struct
devlink_trap_metadata'.

Remove the former and simplify the code.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodrop_monitor: Remove no longer used functions
Ido Schimmel [Tue, 29 Sep 2020 08:15:53 +0000 (11:15 +0300)]
drop_monitor: Remove no longer used functions

The old probe functions that were invoked by drop monitor code are no
longer called and can thus be removed. They were replaced by actual
probe functions that are registered on the recently introduced
'devlink_trap_report' tracepoint.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodrop_monitor: Convert to using devlink tracepoint
Ido Schimmel [Tue, 29 Sep 2020 08:15:52 +0000 (11:15 +0300)]
drop_monitor: Convert to using devlink tracepoint

Convert drop monitor to use the recently introduced
'devlink_trap_report' tracepoint instead of having devlink call into
drop monitor.

This is both consistent with software originated drops ('kfree_skb'
tracepoint) and also allows drop monitor to be built as a module and
still report hardware originated drops.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodrop_monitor: Prepare probe functions for devlink tracepoint
Ido Schimmel [Tue, 29 Sep 2020 08:15:51 +0000 (11:15 +0300)]
drop_monitor: Prepare probe functions for devlink tracepoint

Drop monitor supports two alerting modes: Summary and packet. Prepare a
probe function for each, so that they could be later registered on the
devlink tracepoint by calling register_trace_devlink_trap_report(),
based on the configured alerting mode.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodevlink: Add a tracepoint for trap reports
Ido Schimmel [Tue, 29 Sep 2020 08:15:50 +0000 (11:15 +0300)]
devlink: Add a tracepoint for trap reports

Add a tracepoint for trap reports so that drop monitor could register
its probe on it. Use trace_devlink_trap_report_enabled() to avoid
wasting cycles setting the trap metadata if the tracepoint is not
enabled.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge tag 'linux-can-next-for-5.10-20200930' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Wed, 30 Sep 2020 22:21:43 +0000 (15:21 -0700)]
Merge tag 'linux-can-next-for-5.10-20200930' of git://git./linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2020-09-30

this is a pull request of 13 patches for net-next.

The first 10 target the mcp25xxfd driver (which is renamed to mcp251xfd during
this series).

The first two patches are by Thomas Kopp, which adds reference to the just
related errata and updates the documentation and log messages.

Dan Carpenter's patch fixes a resource leak during ifdown.

A patch by me adds the missing initialization of a variable.

Oleksij Rempel updates the DT binding documentation as requested by Rob
Herring.

The next 5 patches are by Thomas Kopp and me. During review Geert Uytterhoeven
suggested to use "microchip,mcp251xfd" instead of "microchip,mcp25xxfd" as the
DT autodetection compatible to avoid clashes with future but incompatible
devices. We decided not only to rename the compatible but the whole driver from
"mcp25xxfd" to "mcp251xfd". This is done in several patches.

Joakim Zhang contributes three patches for the flexcan driver. The first one
adds support for the ECC feature, which is implemented on some modern IP cores,
by initializing the controller's memory during ifup. The next patch adds
support for the i.MX8MP (which supports ECC) and the last patch properly
disables the runtime PM if device registration fails.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'ionic-watchdog-training'
David S. Miller [Wed, 30 Sep 2020 22:11:09 +0000 (15:11 -0700)]
Merge branch 'ionic-watchdog-training'

Shannon Nelson says:

====================
ionic watchdog training

Our link watchdog displayed a couple of unfriendly behaviors in some recent
stress testing.  These patches change the startup and stop timing in order
to be sure that expected structures are ready to be used by the watchdog.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoionic: prevent early watchdog check
Shannon Nelson [Wed, 30 Sep 2020 17:48:28 +0000 (10:48 -0700)]
ionic: prevent early watchdog check

In one corner case scenario, the driver device lif setup can
get delayed such that the ionic_watchdog_cb() timer goes off
before the ionic->lif is set, thus causing a NULL pointer panic.
We catch the problem by checking for a NULL lif just a little
earlier in the callback.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoionic: stop watchdog timer earlier on remove
Shannon Nelson [Wed, 30 Sep 2020 17:48:27 +0000 (10:48 -0700)]
ionic: stop watchdog timer earlier on remove

We need to be better at making sure we don't have a link check
watchdog go off while we're shutting things down, so let's stop
the timer as soon as we start the remove.

Meanwhile, since that was the only thing in
ionic_dev_teardown(), simplify and remove that function.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'tcp-exponential-backoff-in-tcp_send_ack'
David S. Miller [Wed, 30 Sep 2020 21:21:30 +0000 (14:21 -0700)]
Merge branch 'tcp-exponential-backoff-in-tcp_send_ack'

Eric Dumazet says:

====================
tcp: exponential backoff in tcp_send_ack()

We had outages caused by repeated skb allocation failures in tcp_send_ack()

It is time to add exponential backoff to reduce number of attempts.
Before doing so, first patch removes icsk_ack.blocked to make
room for a new field (icsk_ack.retry)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotcp: add exponential backoff in __tcp_send_ack()
Eric Dumazet [Wed, 30 Sep 2020 12:54:57 +0000 (05:54 -0700)]
tcp: add exponential backoff in __tcp_send_ack()

Whenever host is under very high memory pressure,
__tcp_send_ack() skb allocation fails, and we setup
a 200 ms (TCP_DELACK_MAX) timer before retrying.

On hosts with high number of TCP sockets, we can spend
considerable amount of cpu cycles in these attempts,
add high pressure on various spinlocks in mm-layer,
ultimately blocking threads attempting to free space
from making any progress.

This patch adds standard exponential backoff to avoid
adding fuel to the fire.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoinet: remove icsk_ack.blocked
Eric Dumazet [Wed, 30 Sep 2020 12:54:56 +0000 (05:54 -0700)]
inet: remove icsk_ack.blocked

TCP has been using it to work around the possibility of tcp_delack_timer()
finding the socket owned by user.

After commit 6f458dfb4092 ("tcp: improve latencies of timer triggered events")
we added TCP_DELACK_TIMER_DEFERRED atomic bit for more immediate recovery,
so we can get rid of icsk_ack.blocked

This frees space that following patch will reuse.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: macb: move pdata to private header
Alexandre Belloni [Wed, 30 Sep 2020 10:50:59 +0000 (12:50 +0200)]
net: macb: move pdata to private header

struct macb_platform_data is only used by macb_pci to register the platform
device, move its definition to cadence/macb.h and remove platform_data/macb.h

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'mlxsw-PFC-and-headroom-selftests'
David S. Miller [Wed, 30 Sep 2020 21:06:54 +0000 (14:06 -0700)]
Merge branch 'mlxsw-PFC-and-headroom-selftests'

Petr Machata says:

====================
mlxsw: PFC and headroom selftests

Recent changes in the headroom management code made it clear that an
automated way of testing this functionality is needed. This patchset brings
two tests: a synthetic headroom behavior test, which verifies mechanics of
headroom management. And a PFC test, which verifies whether this behavior
actually translates into a working lossless configuration.

Both of these tests rely on mlnx_qos[1], a tool that interfaces with Linux
DCB API. The tool was originally written to work with Mellanox NICs, but
does not actually rely on anything Mellanox-specific, and can be used for
mlxsw as well as for any other NIC-like driver. Unlike Open LLDP it does
support buffer commands and permits a fire-and-forget approach to
configuration, which makes it very handy for writing of selftests.

Patches #1-#3 extend the selftest devlink_lib.sh in various ways. Patch #4
then adds a helper wrapper for mlnx_qos to mlxsw's qos_lib.sh.

Patch #5 adds a test for management of port headroom.

Patch #6 adds a PFC test.

[1] https://github.com/Mellanox/mlnx-tools/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests: mlxsw: Add a PFC test
Petr Machata [Wed, 30 Sep 2020 10:49:12 +0000 (12:49 +0200)]
selftests: mlxsw: Add a PFC test

Add a test for PFC. Runs 10MB of traffic through a bottleneck and checks
that none of it gets lost.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests: mlxsw: Add headroom handling test
Petr Machata [Wed, 30 Sep 2020 10:49:11 +0000 (12:49 +0200)]
selftests: mlxsw: Add headroom handling test

Add a test for headroom configuration. This covers projection of ETS
configuration to ingress, PFC, adjustments for MTU, the qdisc / TC
mode and the effect of egress SPAN session on buffer configuration.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests: mlxsw: qos_lib: Add a wrapper for running mlnx_qos
Petr Machata [Wed, 30 Sep 2020 10:49:10 +0000 (12:49 +0200)]
selftests: mlxsw: qos_lib: Add a wrapper for running mlnx_qos

mlnx_qos is a script for configuration of DCB. Despite the name it is not
actually Mellanox-specific in any way. It is currently the only ad-hoc tool
available (in contrast to a daemon that manages an interface on an ongoing
basis). However, it is very verbose and parsing out error messages is not
really possible. Add a wrapper that makes it easier to use the tool.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests: forwarding: devlink_lib: Support port-less topologies
Petr Machata [Wed, 30 Sep 2020 10:49:09 +0000 (12:49 +0200)]
selftests: forwarding: devlink_lib: Support port-less topologies

Some selftests may not need any actual ports. Technically those are not
forwarding selftests, but devlink_lib can still be handy. Fall back on
NETIF_NO_CABLE in those cases.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests: forwarding: devlink_lib: Add devlink_cell_size_get()
Petr Machata [Wed, 30 Sep 2020 10:49:08 +0000 (12:49 +0200)]
selftests: forwarding: devlink_lib: Add devlink_cell_size_get()

Add a helper that answers the cell size of the devlink device.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests: forwarding: devlink_lib: Split devlink_..._set() into save & set
Petr Machata [Wed, 30 Sep 2020 10:49:07 +0000 (12:49 +0200)]
selftests: forwarding: devlink_lib: Split devlink_..._set() into save & set

Changing pool type from static to dynamic causes reinterpretation of
threshold values. They therefore need to be saved before pool type is
changed, then the pool type can be changed, and then the new values need
to be set up.

For that reason, set cannot subsume save, because it would be saving the
wrong thing, with possibly a nonsensical value, and restore would then fail
to restore the nonsensical value.

Thus extract a _save() from each of the relevant _set()'s. This way it is
possible to save everything up front, then to tweak it, and then restore in
the required order.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agocan: flexcan: disable runtime PM if register flexcandev failed
Joakim Zhang [Tue, 29 Sep 2020 21:15:57 +0000 (05:15 +0800)]
can: flexcan: disable runtime PM if register flexcandev failed

Disable runtime PM if register flexcandev failed, and balance reference
of usage_count.

Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Link: https://lore.kernel.org/r/20200929211557.14153-4-qiangqing.zhang@nxp.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: flexcan: add flexcan driver for i.MX8MP
Joakim Zhang [Tue, 29 Sep 2020 21:15:56 +0000 (05:15 +0800)]
can: flexcan: add flexcan driver for i.MX8MP

Add flexcan driver for i.MX8MP, which supports CAN FD and ECC.

Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Link: https://lore.kernel.org/r/20200929211557.14153-3-qiangqing.zhang@nxp.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: flexcan: initialize all flexcan memory for ECC function
Joakim Zhang [Tue, 29 Sep 2020 21:15:55 +0000 (05:15 +0800)]
can: flexcan: initialize all flexcan memory for ECC function

One issue was reported at a baremetal environment, which is used for
FPGA verification. "The first transfer will fail for extended ID
format(for both 2.0B and FD format), following frames can be transmitted
and received successfully for extended format, and standard format don't
have this issue. This issue occurred randomly with high possiblity, when
it occurs, the transmitter will detect a BIT1 error, the receiver a CRC
error. According to the spec, a non-correctable error may cause this
transfer failure."

With FLEXCAN_QUIRK_DISABLE_MECR quirk, it supports correctable errors,
disable non-correctable errors interrupt and freeze mode. Platform has
ECC hardware support, but select this quirk, this issue may not come to
light. Initialize all FlexCAN memory before accessing them, at least it
can avoid non-correctable errors detected due to memory uninitialized.
The internal region can't be initialized when the hardware doesn't support
ECC.

According to IMX8MPRM, Rev.C, 04/2020. There is a NOTE at the section
11.8.3.13 Detection and correction of memory errors:
"All FlexCAN memory must be initialized before starting its operation in
order to have the parity bits in memory properly updated. CTRL2[WRMFRZ]
grants write access to all memory positions that require initialization,
ranging from 0x080 to 0xADF and from 0xF28 to 0xFFF when the CAN FD feature
is enabled. The RXMGMASK, RX14MASK, RX15MASK, and RXFGMASK registers need to
be initialized as well. MCR[RFEN] must not be set during memory initialization."

Memory range from 0x080 to 0xADF, there are reserved memory (unimplemented
by hardware, e.g. only configure 64 MBs), these memory can be initialized or not.
In this patch, initialize all flexcan memory which includes reserved memory.

In this patch, create FLEXCAN_QUIRK_SUPPORT_ECC for platforms which has ECC
feature. If you have a ECC platform in your hand, please select this
qurik to initialize all flexcan memory firstly, then you can select
FLEXCAN_QUIRK_DISABLE_MECR to only enable correctable errors.

Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Link: https://lore.kernel.org/r/20200929211557.14153-2-qiangqing.zhang@nxp.com
[mkl: wrap long lines]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: mcp251xfd: rename all remaining occurrence to mcp251xfd
Marc Kleine-Budde [Wed, 30 Sep 2020 08:49:00 +0000 (10:49 +0200)]
can: mcp251xfd: rename all remaining occurrence to mcp251xfd

In [1] Geert noted that the autodetect compatible for the mcp25xxfd driver,
which is "microchip,mcp25xxfd" might be too generic and overlap with upcoming,
but incompatible chips.

In the previous patch the autodetect DT compatbile has been renamed to
"microchip,mcp251xfd", this patch changes all non user facing occurrence of
"mcp25xxfd" to "mcp251xfd" and "MCP25XXFD" to "MCP251XFD".

[1] http://lore.kernel.org/r/CAMuHMdVkwGjr6dJuMyhQNqFoJqbh6Ec5V2b5LenCshwpM2SDsQ@mail.gmail.com

Link: https://lore.kernel.org/r/20200930091424.792165-10-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: mcp251xfd: rename all user facing strings to mcp251xfd
Marc Kleine-Budde [Wed, 30 Sep 2020 08:49:00 +0000 (10:49 +0200)]
can: mcp251xfd: rename all user facing strings to mcp251xfd

In [1] Geert noted that the autodetect compatible for the mcp25xxfd driver,
which is "microchip,mcp25xxfd" might be too generic and overlap with upcoming,
but incompatible chips.

In the previous patch the autodetect DT compatbile has been renamed to
"microchip,mcp251xfd", this patch changes all user facing strings from
"mcp25xxfd" to "mcp251xfd" and "MCP25XXFD" to "MCP251XFD", including:
- kconfig symbols
- name of kernel module
- DT and SPI compatible

[1] http://lore.kernel.org/r/CAMuHMdVkwGjr6dJuMyhQNqFoJqbh6Ec5V2b5LenCshwpM2SDsQ@mail.gmail.com

Link: https://lore.kernel.org/r/20200930091424.792165-9-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: mcp251xfd: rename driver files and subdir to mcp251xfd
Marc Kleine-Budde [Wed, 30 Sep 2020 08:49:00 +0000 (10:49 +0200)]
can: mcp251xfd: rename driver files and subdir to mcp251xfd

In [1] Geert noted that the autodetect compatible for the mcp25xxfd driver,
which is "microchip,mcp25xxfd" might be too generic and overlap with upcoming,
but incompatible chips.

In the previous patch the autodetect DT compatbile has been renamed to
"microchip,mcp251xfd", this patch changes the name of the driver subdir and the
individual files accordinly.

[1] http://lore.kernel.org/r/CAMuHMdVkwGjr6dJuMyhQNqFoJqbh6Ec5V2b5LenCshwpM2SDsQ@mail.gmail.com

Link: https://lore.kernel.org/r/20200930091424.792165-8-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agoselftests/bpf: Test "incremental" btf_dump in C format
Andrii Nakryiko [Tue, 29 Sep 2020 23:28:43 +0000 (16:28 -0700)]
selftests/bpf: Test "incremental" btf_dump in C format

Add test validating that btf_dump works fine with BTFs that are modified and
incrementally generated.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200929232843.1249318-5-andriin@fb.com
3 years agolibbpf: Make btf_dump work with modifiable BTF
Andrii Nakryiko [Tue, 29 Sep 2020 23:28:40 +0000 (16:28 -0700)]
libbpf: Make btf_dump work with modifiable BTF

Ensure that btf_dump can accommodate new BTF types being appended to BTF
instance after struct btf_dump was created. This came up during attemp to
use btf_dump for raw type dumping in selftests, but given changes are not
excessive, it's good to not have any gotchas in API usage, so I decided to
support such use case in general.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200929232843.1249318-2-andriin@fb.com
3 years agoMerge branch 'Various BPF helper improvements'
Alexei Starovoitov [Wed, 30 Sep 2020 18:50:35 +0000 (11:50 -0700)]
Merge branch 'Various BPF helper improvements'

Daniel Borkmann says:

====================
This series adds two BPF helpers, that is, one for retrieving the classid
of an skb and another one to redirect via the neigh subsystem, and improves
also the cookie helpers by removing the atomic counter. I've also added
the bpf_tail_call_static() helper to the libbpf API that we've been using
in Cilium for a while now, and last but not least the series adds a few
selftests. For details, please check individual patches, thanks!

v3 -> v4:
  - Removed out_rec error path (Martin)
  - Integrate BPF_F_NEIGH flag into rejecting invalid flags (Martin)
    - I think this way it's better to avoid bit overlaps given it's
      right in the place that would need to be extended on new flags
v2 -> v3:
  - Removed double skb->dev = dev assignment (David)
  - Added headroom check for v6 path (David)
  - Set set flowi4_proto for ip_route_output_flow (David)
  - Rebased onto latest bpf-next
v1 -> v2:
  - Rework cookie generator to support nested contexts (Eric)
  - Use ip_neigh_gw6() and container_of() (David)
  - Rename __throw_build_bug() and improve comments (Andrii)
  - Use bpf_tail_call_static() also in BPF samples (Maciej)
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agobpf, selftests: Add redirect_neigh selftest
Daniel Borkmann [Wed, 30 Sep 2020 15:18:20 +0000 (17:18 +0200)]
bpf, selftests: Add redirect_neigh selftest

Add a small test that exercises the new redirect_neigh() helper for the
IPv4 and IPv6 case.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/0fc7d9c5f9a6cc1c65b0d3be83b44b1ec9889f43.1601477936.git.daniel@iogearbox.net
3 years agobpf, selftests: Use bpf_tail_call_static where appropriate
Daniel Borkmann [Wed, 30 Sep 2020 15:18:19 +0000 (17:18 +0200)]
bpf, selftests: Use bpf_tail_call_static where appropriate

For those locations where we use an immediate tail call map index use the
newly added bpf_tail_call_static() helper.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/3cfb2b799a62d22c6e7ae5897c23940bdcc24cbc.1601477936.git.daniel@iogearbox.net
3 years agobpf, libbpf: Add bpf_tail_call_static helper for bpf programs
Daniel Borkmann [Wed, 30 Sep 2020 15:18:18 +0000 (17:18 +0200)]
bpf, libbpf: Add bpf_tail_call_static helper for bpf programs

Port of tail_call_static() helper function from Cilium's BPF code base [0]
to libbpf, so others can easily consume it as well. We've been using this
in production code for some time now. The main idea is that we guarantee
that the kernel's BPF infrastructure and JIT (here: x86_64) can patch the
JITed BPF insns with direct jumps instead of having to fall back to using
expensive retpolines. By using inline asm, we guarantee that the compiler
won't merge the call from different paths with potentially different
content of r2/r3.

We're also using Cilium's __throw_build_bug() macro (here as: __bpf_unreachable())
in different places as a neat trick to trigger compilation errors when
compiler does not remove code at compilation time. This works for the BPF
back end as it does not implement the __builtin_trap().

  [0] https://github.com/cilium/cilium/commit/f5537c26020d5297b70936c6b7d03a1e412a1035

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/1656a082e077552eb46642d513b4a6bde9a7dd01.1601477936.git.daniel@iogearbox.net
3 years agobpf: Add redirect_neigh helper as redirect drop-in
Daniel Borkmann [Wed, 30 Sep 2020 15:18:17 +0000 (17:18 +0200)]
bpf: Add redirect_neigh helper as redirect drop-in

Add a redirect_neigh() helper as redirect() drop-in replacement
for the xmit side. Main idea for the helper is to be very similar
in semantics to the latter just that the skb gets injected into
the neighboring subsystem in order to let the stack do the work
it knows best anyway to populate the L2 addresses of the packet
and then hand over to dev_queue_xmit() as redirect() does.

This solves two bigger items: i) skbs don't need to go up to the
stack on the host facing veth ingress side for traffic egressing
the container to achieve the same for populating L2 which also
has the huge advantage that ii) the skb->sk won't get orphaned in
ip_rcv_core() when entering the IP routing layer on the host stack.

Given that skb->sk neither gets orphaned when crossing the netns
as per 9c4c325252c5 ("skbuff: preserve sock reference when scrubbing
the skb.") the helper can then push the skbs directly to the phys
device where FQ scheduler can do its work and TCP stack gets proper
backpressure given we hold on to skb->sk as long as skb is still
residing in queues.

With the helper used in BPF data path to then push the skb to the
phys device, I observed a stable/consistent TCP_STREAM improvement
on veth devices for traffic going container -> host -> host ->
container from ~10Gbps to ~15Gbps for a single stream in my test
environment.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: David Ahern <dsahern@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Cc: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/bpf/f207de81629e1724899b73b8112e0013be782d35.1601477936.git.daniel@iogearbox.net
3 years agobpf, net: Rework cookie generator as per-cpu one
Daniel Borkmann [Wed, 30 Sep 2020 15:18:16 +0000 (17:18 +0200)]
bpf, net: Rework cookie generator as per-cpu one

With its use in BPF, the cookie generator can be called very frequently
in particular when used out of cgroup v2 hooks (e.g. connect / sendmsg)
and attached to the root cgroup, for example, when used in v1/v2 mixed
environments. In particular, when there's a high churn on sockets in the
system there can be many parallel requests to the bpf_get_socket_cookie()
and bpf_get_netns_cookie() helpers which then cause contention on the
atomic counter.

As similarly done in f991bd2e1421 ("fs: introduce a per-cpu last_ino
allocator"), add a small helper library that both can use for the 64 bit
counters. Given this can be called from different contexts, we also need
to deal with potential nested calls even though in practice they are
considered extremely rare. One idea as suggested by Eric Dumazet was
to use a reverse counter for this situation since we don't expect 64 bit
overflows anyways; that way, we can avoid bigger gaps in the 64 bit
counter space compared to just batch-wise increase. Even on machines
with small number of cores (e.g. 4) the cookie generation shrinks from
min/max/med/avg (ns) of 22/50/40/38.9 down to 10/35/14/17.3 when run
in parallel from multiple CPUs.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Link: https://lore.kernel.org/bpf/8a80b8d27d3c49f9a14e1d5213c19d8be87d1dc8.1601477936.git.daniel@iogearbox.net
3 years agobpf: Add classid helper only based on skb->sk
Daniel Borkmann [Wed, 30 Sep 2020 15:18:15 +0000 (17:18 +0200)]
bpf: Add classid helper only based on skb->sk

Similarly to 5a52ae4e32a6 ("bpf: Allow to retrieve cgroup v1 classid
from v2 hooks"), add a helper to retrieve cgroup v1 classid solely
based on the skb->sk, so it can be used as key as part of BPF map
lookups out of tc from host ns, in particular given the skb->sk is
retained these days when crossing net ns thanks to 9c4c325252c5
("skbuff: preserve sock reference when scrubbing the skb."). This
is similar to bpf_skb_cgroup_id() which implements the same for v2.
Kubernetes ecosystem is still operating on v1 however, hence net_cls
needs to be used there until this can be dropped in with the v2
helper of bpf_skb_cgroup_id().

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/ed633cf27a1c620e901c5aa99ebdefb028dce600.1601477936.git.daniel@iogearbox.net
3 years agobpf: fix raw_tp test run in preempt kernel
Song Liu [Tue, 29 Sep 2020 22:29:49 +0000 (15:29 -0700)]
bpf: fix raw_tp test run in preempt kernel

In preempt kernel, BPF_PROG_TEST_RUN on raw_tp triggers:

[   35.874974] BUG: using smp_processor_id() in preemptible [00000000]
code: new_name/87
[   35.893983] caller is bpf_prog_test_run_raw_tp+0xd4/0x1b0
[   35.900124] CPU: 1 PID: 87 Comm: new_name Not tainted 5.9.0-rc6-g615bd02bf #1
[   35.907358] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.10.2-1ubuntu1 04/01/2014
[   35.916941] Call Trace:
[   35.919660]  dump_stack+0x77/0x9b
[   35.923273]  check_preemption_disabled+0xb4/0xc0
[   35.928376]  bpf_prog_test_run_raw_tp+0xd4/0x1b0
[   35.933872]  ? selinux_bpf+0xd/0x70
[   35.937532]  __do_sys_bpf+0x6bb/0x21e0
[   35.941570]  ? find_held_lock+0x2d/0x90
[   35.945687]  ? vfs_write+0x150/0x220
[   35.949586]  do_syscall_64+0x2d/0x40
[   35.953443]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fix this by calling migrate_disable() before smp_processor_id().

Fixes: 1b4d60ec162f ("bpf: Enable BPF_PROG_TEST_RUN for raw_tracepoint")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agocan: mcp25xxfd: narrow down wildcards in device tree bindings to "microchip,mcp251xfd"
Thomas Kopp [Wed, 30 Sep 2020 09:14:22 +0000 (11:14 +0200)]
can: mcp25xxfd: narrow down wildcards in device tree bindings to "microchip,mcp251xfd"

The wildcard should be narrowed down to prevent existing and future devices
that are not compatible from matching. It is very unlikely that incompatible
devices will be released that do not match the wildcard.

Discussion Reference: https://lore.kernel.org/r/CAMuHMdVkwGjr6dJuMyhQNqFoJqbh6Ec5V2b5LenCshwpM2SDsQ@mail.gmail.com

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Thomas Kopp <thomas.kopp@microchip.com>
Link: https://lore.kernel.org/r/20200930091423.755-1-thomas.kopp@microchip.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agodt-binding: can: mcp251xfd: narrow down wildcards in device tree bindings to "microch...
Thomas Kopp [Wed, 30 Sep 2020 09:14:23 +0000 (11:14 +0200)]
dt-binding: can: mcp251xfd: narrow down wildcards in device tree bindings to "microchip,mcp251xfd"

The wildcard should be narrowed down to prevent existing and future devices
that are not compatible from matching. It is very unlikely that incompatible
devices will be released that do not match the wildcard.

This is the documentation part of the commit.

Discussion Reference: https://lore.kernel.org/r/CAMuHMdVkwGjr6dJuMyhQNqFoJqbh6Ec5V2b5LenCshwpM2SDsQ@mail.gmail.com

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Thomas Kopp <thomas.kopp@microchip.com>
Link: https://lore.kernel.org/r/20200930091423.755-2-thomas.kopp@microchip.com
[mkl: rename file, too]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agodt-binding: can: mcp25xxfd: documentation fixes
Oleksij Rempel [Wed, 23 Sep 2020 12:53:01 +0000 (14:53 +0200)]
dt-binding: can: mcp25xxfd: documentation fixes

Apply following fixes:
- Use 'interrupts'. (interrupts-extended will automagically be supported
  by the tools)
- *-supply is always a single item. So, drop maxItems=1
- add "additionalProperties: false" flag to detect unneeded properties.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20200923125301.27200-1-o.rempel@pengutronix.de
Reported-by: Rob Herring <robh@kernel.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Fixes: 1b5a78e69c1f ("dt-binding: can: mcp25xxfd: document device tree bindings")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: mcp25xxfd: mcp25xxfd_irq(): add missing initialization of variable set_normal...
Marc Kleine-Budde [Wed, 23 Sep 2020 11:44:36 +0000 (13:44 +0200)]
can: mcp25xxfd: mcp25xxfd_irq(): add missing initialization of variable set_normal mode

This patch fixes the following warning:

    drivers/net/can/spi/mcp25xxfd/mcp25xxfd-core.c:2155 mcp25xxfd_irq()
    error: uninitialized symbol 'set_normal_mode'.

by adding the missing initialization.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Fixes: 55e5b97f003e ("can: mcp25xxfd: add driver for Microchip MCP25xxFD SPI CAN")
Link: https://lore.kernel.org/r/20200923114726.2704426-1-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: mcp25xxfd: mcp25xxfd_ring_free(): fix memory leak during cleanup
Dan Carpenter [Wed, 23 Sep 2020 11:27:52 +0000 (14:27 +0300)]
can: mcp25xxfd: mcp25xxfd_ring_free(): fix memory leak during cleanup

This loop doesn't free the first element of the array.  The "i > 0" has
to be changed to "i >= 0".

Fixes: 55e5b97f003e ("can: mcp25xxfd: add driver for Microchip MCP25xxFD SPI CAN")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20200923112752.GA1473821@mwanda
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: mcp25xxfd: mcp25xxfd_probe(): add SPI clk limit related errata information
Thomas Kopp [Fri, 25 Sep 2020 06:56:06 +0000 (08:56 +0200)]
can: mcp25xxfd: mcp25xxfd_probe(): add SPI clk limit related errata information

This patch adds a reference to the recent released MCP2517FD and MCP2518FD
errata sheets and paste the explanation.

The driver already implements the proposed fix.

Signed-off-by: Thomas Kopp <thomas.kopp@microchip.com>
Link: https://lore.kernel.org/r/20200925065606.358-1-thomas.kopp@microchip.com
[mkl: split into two patches, adjust subject and commit message]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: mcp25xxfd: mcp25xxfd_handle_eccif(): add ECC related errata and update log messages
Thomas Kopp [Fri, 25 Sep 2020 06:56:06 +0000 (08:56 +0200)]
can: mcp25xxfd: mcp25xxfd_handle_eccif(): add ECC related errata and update log messages

This patch adds a reference to the recent released MCP2517FD and MCP2518FD
errata sheets and paste the explanation.

The single error correction does not always work, so always indicate that a
single error occurred. If the location of the ECC error is outside of the
TX-RAM always use netdev_notice() to log the problem. For ECC errors in the
TX-RAM, there is a recovery procedure.

Signed-off-by: Thomas Kopp <thomas.kopp@microchip.com>
Link: https://lore.kernel.org/r/20200925065606.358-1-thomas.kopp@microchip.com
[mkl: split into two patches, adjust subject and commit message]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agoMerge branch 'HW-support-for-VCAP-IS1-and-ES0-in-mscc_ocelot'
David S. Miller [Wed, 30 Sep 2020 01:26:42 +0000 (18:26 -0700)]
Merge branch 'HW-support-for-VCAP-IS1-and-ES0-in-mscc_ocelot'

Vladimir Oltean says:

====================
HW support for VCAP IS1 and ES0 in mscc_ocelot

The patches from RFC series "Offload tc-flower to mscc_ocelot switch
using VCAP chains" have been split into 2:
https://patchwork.ozlabs.org/project/netdev/list/?series=204810&state=*

This is the boring part, that deals with the prerequisites, and not with
tc-flower integration. Apart from the initialization of some hardware
blocks, which at this point still don't do anything, no new
functionality is introduced.

- Key and action field offsets are defined for the supported switches.
- VCAP properties are added to the driver for the new TCAM blocks. But
  instead of adding them manually as was done for IS2, which is error
  prone, the driver is refactored to read these parameters from
  hardware, which is possible.
- Some improvements regarding the processing of struct ocelot_vcap_filter.
- Extending the code to be compatible with full and quarter keys.

This series was tested, along with other patches not yet submitted, on
the Felix and Seville switches.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mscc: ocelot: look up the filters in flower_stats() and flower_destroy()
Vladimir Oltean [Tue, 29 Sep 2020 22:27:33 +0000 (01:27 +0300)]
net: mscc: ocelot: look up the filters in flower_stats() and flower_destroy()

Currently a new filter is created, containing just enough correct
information to be able to call ocelot_vcap_block_find_filter_by_index()
on it.

This will be limiting us in the future, when we'll have more metadata
associated with a filter, which will matter in the stats() and destroy()
callbacks, and which we can't make up on the spot. For example, we'll
start "offloading" some dummy tc filter entries for the TCAM skeleton,
but we won't actually be adding them to the hardware, or to block->rules.
So, it makes sense to avoid deleting those rules too. That's the kind of
thing which is difficult to determine unless we look up the real filter.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mscc: ocelot: add a new ocelot_vcap_block_find_filter_by_id function
Vladimir Oltean [Tue, 29 Sep 2020 22:27:32 +0000 (01:27 +0300)]
net: mscc: ocelot: add a new ocelot_vcap_block_find_filter_by_id function

And rename the existing find to ocelot_vcap_block_find_filter_by_index.
The index is the position in the TCAM, and the id is the flow cookie
given by tc.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mscc: ocelot: rename variable 'cnt' in vcap_data_offset_get()
Vladimir Oltean [Tue, 29 Sep 2020 22:27:31 +0000 (01:27 +0300)]
net: mscc: ocelot: rename variable 'cnt' in vcap_data_offset_get()

The 'cnt' variable is actually used for 2 purposes, to hold the number
of sub-words per VCAP entry, and the number of sub-words per VCAP
action.

In fact, I'm pretty sure these 2 numbers can never be different from one
another. By hardware definition, the entry (key) TCAM rows are divided
into the same number of sub-words as its associated action RAM rows.
But nonetheless, let's at least rename the variables such that
observations like this one are easier to make in the future.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mscc: ocelot: rename variable 'count' in vcap_data_offset_get()
Vladimir Oltean [Tue, 29 Sep 2020 22:27:30 +0000 (01:27 +0300)]
net: mscc: ocelot: rename variable 'count' in vcap_data_offset_get()

This gets rid of one of the 2 variables named, very generically,
"count".

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>