linux-2.6-microblaze.git
3 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
David S. Miller [Thu, 1 Oct 2020 21:29:01 +0000 (14:29 -0700)]
Merge git://git./linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2020-10-01

The following pull-request contains BPF updates for your *net-next* tree.

We've added 90 non-merge commits during the last 8 day(s) which contain
a total of 103 files changed, 7662 insertions(+), 1894 deletions(-).

Note that once bpf(/net) tree gets merged into net-next, there will be a small
merge conflict in tools/lib/bpf/btf.c between commit 1245008122d7 ("libbpf: Fix
native endian assumption when parsing BTF") from the bpf tree and the commit
3289959b97ca ("libbpf: Support BTF loading and raw data output in both endianness")
from the bpf-next tree. Correct resolution would be to stick with bpf-next, it
should look like:

  [...]
        /* check BTF magic */
        if (fread(&magic, 1, sizeof(magic), f) < sizeof(magic)) {
                err = -EIO;
                goto err_out;
        }
        if (magic != BTF_MAGIC && magic != bswap_16(BTF_MAGIC)) {
                /* definitely not a raw BTF */
                err = -EPROTO;
                goto err_out;
        }

        /* get file size */
  [...]

The main changes are:

1) Add bpf_snprintf_btf() and bpf_seq_printf_btf() helpers to support displaying
   BTF-based kernel data structures out of BPF programs, from Alan Maguire.

2) Speed up RCU tasks trace grace periods by a factor of 50 & fix a few race
   conditions exposed by it. It was discussed to take these via BPF and
   networking tree to get better testing exposure, from Paul E. McKenney.

3) Support multi-attach for freplace programs, needed for incremental attachment
   of multiple XDP progs using libxdp dispatcher model, from Toke Høiland-Jørgensen.

4) libbpf support for appending new BTF types at the end of BTF object, allowing
   intrusive changes of prog's BTF (useful for future linking), from Andrii Nakryiko.

5) Several BPF helper improvements e.g. avoid atomic op in cookie generator and add
   a redirect helper into neighboring subsys, from Daniel Borkmann.

6) Allow map updates on sockmaps from bpf_iter context in order to migrate sockmaps
   from one to another, from Lorenz Bauer.

7) Fix 32 bit to 64 bit assignment from latest alu32 bounds tracking which caused
   a verifier issue due to type downgrade to scalar, from John Fastabend.

8) Follow-up on tail-call support in BPF subprogs which optimizes x64 JIT prologue
   and epilogue sections, from Maciej Fijalkowski.

9) Add an option to perf RB map to improve sharing of event entries by avoiding remove-
   on-close behavior. Also, add BPF_PROG_TEST_RUN for raw_tracepoint, from Song Liu.

10) Fix a crash in AF_XDP's socket_release when memory allocation for UMEMs fails,
    from Magnus Karlsson.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'net-ravb-Add-support-for-explicit-internal-clock-delay-c
David S. Miller [Thu, 1 Oct 2020 19:53:31 +0000 (12:53 -0700)]
Merge branch 'net-ravb-Add-support-for-explicit-internal-clock-delay-c
onfiguration'

Geert Uytterhoeven says:

====================
net/ravb: Add support for explicit internal clock delay configuration

Some Renesas EtherAVB variants support internal clock delay
configuration, which can add larger delays than the delays that are
typically supported by the PHY (using an "rgmii-*id" PHY mode, and/or
"[rt]xc-skew-ps" properties).

Historically, the EtherAVB driver configured these delays based on the
"rgmii-*id" PHY mode.  This caused issues with PHY drivers that
implement PHY internal delays properly[1].  Hence a backwards-compatible
workaround was added by masking the PHY mode[2].

This patch series implements the next step of the plan outlined in [3],
and adds proper support for explicit configuration of the MAC internal
clock delays using new "[rt]x-internal-delay-ps" properties.  If none of
these properties is present, the driver falls back to the old handling.

This can be considered the MAC counterpart of commit 9150069bf5fc0e86
("dt-bindings: net: Add tx and rx internal delays"), which applies to
the PHY.  Note that unlike commit 92252eec913b2dd5 ("net: phy: Add a
helper to return the index for of the internal delay"), no helpers are
provided to parse the DT properties, as so far there is a single user
only, which supports only zero or a single fixed value.  Of course such
helpers can be added later, when the need arises, or when deemed useful
otherwise.

This series consists of 3 parts:
  1. DT binding updates documenting the new properties, for both the
     generic ethernet-controller and the EtherAVB-specific bindings,
  2. Conversion to json-schema of the Renesas EtherAVB DT bindings.
     Technically, the conversion is independent of all of the above.
     I included it in this series, as it shows how all sanity checks on
     "[rt]x-internal-delay-ps" values are implemented as DT binding
     checks,
  3. EtherAVB driver update implementing support for the new properties.

Given Rob has provided his acks for the DT binding updates, all of this
can be merged through net-next.

Changes compared to v3[4]:
  - Add Reviewed-by,
  - Drop the DT updates, as they will be merged through renesas-devel and
    arm-soc, and have a hard dependency on this series.

Changes compared to v2[5]:
  - Update recently added board DTS files,
  - Add Reviewed-by.

Changes compared to v1[6]:
  - Added "[PATCH 1/7] dt-bindings: net: ethernet-controller: Add
    internal delay properties",
  - Replace "renesas,[rt]xc-delay-ps" by "[rt]x-internal-delay-ps",
  - Incorporated EtherAVB DT binding conversion to json-schema,
  - Add Reviewed-by.

Impacted, tested:
  - Salvator-X(S) with R-Car H3 ES1.0 and ES2.0, M3-W, and M3-N.

Not impacted, tested:
  - Ebisu with R-Car E3.

Impacted, not tested:
  - Salvator-X(S) with other SoC variants,
  - ULCB with R-Car H3/M3-W/M3-N variants,
  - V3MSK and Eagle with R-Car V3M,
  - Draak with R-Car V3H,
  - HiHope RZ/G2[MN] with RZ/G2M or RZ/G2N,
  - Beacon EmbeddedWorks RZ/G2M Development Kit.

To ease testing, I have pushed this series and the DT updates to the
topic/ravb-internal-clock-delays-v4 branch of my renesas-drivers
repository at
git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers.git.

Thanks for applying!

References:
  [1] Commit bcf3440c6dd78bfe ("net: phy: micrel: add phy-mode support
      for the KSZ9031 PHY")
  [2] Commit 9b23203c32ee02cd ("ravb: Mask PHY mode to avoid inserting
      delays twice").
      https://lore.kernel.org/r/20200529122540.31368-1-geert+renesas@glider.be/
  [3] https://lore.kernel.org/r/CAMuHMdU+MR-2tr3-pH55G0GqPG9HwH3XUd=8HZxprFDMGQeWUw@mail.gmail.com/
  [4] https://lore.kernel.org/linux-devicetree/20200819134344.27813-1-geert+renesas@glider.be/
  [5] https://lore.kernel.org/linux-devicetree/20200706143529.18306-1-geert+renesas@glider.be/
  [6] https://lore.kernel.org/linux-devicetree/20200619191554.24942-1-geert+renesas@glider.be/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoravb: Add support for explicit internal clock delay configuration
Geert Uytterhoeven [Thu, 1 Oct 2020 10:10:08 +0000 (12:10 +0200)]
ravb: Add support for explicit internal clock delay configuration

Some EtherAVB variants support internal clock delay configuration, which
can add larger delays than the delays that are typically supported by
the PHY (using an "rgmii-*id" PHY mode, and/or "[rt]xc-skew-ps"
properties).

Historically, the EtherAVB driver configured these delays based on the
"rgmii-*id" PHY mode.  This caused issues with PHY drivers that
implement PHY internal delays properly[1].  Hence a backwards-compatible
workaround was added by masking the PHY mode[2].

Add proper support for explicit configuration of the MAC internal clock
delays using the new "[rt]x-internal-delay-ps" properties.
Fall back to the old handling if none of these properties is present.

[1] Commit bcf3440c6dd78bfe ("net: phy: micrel: add phy-mode support for
    the KSZ9031 PHY")
[2] Commit 9b23203c32ee02cd ("ravb: Mask PHY mode to avoid inserting
    delays twice").

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoravb: Split delay handling in parsing and applying
Geert Uytterhoeven [Thu, 1 Oct 2020 10:10:07 +0000 (12:10 +0200)]
ravb: Split delay handling in parsing and applying

Currently, full delay handling is done in both the probe and resume
paths.  Split it in two parts, so the resume path doesn't have to redo
the parsing part over and over again.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodt-bindings: net: renesas,etheravb: Convert to json-schema
Geert Uytterhoeven [Thu, 1 Oct 2020 10:10:06 +0000 (12:10 +0200)]
dt-bindings: net: renesas,etheravb: Convert to json-schema

Convert the Renesas Ethernet AVB (EthernetAVB-IF) Device Tree binding
documentation to json-schema.

Add missing properties.
Update the example to match reality.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodt-bindings: net: renesas,ravb: Document internal clock delay properties
Geert Uytterhoeven [Thu, 1 Oct 2020 10:10:05 +0000 (12:10 +0200)]
dt-bindings: net: renesas,ravb: Document internal clock delay properties

Some EtherAVB variants support internal clock delay configuration, which
can add larger delays than the delays that are typically supported by
the PHY (using an "rgmii-*id" PHY mode, and/or "[rt]xc-skew-ps"
properties).

Add properties for configuring the internal MAC delays.
These properties are mandatory, even when specified as zero, to
distinguish between old and new DTBs.

Update the (bogus) example accordingly.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodt-bindings: net: ethernet-controller: Add internal delay properties
Geert Uytterhoeven [Thu, 1 Oct 2020 10:10:04 +0000 (12:10 +0200)]
dt-bindings: net: ethernet-controller: Add internal delay properties

Internal Receive and Transmit Clock Delays are a common setting for
RGMII capable devices.

While these delays are typically applied by the PHY, some MACs support
configuring internal clock delay settings, too.  Hence add standardized
properties to configure this.

This is the MAC counterpart of commit 9150069bf5fc0e86 ("dt-bindings:
net: Add tx and rx internal delays"), which applies to the PHY.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge tag 'mlx5-updates-2020-09-30' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Thu, 1 Oct 2020 19:24:52 +0000 (12:24 -0700)]
Merge tag 'mlx5-updates-2020-09-30' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2020-09-30

Updates and cleanups for mlx5 driver:

1) From Ariel, Dan Carpenter and Gostavo, Fixes to the previous
   mlx5 Connection track series.

2) From Yevgeny, trivial cleanups for Software steering

3) From Hamdan, Support for Flow source hint in software steering and
   E-Switch

4) From Parav and Sunil, Small and trivial E-Switch updates and
   cleanups in preparation for mlx5 Sub-functions support
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'introduce BPF_F_PRESERVE_ELEMS'
Alexei Starovoitov [Thu, 1 Oct 2020 06:18:12 +0000 (23:18 -0700)]
Merge branch 'introduce BPF_F_PRESERVE_ELEMS'

Song Liu says:

====================
This set introduces BPF_F_PRESERVE_ELEMS to perf event array for better
sharing of perf event. By default, perf event array removes the perf event
when the map fd used to add the event is closed. With BPF_F_PRESERVE_ELEMS
set, however, the perf event will stay in the array until it is removed, or
the map is closed.
---
Changes v3 => v5:
1. Clean up in selftest. (Alexei)

Changes v2 => v3:
1. Move perf_event_fd_array_map_free() to avoid unnecessary forward
   declaration. (Daniel)

Changes v1 => v2:
1. Rename the flag as BPF_F_PRESERVE_ELEMS. (Alexei, Daniel)
2. Simplify the code and selftest. (Daniel, Alexei)
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agoselftests/bpf: Add tests for BPF_F_PRESERVE_ELEMS
Song Liu [Wed, 30 Sep 2020 22:49:27 +0000 (15:49 -0700)]
selftests/bpf: Add tests for BPF_F_PRESERVE_ELEMS

Add tests for perf event array with and without BPF_F_PRESERVE_ELEMS.

Add a perf event to array via fd mfd. Without BPF_F_PRESERVE_ELEMS, the
perf event is removed when mfd is closed. With BPF_F_PRESERVE_ELEMS, the
perf event is removed when the map is freed.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200930224927.1936644-3-songliubraving@fb.com
3 years agobpf: Introduce BPF_F_PRESERVE_ELEMS for perf event array
Song Liu [Wed, 30 Sep 2020 22:49:26 +0000 (15:49 -0700)]
bpf: Introduce BPF_F_PRESERVE_ELEMS for perf event array

Currently, perf event in perf event array is removed from the array when
the map fd used to add the event is closed. This behavior makes it
difficult to the share perf events with perf event array.

Introduce perf event map that keeps the perf event open with a new flag
BPF_F_PRESERVE_ELEMS. With this flag set, perf events in the array are not
removed when the original map fd is closed. Instead, the perf event will
stay in the map until 1) it is explicitly removed from the array; or 2)
the array is freed.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200930224927.1936644-2-songliubraving@fb.com
3 years agonet/mlx5e: Fix potential null pointer dereference
Gustavo A. R. Silva [Fri, 25 Sep 2020 16:49:13 +0000 (11:49 -0500)]
net/mlx5e: Fix potential null pointer dereference

Calls to kzalloc() and kvzalloc() should be null-checked
in order to avoid any potential failures. In this case,
a potential null pointer dereference.

Fix this by adding null checks for _parse_attr_ and _flow_
right after allocation.

Addresses-Coverity-ID: 1497154 ("Dereference before null check")
Fixes: c620b772152b ("net/mlx5: Refactor tc flow attributes structure")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: Fix a use after free on error in mlx5_tc_ct_shared_counter_get()
Dan Carpenter [Mon, 28 Sep 2020 09:05:56 +0000 (12:05 +0300)]
net/mlx5e: Fix a use after free on error in mlx5_tc_ct_shared_counter_get()

This code frees "shared_counter" and then dereferences on the next line
to get the error code.

Fixes: 1edae2335adf ("net/mlx5e: CT: Use the same counter for both directions")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: Fix dereference on pointer attr after null check
Ariel Levkovich [Mon, 28 Sep 2020 16:34:10 +0000 (19:34 +0300)]
net/mlx5: Fix dereference on pointer attr after null check

When removing a flow from the slow path fdb, a flow attr struct is
allocated for the rule removal process. If the allocation fails the
code prints a warning message but continues with the removal flow
which include dereferencing a pointer which could be null.
Fix this by exiting the function in case the attr allocation failed.

Fixes: c620b772152b ("net/mlx5: Refactor tc flow attributes structure")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: Use dma device access helper
Parav Pandit [Wed, 9 Sep 2020 17:41:38 +0000 (20:41 +0300)]
net/mlx5: Use dma device access helper

Use the PCI device directly for dma accesses as non PCI device unlikely
support IOMMU and dma mappings.
Introduce and use helper routine to access DMA device.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: E-Switch, Support flow source for local vport
Hamdan Igbaria [Sun, 30 Aug 2020 09:32:37 +0000 (12:32 +0300)]
net/mlx5: E-Switch, Support flow source for local vport

Set flow source as hint for local vport.

Signed-off-by: Hamdan Igbaria <hamdani@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: E-switch, Move devlink eswitch ports closer to eswitch
Parav Pandit [Mon, 31 Aug 2020 19:47:47 +0000 (22:47 +0300)]
net/mlx5: E-switch, Move devlink eswitch ports closer to eswitch

Currently devlink eswitch ports are registered and unregistered by the
representor layer.
However it is better to register them at eswitch layer so that in future
user initiated command port add and delete commands can also
register/unregister devlink ports without depending on representor layer.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: E-switch, Use helper function to load unload representor
Parav Pandit [Mon, 31 Aug 2020 18:18:59 +0000 (21:18 +0300)]
net/mlx5: E-switch, Use helper function to load unload representor

To register and unregister devlink ports when loading/unload
representors, refactor the code to helper functions.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: E-switch, Add helper to check egress ACL need
Parav Pandit [Wed, 2 Sep 2020 05:59:30 +0000 (08:59 +0300)]
net/mlx5: E-switch, Add helper to check egress ACL need

Currently only VF vports need egress ACL table.
Add a generic helper to check whether a vport need egress
ACL table or not.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: E-switch, Use PF num in metadata reg c0
sunils [Thu, 10 Sep 2020 21:13:26 +0000 (00:13 +0300)]
net/mlx5: E-switch, Use PF num in metadata reg c0

Currently only 256 vports can be supported as only 8 bits are
reserved for them and 8 bits are reserved for vhca_ids in
metadata reg c0. To support more than 256 vports, replace
vhca_id with a unique shorter 4-bit PF number which covers
upto 16 PF's. Use remaining 12 bits for vports ranging 1-4095.
This will continue to generate unique metadata even if
multiple PCI devices have same switch_id.

Signed-off-by: sunils <sunils@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: DR, Add support for rule creation with flow source hint
Hamdan Igbaria [Mon, 15 Jun 2020 15:18:14 +0000 (18:18 +0300)]
net/mlx5: DR, Add support for rule creation with flow source hint

Skip the rule according to flow arrival source, in case of RX and the
source is local port skip and in case of TX and the source is uplink
skip, we get this info according to the flow source hint we get from
upper layers when creating the rule.
This is needed because for example in case of FDB table which has a TX
and RX tables and we are inserting a rule with an encap action which
is only a TX action, in this case rule will fail on RX, so we can rely
on the flow source hint and skip RX in such case.
Until now we relied on metadata regc_0 that upper layer mapped the
port in the regc_0, but the problem is that upper layer did not always
use regc_0 for port mapping, so now we added support to flow source
hint which upper layers will pass to SW steering when creating a rule.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Hamdan Igbaria <hamdani@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: DR, Call ste_builder directly with tag pointer
Yevgeny Kliteynik [Mon, 31 Aug 2020 08:56:40 +0000 (11:56 +0300)]
net/mlx5: DR, Call ste_builder directly with tag pointer

Instead of getting the tag in each function, call the builder
directly with the tag. This will allow to use the same function
for building the tag and the bitmask.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: DR, Remove unneeded local variable
Yevgeny Kliteynik [Mon, 31 Aug 2020 08:57:28 +0000 (11:57 +0300)]
net/mlx5: DR, Remove unneeded local variable

The misc3 variable is used only once and can be dropped.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: DR, Remove unneeded vlan check from L2 builder
Yevgeny Kliteynik [Mon, 31 Aug 2020 08:57:14 +0000 (11:57 +0300)]
net/mlx5: DR, Remove unneeded vlan check from L2 builder

When we create a matcher we check that all fields are consumed.
There is no need for this specific check. This keeps the STE
builder functions simple and clean.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: DR, Remove unneeded check from source port builder
Yevgeny Kliteynik [Mon, 31 Aug 2020 08:56:59 +0000 (11:56 +0300)]
net/mlx5: DR, Remove unneeded check from source port builder

Mask validity for ste builders is checked by mlx5dr_ste_build_pre_check
during matcher creation.
It already checks the mask value of source_vport, so removing
this duplicated check.
Also, moving there the check of source_eswitch_owner_vhca_id mask.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: DR, Replace the check for valid STE entry
Yevgeny Kliteynik [Mon, 31 Aug 2020 08:56:07 +0000 (11:56 +0300)]
net/mlx5: DR, Replace the check for valid STE entry

Validity check is done by reading the next lu_type from the STE,
this check can be replaced by checking the refcount.
This will make the check independent on internal STE structure.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agoselftests/bpf: Fix alignment of .BTF_ids
Jean-Philippe Brucker [Wed, 30 Sep 2020 09:36:01 +0000 (11:36 +0200)]
selftests/bpf: Fix alignment of .BTF_ids

Fix a build failure on arm64, due to missing alignment information for
the .BTF_ids section:

resolve_btfids.test.o: in function `test_resolve_btfids':
tools/testing/selftests/bpf/prog_tests/resolve_btfids.c:140:(.text+0x29c): relocation truncated to fit: R_AARCH64_LDST32_ABS_LO12_NC against `.BTF_ids'
ld: tools/testing/selftests/bpf/prog_tests/resolve_btfids.c:140: warning: one possible cause of this error is that the symbol is being referenced in the indicated code as if it had a larger alignment than was declared where it was defined

In vmlinux, the .BTF_ids section is aligned to 4 bytes by vmlinux.lds.h.
In test_progs however, .BTF_ids doesn't have alignment constraints. The
arm64 linker expects the btf_id_set.cnt symbol, a u32, to be naturally
aligned but finds it misaligned and cannot apply the relocation. Enforce
alignment of .BTF_ids to 4 bytes.

Fixes: cd04b04de119 ("selftests/bpf: Add set test to resolve_btfids")
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/bpf/20200930093559.2120126-1-jean-philippe@linaro.org
3 years agoMerge branch 'drop_monitor-Convert-to-use-devlink-tracepoint'
David S. Miller [Thu, 1 Oct 2020 01:01:27 +0000 (18:01 -0700)]
Merge branch 'drop_monitor-Convert-to-use-devlink-tracepoint'

Ido Schimmel says:

====================
drop_monitor: Convert to use devlink tracepoint

Drop monitor is able to monitor both software and hardware originated
drops. Software drops are monitored by having drop monitor register its
probe on the 'kfree_skb' tracepoint. Hardware originated drops are
monitored by having devlink call into drop monitor whenever it receives
a dropped packet from the underlying hardware.

This patch set converts drop monitor to monitor both software and
hardware originated drops in the same way - by registering its probe on
the relevant tracepoint.

In addition to drop monitor being more consistent, it is now also
possible to build drop monitor as module instead of as a builtin and
still monitor hardware originated drops. Initially, CONFIG_NET_DEVLINK
implied CONFIG_NET_DROP_MONITOR, but after commit def2fbffe62c
("kconfig: allow symbols implied by y to become m") we can have
CONFIG_NET_DEVLINK=y and CONFIG_NET_DROP_MONITOR=m and hardware
originated drops will not be monitored.

Patch set overview:

Patch #1 adds a tracepoint in devlink for trap reports.

Patch #2 prepares probe functions in drop monitor for the new
tracepoint.

Patch #3 converts drop monitor to use the new tracepoint.

Patches #4-#6 perform cleanups after the conversion.

Patch #7 adds a test case for drop monitor. Both software originated
drops and hardware originated drops (using netdevsim) are tested.

Tested:

| CONFIG_NET_DEVLINK | CONFIG_NET_DROP_MONITOR | Build | SW drops | HW drops |
| -------------------|-------------------------|-------|----------|----------|
|          y         |            y            |   v   |     v    |     v    |
|          y         |            m            |   v   |     v    |     v    |
|          y         |            n            |   v   |     x    |     x    |
|          n         |            y            |   v   |     v    |     x    |
|          n         |            m            |   v   |     v    |     x    |
|          n         |            n            |   v   |     x    |     x    |
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests: net: Add drop monitor test
Ido Schimmel [Tue, 29 Sep 2020 08:15:56 +0000 (11:15 +0300)]
selftests: net: Add drop monitor test

Test that drop monitor correctly captures both software and hardware
originated packet drops.

# ./drop_monitor_tests.sh

Software drops test
    TEST: Capturing active software drops                               [ OK ]
    TEST: Capturing inactive software drops                             [ OK ]

Hardware drops test
    TEST: Capturing active hardware drops                               [ OK ]
    TEST: Capturing inactive hardware drops                             [ OK ]

Tests passed:   4
Tests failed:   0

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodrop_monitor: Filter control packets in drop monitor
Ido Schimmel [Tue, 29 Sep 2020 08:15:55 +0000 (11:15 +0300)]
drop_monitor: Filter control packets in drop monitor

Previously, devlink called into drop monitor in order to report hardware
originated drops / exceptions. devlink intentionally filtered control
packets and did not pass them to drop monitor as they were not dropped
by the underlying hardware.

Now drop monitor registers its probe on a generic 'devlink_trap_report'
tracepoint and should therefore perform this filtering itself instead of
having devlink do that.

Add the trap type as metadata and have drop monitor ignore control
packets.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodrop_monitor: Remove duplicate struct
Ido Schimmel [Tue, 29 Sep 2020 08:15:54 +0000 (11:15 +0300)]
drop_monitor: Remove duplicate struct

'struct net_dm_hw_metadata' is a duplicate of 'struct
devlink_trap_metadata'.

Remove the former and simplify the code.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodrop_monitor: Remove no longer used functions
Ido Schimmel [Tue, 29 Sep 2020 08:15:53 +0000 (11:15 +0300)]
drop_monitor: Remove no longer used functions

The old probe functions that were invoked by drop monitor code are no
longer called and can thus be removed. They were replaced by actual
probe functions that are registered on the recently introduced
'devlink_trap_report' tracepoint.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodrop_monitor: Convert to using devlink tracepoint
Ido Schimmel [Tue, 29 Sep 2020 08:15:52 +0000 (11:15 +0300)]
drop_monitor: Convert to using devlink tracepoint

Convert drop monitor to use the recently introduced
'devlink_trap_report' tracepoint instead of having devlink call into
drop monitor.

This is both consistent with software originated drops ('kfree_skb'
tracepoint) and also allows drop monitor to be built as a module and
still report hardware originated drops.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodrop_monitor: Prepare probe functions for devlink tracepoint
Ido Schimmel [Tue, 29 Sep 2020 08:15:51 +0000 (11:15 +0300)]
drop_monitor: Prepare probe functions for devlink tracepoint

Drop monitor supports two alerting modes: Summary and packet. Prepare a
probe function for each, so that they could be later registered on the
devlink tracepoint by calling register_trace_devlink_trap_report(),
based on the configured alerting mode.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodevlink: Add a tracepoint for trap reports
Ido Schimmel [Tue, 29 Sep 2020 08:15:50 +0000 (11:15 +0300)]
devlink: Add a tracepoint for trap reports

Add a tracepoint for trap reports so that drop monitor could register
its probe on it. Use trace_devlink_trap_report_enabled() to avoid
wasting cycles setting the trap metadata if the tracepoint is not
enabled.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge tag 'linux-can-next-for-5.10-20200930' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Wed, 30 Sep 2020 22:21:43 +0000 (15:21 -0700)]
Merge tag 'linux-can-next-for-5.10-20200930' of git://git./linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2020-09-30

this is a pull request of 13 patches for net-next.

The first 10 target the mcp25xxfd driver (which is renamed to mcp251xfd during
this series).

The first two patches are by Thomas Kopp, which adds reference to the just
related errata and updates the documentation and log messages.

Dan Carpenter's patch fixes a resource leak during ifdown.

A patch by me adds the missing initialization of a variable.

Oleksij Rempel updates the DT binding documentation as requested by Rob
Herring.

The next 5 patches are by Thomas Kopp and me. During review Geert Uytterhoeven
suggested to use "microchip,mcp251xfd" instead of "microchip,mcp25xxfd" as the
DT autodetection compatible to avoid clashes with future but incompatible
devices. We decided not only to rename the compatible but the whole driver from
"mcp25xxfd" to "mcp251xfd". This is done in several patches.

Joakim Zhang contributes three patches for the flexcan driver. The first one
adds support for the ECC feature, which is implemented on some modern IP cores,
by initializing the controller's memory during ifup. The next patch adds
support for the i.MX8MP (which supports ECC) and the last patch properly
disables the runtime PM if device registration fails.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'ionic-watchdog-training'
David S. Miller [Wed, 30 Sep 2020 22:11:09 +0000 (15:11 -0700)]
Merge branch 'ionic-watchdog-training'

Shannon Nelson says:

====================
ionic watchdog training

Our link watchdog displayed a couple of unfriendly behaviors in some recent
stress testing.  These patches change the startup and stop timing in order
to be sure that expected structures are ready to be used by the watchdog.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoionic: prevent early watchdog check
Shannon Nelson [Wed, 30 Sep 2020 17:48:28 +0000 (10:48 -0700)]
ionic: prevent early watchdog check

In one corner case scenario, the driver device lif setup can
get delayed such that the ionic_watchdog_cb() timer goes off
before the ionic->lif is set, thus causing a NULL pointer panic.
We catch the problem by checking for a NULL lif just a little
earlier in the callback.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoionic: stop watchdog timer earlier on remove
Shannon Nelson [Wed, 30 Sep 2020 17:48:27 +0000 (10:48 -0700)]
ionic: stop watchdog timer earlier on remove

We need to be better at making sure we don't have a link check
watchdog go off while we're shutting things down, so let's stop
the timer as soon as we start the remove.

Meanwhile, since that was the only thing in
ionic_dev_teardown(), simplify and remove that function.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'tcp-exponential-backoff-in-tcp_send_ack'
David S. Miller [Wed, 30 Sep 2020 21:21:30 +0000 (14:21 -0700)]
Merge branch 'tcp-exponential-backoff-in-tcp_send_ack'

Eric Dumazet says:

====================
tcp: exponential backoff in tcp_send_ack()

We had outages caused by repeated skb allocation failures in tcp_send_ack()

It is time to add exponential backoff to reduce number of attempts.
Before doing so, first patch removes icsk_ack.blocked to make
room for a new field (icsk_ack.retry)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotcp: add exponential backoff in __tcp_send_ack()
Eric Dumazet [Wed, 30 Sep 2020 12:54:57 +0000 (05:54 -0700)]
tcp: add exponential backoff in __tcp_send_ack()

Whenever host is under very high memory pressure,
__tcp_send_ack() skb allocation fails, and we setup
a 200 ms (TCP_DELACK_MAX) timer before retrying.

On hosts with high number of TCP sockets, we can spend
considerable amount of cpu cycles in these attempts,
add high pressure on various spinlocks in mm-layer,
ultimately blocking threads attempting to free space
from making any progress.

This patch adds standard exponential backoff to avoid
adding fuel to the fire.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoinet: remove icsk_ack.blocked
Eric Dumazet [Wed, 30 Sep 2020 12:54:56 +0000 (05:54 -0700)]
inet: remove icsk_ack.blocked

TCP has been using it to work around the possibility of tcp_delack_timer()
finding the socket owned by user.

After commit 6f458dfb4092 ("tcp: improve latencies of timer triggered events")
we added TCP_DELACK_TIMER_DEFERRED atomic bit for more immediate recovery,
so we can get rid of icsk_ack.blocked

This frees space that following patch will reuse.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: macb: move pdata to private header
Alexandre Belloni [Wed, 30 Sep 2020 10:50:59 +0000 (12:50 +0200)]
net: macb: move pdata to private header

struct macb_platform_data is only used by macb_pci to register the platform
device, move its definition to cadence/macb.h and remove platform_data/macb.h

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'mlxsw-PFC-and-headroom-selftests'
David S. Miller [Wed, 30 Sep 2020 21:06:54 +0000 (14:06 -0700)]
Merge branch 'mlxsw-PFC-and-headroom-selftests'

Petr Machata says:

====================
mlxsw: PFC and headroom selftests

Recent changes in the headroom management code made it clear that an
automated way of testing this functionality is needed. This patchset brings
two tests: a synthetic headroom behavior test, which verifies mechanics of
headroom management. And a PFC test, which verifies whether this behavior
actually translates into a working lossless configuration.

Both of these tests rely on mlnx_qos[1], a tool that interfaces with Linux
DCB API. The tool was originally written to work with Mellanox NICs, but
does not actually rely on anything Mellanox-specific, and can be used for
mlxsw as well as for any other NIC-like driver. Unlike Open LLDP it does
support buffer commands and permits a fire-and-forget approach to
configuration, which makes it very handy for writing of selftests.

Patches #1-#3 extend the selftest devlink_lib.sh in various ways. Patch #4
then adds a helper wrapper for mlnx_qos to mlxsw's qos_lib.sh.

Patch #5 adds a test for management of port headroom.

Patch #6 adds a PFC test.

[1] https://github.com/Mellanox/mlnx-tools/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests: mlxsw: Add a PFC test
Petr Machata [Wed, 30 Sep 2020 10:49:12 +0000 (12:49 +0200)]
selftests: mlxsw: Add a PFC test

Add a test for PFC. Runs 10MB of traffic through a bottleneck and checks
that none of it gets lost.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests: mlxsw: Add headroom handling test
Petr Machata [Wed, 30 Sep 2020 10:49:11 +0000 (12:49 +0200)]
selftests: mlxsw: Add headroom handling test

Add a test for headroom configuration. This covers projection of ETS
configuration to ingress, PFC, adjustments for MTU, the qdisc / TC
mode and the effect of egress SPAN session on buffer configuration.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests: mlxsw: qos_lib: Add a wrapper for running mlnx_qos
Petr Machata [Wed, 30 Sep 2020 10:49:10 +0000 (12:49 +0200)]
selftests: mlxsw: qos_lib: Add a wrapper for running mlnx_qos

mlnx_qos is a script for configuration of DCB. Despite the name it is not
actually Mellanox-specific in any way. It is currently the only ad-hoc tool
available (in contrast to a daemon that manages an interface on an ongoing
basis). However, it is very verbose and parsing out error messages is not
really possible. Add a wrapper that makes it easier to use the tool.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests: forwarding: devlink_lib: Support port-less topologies
Petr Machata [Wed, 30 Sep 2020 10:49:09 +0000 (12:49 +0200)]
selftests: forwarding: devlink_lib: Support port-less topologies

Some selftests may not need any actual ports. Technically those are not
forwarding selftests, but devlink_lib can still be handy. Fall back on
NETIF_NO_CABLE in those cases.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests: forwarding: devlink_lib: Add devlink_cell_size_get()
Petr Machata [Wed, 30 Sep 2020 10:49:08 +0000 (12:49 +0200)]
selftests: forwarding: devlink_lib: Add devlink_cell_size_get()

Add a helper that answers the cell size of the devlink device.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoselftests: forwarding: devlink_lib: Split devlink_..._set() into save & set
Petr Machata [Wed, 30 Sep 2020 10:49:07 +0000 (12:49 +0200)]
selftests: forwarding: devlink_lib: Split devlink_..._set() into save & set

Changing pool type from static to dynamic causes reinterpretation of
threshold values. They therefore need to be saved before pool type is
changed, then the pool type can be changed, and then the new values need
to be set up.

For that reason, set cannot subsume save, because it would be saving the
wrong thing, with possibly a nonsensical value, and restore would then fail
to restore the nonsensical value.

Thus extract a _save() from each of the relevant _set()'s. This way it is
possible to save everything up front, then to tweak it, and then restore in
the required order.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agocan: flexcan: disable runtime PM if register flexcandev failed
Joakim Zhang [Tue, 29 Sep 2020 21:15:57 +0000 (05:15 +0800)]
can: flexcan: disable runtime PM if register flexcandev failed

Disable runtime PM if register flexcandev failed, and balance reference
of usage_count.

Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Link: https://lore.kernel.org/r/20200929211557.14153-4-qiangqing.zhang@nxp.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: flexcan: add flexcan driver for i.MX8MP
Joakim Zhang [Tue, 29 Sep 2020 21:15:56 +0000 (05:15 +0800)]
can: flexcan: add flexcan driver for i.MX8MP

Add flexcan driver for i.MX8MP, which supports CAN FD and ECC.

Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Link: https://lore.kernel.org/r/20200929211557.14153-3-qiangqing.zhang@nxp.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: flexcan: initialize all flexcan memory for ECC function
Joakim Zhang [Tue, 29 Sep 2020 21:15:55 +0000 (05:15 +0800)]
can: flexcan: initialize all flexcan memory for ECC function

One issue was reported at a baremetal environment, which is used for
FPGA verification. "The first transfer will fail for extended ID
format(for both 2.0B and FD format), following frames can be transmitted
and received successfully for extended format, and standard format don't
have this issue. This issue occurred randomly with high possiblity, when
it occurs, the transmitter will detect a BIT1 error, the receiver a CRC
error. According to the spec, a non-correctable error may cause this
transfer failure."

With FLEXCAN_QUIRK_DISABLE_MECR quirk, it supports correctable errors,
disable non-correctable errors interrupt and freeze mode. Platform has
ECC hardware support, but select this quirk, this issue may not come to
light. Initialize all FlexCAN memory before accessing them, at least it
can avoid non-correctable errors detected due to memory uninitialized.
The internal region can't be initialized when the hardware doesn't support
ECC.

According to IMX8MPRM, Rev.C, 04/2020. There is a NOTE at the section
11.8.3.13 Detection and correction of memory errors:
"All FlexCAN memory must be initialized before starting its operation in
order to have the parity bits in memory properly updated. CTRL2[WRMFRZ]
grants write access to all memory positions that require initialization,
ranging from 0x080 to 0xADF and from 0xF28 to 0xFFF when the CAN FD feature
is enabled. The RXMGMASK, RX14MASK, RX15MASK, and RXFGMASK registers need to
be initialized as well. MCR[RFEN] must not be set during memory initialization."

Memory range from 0x080 to 0xADF, there are reserved memory (unimplemented
by hardware, e.g. only configure 64 MBs), these memory can be initialized or not.
In this patch, initialize all flexcan memory which includes reserved memory.

In this patch, create FLEXCAN_QUIRK_SUPPORT_ECC for platforms which has ECC
feature. If you have a ECC platform in your hand, please select this
qurik to initialize all flexcan memory firstly, then you can select
FLEXCAN_QUIRK_DISABLE_MECR to only enable correctable errors.

Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Link: https://lore.kernel.org/r/20200929211557.14153-2-qiangqing.zhang@nxp.com
[mkl: wrap long lines]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: mcp251xfd: rename all remaining occurrence to mcp251xfd
Marc Kleine-Budde [Wed, 30 Sep 2020 08:49:00 +0000 (10:49 +0200)]
can: mcp251xfd: rename all remaining occurrence to mcp251xfd

In [1] Geert noted that the autodetect compatible for the mcp25xxfd driver,
which is "microchip,mcp25xxfd" might be too generic and overlap with upcoming,
but incompatible chips.

In the previous patch the autodetect DT compatbile has been renamed to
"microchip,mcp251xfd", this patch changes all non user facing occurrence of
"mcp25xxfd" to "mcp251xfd" and "MCP25XXFD" to "MCP251XFD".

[1] http://lore.kernel.org/r/CAMuHMdVkwGjr6dJuMyhQNqFoJqbh6Ec5V2b5LenCshwpM2SDsQ@mail.gmail.com

Link: https://lore.kernel.org/r/20200930091424.792165-10-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: mcp251xfd: rename all user facing strings to mcp251xfd
Marc Kleine-Budde [Wed, 30 Sep 2020 08:49:00 +0000 (10:49 +0200)]
can: mcp251xfd: rename all user facing strings to mcp251xfd

In [1] Geert noted that the autodetect compatible for the mcp25xxfd driver,
which is "microchip,mcp25xxfd" might be too generic and overlap with upcoming,
but incompatible chips.

In the previous patch the autodetect DT compatbile has been renamed to
"microchip,mcp251xfd", this patch changes all user facing strings from
"mcp25xxfd" to "mcp251xfd" and "MCP25XXFD" to "MCP251XFD", including:
- kconfig symbols
- name of kernel module
- DT and SPI compatible

[1] http://lore.kernel.org/r/CAMuHMdVkwGjr6dJuMyhQNqFoJqbh6Ec5V2b5LenCshwpM2SDsQ@mail.gmail.com

Link: https://lore.kernel.org/r/20200930091424.792165-9-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: mcp251xfd: rename driver files and subdir to mcp251xfd
Marc Kleine-Budde [Wed, 30 Sep 2020 08:49:00 +0000 (10:49 +0200)]
can: mcp251xfd: rename driver files and subdir to mcp251xfd

In [1] Geert noted that the autodetect compatible for the mcp25xxfd driver,
which is "microchip,mcp25xxfd" might be too generic and overlap with upcoming,
but incompatible chips.

In the previous patch the autodetect DT compatbile has been renamed to
"microchip,mcp251xfd", this patch changes the name of the driver subdir and the
individual files accordinly.

[1] http://lore.kernel.org/r/CAMuHMdVkwGjr6dJuMyhQNqFoJqbh6Ec5V2b5LenCshwpM2SDsQ@mail.gmail.com

Link: https://lore.kernel.org/r/20200930091424.792165-8-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agoselftests/bpf: Test "incremental" btf_dump in C format
Andrii Nakryiko [Tue, 29 Sep 2020 23:28:43 +0000 (16:28 -0700)]
selftests/bpf: Test "incremental" btf_dump in C format

Add test validating that btf_dump works fine with BTFs that are modified and
incrementally generated.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200929232843.1249318-5-andriin@fb.com
3 years agolibbpf: Make btf_dump work with modifiable BTF
Andrii Nakryiko [Tue, 29 Sep 2020 23:28:40 +0000 (16:28 -0700)]
libbpf: Make btf_dump work with modifiable BTF

Ensure that btf_dump can accommodate new BTF types being appended to BTF
instance after struct btf_dump was created. This came up during attemp to
use btf_dump for raw type dumping in selftests, but given changes are not
excessive, it's good to not have any gotchas in API usage, so I decided to
support such use case in general.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200929232843.1249318-2-andriin@fb.com
3 years agoMerge branch 'Various BPF helper improvements'
Alexei Starovoitov [Wed, 30 Sep 2020 18:50:35 +0000 (11:50 -0700)]
Merge branch 'Various BPF helper improvements'

Daniel Borkmann says:

====================
This series adds two BPF helpers, that is, one for retrieving the classid
of an skb and another one to redirect via the neigh subsystem, and improves
also the cookie helpers by removing the atomic counter. I've also added
the bpf_tail_call_static() helper to the libbpf API that we've been using
in Cilium for a while now, and last but not least the series adds a few
selftests. For details, please check individual patches, thanks!

v3 -> v4:
  - Removed out_rec error path (Martin)
  - Integrate BPF_F_NEIGH flag into rejecting invalid flags (Martin)
    - I think this way it's better to avoid bit overlaps given it's
      right in the place that would need to be extended on new flags
v2 -> v3:
  - Removed double skb->dev = dev assignment (David)
  - Added headroom check for v6 path (David)
  - Set set flowi4_proto for ip_route_output_flow (David)
  - Rebased onto latest bpf-next
v1 -> v2:
  - Rework cookie generator to support nested contexts (Eric)
  - Use ip_neigh_gw6() and container_of() (David)
  - Rename __throw_build_bug() and improve comments (Andrii)
  - Use bpf_tail_call_static() also in BPF samples (Maciej)
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agobpf, selftests: Add redirect_neigh selftest
Daniel Borkmann [Wed, 30 Sep 2020 15:18:20 +0000 (17:18 +0200)]
bpf, selftests: Add redirect_neigh selftest

Add a small test that exercises the new redirect_neigh() helper for the
IPv4 and IPv6 case.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/0fc7d9c5f9a6cc1c65b0d3be83b44b1ec9889f43.1601477936.git.daniel@iogearbox.net
3 years agobpf, selftests: Use bpf_tail_call_static where appropriate
Daniel Borkmann [Wed, 30 Sep 2020 15:18:19 +0000 (17:18 +0200)]
bpf, selftests: Use bpf_tail_call_static where appropriate

For those locations where we use an immediate tail call map index use the
newly added bpf_tail_call_static() helper.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/3cfb2b799a62d22c6e7ae5897c23940bdcc24cbc.1601477936.git.daniel@iogearbox.net
3 years agobpf, libbpf: Add bpf_tail_call_static helper for bpf programs
Daniel Borkmann [Wed, 30 Sep 2020 15:18:18 +0000 (17:18 +0200)]
bpf, libbpf: Add bpf_tail_call_static helper for bpf programs

Port of tail_call_static() helper function from Cilium's BPF code base [0]
to libbpf, so others can easily consume it as well. We've been using this
in production code for some time now. The main idea is that we guarantee
that the kernel's BPF infrastructure and JIT (here: x86_64) can patch the
JITed BPF insns with direct jumps instead of having to fall back to using
expensive retpolines. By using inline asm, we guarantee that the compiler
won't merge the call from different paths with potentially different
content of r2/r3.

We're also using Cilium's __throw_build_bug() macro (here as: __bpf_unreachable())
in different places as a neat trick to trigger compilation errors when
compiler does not remove code at compilation time. This works for the BPF
back end as it does not implement the __builtin_trap().

  [0] https://github.com/cilium/cilium/commit/f5537c26020d5297b70936c6b7d03a1e412a1035

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/1656a082e077552eb46642d513b4a6bde9a7dd01.1601477936.git.daniel@iogearbox.net
3 years agobpf: Add redirect_neigh helper as redirect drop-in
Daniel Borkmann [Wed, 30 Sep 2020 15:18:17 +0000 (17:18 +0200)]
bpf: Add redirect_neigh helper as redirect drop-in

Add a redirect_neigh() helper as redirect() drop-in replacement
for the xmit side. Main idea for the helper is to be very similar
in semantics to the latter just that the skb gets injected into
the neighboring subsystem in order to let the stack do the work
it knows best anyway to populate the L2 addresses of the packet
and then hand over to dev_queue_xmit() as redirect() does.

This solves two bigger items: i) skbs don't need to go up to the
stack on the host facing veth ingress side for traffic egressing
the container to achieve the same for populating L2 which also
has the huge advantage that ii) the skb->sk won't get orphaned in
ip_rcv_core() when entering the IP routing layer on the host stack.

Given that skb->sk neither gets orphaned when crossing the netns
as per 9c4c325252c5 ("skbuff: preserve sock reference when scrubbing
the skb.") the helper can then push the skbs directly to the phys
device where FQ scheduler can do its work and TCP stack gets proper
backpressure given we hold on to skb->sk as long as skb is still
residing in queues.

With the helper used in BPF data path to then push the skb to the
phys device, I observed a stable/consistent TCP_STREAM improvement
on veth devices for traffic going container -> host -> host ->
container from ~10Gbps to ~15Gbps for a single stream in my test
environment.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: David Ahern <dsahern@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Cc: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/bpf/f207de81629e1724899b73b8112e0013be782d35.1601477936.git.daniel@iogearbox.net
3 years agobpf, net: Rework cookie generator as per-cpu one
Daniel Borkmann [Wed, 30 Sep 2020 15:18:16 +0000 (17:18 +0200)]
bpf, net: Rework cookie generator as per-cpu one

With its use in BPF, the cookie generator can be called very frequently
in particular when used out of cgroup v2 hooks (e.g. connect / sendmsg)
and attached to the root cgroup, for example, when used in v1/v2 mixed
environments. In particular, when there's a high churn on sockets in the
system there can be many parallel requests to the bpf_get_socket_cookie()
and bpf_get_netns_cookie() helpers which then cause contention on the
atomic counter.

As similarly done in f991bd2e1421 ("fs: introduce a per-cpu last_ino
allocator"), add a small helper library that both can use for the 64 bit
counters. Given this can be called from different contexts, we also need
to deal with potential nested calls even though in practice they are
considered extremely rare. One idea as suggested by Eric Dumazet was
to use a reverse counter for this situation since we don't expect 64 bit
overflows anyways; that way, we can avoid bigger gaps in the 64 bit
counter space compared to just batch-wise increase. Even on machines
with small number of cores (e.g. 4) the cookie generation shrinks from
min/max/med/avg (ns) of 22/50/40/38.9 down to 10/35/14/17.3 when run
in parallel from multiple CPUs.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Link: https://lore.kernel.org/bpf/8a80b8d27d3c49f9a14e1d5213c19d8be87d1dc8.1601477936.git.daniel@iogearbox.net
3 years agobpf: Add classid helper only based on skb->sk
Daniel Borkmann [Wed, 30 Sep 2020 15:18:15 +0000 (17:18 +0200)]
bpf: Add classid helper only based on skb->sk

Similarly to 5a52ae4e32a6 ("bpf: Allow to retrieve cgroup v1 classid
from v2 hooks"), add a helper to retrieve cgroup v1 classid solely
based on the skb->sk, so it can be used as key as part of BPF map
lookups out of tc from host ns, in particular given the skb->sk is
retained these days when crossing net ns thanks to 9c4c325252c5
("skbuff: preserve sock reference when scrubbing the skb."). This
is similar to bpf_skb_cgroup_id() which implements the same for v2.
Kubernetes ecosystem is still operating on v1 however, hence net_cls
needs to be used there until this can be dropped in with the v2
helper of bpf_skb_cgroup_id().

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/ed633cf27a1c620e901c5aa99ebdefb028dce600.1601477936.git.daniel@iogearbox.net
3 years agobpf: fix raw_tp test run in preempt kernel
Song Liu [Tue, 29 Sep 2020 22:29:49 +0000 (15:29 -0700)]
bpf: fix raw_tp test run in preempt kernel

In preempt kernel, BPF_PROG_TEST_RUN on raw_tp triggers:

[   35.874974] BUG: using smp_processor_id() in preemptible [00000000]
code: new_name/87
[   35.893983] caller is bpf_prog_test_run_raw_tp+0xd4/0x1b0
[   35.900124] CPU: 1 PID: 87 Comm: new_name Not tainted 5.9.0-rc6-g615bd02bf #1
[   35.907358] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.10.2-1ubuntu1 04/01/2014
[   35.916941] Call Trace:
[   35.919660]  dump_stack+0x77/0x9b
[   35.923273]  check_preemption_disabled+0xb4/0xc0
[   35.928376]  bpf_prog_test_run_raw_tp+0xd4/0x1b0
[   35.933872]  ? selinux_bpf+0xd/0x70
[   35.937532]  __do_sys_bpf+0x6bb/0x21e0
[   35.941570]  ? find_held_lock+0x2d/0x90
[   35.945687]  ? vfs_write+0x150/0x220
[   35.949586]  do_syscall_64+0x2d/0x40
[   35.953443]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fix this by calling migrate_disable() before smp_processor_id().

Fixes: 1b4d60ec162f ("bpf: Enable BPF_PROG_TEST_RUN for raw_tracepoint")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agocan: mcp25xxfd: narrow down wildcards in device tree bindings to "microchip,mcp251xfd"
Thomas Kopp [Wed, 30 Sep 2020 09:14:22 +0000 (11:14 +0200)]
can: mcp25xxfd: narrow down wildcards in device tree bindings to "microchip,mcp251xfd"

The wildcard should be narrowed down to prevent existing and future devices
that are not compatible from matching. It is very unlikely that incompatible
devices will be released that do not match the wildcard.

Discussion Reference: https://lore.kernel.org/r/CAMuHMdVkwGjr6dJuMyhQNqFoJqbh6Ec5V2b5LenCshwpM2SDsQ@mail.gmail.com

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Thomas Kopp <thomas.kopp@microchip.com>
Link: https://lore.kernel.org/r/20200930091423.755-1-thomas.kopp@microchip.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agodt-binding: can: mcp251xfd: narrow down wildcards in device tree bindings to "microch...
Thomas Kopp [Wed, 30 Sep 2020 09:14:23 +0000 (11:14 +0200)]
dt-binding: can: mcp251xfd: narrow down wildcards in device tree bindings to "microchip,mcp251xfd"

The wildcard should be narrowed down to prevent existing and future devices
that are not compatible from matching. It is very unlikely that incompatible
devices will be released that do not match the wildcard.

This is the documentation part of the commit.

Discussion Reference: https://lore.kernel.org/r/CAMuHMdVkwGjr6dJuMyhQNqFoJqbh6Ec5V2b5LenCshwpM2SDsQ@mail.gmail.com

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Thomas Kopp <thomas.kopp@microchip.com>
Link: https://lore.kernel.org/r/20200930091423.755-2-thomas.kopp@microchip.com
[mkl: rename file, too]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agodt-binding: can: mcp25xxfd: documentation fixes
Oleksij Rempel [Wed, 23 Sep 2020 12:53:01 +0000 (14:53 +0200)]
dt-binding: can: mcp25xxfd: documentation fixes

Apply following fixes:
- Use 'interrupts'. (interrupts-extended will automagically be supported
  by the tools)
- *-supply is always a single item. So, drop maxItems=1
- add "additionalProperties: false" flag to detect unneeded properties.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20200923125301.27200-1-o.rempel@pengutronix.de
Reported-by: Rob Herring <robh@kernel.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Fixes: 1b5a78e69c1f ("dt-binding: can: mcp25xxfd: document device tree bindings")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: mcp25xxfd: mcp25xxfd_irq(): add missing initialization of variable set_normal...
Marc Kleine-Budde [Wed, 23 Sep 2020 11:44:36 +0000 (13:44 +0200)]
can: mcp25xxfd: mcp25xxfd_irq(): add missing initialization of variable set_normal mode

This patch fixes the following warning:

    drivers/net/can/spi/mcp25xxfd/mcp25xxfd-core.c:2155 mcp25xxfd_irq()
    error: uninitialized symbol 'set_normal_mode'.

by adding the missing initialization.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Fixes: 55e5b97f003e ("can: mcp25xxfd: add driver for Microchip MCP25xxFD SPI CAN")
Link: https://lore.kernel.org/r/20200923114726.2704426-1-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: mcp25xxfd: mcp25xxfd_ring_free(): fix memory leak during cleanup
Dan Carpenter [Wed, 23 Sep 2020 11:27:52 +0000 (14:27 +0300)]
can: mcp25xxfd: mcp25xxfd_ring_free(): fix memory leak during cleanup

This loop doesn't free the first element of the array.  The "i > 0" has
to be changed to "i >= 0".

Fixes: 55e5b97f003e ("can: mcp25xxfd: add driver for Microchip MCP25xxFD SPI CAN")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20200923112752.GA1473821@mwanda
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: mcp25xxfd: mcp25xxfd_probe(): add SPI clk limit related errata information
Thomas Kopp [Fri, 25 Sep 2020 06:56:06 +0000 (08:56 +0200)]
can: mcp25xxfd: mcp25xxfd_probe(): add SPI clk limit related errata information

This patch adds a reference to the recent released MCP2517FD and MCP2518FD
errata sheets and paste the explanation.

The driver already implements the proposed fix.

Signed-off-by: Thomas Kopp <thomas.kopp@microchip.com>
Link: https://lore.kernel.org/r/20200925065606.358-1-thomas.kopp@microchip.com
[mkl: split into two patches, adjust subject and commit message]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: mcp25xxfd: mcp25xxfd_handle_eccif(): add ECC related errata and update log messages
Thomas Kopp [Fri, 25 Sep 2020 06:56:06 +0000 (08:56 +0200)]
can: mcp25xxfd: mcp25xxfd_handle_eccif(): add ECC related errata and update log messages

This patch adds a reference to the recent released MCP2517FD and MCP2518FD
errata sheets and paste the explanation.

The single error correction does not always work, so always indicate that a
single error occurred. If the location of the ECC error is outside of the
TX-RAM always use netdev_notice() to log the problem. For ECC errors in the
TX-RAM, there is a recovery procedure.

Signed-off-by: Thomas Kopp <thomas.kopp@microchip.com>
Link: https://lore.kernel.org/r/20200925065606.358-1-thomas.kopp@microchip.com
[mkl: split into two patches, adjust subject and commit message]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agoMerge branch 'HW-support-for-VCAP-IS1-and-ES0-in-mscc_ocelot'
David S. Miller [Wed, 30 Sep 2020 01:26:42 +0000 (18:26 -0700)]
Merge branch 'HW-support-for-VCAP-IS1-and-ES0-in-mscc_ocelot'

Vladimir Oltean says:

====================
HW support for VCAP IS1 and ES0 in mscc_ocelot

The patches from RFC series "Offload tc-flower to mscc_ocelot switch
using VCAP chains" have been split into 2:
https://patchwork.ozlabs.org/project/netdev/list/?series=204810&state=*

This is the boring part, that deals with the prerequisites, and not with
tc-flower integration. Apart from the initialization of some hardware
blocks, which at this point still don't do anything, no new
functionality is introduced.

- Key and action field offsets are defined for the supported switches.
- VCAP properties are added to the driver for the new TCAM blocks. But
  instead of adding them manually as was done for IS2, which is error
  prone, the driver is refactored to read these parameters from
  hardware, which is possible.
- Some improvements regarding the processing of struct ocelot_vcap_filter.
- Extending the code to be compatible with full and quarter keys.

This series was tested, along with other patches not yet submitted, on
the Felix and Seville switches.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mscc: ocelot: look up the filters in flower_stats() and flower_destroy()
Vladimir Oltean [Tue, 29 Sep 2020 22:27:33 +0000 (01:27 +0300)]
net: mscc: ocelot: look up the filters in flower_stats() and flower_destroy()

Currently a new filter is created, containing just enough correct
information to be able to call ocelot_vcap_block_find_filter_by_index()
on it.

This will be limiting us in the future, when we'll have more metadata
associated with a filter, which will matter in the stats() and destroy()
callbacks, and which we can't make up on the spot. For example, we'll
start "offloading" some dummy tc filter entries for the TCAM skeleton,
but we won't actually be adding them to the hardware, or to block->rules.
So, it makes sense to avoid deleting those rules too. That's the kind of
thing which is difficult to determine unless we look up the real filter.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mscc: ocelot: add a new ocelot_vcap_block_find_filter_by_id function
Vladimir Oltean [Tue, 29 Sep 2020 22:27:32 +0000 (01:27 +0300)]
net: mscc: ocelot: add a new ocelot_vcap_block_find_filter_by_id function

And rename the existing find to ocelot_vcap_block_find_filter_by_index.
The index is the position in the TCAM, and the id is the flow cookie
given by tc.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mscc: ocelot: rename variable 'cnt' in vcap_data_offset_get()
Vladimir Oltean [Tue, 29 Sep 2020 22:27:31 +0000 (01:27 +0300)]
net: mscc: ocelot: rename variable 'cnt' in vcap_data_offset_get()

The 'cnt' variable is actually used for 2 purposes, to hold the number
of sub-words per VCAP entry, and the number of sub-words per VCAP
action.

In fact, I'm pretty sure these 2 numbers can never be different from one
another. By hardware definition, the entry (key) TCAM rows are divided
into the same number of sub-words as its associated action RAM rows.
But nonetheless, let's at least rename the variables such that
observations like this one are easier to make in the future.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mscc: ocelot: rename variable 'count' in vcap_data_offset_get()
Vladimir Oltean [Tue, 29 Sep 2020 22:27:30 +0000 (01:27 +0300)]
net: mscc: ocelot: rename variable 'count' in vcap_data_offset_get()

This gets rid of one of the 2 variables named, very generically,
"count".

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mscc: ocelot: calculate vcap offsets correctly for full and quarter entries
Xiaoliang Yang [Tue, 29 Sep 2020 22:27:29 +0000 (01:27 +0300)]
net: mscc: ocelot: calculate vcap offsets correctly for full and quarter entries

When calculating the offsets for the current entry within the row and
placing them inside struct vcap_data, the function assumes half key
entry (2 keys per row).

This patch modifies the vcap_data_offset_get() function to calculate a
correct data offset when the setting VCAP Type-Group of a key to
VCAP_TG_FULL or VCAP_TG_QUARTER.

This is needed because, for example, VCAP ES0 only supports full keys.

Also rename the 'count' variable to 'num_entries_per_row' to make the
function just one tiny bit easier to follow.

Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mscc: ocelot: parse flower action before key
Vladimir Oltean [Tue, 29 Sep 2020 22:27:28 +0000 (01:27 +0300)]
net: mscc: ocelot: parse flower action before key

When we'll make the switch to multiple chain offloading, we'll want to
know first what VCAP block the rule is offloaded to. This impacts what
keys are available. Since the VCAP block is determined by what actions
are used, parse the action first.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mscc: ocelot: remove unneeded VCAP parameters for IS2
Vladimir Oltean [Tue, 29 Sep 2020 22:27:27 +0000 (01:27 +0300)]
net: mscc: ocelot: remove unneeded VCAP parameters for IS2

Now that we are deriving these from the constants exposed by the
hardware, we can delete the static info we're keeping in the driver.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mscc: ocelot: automatically detect VCAP constants
Vladimir Oltean [Tue, 29 Sep 2020 22:27:26 +0000 (01:27 +0300)]
net: mscc: ocelot: automatically detect VCAP constants

The numbers in struct vcap_props are not intuitive to derive, because
they are not a straightforward copy-and-paste from the reference manual
but instead rely on a fairly detailed level of understanding of the
layout of an entry in the TCAM and in the action RAM. For this reason,
bugs are very easy to introduce here.

Ease the work of hardware porters and read from hardware the constants
that were exported for this particular purpose. Note that this implies
that struct vcap_props can no longer be const.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mscc: ocelot: add definitions for VCAP ES0 keys, actions and target
Vladimir Oltean [Tue, 29 Sep 2020 22:27:25 +0000 (01:27 +0300)]
net: mscc: ocelot: add definitions for VCAP ES0 keys, actions and target

As a preparation step for the offloading to ES0, let's create the
infrastructure for talking with this hardware block.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mscc: ocelot: add definitions for VCAP IS1 keys, actions and target
Vladimir Oltean [Tue, 29 Sep 2020 22:27:24 +0000 (01:27 +0300)]
net: mscc: ocelot: add definitions for VCAP IS1 keys, actions and target

As a preparation step for the offloading to IS1, let's create the
infrastructure for talking with this hardware block.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mscc: ocelot: generalize existing code for VCAP
Vladimir Oltean [Tue, 29 Sep 2020 22:27:23 +0000 (01:27 +0300)]
net: mscc: ocelot: generalize existing code for VCAP

In the Ocelot switches there are 3 TCAMs: VCAP ES0, IS1 and IS2, which
have the same configuration interface, but different sets of keys and
actions. The driver currently only supports VCAP IS2.

In preparation of VCAP IS1 and ES0 support, the existing code must be
generalized to work with any VCAP.

In that direction, we should move the structures that depend upon VCAP
instantiation, like vcap_is2_keys and vcap_is2_actions, out of struct
ocelot and into struct vcap_props .keys and .actions, a structure that
is replicated 3 times, once per VCAP. We'll pass that structure as an
argument to each function that does the key and action packing - only
the control logic needs to distinguish between ocelot->vcap[VCAP_IS2]
or IS1 or ES0.

Another change is to make use of the newly introduced ocelot_target_read
and ocelot_target_write API, since the 3 VCAPs have the same registers
but put at different addresses.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mscc: ocelot: return error if VCAP filter is not found
Xiaoliang Yang [Tue, 29 Sep 2020 22:27:22 +0000 (01:27 +0300)]
net: mscc: ocelot: return error if VCAP filter is not found

Although it doesn't look like it is possible to hit these conditions
from user space, there are 2 separate, but related, issues.

First, the ocelot_vcap_block_get_filter_index function, née
ocelot_ace_rule_get_index_id prior to the aae4e500e106 ("net: mscc:
ocelot: generalize the "ACE/ACL" names") rename, does not do what the
author probably intended. If the desired filter entry is not present in
the ACL block, this function returns an index equal to the total number
of filters, instead of -1, which is maybe what was intended, judging
from the curious initialization with -1, and the "++index" idioms.
Either way, none of the callers seems to expect this behavior.

Second issue, the callers don't actually check the return value at all.
So in case the filter is not found in the rule list, propagate the
return code.

So update the callers and also take the opportunity to get rid of the
odd coding idioms that appear to work but don't.

Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mscc: ocelot: introduce a new ocelot_target_{read,write} API
Vladimir Oltean [Tue, 29 Sep 2020 22:27:21 +0000 (01:27 +0300)]
net: mscc: ocelot: introduce a new ocelot_target_{read,write} API

There are some targets (register blocks) in the Ocelot switch that are
instantiated more than once. For example, the VCAP IS1, IS2 and ES0
blocks all share the same register layout for interacting with the cache
for the TCAM and the action RAM.

For the VCAPs, the procedure for servicing them is actually common. We
just need an API specifying which VCAP we are talking to, and we do that
via these raw ocelot_target_read and ocelot_target_write accessors.

In plain ocelot_read, the target is encoded into the register enum
itself:

u16 target = reg >> TARGET_OFFSET;

For the VCAPs, the registers are currently defined like this:

enum ocelot_reg {
[...]
S2_CORE_UPDATE_CTRL = S2 << TARGET_OFFSET,
S2_CORE_MV_CFG,
S2_CACHE_ENTRY_DAT,
S2_CACHE_MASK_DAT,
S2_CACHE_ACTION_DAT,
S2_CACHE_CNT_DAT,
S2_CACHE_TG_DAT,
[...]
};

which is precisely what we want to avoid, because we'd have to duplicate
the same register map for S1 and for S0, and then figure out how to pass
VCAP instance-specific registers to the ocelot_read calls (basically
another lookup table that undoes the effect of shifting with
TARGET_OFFSET).

So for some targets, propose a more raw API, similar to what is
currently done with ocelot_port_readl and ocelot_port_writel. Those
targets can only be accessed with ocelot_target_{read,write} and not
with ocelot_{read,write} after the conversion, which is fine.

The VCAP registers are not actually modified to use this new API as of
this patch. They will be modified in the next one.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mvneta: avoid possible cache misses in mvneta_rx_swbm
Lorenzo Bianconi [Tue, 29 Sep 2020 21:58:57 +0000 (23:58 +0200)]
net: mvneta: avoid possible cache misses in mvneta_rx_swbm

Do not use rx_desc pointers if possible since rx descriptors are stored in
uncached memory and dereferencing rx_desc pointers generate extra loads.
This patch improves XDP_DROP performance of ~ 110Kpps (700Kpps vs 590Kpps)
on Marvell Espressobin

Analyzed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agolibbpf: Compile in PIC mode only for shared library case
Andrii Nakryiko [Tue, 29 Sep 2020 22:06:04 +0000 (15:06 -0700)]
libbpf: Compile in PIC mode only for shared library case

Libbpf compiles .o's for static and shared library modes separately, so no
need to specify -fPIC for both. Keep it only for shared library mode.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200929220604.833631-3-andriin@fb.com
3 years agolibbpf: Compile libbpf under -O2 level by default and catch extra warnings
Andrii Nakryiko [Tue, 29 Sep 2020 22:06:03 +0000 (15:06 -0700)]
libbpf: Compile libbpf under -O2 level by default and catch extra warnings

For some reason compiler doesn't complain about uninitialized variable, fixed
in previous patch, if libbpf is compiled without -O2 optimization level. So do
compile it with -O2 and never let similar issue slip by again. -Wall is added
unconditionally, so no need to specify it again.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200929220604.833631-2-andriin@fb.com
3 years agolibbpf: Fix uninitialized variable in btf_parse_type_sec
Andrii Nakryiko [Tue, 29 Sep 2020 22:06:02 +0000 (15:06 -0700)]
libbpf: Fix uninitialized variable in btf_parse_type_sec

Fix obvious unitialized variable use that wasn't reported by compiler. libbpf
Makefile changes to catch such errors are added separately.

Fixes: 3289959b97ca ("libbpf: Support BTF loading and raw data output in both endianness")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200929220604.833631-1-andriin@fb.com
3 years agoMerge branch 'bpf, x64: optimize JIT's pro/epilogue'
Alexei Starovoitov [Tue, 29 Sep 2020 23:47:39 +0000 (16:47 -0700)]
Merge branch 'bpf, x64: optimize JIT's pro/epilogue'

Maciej Fijalkowski says:

====================
Hi!

This small set can be considered as a followup after recent addition of
support for tailcalls in bpf subprograms and is focused on optimizing
x64 JIT prologue and epilogue sections.

Turns out the popping tail call counter is not needed anymore and %rsp
handling when stack depth is 0 can be skipped.

For longer explanations, please see commit messages.

Thank you,
Maciej
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agobpf: x64: Do not emit sub/add 0, %rsp when !stack_depth
Maciej Fijalkowski [Tue, 29 Sep 2020 20:46:53 +0000 (22:46 +0200)]
bpf: x64: Do not emit sub/add 0, %rsp when !stack_depth

There is no particular reason for keeping the "sub 0, %rsp" insn within
the BPF's x64 JIT prologue.

When tail call code was skipping the whole prologue section these 7
bytes that represent the rsp subtraction could not be simply discarded
as the jump target address would be broken. An option to address that
would be to substitute it with nop7.

Right now tail call is skipping only first 11 bytes of target program's
prologue and "sub X, %rsp" is the first insn that is processed, so if
stack depth is zero then this insn could be omitted without the need for
nop7 swap.

Therefore, do not emit the "sub 0, %rsp" in prologue when program is not
making use of R10 register. Also, make the emission of "add X, %rsp"
conditional in tail call code logic and take into account the presence
of mentioned insn when calculating the jump offsets.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200929204653.4325-3-maciej.fijalkowski@intel.com
3 years agobpf, x64: Drop "pop %rcx" instruction on BPF JIT epilogue
Maciej Fijalkowski [Tue, 29 Sep 2020 20:46:52 +0000 (22:46 +0200)]
bpf, x64: Drop "pop %rcx" instruction on BPF JIT epilogue

Back when all of the callee-saved registers where always pushed to stack
in x64 JIT prologue, tail call counter was placed at the bottom of the
BPF program's stack frame that had a following layout:

+-------------+
|  ret addr   |
+-------------+
|     rbp     | <- rbp
+-------------+
|             |
| free space  |
| from:       |
| sub $x,%rsp |
|             |
+-------------+
|     rbx     |
+-------------+
|     r13     |
+-------------+
|     r14     |
+-------------+
|     r15     |
+-------------+
|  tail call  | <- rsp
|   counter   |
+-------------+

In order to restore the callee saved registers, epilogue needed to
explicitly toss away the tail call counter via "pop %rbx" insn, so that
%rsp would be back at the place where %r15 was stored.

Currently, the tail call counter is placed on stack *before* the callee
saved registers (brackets on rbx through r15 mean that they are now
pushed to stack only if they are used):

+-------------+
|  ret addr   |
+-------------+
|     rbp     | <- rbp
+-------------+
|             |
| free space  |
| from:       |
| sub $x,%rsp |
|             |
+-------------+
|  tail call  |
|   counter   |
+-------------+
(     rbx     )
+-------------+
(     r13     )
+-------------+
(     r14     )
+-------------+
(     r15     ) <- rsp
+-------------+

For the record, the epilogue insns consist of (assuming all of the
callee saved registers are used by program):
pop    %r15
pop    %r14
pop    %r13
pop    %rbx
pop    %rcx
leaveq
retq

"pop %rbx" for getting rid of tail call counter was not an option
anymore as it would overwrite the restored value of %rbx register, so it
was changed to use the %rcx register.

Since epilogue can start popping the callee saved registers right away
without any additional work, the "pop %rcx" could be dropped altogether
as "leave" insn will simply move the %rbp to %rsp. IOW, tail call
counter does not need the explicit handling.

Having in mind the explanation above and the actual reason for that,
let's piggy back on "leave" insn for discarding the tail call counter
from stack and remove the "pop %rcx" from epilogue.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200929204653.4325-2-maciej.fijalkowski@intel.com
3 years agoselftests/bpf: Fix endianness issues in sk_lookup/ctx_narrow_access
Ilya Leoshkevich [Tue, 29 Sep 2020 20:18:14 +0000 (22:18 +0200)]
selftests/bpf: Fix endianness issues in sk_lookup/ctx_narrow_access

This test makes a lot of narrow load checks while assuming little
endian architecture, and therefore fails on s390.

Fix by introducing LSB and LSW macros and using them to perform narrow
loads.

Fixes: 0ab5539f8584 ("selftests/bpf: Tests for BPF_SK_LOOKUP attach point")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200929201814.44360-1-iii@linux.ibm.com
3 years agolib8390: Replace panic() call with BUILD_BUG_ON
Armin Wolf [Tue, 29 Sep 2020 17:13:26 +0000 (19:13 +0200)]
lib8390: Replace panic() call with BUILD_BUG_ON

Replace panic() call in lib8390.c with BUILD_BUG_ON()
since checking the size of struct e8390_pkt_hdr should
happen at compile-time.

Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'net-in_interrupt-cleanup-and-fixes'
David S. Miller [Tue, 29 Sep 2020 21:02:55 +0000 (14:02 -0700)]
Merge branch 'net-in_interrupt-cleanup-and-fixes'

Thomas Gleixner says:

====================
net: in_interrupt() cleanup and fixes

in the discussion about preempt count consistency accross kernel configurations:

  https://lore.kernel.org/r/20200914204209.256266093@linutronix.de/

Linus clearly requested that code in drivers and libraries which changes
behaviour based on execution context should either be split up so that
e.g. task context invocations and BH invocations have different interfaces
or if that's not possible the context information has to be provided by the
caller which knows in which context it is executing.

This includes conditional locking, allocation mode (GFP_*) decisions and
avoidance of code paths which might sleep.

In the long run, usage of 'preemptible, in_*irq etc.' should be banned from
driver code completely.

This is the second version of the first batch of related changes. V1 can be
found here:

     https://lore.kernel.org/r/20200927194846.045411263@linutronix.de

Changes vs. V1:

  - Rebased to net-next

  - Fixed the half done rename sillyness in the ENIC patch.

  - Fixed the IONIC driver fallout.

  - Picked up the SFC fix from Edward and adjusted the GFP_KERNEL change
    accordingly.

  - Addressed the review comments vs. BCRFMAC.

  - Collected Reviewed/Acked-by tags as appropriate.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: rtlwifi: Replace in_interrupt() for context detection
Sebastian Andrzej Siewior [Tue, 29 Sep 2020 20:25:45 +0000 (22:25 +0200)]
net: rtlwifi: Replace in_interrupt() for context detection

rtl_lps_enter() and rtl_lps_leave() are using in_interrupt() to detect
whether it is safe to acquire a mutex or if it is required to defer to a
workqueue.

The usage of in_interrupt() in drivers is phased out and Linus clearly
requested that code which changes behaviour depending on context should
either be seperated or the context be conveyed in an argument passed by the
caller, which usually knows the context.

in_interrupt() also is only partially correct because it fails to chose the
correct code path when just preemption or interrupts are disabled.

Add an argument 'may_block' to both functions and adjust the callers to
pass the context information.

The following call chains were analyzed to be safe to block:

    rtl_watchdog_wq_callback()
      rlf_lps_leave/enter()

    rtl_op_suspend()
      rtl_lps_leave()

    rtl_op_bss_info_changed()
      rtl_lps_leave()

    rtl_op_sw_scan_start()
      rtl_lps_leave()

The following call chains were analyzed to be unsafe to block:

    _rtl_pci_interrupt()
      _rtl_pci_rx_interrupt()
  rtl_lps_leave()

    _rtl_pci_interrupt()
      _rtl_pci_rx_interrupt()
        rtl_is_special_data()
  rtl_lps_leave()

    _rtl_pci_interrupt()
      _rtl_pci_rx_interrupt()
        rtl_is_special_data()
  setup_special_tx()
    rtl_lps_leave()

    _rtl_pci_interrupt()
      _rtl_pci_tx_isr
        rtl_lps_leave()

      halbtc_leave_lps()
        rtl_lps_leave()

This leaves four callers of rtl_lps_enter/leave() where the analyzis
stopped dead in the maze of several nested pointer based callchains and
lack of rtlwifi hardware to debug this via tracing:

     halbtc_leave_lps(), halbtc_enter_lps(), halbtc_normal_lps(),
     halbtc_pre_normal_lps()

These four have been cautionally marked to be unable to block which is the
safe option, but the rtwifi wizards should be able to clarify that.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: rtlwifi: Remove in_interrupt() from debug macro
Sebastian Andrzej Siewior [Tue, 29 Sep 2020 20:25:44 +0000 (22:25 +0200)]
net: rtlwifi: Remove in_interrupt() from debug macro

The usage of in_interrupt() in drivers in is phased out.

rtl_dbg() a printk based debug aid is using in_interrupt() in the
underlying C function _rtl_dbg_out() which is almost identical to
_rtl_dbg_print(). The only difference is the printout of in_interrupt().

The decoding of in_interrupt() as hexvalue is non-trivial and aside of
being phased out for driver usage the return value is just by chance the
masked preempt count value and not a boolean.

These home brewn printk debug aids are tedious to work with and provide
only minimal context.  They should be replaced by trace_printk() or a debug
tracepoint which automatically records all context information.

To make progress on the in_interrupt() cleanup, make rtl_dbg() use
_rtl_dbg_print() and remove _rtl_dbg_out().

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: rtlwifi: Remove void* casts related to delayed work
Sebastian Andrzej Siewior [Tue, 29 Sep 2020 20:25:43 +0000 (22:25 +0200)]
net: rtlwifi: Remove void* casts related to delayed work

INIT_DELAYED_WORK() takes two arguments: A pointer to the delayed work and
a function reference for the callback.

The rtl code casts all function references to (void *) because the
callbacks in use are not matching the required function signature. That's
error prone and bad pratice.

Some of the callback functions are also global, but only used in a single
file.

Clean the mess up by:

  - Adding the proper arguments to the callback functions and using them in
    the container_of() constructs correctly which removes the hideous
    container_of_dwork_rtl() macro as well.

  - Removing the type cast at the initializers

  - Making the unnecessary global functions static

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>