git.monstr.eu Git - linux-2.6-microblaze.git/log

net: dsa: mv88e6xxx: Fix sparse warnings from GENMASK

Oddly, GENMASK() requires signed bit numbers, so that it can compare
them for < 0. If passed an unsigned type, we get warnings about the
test never being true.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-dsa-warnings'

Andrew Lunn says:

====================
net: dsa: Fix C=1 W=1 warnings

Mostly not using __be16 when decoding packet contents.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: tag_qca.c: Fix warning for __be16 vs u16

net/dsa/tag_qca.c:48:15: warning: incorrect type in assignment (different base types)
net/dsa/tag_qca.c:48:15:    expected unsigned short [usertype]
net/dsa/tag_qca.c:48:15:    got restricted __be16 [usertype]
net/dsa/tag_qca.c:68:13: warning: incorrect type in assignment (different base types)
net/dsa/tag_qca.c:68:13:    expected restricted __be16 [usertype] hdr
net/dsa/tag_qca.c:68:13:    got int
net/dsa/tag_qca.c:71:16: warning: restricted __be16 degrades to integer
net/dsa/tag_qca.c:81:17: warning: restricted __be16 degrades to integer

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: tag_mtk: Fix warnings for __be16

net/dsa/tag_mtk.c:84:13: warning: incorrect type in assignment (different base types)
net/dsa/tag_mtk.c:84:13: expected restricted __be16 [usertype] hdr
net/dsa/tag_mtk.c:84:13: got int
net/dsa/tag_mtk.c:94:17: warning: restricted __be16 degrades to integer

The result of a ntohs() is not __be16, but u16.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: tag_lan9303: Fix __be16 warnings

net/dsa/tag_lan9303.c:76:24: warning: incorrect type in assignment (different base types)
net/dsa/tag_lan9303.c:76:24:    expected unsigned short [usertype]
net/dsa/tag_lan9303.c:76:24:    got restricted __be16 [usertype]
net/dsa/tag_lan9303.c:80:24: warning: incorrect type in assignment (different base types)
net/dsa/tag_lan9303.c:80:24:    expected unsigned short [usertype]
net/dsa/tag_lan9303.c:80:24:    got restricted __be16 [usertype]
net/dsa/tag_lan9303.c:106:31: warning: restricted __be16 degrades to integer
net/dsa/tag_lan9303.c:111:24: warning: cast to restricted __be16
net/dsa/tag_lan9303.c:111:24: warning: cast to restricted __be16
net/dsa/tag_lan9303.c:111:24: warning: cast to restricted __be16
net/dsa/tag_lan9303.c:111:24: warning: cast to restricted __be16

Make use of __be16 where appropriate to fix these warnings.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: tag_ksz: Fix __be16 warnings

cpu_to_be16 returns a __be16 value. So what it is assigned to needs to
have the same type to avoid warnings.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: Add __percpu property to prevent warnings

net/dsa/slave.c:505:13: warning: incorrect type in initializer (different address spaces)
net/dsa/slave.c:505:13: expected void const [noderef] <asn:3> *__vpp_verify
net/dsa/slave.c:505:13: got struct pcpu_sw_netstats *

Add the needed _percpu property to prevent this warning.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'Phylink-integration-improvements-for-Felix-DSA-driver'

Vladimir Oltean says:

====================
Phylink integration improvements for Felix DSA driver

This is an overhaul of the Felix switch driver's phylink operations.

Patches 1, 3, 4 and 5 are cleanup, patch 2 is adding a new feature and
and patch 6 is adaptation to the new format of an existing phylink API
(mac_link_up).

Changes since v2:
- Replaced "PHYLINK" with "phylink".
- Rewrote commit message of patch 5/6.

Changes since v1:
- Now using phy_clear_bits and phy_set_bits instead of plain writes to
MII_BMCR. This combines former patches 1/7 and 6/7 into a single new
patch 1/6.
- Updated commit message of patch 5/6.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: felix: use resolved link config in mac_link_up()

Phylink now requires that parameters established through
auto-negotiation be written into the MAC at the time of the
mac_link_up() callback. In the case of felix, that means taking the port
out of reset, setting the correct timers for PAUSE frames, and
enabling/disabling TX flow control.

This patch also splits the inband and noinband configuration of the
vsc9959 PCS (currently found in a function called "init") into 2
different functions, which have a nomenclature closer to phylink:
"config", for inband setup, and "link_up", for noinband (forced) setup.

This is necessary as a preparation step for giving up control of the PCS
to phylink, which will be done in further patch series.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: felix: delete .phylink_mac_an_restart code

Phylink uses the .mac_an_restart method to offer the user an
implementation of the "ethtool -r" behavior, when the media-side auto
negotiation can be restarted by the local MAC PCS. This is the case for
fiber modes 1000Base-X and 2500Base-X (IEEE clause 37) that don't have
an Ethernet PHY connected locally, and the media is connected to the MAC
PCS directly.

On the other hand, the Cisco SGMII and USXGMII standards also have an
auto negotiation mechanism based on IEEE 802.3 clause 37 (their
respective specs require a MAC PCS and a PHY PCS to implement the same
state machine, which is described in IEEE 802.3 "Auto-Negotiation Figure
37-6"), so the ability to restart auto-negotiation is intrinsically
symmetrical (the MAC PCS can do it too).

However, it appears that not all SGMII/USXGMII PHYs have logic to
restart the MDI-side auto-negotiation process when they detect a
transition of the SGMII link from data mode to configuration mode.
Some do (VSC8234) and some don't (AR8033, MV88E1111). IEEE and/or Cisco
specification wordings to not help to prove whether propagating the "AN
restart" event from MII side ("mr_restart_an") to MDI side
("mr_restart_negotiation") is required behavior - neither of them
specifies any mandatory interaction between the clause 37 AN state
machine from Figure 37-6 and the clause 28 AN state machine from Figure
28-18.

Therefore, even if a certain behavior could be proven as being required,
real-life SGMII/USXGMII PHYs are inconsistent enough that a clause 37 AN
restart cannot be used by phylink to reliably trigger a media-side
renegotiation, when the user requests it via ethtool.

The only remaining use that the .mac_an_restart callback might possibly
have, given what we know now, is to implement some silicon quirks, but
so far that has proven to not be necessary.

So remove this code for now, since it never gets called and we don't
foresee any circumstance in which it might be, either.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: felix: set proper pause frame timers based on link speed

state->speed holds a value of 10, 100, 1000 or 2500, but
SYS_MAC_FC_CFG_FC_LINK_SPEED expects a value in the range 0, 1, 2 or 3.

So set the correct speed encoding into this register.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: felix: unconditionally configure MAC speed to 1000Mbps

In VSC9959, the PCS is the one who performs rate adaptation (symbol
duplication) to the speed negotiated by the PHY. The MAC is unaware of
that and must remain configured for gigabit. If it is configured at
OCELOT_SPEED_10 or OCELOT_SPEED_100, it'll start transmitting PAUSE
frames out of control and never recover, _even if_ we then reconfigure
it at OCELOT_SPEED_1000 afterwards.

This patch fixes a bug that luckily did not have any functional impact.
We were writing 10, 100, 1000 etc into this 2-bit field in
DEV_CLOCK_CFG, but the hardware expects values in the range 0, 1, 2, 3.
So all speed values were getting truncated to 0, which is
OCELOT_SPEED_2500, and which also appears to be fine.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: felix: support half-duplex link modes

Ping tested:

  [   11.808455] mscc_felix 0000:00:00.5 swp0: Link is Up - 1Gbps/Full - flow control rx/tx
  [   11.816497] IPv6: ADDRCONF(NETDEV_CHANGE): swp0: link becomes ready

  [root@LS1028ARDB ~] # ethtool -s swp0 advertise 0x4
  [   18.844591] mscc_felix 0000:00:00.5 swp0: Link is Down
  [   22.048337] mscc_felix 0000:00:00.5 swp0: Link is Up - 100Mbps/Half - flow control off

  [root@LS1028ARDB ~] # ip addr add 192.168.1.1/24 dev swp0

  [root@LS1028ARDB ~] # ping 192.168.1.2
  PING 192.168.1.2 (192.168.1.2): 56 data bytes
  (...)
  ^C--- 192.168.1.2 ping statistics ---
  3 packets transmitted, 3 packets received, 0% packet loss
  round-trip min/avg/max = 0.383/0.611/1.051 ms

  [root@LS1028ARDB ~] # ethtool -s swp0 advertise 0x10
  [  355.637747] mscc_felix 0000:00:00.5 swp0: Link is Down
  [  358.788034] mscc_felix 0000:00:00.5 swp0: Link is Up - 1Gbps/Half - flow control off

  [root@LS1028ARDB ~] # ping 192.168.1.2
  PING 192.168.1.2 (192.168.1.2): 56 data bytes
  (...)
  ^C
  --- 192.168.1.2 ping statistics ---
  16 packets transmitted, 16 packets received, 0% packet loss
  round-trip min/avg/max = 0.301/0.384/1.138 ms

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: felix: clarify the intention of writes to MII_BMCR

The driver appears to write to BMCR_SPEED and BMCR_DUPLEX, fields which
are read-only, since they are actually configured through the
vendor-specific IF_MODE (0x14) register.

But the reason we're writing back the read-only values of MII_BMCR is to
alter these writable fields:

BMCR_RESET
BMCR_LOOPBACK
BMCR_ANENABLE
BMCR_PDOWN
BMCR_ISOLATE
BMCR_ANRESTART

In particular, the only field which is really relevant to this driver is
BMCR_ANENABLE. Clarify that intention by spelling it out, using
phy_set_bits and phy_clear_bits.

The driver also made a few writes to BMCR_RESET and BMCR_ANRESTART which
are unnecessary and may temporarily disrupt the link to the PHY. Remove
them.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'qlogic-use-generic-power-management'

Vaibhav Gupta says:

====================
qlogic: use generic power management

Linux Kernel Mentee: Remove Legacy Power Management.

The purpose of this patch series is to remove legacy power management callbacks
from qlogic ethernet drivers.

The callbacks performing suspend() and resume() operations are still calling
pci_save_state(), pci_set_power_state(), etc. and handling the power management
themselves, which is not recommended.

The conversion requires the removal of the those function calls and change the
callback definition accordingly and make use of dev_pm_ops structure.

All patches are compile-tested only.

V2: Fix unused variable warning in v1.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

qlcninc: use generic power management

With legacy PM, drivers themselves were responsible for managing the
device's power states and taking care of register states. And they use PCI
helper functions to do it.

After upgrading to the generic structure, PCI core will take care of
required tasks and drivers should do only device-specific operations.

.suspend() calls __qlcnic_shutdown, which then calls qlcnic_82xx_shutdown;
.resume() calls __qlcnic_resume, which then calls qlcnic_82xx_resume;

Both ...82xx..() are define in
drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.c and are used only in
drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c.

Hence upgrade them and remove PCI function calls, like pci_save_state() and
pci_enable_wake(), inside them

Compile-tested only.

Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netxen_nic: use generic power management

With legacy PM, drivers themselves were responsible for managing the
device's power states and takes care of register states. And they use PCI
helper functions to do it.

After upgrading to the generic structure, PCI core will take care of
required tasks and drivers should do only device-specific operations.

In this driver:
netxen_nic_resume() calls netxen_nic_attach_func() which then invokes PCI
helper functions like pci_enable_device(), pci_set_power_state() and
pci_restore_state(). Other function:
- netxen_io_slot_reset()
also calls netxen_nic_attach_func().

Also, netxen_io_slot_reset() returns specific value based on the return value
of netxen_nic_attach_func() as whole. Thus, cannot simply move some piece of
code from netxen_nic_attach_func() to it.

Hence, define a new function netxen_nic_attach_late_func() to do the tasks
which has to be done after PCI helper functions have done their job.

Now, netxen_nic_attach_func() invokes netxen_nic_attach_late_func(), thus
netxen_io_slot_reset() behaves normally.
And, netxen_nic_resume() calls netxen_nic_attach_late_func() to avoid PCI
helper functions calls.

Compile-tested only.

Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: microchip: remove unused private members

Private structure members live_ports, on_ports, rx_ports, tx_ports are
initialized but not used anywhere. Let's remove them.

Suggested-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Codrin Ciubotariu <codrin.ciubotariu@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: microchip: split adjust_link() in phylink_mac_link_{up|down}()

The DSA subsystem moved to phylink and adjust_link() became deprecated in
the process. This patch removes adjust_link from the KSZ DSA switches and
adds phylink_mac_link_up() and phylink_mac_link_down().

Signed-off-by: Codrin Ciubotariu <codrin.ciubotariu@microchip.com>
Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mptcp-add-REUSEADDR-REUSEPORT-V6ONLY-setsockopt-support'

Florian Westphal says:

====================
mptcp: add REUSEADDR/REUSEPORT/V6ONLY setsockopt support

restarting an mptcp-patched sshd yields following error:

  sshd: error: Bind to port 22 on 0.0.0.0 failed: Address already in use.
  sshd: error: setsockopt IPV6_V6ONLY: Operation not supported
  sshd: error: Bind to port 22 on :: failed: Address already in use.
  sshd: fatal: Cannot bind any address.

This series adds support for the needed setsockopts:

First patch skips the generic SOL_SOCKET handler for MPTCP:
in mptcp case, the setsockopt needs to alter the tcp socket, not the mptcp
parent socket.

Second patch adds minimal SOL_SOCKET support: REUSEPORT and REUSEADDR.
Rest is still handled by the generic SOL_SOCKET code.

Last patch adds IPV6ONLY support.  This makes ipv6 work for openssh:
It creates two listening sockets, before this patch, binding the ipv6
socket will fail because the port is already bound by the ipv4 one.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mptcp: support IPV6_V6ONLY setsockopt

Without this, Opensshd fails to open an ipv6 socket listening
socket:
error: setsockopt IPV6_V6ONLY: Operation not supported
error: Bind to port 22 on :: failed: Address already in use.

Opensshd opens an ipv4 and and ipv6 listening socket, but because
IPV6_V6ONLY setsockopt fails, the port number is already in use.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

mptcp: add REUSEADDR/REUSEPORT support

This will e.g. make 'sshd restart' work when MPTCP is used, as we will
now set this option on the listener socket instead of only the mptcp
socket (where it has no effect).

We still need to copy the setting to the master socket so that a
subsequent getsockopt() returns the expected value.

Reported-by: Christoph Paasch <cpaasch@apple.com>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: use mptcp setsockopt function for SOL_SOCKET on mptcp sockets

setsockopt(mptcp_fd, SOL_SOCKET, ...)... appears to work (returns 0),
but it has no effect -- this is because the MPTCP layer never has a
chance to copy the settings to the subflow socket.

Skip the generic handling for the mptcp case and instead call the
mptcp specific handler instead for SOL_SOCKET too.

Next patch adds more specific handling for SOL_SOCKET to mptcp.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests/net: update initializer syntax to use c99 designators

Before, clang version 9 threw errors such as: error:
use of GNU old-style field designator extension [-Werror,-Wgnu-designator]
                { tstamp: true, swtstamp: true }
                  ^~~~~~~
                  .tstamp =
Fix these warnings in tools/testing/selftests/net in the same manner as
commit 121e357ac728 ("selftests/harness: Update named initializer syntax").
N.B. rxtimestamp.c is the only affected file in the directory.

Signed-off-by: Tanner Love <tannerlove@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bnx2x-Perform-IdleChk-dump'

Sudarsana Reddy Kalluru says:

====================
bnx2x: Perform IdleChk dump.

Idlechk test verifies that the chip is in idle state. If there are any
errors, Idlechk dump would capture the same. This data will help in
debugging the device related issues.
The patch series adds driver support for dumping IdleChk data during the
debug dump collection.
Patch (1) adds register definitions required in this implementation.
Patch (2) adds the implementation for Idlechk tests.
Patch (3) adds driver changes to invoke Idlechk implementation.

Changes from previous version:
-------------------------------
v3: Combined the test data creation and implementation to a single patch.
v2: Addressed issues reported by kernel test robot.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Perform Idlechk dump during the debug collection.

The patch adds driver changes to perform Idlechk dump during the debug
data collection.

Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Add support for idlechk tests.

This patch populates a database of idlechk tests (registers and
predicates) and performs the idlechk using this data.

Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Add Idlechk related register definitions.

The patch adds register definitions required for Idlechk implementation.

Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2020-07-04

The following pull-request contains BPF updates for your *net-next* tree.

We've added 73 non-merge commits during the last 17 day(s) which contain
a total of 106 files changed, 5233 insertions(+), 1283 deletions(-).

The main changes are:

1) bpftool ability to show PIDs of processes having open file descriptors
   for BPF map/program/link/BTF objects, relying on BPF iterator progs
   to extract this info efficiently, from Andrii Nakryiko.

2) Addition of BPF iterator progs for dumping TCP and UDP sockets to
   seq_files, from Yonghong Song.

3) Support access to BPF map fields in struct bpf_map from programs
   through BTF struct access, from Andrey Ignatov.

4) Add a bpf_get_task_stack() helper to be able to dump /proc/*/stack
   via seq_file from BPF iterator progs, from Song Liu.

5) Make SO_KEEPALIVE and related options available to bpf_setsockopt()
   helper, from Dmitry Yakunin.

6) Optimize BPF sk_storage selection of its caching index, from Martin
   KaFai Lau.

7) Removal of redundant synchronize_rcu()s from BPF map destruction which
   has been a historic leftover, from Alexei Starovoitov.

8) Several improvements to test_progs to make it easier to create a shell
   loop that invokes each test individually which is useful for some CIs,
   from Jesper Dangaard Brouer.

9) Fix bpftool prog dump segfault when compiled without skeleton code on
   older clang versions, from John Fastabend.

10) Bunch of cleanups and minor improvements, from various others.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mvpp2-XDP-support'

Matteo Croce says:

====================
mvpp2: XDP support

Add XDP support to mvpp2. This series converts the driver to the
page_pool API for RX buffer management, and adds native XDP support.

XDP support comes with extack error reporting and statistics as well.

These are the performance numbers, as measured by Sven:

SKB fwd page pool:
Rx bps     390.38 Mbps
Rx pps     762.46 Kpps

XDP fwd:
Rx bps     1.39 Gbps
Rx pps     2.72 Mpps

XDP Drop:
eth0: 12.9 Mpps
eth1: 4.1 Mpps
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mvpp2: xdp ethtool stats

Add ethtool statistics for XDP.

Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

mvpp2: XDP TX support

Add the transmit part of XDP support, which includes:
- support for XDP_TX in mvpp2_xdp()
- .ndo_xdp_xmit hook for AF_XDP and XDP_REDIRECT with mvpp2 as destination

mvpp2_xdp_submit_frame() is a generic function which is called by
mvpp2_xdp_xmit_back() when doing XDP_TX, and by mvpp2_xdp_xmit when
doing AF_XDP or XDP_REDIRECT target.

The buffer allocation has been reworked to be able to map the buffers
as DMA_FROM_DEVICE or DMA_BIDIRECTIONAL depending if native XDP is
in use or not.

Co-developed-by: Sven Auhagen <sven.auhagen@voleatech.de>
Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de>
Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mvpp2: add basic XDP support

Add XDP native support.
By now only XDP_DROP, XDP_PASS and XDP_REDIRECT
verdicts are supported.

Co-developed-by: Sven Auhagen <sven.auhagen@voleatech.de>
Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de>
Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mvpp2: use page_pool allocator

Use the page_pool API for memory management.
This is a prerequisite for native XDP support.

Tested-by: Sven Auhagen <sven.auhagen@voleatech.de>
Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mvpp2: refactor BM pool init percpu code

In mvpp2_swf_bm_pool_init_percpu(), a reference to a struct
mvpp2_bm_pool is obtained traversing multiple structs, when a
local variable already points to the same object.

Fix it and, while at it, give the variable a meaningful name.

Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests/net: add ipv6 test coverage in rxtimestamp test

Add the options --ipv4, --ipv6 to specify running over ipv4 and/or
ipv6. If neither is specified, then run both.

Signed-off-by: Tanner Love <tannerlove@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-ipa-fix-HOLB-timer-register-use'

Alex Elder says:

====================
net: ipa: fix HOLB timer register use

The function ipa_reg_init_hol_block_timer_val() generates the value
to write into the HOL_BLOCK_TIMER endpoint configuration register,
to represent a given timeout value (in microseconds). It only
supports a timer value of 0 though, in part because that's
sufficient, but mainly because there was some confusion about
how the register is formatted in newer hardware.

I got clarification about the register format, so this series fixes
ipa_reg_init_hol_block_timer_val() to work for any supported delay
value.

The delay is based on the IPA core clock, so determining the value
to write for a given period requires access to the current core
clock rate. So the first patch just creates a new function to
provide that.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: fix HOLB timer calculation

For IPA v4.2, the exact interpretation of the register that defines
the timeout for avoiding head-of-line blocking was a little unclear.
We're only assigning a 0 timeout to it right now, so that wasn't
very important.  But now that I know how it's supposed to work, I'm
fixing it.

The register represents a tick counter, where each tick is equal to
128 IPA core clock cycles.  For IPA v3.5.1, the register contains
a simple counter value.  But for IPA v4.2, the register contains two
fields, base and scale, which approximate the tick counter as:
    ticks = base << scale
The base and scale values to use for a given tick count are computed
using clever bit operations, and measures are taken to make the
resulting time period as close as possible to that requested.

There's no need for ipa_endpoint_init_hol_block_timer() to return
an error, so change its return type to void.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: introduce ipa_clock_rate()

Create a new function that returns the current rate of the IPA core
clock.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6/ping: set skb->mark on icmpv6 sockets

IPv6 ping sockets route based on fwmark, but do not yet set skb->mark.
Add this. IPv4 ping sockets also do both.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests/bpf: Fix compilation error of bpf_iter_task_stack.c

BPF selftests show a compilation error as follows:

libbpf: invalid relo for 'entries' in special section 0xfff2; forgot to
initialize global var?..

Fix it by initializing 'entries' to zeros.

Fixes: c7568114bc56 ("selftests/bpf: Add bpf_iter test with bpf_get_task_stack()")
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200703181719.3747072-1-songliubraving@fb.com

bpf: Fix bpftool without skeleton code enabled

Fix segfault from bpftool by adding emit_obj_refs_plain when skeleton
code is disabled.

Tested by deleting BUILD_BPF_SKELS in Makefile. We found this doing
backports for Cilium when a testing image pulled in latest bpf-next
bpftool, but kept using an older clang-7.

  # ./bpftool prog show
  Error: bpftool built without PID iterator support
  3: cgroup_skb  tag 7be49e3934a125ba  gpl
          loaded_at 2020-07-01T08:01:29-0700  uid 0
  Segmentation fault

Fixes: d53dee3fe013 ("tools/bpftool: Show info for processes holding BPF map/prog/link/btf FDs")
Reported-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/159375071997.14984.17404504293832961401.stgit@john-XPS-13-9370

net/xen-netfront: add kernel TX timestamps

This adds kernel TX timestamps to the xen-netfront driver. Tested with chrony
on an AWS EC2 instance.

Signed-off-by: Daniel Drown <dan-netdev@drown.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: Allow changing carrier from user-space

The GENET driver interfaces with internal MoCA interface as well as
external MoCA chips like the BCM6802/6803 through a fixed link
interface. It is desirable for the mocad user-space daemon to be able to
control the carrier state based upon out of band messages that it
receives from the MoCA chip.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'mlx5-updates-2020-07-02' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2020-07-02

Rx and Tx devlink health reporters enhancements.

1) Code cleanup
2) devlink output format improvements
3) Print more useful info on devlink health diagnose output
4) TX timeout recovery, on a single SQ recover failure, stop the loop
and reset all rings (re-open netdev).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: Enhance TX timeout recovery

Upon a TX timeout handle, if the TX reporter was not able to recover
from the error, reopen the channels. If tried to reopen channels, do not
loop over TX queues for timeout.

With that, the reporters state and separation will better
expose the driver's state.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Enhance ICOSQ data on RX reporter's diagnose

When the RQ is in striding RQ mode, it uses the ICOSQ as a helper queue.
In this mode, RX reporter dumps more info about the ICOSQ and its
related CQ.

$ devlink health diagnose pci/0000:00:0b.0 reporter rx
Common config:
    RQ:
      type: 2 stride size: 2048 size: 8
      CQ:
        stride size: 64 size: 1024
RQs:
    channel ix: 0 rqn: 2413 HW state: 1 SW state: 5 WQE counter: 7 posted WQEs: 7 cc: 7
    CQ:
      cqn: 1032 HW status: 0 ci: 0 size: 1024
    EQ:
      eqn: 7 irqn: 42 vecidx: 1 ci: 93 size: 2048
    ICOSQ:
      sqn: 2411 HW state: 1 cc: 74 pc: 74 WQE size: 128
      CQ:
        cqn: 1029 cc: 8 size: 128
    channel ix: 1 rqn: 2418 HW state: 1 SW state: 5 WQE counter: 7 posted WQEs: 7 cc: 7
    CQ:
      cqn: 1036 HW status: 0 ci: 0 size: 1024
    EQ:
      eqn: 8 irqn: 43 vecidx: 2 ci: 2 size: 2048
    ICOSQ:
      sqn: 2416 HW state: 1 cc: 74 pc: 74 WQE size: 128
      CQ:
        cqn: 1033 cc: 8 size: 128

Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Add EQ info to TX/RX reporter's diagnose

Enhance TX/RX reporter's diagnose to include info about the
corresponding EQ.

$ devlink health diagnose pci/0000:00:0b.0 reporter rx
Common config:
    RQ:
      type: 2 stride size: 2048 size: 8
      CQ:
        stride size: 64 size: 1024
RQs:
    channel ix: 0 rqn: 1713 HW state: 1 SW state: 5 WQE counter: 7 posted WQEs: 7 cc: 7 ICOSQ HW state: 1
     CQ:
       cqn: 1032 HW status: 0 ci: 0 size: 1024
     EQ:
       eqn: 7 irqn: 42 vecidx: 1 ci: 93 size: 2048
     channel ix: 1 rqn: 1718 HW state: 1 SW state: 5 WQE counter: 7 posted WQEs: 7 cc: 7 ICOSQ HW state: 1
     CQ:
       cqn: 1036 HW status: 0 ci: 0 size: 1024
     EQ:
       eqn: 8 irqn: 43 vecidx: 2 ci: 2 size: 2048

$ devlink health diagnose pci/0000:00:0b.0 reporter tx
Common Config:
    SQ:
      stride size: 64 size: 1024
      CQ:
        stride size: 64 size: 1024
SQs:
   channel ix: 0 tc: 0 txq ix: 0 sqn: 1712 HW state: 1 stopped: false cc: 91 pc: 91
   CQ:
     cqn: 1030 HW status: 0 ci: 91 size: 1024
   EQ:
     eqn: 7 irqn: 42 vecidx: 1 ci: 93 size: 2048
   channel ix: 1 tc: 0 txq ix: 1 sqn: 1717 HW state: 1 stopped: false cc: 0 pc: 0
   CQ:
     cqn: 1034 HW status: 0 ci: 0 size: 1024
   EQ:
     eqn: 8 irqn: 43 vecidx: 2 ci: 2 size: 2048

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Enhance CQ data on diagnose output

Add CQ's consumer index and size to the CQ's diagnose output retruved on
RX/TX reporter diadgnose.

$ devlink health diagnose pci/0000:00:0b.0 reporter rx
Common config:
    RQ:
      type: 2 stride size: 2048 size: 8
      CQ:
        stride size: 64 size: 1024
RQs:
    channel ix: 0 rqn: 2413 HW state: 1 SW state: 5 WQE counter: 7 posted WQEs: 7 cc: 7 ICOSQ HW state: 1
    CQ:
      cqn: 1032 HW status: 0 ci: 0 size: 1024
    channel ix: 1 rqn: 2418 HW state: 1 SW state: 5 WQE counter: 7 posted WQEs: 7 cc: 7 ICOSQ HW state: 1
    CQ:
      cqn: 1036 HW status: 0 ci: 0 size: 1024

$ devlink health diagnose pci/0000:00:0b.0 reporter tx
Common Config:
    SQ:
      stride size: 64 size: 1024
      CQ:
        stride size: 64 size: 1024
SQs:
    channel ix: 0 tc: 0 txq ix: 0 sqn: 2412 HW state: 1 stopped: false cc: 0 pc: 0
    CQ:
      cqn: 1030 HW status: 0 ci: 0 size: 1024
    channel ix: 1 tc: 0 txq ix: 1 sqn: 2417 HW state: 1 stopped: false cc: 5 pc: 5
    CQ:
      cqn: 1034 HW status: 0 ci: 5 size: 1024

Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Rename reporter's helpers

Change prefix to match resident file:
%s/mlx5e_reporter_cq_diagnose/mlx5e_health_cq_diag_fmsg
%s/mlx5e_reporter_cq_common_diagnose/mlx5e_health_cq_common_diag_fmsg
%s/mlx5e_reporter_named_obj_nest_start/mlx5e_health_fmsg_named_obj_nest_start
%s/mlx5e_reporter_named_obj_nest_end/mlx5e_health_fmsg_named_obj_nest_end

Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Add helper to get the RQ WQE counter

Add a helper which retrieves the RQ's WQE counter. Use this helper in
the RX reporter diagnose callback.

$ devlink health diagnose pci/0000:00:0b.0 reporter rx
Common config:
  RQ:
     type: 2 stride size: 2048 size: 8
     CQ:
      stride size: 64 size: 1024
RQs:
   channel ix: 0 rqn: 2113 HW state: 1 SW state: 5 WQE counter: 7 posted WQEs: 7 cc: 7 ICOSQ HW state: 1
   CQ:
    cqn: 1032 HW status: 0
   channel ix: 1 rqn: 2118 HW state: 1 SW state: 5 WQE counter: 7 posted WQEs: 7 cc: 7 ICOSQ HW state: 1
   CQ:
    cqn: 1036 HW status: 0

Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Add helper to get RQ WQE's head

Add helper which retrieves the RQ WQE's head. Use this helper in RX
reporter diagnose callback.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Move RQ helpers to txrx.h

Use txrx.h to contain helper function regarding TX/RX. In the coming
patches, I will add more RQ helpers.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Align RX/TX reporters diagnose output format

Change the hierarchy of the RX reporter 'Common config' in the diagnose
output to match the 'Common config' of the TX reporter which reflects
that CQ is a helper to the traffic queues.

Before:
$ devlink health diagnose pci/0000:00:0b.0 reporter rx
Common config:
    RQ:
      type: 2 stride size: 2048 size: 8
    CQ:
      stride size: 64 size: 1024
    RQs:
    ...

After:
$ devlink health diagnose pci/0000:00:0b.0 reporter rx
Common config:
    RQ:
      type: 2 stride size: 2048 size: 8
      CQ:
        stride size: 64 size: 1024
    RQs:
    ...

Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Remove redundant RQ state query

When received a CQE error, the driver inspect the syndrome given by the
firmware. RQ recovery is initiated only as a result of a fatal syndrome;
syndrome which set the RQ into an error state. Hence no need to query
the RQ state at the beginning of the recovery process. Add additional
debug prints before recovering.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Add a flush timeout define

During queue's recovery, driver waits for flush. The flush timeout is
set to 2 seconds. Add a define for this value for the benefit of RX and
TX reporters.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Change reporters create functions to return void

Creation of devlink health reporters is not fatal for mlx5e instance load.
In case of error in reporter's creation, the return value is ignored.
Change all reporters creation functions to return void.

In addition, with this change, a failure in creating a reporter, will not
prevent the driver from trying to create the next reporter in the list.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

bpf: Fix build without CONFIG_STACKTRACE

Without CONFIG_STACKTRACE stack_trace_save_tsk() is not defined. Let
get_callchain_entry_for_task() to always return NULL in such cases.

Fixes: fa28dcb82a38 ("bpf: Introduce helper bpf_get_task_stack()")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200703024537.79971-1-songliubraving@fb.com

Merge branch 'sfc-prerequisites-for-EF100-driver-part-3'

Edward Cree says:

====================
sfc: prerequisites for EF100 driver, part 3

Continuing on from [1] and [2], this series assembles the last pieces
of the common codebase that will be used by the forthcoming EF100
driver.
Patch #1 also adds a minor feature to EF10 (setting MTU on VFs) since
EF10 supports the same MCDI extension which that feature will use on
EF100.
Patches #5 & #7, while they should have no externally-visible effect
on driver functionality, change how that functionality is implemented
and how the driver represents TXQ configuration internally, so are
not mere cleanup/refactoring like most of these prerequisites have
(from the perspective of the existing sfc driver) been.

Changes in v2:
* Patch #1: use efx_mcdi_set_mtu() directly, instead of as a fallback,
in the mtu_only case (Jakub)
* Patch #3: fix symbol collision in non-modular builds by renaming
interrupt_mode to efx_interrupt_mode (kernel test robot)
* Patch #6: check for failure of netif_set_real_num_[tr]x_queues (Jakub)
* Patch #12: cleaner solution for ethtool drvinfo (Jakub, David)

[1]: https://lore.kernel.org/netdev/20200629.173812.1532344417590172093.davem@davemloft.net/T/
[2]: https://lore.kernel.org/netdev/20200630.130923.402514193016248355.davem@davemloft.net/T/
====================

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc_ef100: helper function to set default RSS table of given size

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc_ef100: NVRAM selftest support code

We have yet another new scheme for NVRAM, and a corresponding new MCDI.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc_ef100: populate BUFFER_SIZE_BYTES in INIT_RXQ

The QDMA subsystem on EF100 needs this information.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc_ef100: add EF100 to NIC-revision enumeration

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: get drvinfo driver name from outside the common code

Since ethtool_common.o will be built into both sfc and sfc_ef100 drivers,
it can't use KBUILD_MODNAME directly. Instead, make it reference a
string provided by the individual driver code.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: initialise RSS context ID to 'no RSS context' in efx_init_struct()

Previously this was only happening in ef10-specific code.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: commonise efx_fini_dmaq

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: factor out efx_mcdi_filter_table_down() from _remove()

_down() merely removes all our filters and VLANs, it doesn't free
efx->filter_state itself.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: don't call tx_limit_len if NIC type doesn't have one

EF100 doesn't need to split up large DMAs.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: assign TXQs without gaps

Since we only allocate VIs for the number of TXQs we actually need, we
cannot naively use "channel * TXQ_TYPES + txq" for the TXQ number, as
this has gaps (when efx->tx_queues_per_channel < EFX_TXQ_TYPES) and
thus overruns the driver's VI allocations, causing the firmware to
reject the MC_CMD_INIT_TXQ based on INSTANCE.
Thus, we distinguish INSTANCE (stored in tx_queue->queue) from LABEL
(tx_queue->label); the former is allocated starting from 0 in
efx_set_channels(), while the latter is simply the txq type (index in
channel->tx_queue array).
To simplify things, rather than changing tx_queues_per_channel after
setting up TXQs, make Siena always probe its HIGHPRI queues at start
of day, rather than deferring it until tc mqprio enables them.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: commonise netif_set_real_num[tr]x_queues calls

While we're at it, also check them for failure.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: make tx_queues_per_channel variable at runtime

Siena needs four TX queues (csum * highpri), EF10 needs two (csum),
and EF100 only needs one (as checksumming is controlled entirely by
the transmit descriptor). Rather than having various bits of ad-hoc
code to decide which queues to set up etc., put the knowledge of how
many TXQs a channel has in one place.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: move modparam 'rss_cpus' out of common channel code

Instead of exposing this old module parameter on the new driver (thus
having to keep it forever after for compatibility), let's confine it
to the old one; if we find later that we need the feature, we ought
to support it properly, with ethtool set-channels.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: move modparam 'interrupt_mode' out of common channel code

EF100 only supports MSI-X, so there's no need for the new driver to
expose this old module parameter.
Since it's now visible to the linker, we have to rename it internally
to efx_interrupt_mode to avoid symbol collisions in non-modular
builds.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: remove max_interrupt_mode

All NICs supported by this driver are capable of MSI-X interrupts (only
Falcon A1 wasn't, and that's now hived off into its own driver), so no
need for a nic-type parameter. Besides, the code that checked it was
buggy anyway (the following assignment that checked min_interrupt_mode
overrode it).

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: support setting MTU even if not privileged to configure MAC fully

Unprivileged functions (such as VFs) may set their MTU by use of the
'control' field of MC_CMD_SET_MAC_EXT, as used in efx_mcdi_set_mtu().
If calling efx_ef10_mac_reconfigure() from efx_change_mtu(), and the
NIC supports the above (SET_MAC_ENHANCED capability), use it rather
than efx_mcdi_set_mac().

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xen-netfront: remove redundant assignment to variable 'act'

The variable act is being initialized with a value that is
never read and it is being updated later with a new value. The
initialization is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-ipa-simplify-endpoint-programming'

Alex Elder says:

====================
net: ipa: simplify endpoint programming

Add tests to functions so they don't update undefined endpoint
registers, rather than requiring the caller to avoid calling them.

Move the call to a workaround function required when suspending
inside the function that puts an endpoint into suspend mode. This
requires moving a few functions (which are otherwise unchanged).

Then simplify ipa_endpoint_program() to call essentially all
endpoint register update functions unconditionally.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: simplify ipa_endpoint_program()

Have functions that write endpoint configuration registers return
immediately if they are not valid for the direction of transfer for
the endpoint. This allows most of the calls in ipa_endpoint_program()
to be made unconditionally. Reorder the register writes to match
the order of their definition (based on offset).

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: move version test inside ipa_endpoint_program_suspend()

IPA version 4.0+ does not support endpoint suspend. Put a test at
the top of ipa_endpoint_program_suspend() that returns immediately
if suspend is not supported rather than making that check in the caller.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: always handle suspend workaround

IPA version 3.5.1 has a hardware quirk that requires special
handling if an RX endpoint is suspended while aggregation is active.
This handling is implemented by ipa_endpoint_suspend_aggr().

Have ipa_endpoint_program_suspend() be responsible for calling
ipa_endpoint_suspend_aggr() if suspend mode is being enabled on
an endpoint. If the endpoint does not support aggregation, or if
aggregation isn't active, this call will continue to have no effect.

Move the definition of ipa_endpoint_suspend_aggr() up in the file so
its definition precedes the new earlier reference to it. This
requires ipa_endpoint_aggr_active() and ipa_endpoint_force_close()
to be moved as well.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: move version test inside ipa_endpoint_program_delay()

IPA version 4.2 has a hardware quirk that affects endpoint delay
mode, so it isn't used there. Isolate the test that avoids using
delay mode for that version inside ipa_endpoint_program_delay(),
rather than making that check in the caller.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlx4: Mark PM functions as __maybe_unused

In certain configurations without power management support, the
following warnings happen:

drivers/net/ethernet/mellanox/mlx4/main.c:4388:12:
warning: 'mlx4_resume' defined but not used [-Wunused-function]
4388 | static int mlx4_resume(struct device *dev_d)
| ^~~~~~~~~~~
drivers/net/ethernet/mellanox/mlx4/main.c:4373:12: warning:
'mlx4_suspend' defined but not used [-Wunused-function]
4373 | static int mlx4_suspend(struct device *dev_d)
| ^~~~~~~~~~~~

Mark these functions as __maybe_unused to make it clear to the
compiler that this is going to happen based on the configuration,
which is the standard for these types of functions.

Fixes: 0e3e206a3e12 ("mlx4: use generic power management")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ksz884x: mark pcidev_suspend() as __maybe_unused

In certain configurations without power management support, gcc report
the following warning:

drivers/net/ethernet/micrel/ksz884x.c:7182:12: warning:
'pcidev_suspend' defined but not used [-Wunused-function]
7182 | static int pcidev_suspend(struct device *dev_d)
| ^~~~~~~~~~~~~~

Mark pcidev_suspend() as __maybe_unused to make it clear.

Fixes: 64120615d140 ("ksz884x: use generic power management")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-macb-few-code-cleanups'

Claudiu Beznea says:

====================
net: macb: few code cleanups

Patches in this series cleanup a bit macb code.

Changes in v2:
- in patch 2/4 use hweight32() instead of hweight_long()
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: macb: remove is_udp variable

Remove is_udp variable that is used in only one place and use
ip_hdr(skb)->protocol == IPPROTO_UDP check instead.

Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: macb: do not initialize queue variable

Do not initialize queue variable. It is already initialized in for loops.

Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: macb: use hweight32() to count set bits in queue_mask

Use hweight32() to count set bits in queue_mask.

Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: macb: do not set again bit 0 of queue_mask

Bit 0 of queue_mask is set at the beginning of
macb_probe_queues() function. Do not set it again after reading
DGFG6 but instead use "|=" operator.

Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bridge-mrp-Add-support-for-getting-the-status'

Horatiu Vultur says:

====================
bridge: mrp: Add support for getting the status

This patch series extends the MRP netlink interface to allow the userspace
daemon to get the status of the MRP instances in the kernel.

v3:
  - remove misleading comment
  - fix to use correctly the RCU

v2:
  - fix sparse warnings
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: Extend br_fill_ifinfo to return MPR status

This patch extends the function br_fill_ifinfo to return also the MRP
status for each instance on a bridge. It also adds a new filter
RTEXT_FILTER_MRP to return the MRP status only when this is set, not to
interfer with the vlans. The MRP status is return only on the bridge
interfaces.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: mrp: Add br_mrp_fill_info

Add the function br_mrp_fill_info which populates the MRP attributes
regarding the status of each MRP instance.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: uapi: mrp: Extend MRP attributes to get the status

Add MRP attribute IFLA_BRIDGE_MRP_INFO to allow the userspace to get the
current state of the MRP instances. This is a nested attribute that
contains other attributes like, ring id, index of primary and secondary
port, priority, ring state, ring role.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: selftests: Restore netns after each test

It is common for networking tests creating its netns and making its own
setting under this new netns (e.g. changing tcp sysctl).  If the test
forgot to restore to the original netns, it would affect the
result of other tests.

This patch saves the original netns at the beginning and then restores it
after every test.  Since the restore "setns()" is not expensive, it does it
on all tests without tracking if a test has created a new netns or not.

The new restore_netns() could also be done in test__end_subtest() such
that each subtest will get an automatic netns reset.  However,
the individual test would lose flexibility to have total control
on netns for its own subtests.  In some cases, forcing a test to do
unnecessary netns re-configure for each subtest is time consuming.
e.g. In my vm, forcing netns re-configure on each subtest in sk_assign.c
increased the runtime from 1s to 8s.  On top of that,  test_progs.c
is also doing per-test (instead of per-subtest) cleanup for cgroup.
Thus, this patch also does per-test restore_netns().  The only existing
per-subtest cleanup is reset_affinity() and no test is depending on this.
Thus, it is removed from test__end_subtest() to give a consistent
expectation to the individual tests.  test_progs.c only ensures
any affinity/netns/cgroup change made by an earlier test does not
affect the following tests.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200702004858.2103728-1-kafai@fb.com

bpf: selftests: A few improvements to network_helpers.c

This patch makes a few changes to the network_helpers.c

1) Enforce SO_RCVTIMEO and SO_SNDTIMEO
   This patch enforces timeout to the network fds through setsockopt
   SO_RCVTIMEO and SO_SNDTIMEO.

   It will remove the need for SOCK_NONBLOCK that requires a more demanding
   timeout logic with epoll/select, e.g. epoll_create, epoll_ctrl, and
   then epoll_wait for timeout.

   That removes the need for connect_wait() from the
   cgroup_skb_sk_lookup.c. The needed change is made in
   cgroup_skb_sk_lookup.c.

2) start_server():
   Add optional addr_str and port to start_server().
   That removes the need of the start_server_with_port().  The caller
   can pass addr_str==NULL and/or port==0.

   I have a future tcp-hdr-opt test that will pass a non-NULL addr_str
   and it is in general useful for other future tests.

   "int timeout_ms" is also added to control the timeout
   on the "accept(listen_fd)".

3) connect_to_fd(): Fully use the server_fd.
   The server sock address has already been obtained from
   getsockname(server_fd).  The sockaddr includes the family,
   so the "int family" arg is redundant.

   Since the server address is obtained from server_fd,  there
   is little reason not to get the server's socket type from the
   server_fd also.  getsockopt(server_fd) can be used to do that,
   so "int type" arg is also removed.

   "int timeout_ms" is added.

4) connect_fd_to_fd():
   "int timeout_ms" is added.
   Some code is also refactored to connect_fd_to_addr() which is
   shared with connect_to_fd().

5) Preserve errno:
   Some callers need to check errno, e.g. cgroup_skb_sk_lookup.c.
   Make changes to do it more consistently in save_errno_close()
   and log_err().

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200702004852.2103003-1-kafai@fb.com

Merge branch 'mptcp-add-receive-buffer-auto-tuning'

Florian Westphal says:

====================
mptcp: add receive buffer auto-tuning

First patch extends the test script to allow for reproducible results.
Second patch adds receive auto-tuning. Its based on what TCP is doing,
only difference is that we use the largest RTT of any of the subflows
and that we will update all subflows with the new value.

Else, we get spurious packet drops because the mptcp work queue might
not be able to move packets from subflow socket to master socket
fast enough. Without the adjustment, TCP may drop the packets because
the subflow socket is over its rcvbuffer limit.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mptcp: add receive buffer auto-tuning

When mptcp is used, userspace doesn't read from the tcp (subflow)
socket but from the parent (mptcp) socket receive queue.

skbs are moved from the subflow socket to the mptcp rx queue either from
'data_ready' callback (if mptcp socket can be locked), a work queue, or
the socket receive function.

This means tcp_rcv_space_adjust() is never called and thus no receive
buffer size auto-tuning is done.

An earlier (not merged) patch added tcp_rcv_space_adjust() calls to the
function that moves skbs from subflow to mptcp socket.
While this enabled autotuning, it also meant tuning was done even if
userspace was reading the mptcp socket very slowly.

This adds mptcp_rcv_space_adjust() and calls it after userspace has
read data from the mptcp socket rx queue.

Its very similar to tcp_rcv_space_adjust, with two differences:

1. The rtt estimate is the largest one observed on a subflow
2. The rcvbuf size and window clamp of all subflows is adjusted
   to the mptcp-level rcvbuf.

Otherwise, we get spurious drops at tcp (subflow) socket level if
the skbs are not moved to the mptcp socket fast enough.

Before:
time mptcp_connect.sh -t -f $((4*1024*1024)) -d 300 -l 0.01% -r 0 -e "" -m mmap
[..]
ns4 MPTCP -> ns3 (10.0.3.2:10108      ) MPTCP   (duration 40823ms) [ OK ]
ns4 MPTCP -> ns3 (10.0.3.2:10109      ) TCP     (duration 23119ms) [ OK ]
ns4 TCP   -> ns3 (10.0.3.2:10110      ) MPTCP   (duration  5421ms) [ OK ]
ns4 MPTCP -> ns3 (dead:beef:3::2:10111) MPTCP   (duration 41446ms) [ OK ]
ns4 MPTCP -> ns3 (dead:beef:3::2:10112) TCP     (duration 23427ms) [ OK ]
ns4 TCP   -> ns3 (dead:beef:3::2:10113) MPTCP   (duration  5426ms) [ OK ]
Time: 1396 seconds

After:
ns4 MPTCP -> ns3 (10.0.3.2:10108      ) MPTCP   (duration  5417ms) [ OK ]
ns4 MPTCP -> ns3 (10.0.3.2:10109      ) TCP     (duration  5427ms) [ OK ]
ns4 TCP   -> ns3 (10.0.3.2:10110      ) MPTCP   (duration  5422ms) [ OK ]
ns4 MPTCP -> ns3 (dead:beef:3::2:10111) MPTCP   (duration  5415ms) [ OK ]
ns4 MPTCP -> ns3 (dead:beef:3::2:10112) TCP     (duration  5422ms) [ OK ]
ns4 TCP   -> ns3 (dead:beef:3::2:10113) MPTCP   (duration  5423ms) [ OK ]
Time: 296 seconds

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: mptcp: add option to specify size of file to transfer

The script generates two random files that are then sent via tcp and
mptcp connections.

In order to compare throughput over consecutive runs add an option
to provide the file size on the command line: "-f 128000".

Also add an option, -t, to enable tcp tests. This is useful to
compare throughput of mptcp connections and tcp connections.

Example: run tests with a 4mb file size, 300ms delay 0.01% loss,
default gso/tso/gro settings and with large write/blocking io:

mptcp_connect.sh -t -f $((4 * 1024 * 1024)) -d 300 -l 0.01% -r 0 -e "" -m mmap

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: Allow changing default qdisc to FQ-PIE

Similar to fq_codel and the other qdiscs that can set as default,
fq_pie is also suitable for general use without explicit configuration,
which makes it a valid choice for this.

This is useful in situations where a painless out-of-the-box solution
for reducing bufferbloat is desired but fq_codel is not necessarily the
best choice. For example, fq_pie can be better for DASH streaming, but
there could be more cases where it's the better choice of the two simple
AQMs available in the kernel.

Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '40GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2020-07-01

This series contains updates to all Intel drivers, but a majority of the
changes are to the i40e driver.

Jeff converts 'fall through' comments to the 'fallthrough;' keyword for
all Intel drivers. Removed unnecessary delay in the ixgbe ethtool
diagnostics test.

Arkadiusz implements Total Port Shutdown for i40e. This is the revised
patch based on Jakub's feedback from an earlier submission of this
patch, where additional code comments and description was needed to
describe the functionality.

Wei Yongjun fixes return error code for iavf_init_get_resources().

Magnus optimizes XDP code in i40e; starting with AF_XDP zero-copy
transmit completion path. Then by only executing a division when
necessary in the napi_poll data path. Move the check for transmit ring
full outside the send loop to increase performance.

Ciara add XDP ring statistics to i40e and the ability to dump these
statistics and descriptors.

Tony fixes reporting iavf statistics.

Radoslaw adds support for 2.5 and 5 Gbps by implementing the newer ethtool
ksettings API in ixgbe.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '100GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2020-07-01

This series contains updates to the ice driver only.

Jacob implements a devlink region for device capabilities.

Bruce removes structs containing only one-element arrays that are either
unused or only used for indexing. Instead, use pointer arithmetic or
other indexing to access the elements. Converts "C struct hack"
variable-length types to the preferred C99 flexible array member.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>