git.monstr.eu Git - linux-2.6-microblaze.git/log

mptcp: move sockopt function into a new file

The MPTCP sockopt implementation is going to be much
more big and complex soon. Let's move it to a different
source file.

No functional change intended.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mptcp: revert "mptcp: forbit mcast-related sockopt on MPTCP sockets"

This change reverts commit 86581852d771 ("mptcp: forbit mcast-related sockopt on MPTCP sockets").

As announced in the cover letter of the mentioned patch above, the
following commits introduce a larger MPTCP sockopt implementation
refactor.

This time, we switch from a blocklist to an allowlist. This is safer for
the future where new sockoptions could be added while not being fully
supported with MPTCP sockets and thus causing unstabilities.

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1c: move tx cleanup processing out of interrupt

Tx queue cleanup happens in interrupt handler on same core as rx queue
processing. Both can take considerable amount of processing in high
packet-per-second scenarios.

Sending big amounts of packets can stall the rx processing which is
unfair and also can lead to out-of-memory condition since
__dev_kfree_skb_irq queues the skbs for later kfree in softirq which
is not allowed to happen with heavy load in interrupt handler.

This puts tx cleanup in its own napi and enables threaded napi to
allow the rx/tx queue processing to happen on different cores.

The ability to sustain equal amounts of tx/rx traffic increased:
from 280Kpps to 1130Kpps on Threadripper 3960X with upcoming
Mikrotik 10/25G NIC,
from 520Kpps to 850Kpps on Intel i3-3320 with Mikrotik RB44Ge adapter.

Signed-off-by: Gatis Peisenieks <gatis@mikrotik.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'BR_FDB_LOCAL'

Vladimir Oltean says:

====================
Pass the BR_FDB_LOCAL information to switchdev drivers

Bridge FDB entries with the is_local flag are entries which are
terminated locally and not forwarded. Switchdev drivers might want to be
notified of these addresses so they can trap them. If they don't program
these entries to hardware, there is no guarantee that they will do the
right thing with these entries, and they won't be, let's say, flooded.

Ideally none of the switchdev drivers should ignore these entries, but
having access to the is_local bit is the bare minimum change that should
be done in the bridge layer, before this is even possible.

These 2 changes are extracted from the larger "RX filtering in DSA"
series:
https://patchwork.kernel.org/project/netdevbpf/patch/20210224114350.2791260-8-olteanv@gmail.com/
https://patchwork.kernel.org/project/netdevbpf/patch/20210224114350.2791260-9-olteanv@gmail.com/
and submitted separately, because they touch all switchdev drivers,
while the rest is mostly specific to DSA.

This change is not a functional one, in the sense that everybody still
ignores the local FDB entries, but this will be changed by further
patches at least for DSA.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: switchdev: include local flag in FDB notifications

As explained in bugfix commit 6ab4c3117aec ("net: bridge: don't notify
switchdev for local FDB addresses") as well as in this discussion:
https://lore.kernel.org/netdev/20210117193009.io3nungdwuzmo5f7@skbuf/

the switchdev notifiers for FDB entries managed to have a zero-day bug,
which was that drivers would not know what to do with local FDB entries,
because they were not told that they are local. The bug fix was to
simply not notify them of those addresses.

Let us now add the 'is_local' bit to bridge FDB entries, and make all
drivers ignore these entries by their own choice.

Co-developed-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: switchdev: refactor br_switchdev_fdb_notify

Instead of having to add more and more arguments to
br_switchdev_fdb_call_notifiers, get rid of it and build the info
struct directly in br_switchdev_fdb_notify.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

scm: fix a typo in put_cmsg()

We need to store cmlen instead of len in cm->cmsg_len.

Fixes: 38ebcf5096a8 ("scm: optimize put_cmsg()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ehtool-fec-stats'

Jakub Kicinski says:

====================
ethtool: add standard FEC statistics

This set adds uAPI for reporting standard FEC statistics, and
implements it in a handful of drivers.

The statistics are taken from the IEEE standard, with one
extra seemingly popular but not standard statistics added.

The implementation is similar to that of the pause frame
statistics, user requests the stats by setting a bit
(ETHTOOL_FLAG_STATS) in the common ethtool header of
ETHTOOL_MSG_FEC_GET.

Since standard defines the statistics per lane what's
reported is both total and per-lane counters:

# ethtool -I --show-fec eth0
FEC parameters for eth0:
Configured FEC encodings: None
Active FEC encoding: None
Statistics:
  corrected_blocks: 256
    Lane 0: 255
    Lane 1: 1
  uncorrectable_blocks: 145
    Lane 0: 128
    Lane 1: 17

v2: check for errors in mlx5 register access
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mlx5: implement ethtool::get_fec_stats

Report corrected bits.

v2: catch reg access errors (Saeed)

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: ef10: implement ethtool::get_fec_stats

Report what appears to be the standard block counts:
- 30.5.1.1.17 aFECCorrectedBlocks
- 30.5.1.1.18 aFECUncorrectableBlocks

Don't report the per-lane symbol counts, if those really
count symbols they are not what the standard calls for
(even if symbols seem like the most useful thing to count.)

Fingers crossed that fec_corrected_errors is not in symbols.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt: implement ethtool::get_fec_stats

Report corrected bits.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethtool: add FEC statistics

Similarly to pause statistics add stats for FEC.

The IEEE standard mandates two sets of counters:
- 30.5.1.1.17 aFECCorrectedBlocks
- 30.5.1.1.18 aFECUncorrectableBlocks
where block is a block of bits FEC operates on.
Each of these counters is defined per lane (PCS instance).

Multiple vendors provide number of corrected _bits_ rather
than/as well as blocks.

This set adds the 2 standard-based block counters and a extra
one for corrected bits.

Counters are exposed to user space via netlink in new attributes.
Each attribute carries an array of u64s, first element is
the total count, and the following ones are a per-lane break down.

Much like with pause stats the operation will not fail when driver
does not implement the get_fec_stats callback (nor can the driver
fail the operation by returning an error). If stats can't be
reported the relevant attributes will be empty.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethtool: fec_prepare_data() - jump to error handling

Refactor fec_prepare_data() a little bit to skip the body
of the function and exit on error. Currently the code
depends on the fact that we only have one call which
may fail between ethnl_ops_begin() and ethnl_ops_complete()
and simply saves the error code. This will get hairy with
the stats also being queried.

No functional changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethtool: move ethtool_stats_init

We'll need it for FEC stats as well.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

scm: optimize put_cmsg()

Calling two copy_to_user() for very small regions has very high overhead.

Switch to inlined unsafe_put_user() to save one stac/clac sequence,
and avoid copy_to_user().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

enetc: convert to schedule_work()

Convert system_wq queue_work() to schedule_work() which is
a wrapper around it, since the former is a rare construct.

Fixes: 7294380c5211 ("enetc: support PTP Sync packet one-step timestamping")
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'hns3-next'

Huazhong Tan says:

====================
net: hns3: updates for -next

This series adds support for pushing link status to VFs for
the HNS3 ethernet driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: VF not request link status when PF support push link status feature

To reduce the processing of unnecessary mailbox command when PF supports
actively push its link status to VFs, VFs stop sending request link
status command in periodic service task in this case.

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: PF add support for pushing link status to VFs

Previously, VF updates its link status every second by send query command
to PF in periodic service task. If link stats of PF is changed, VF may
need at most one second to update its link status.

To reduce delay of link status between PF and VFs, PF actively push its
link status to VFs when its link status is updated. And to let VF know
PF supports this new feature, the link status changed mailbox command
adds one bit to indicate it.

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: at803x: select correct page on config init

The Atheros AR8031 and AR8033 expose different registers for SGMII/Fiber
as well as the copper side of the PHY depending on the BT_BX_REG_SEL bit
in the chip configure register.

The driver assumes the copper side is selected on probe, but this might
not be the case depending which page was last selected by the
bootloader. Notably, Ubiquiti UniFi bootloaders show this behavior.

Select the copper page when probing to circumvent this.

Signed-off-by: David Bauer <mail@david-bauer.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2021-04-14

This series contains updates to ice driver only.

Bruce changes and removes open coded values to instead use existing
kernel defines and suppresses false cppcheck issues.

Ani adds new VSI states to track netdev allocation and registration. He
also removes leading underscores in the ice_pf_state enum.

Jesse refactors ITR by introducing helpers to reduce duplicated code and
structures to simplify checking of ITR mode. He also triggers a software
interrupt when exiting napi poll or busy-poll to ensure all work is
processed. Modifies /proc/iomem to display driver name instead of PCI
address. He also changes the checks of vsi->type to use a local variable
in ice_vsi_rebuild() and removes an unneeded struct member.

Jake replaces the driver's adaptive interrupt moderation algorithm to
use the kernel's DIM library implementation.

Scott reworks module reads to reduce the number of reads needed and
remove excessive increment of QSFP page.

Brett sets the vsi->vf_id to invalid for non-VF VSIs.

Paul removes the return value from ice_vsi_manage_rss_lut() as it's not
communicating anything critical. He also reduces the scope of a
variable.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ice: reduce scope of variable

The scope of this variable can be reduced so do that.

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: remove return variable

We were saving the return value from ice_vsi_manage_rss_lut(), but
the errors from that function are not critical so change it to
return void and remove the code that saved the value.

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: suppress false cppcheck issues

Silence false errors, warnings and style issues reported by cppcheck.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Set vsi->vf_id as ICE_INVAL_VFID for non VF VSI types

Currently the vsi->vf_id is set only for ICE_VSI_VF and it's left as 0
for all other VSI types. This is confusing and could be problematic
since 0 is a valid vf_id. Fix this by always setting non VF VSI types to
ICE_INVAL_VFID.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: remove unused struct member

The only time you can ever have a rq_last_status is if
a firmware event was somehow reporting a status on the receive
queue, which are generally firmware initiated events or
mailbox messages from a VF. Mostly this struct member was unused.

Fix this problem by still printing the value of the field in a debug
print, but don't store the value forever in a struct, potentially
creating opportunities for callers to use the wrong struct member.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: use local for consistency

Do a minor refactor on ice_vsi_rebuild to use a local
variable to store vsi->type.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: print name in /proc/iomem

The driver previously printed it's PCI address in
the name field for the pci resource, which when displayed
via /proc/iomem, would print the same thing twice.

It's more useful for debugging to see the driver name, as
most other modules do.

Here's a diff of before and after this change:
     99100000-991fffff : 0000:3b:00.1
   9a000000-a04fffff : PCI Bus 0000:3b
     9a000000-9bffffff : 0000:3b:00.1
-      9a000000-9bffffff : 0000:3b:00.1
+      9a000000-9bffffff : ice
     9c000000-9dffffff : 0000:3b:00.0
-      9c000000-9dffffff : 0000:3b:00.0
+      9c000000-9dffffff : ice
     9e000000-9effffff : 0000:3b:00.1
     9f000000-9fffffff : 0000:3b:00.0
     a0000000-a000ffff : 0000:3b:00.1

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Reimplement module reads used by ethtool

There was an excessive increment of the QSFP page, which is
now fixed. Additionally, this new update now reads 8 bytes
at a time and will retry each request if the module/bus is
busy.

Also, prevent reading from upper pages if module does not
support those pages.

Signed-off-by: Scott W Taylor <scott.w.taylor@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: refactor ITR data structures

Use a dedicated bitfield in order to both increase
the amount of checking around the length of ITR writes
as well as simplify the checks of dynamic mode.

Basically unpack the "high bit means dynamic" logic
into bitfields.

Also, remove some unused ITR defines.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: manage interrupts during poll exit

The driver would occasionally miss that there were outstanding
descriptors to clean when exiting busy/napi poll. This issue has
been in the code since the introduction of the ice driver.

Attempt to "catch" any remaining work by triggering a software
interrupt when exiting napi poll or busy-poll. This will not
cause extra interrupts in the case of normal execution.

This issue was found when running sfnt-pingpong, with busy
poll enabled, and typically with larger I/O sizes like > 8192,
the program would occasionally report > 1 second maximums
to complete a ping pong.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: replace custom AIM algorithm with kernel's DIM library

The ice driver has support for adaptive interrupt moderation, an
algorithm for tuning the interrupt rate dynamically. This algorithm
is based on various assumptions about ring size, socket buffer size,
link speed, SKB overhead, ethernet frame overhead and more.

The Linux kernel has support for a dynamic interrupt moderation
algorithm known as "dimlib". Replace the custom driver-specific
implementation of dynamic interrupt moderation with the kernel's
algorithm.

The Intel hardware has a different hardware implementation than the
originators of the dimlib code had to work with, which requires the
driver to use a slightly different set of inputs for the actual
moderation values, while getting all the advice from dimlib of
better/worse, shift left or right.

The change made for this implementation is to use a pair of values
for each of the 5 "slots" that the dimlib moderation expects, and
the driver will program those pairs when dimlib recommends a slot to
use. The currently implementation uses two tables, one for receive
and one for transmit, and the pairs of values in each slot set the
maximum delay of an interrupt and a maximum number of interrupts per
second (both expressed in microseconds).

There are two separate kinds of bugs fixed by using DIMLIB, one is
UDP single stream send was too slow, and the other is that 8K
ping-pong was going to the most aggressive moderation and has much
too high latency.

The overall result of using DIMLIB is that we meet or exceed our
performance expectations set based on the old algorithm.

Co-developed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: refactor interrupt moderation writes

Introduce several new helpers for writing ITR and GLINT_RATE
registers, and refactor the code calling them. This resulted
in removal of several duplicate functions and rolled a bunch
of simple code back into the calling routines.

In particular this removes some code that was doing both
a store and a set in a helper function, which seems better
done as separate tasks in the caller (and generally takes
less lines of code even with a tiny bit of repetition).

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Add new VSI states to track netdev alloc/registration

Add two new VSI states, one to track if a netdev for the VSI has been
allocated and the other to track if the netdev has been registered.
Call unregister_netdev/free_netdev only when the corresponding state
bits are set.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Drop leading underscores in enum ice_pf_state

Remove the leading underscores in enum ice_pf_state. This is not really
communicating anything and is unnecessary. No functional change.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: use kernel definitions for IANA protocol ports and ether-types

The well-known IANA protocol port 3260 (iscsi-target 0x0cbc) and the
ether-types 0x8906 (ETH_P_FCOE) and 0x8914 (ETH_P_FIP) are already defined
in kernel header files. Use those definitions instead of open-coding the
same.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

Merge tag 'linux-can-next-for-5.13-20210414' of git://git./linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2021-04-14

this is a pull request of a single patch for net-next/master.

Vincent Mailhol's patch fixes a NULL pointer dereference when handling
error frames in the etas_es58x driver, which has been added in the
previous PR.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/packet: remove data races in fanout operations

af_packet fanout uses RCU rules to ensure f->arr elements
are not dismantled before RCU grace period.

However, it lacks rcu accessors to make sure KCSAN and other tools
wont detect data races. Stupid compilers could also play games.

Fixes: dc99f600698d ("packet: Add fanout support.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: "Gong, Sishuai" <sishuai@purdue.edu>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: propagate error code and extack from br_mc_disabled_update

Some Ethernet switches might only be able to support disabling multicast
snooping globally, which is an issue for example when several bridges
span the same physical device and request contradictory settings.

Propagate the return value of br_mc_disabled_update() such that this
limitation is transmitted correctly to user-space.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'mlx5-updates-2021-04-13' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2021-04-13

mlx5 core and netdev driver updates

1) E-Switch updates from Parav,
  1.1) Devlink parameter to control mlx5 metadata enablement for E-Switch
  1.2) Trivial cleanups for E-Switch code
  1.3) Dynamically allocate vport steering namespaces only when required

2) From Jianbo, Use variably sized data structures for Software steering

3) Several minor cleanups
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: enetc: fetch MAC address from device tree

Normally, the bootloader will already initialize the MAC address
registers of the ENETC and the driver will just use them or generate a
random one, if it is not initialized.

Add a new way to provide the MAC address: via device tree. Besides the
usual 'mac-address' property, there is also the possibility to fetch it
via a NVMEM provider. The sl28 board stores the MAC address in the SPI
NOR flash OTP region. Having this will allow linux to fetch the MAC
address from there without being dependent on the bootloader.

No in-tree boards have the device tree properties set, thus for these,
this is a no-op.

Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: Remove duplicate argument

Fix the following coccicheck warning:

./drivers/net/ethernet/sfc/enum.h:80:7-28: duplicated argument to |

Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

skbuff: revert "skbuff: remove some unnecessary operation in skb_segment_list()"

the commit 1ddc3229ad3c ("skbuff: remove some unnecessary operation
in skb_segment_list()") introduces an issue very similar to the
one already fixed by commit 53475c5dd856 ("net: fix use-after-free when
UDP GRO with shared fraglist").

If the GSO skb goes though skb_clone() and pskb_expand_head() before
entering skb_segment_list(), the latter will unshare the frag_list
skbs and will release the old list. With the reverted commit in place,
when skb_segment_list() completes, skb->next points to the just
released list, and later on the kernel will hit UaF.

Note that since commit e0e3070a9bc9 ("udp: properly complete L4 GRO
over UDP tunnel packet") the critical scenario can be reproduced also
receiving UDP over vxlan traffic with:

NIC (NETIF_F_GRO_FRAGLIST enabled) -> vxlan -> UDP sink

Attaching a packet socket to the NIC will cause skb_clone() and the
tunnel decapsulation will call pskb_expand_head().

Fixes: 1ddc3229ad3c ("skbuff: remove some unnecessary operation in skb_segment_list()")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atm: idt77252: remove unused function

Fix the following clang warning:

drivers/atm/idt77252.c:1787:1: warning: unused function
'idt77252_fbq_level' [-Wunused-function].

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2021-04-14

Not much this time:

1) Simplification of some variable calculations in esp4 and esp6.
From Jiapeng Chong and Junlin Yang.

2) Fix a clang Wformat warning in esp6 and ah6.
From Arnd Bergmann.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: add support for pause ethtool ops

This adds support for the [g|s]et_pauseparam ethtool ops. It considers
that the chip doesn't support pause frame use in jumbo mode.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '10GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
10GbE Intel Wired LAN Driver Updates 2021-04-13

This series contains updates to ixgbe and ixgbevf driver.

Jostar Yang adds support for BCM54616s PHY for ixgbe.

Chen Lin removes an unused function pointer for ixgbe and ixgbevf.

Bhaskar Chowdhury fixes a typo in ixgbe.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: Add support for external trigger timestamping

The Synopsis MAC controller supports auxiliary snapshot feature that
allows user to store a snapshot of the system time based on an external
event.

This patch add supports to the above mentioned feature. Users will be
able to triggered capturing the time snapshot from user-space using
application such as testptp or any other applications that uses the
PTP_EXTTS_REQUEST ioctl request.

Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Tan Tee Min <tee.min.tan@intel.com>
Co-developed-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'marvell-88x2222-improvements'

Ivan Bornyakov says:

====================
net: phy: marvell-88x2222: a couple of improvements

First, there are some SFP modules that only uses RX_LOS for link
indication. Add check that link is operational before actual read of
line-side status.

Second, it is invalid to set 10G speed without autonegotiation,
according to phy_ethtool_ksettings_set(). Implement switching between
10GBase-R and 1000Base-X/SGMII if autonegotiation can't complete but
there is signal in line.

Changelog:
  v1 -> v2:
    * make checking that link is operational more friendly for
      trancievers without SFP cages.
    * split swapping 1G/10G modes into non-functional and functional
      commits for the sake of easier review.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: marvell-88x2222: swap 1G/10G modes on autoneg

Setting 10G without autonegotiation is invalid according to
phy_ethtool_ksettings_set(). Thus, we need to set it during
autonegotiation.

If 1G autonegotiation can't complete for quite a time, but there is
signal in line, switch line interface type to 10GBase-R, if supported,
in hope for link to be established.

And vice versa. If 10GBase-R link can't be established for quite a time,
and autonegotiation is enabled, and there is signal in line, switch line
interface type to appropriate 1G mode, i.e. 1000Base-X or SGMII, if
supported.

Signed-off-by: Ivan Bornyakov <i.bornyakov@metrotek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: marvell-88x2222: move read_status after config_aneg

No functional changes, just move read link status routines below
autonegotiation configuration to make future functional changes more
distinct.

Signed-off-by: Ivan Bornyakov <i.bornyakov@metrotek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: marvell-88x2222: check that link is operational

Some SFP modules uses RX_LOS for link indication. In such cases link
will be always up, even without cable connected. RX_LOS changes will
trigger link_up()/link_down() upstream operations. Thus, check that SFP
link is operational before actual read link status.

If there is no SFP cage connected to the tranciever, check only PMD
Recieve Signal Detect register.

Signed-off-by: Ivan Bornyakov <i.bornyakov@metrotek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: Fix RQ creation flow for queues which doesn't support XDP

Allow to create an RQ which is not registered as an XDP RQ. For example:
the trap-RQ doesn't register as an XDP RQ.

Fixes: 869c5f926247 ("net/mlx5e: Generalize open RQ")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Replace spaces with tab at the start of a line

There should be no spaces at the start of the line.

Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Remove return statement exist at the end of void function

void function return statements are not generally useful.

Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Add a blank line after declarations

There should be a blank lines after declarations.

Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Fix bit-wise and with zero

The bit-wise and of the action field with MLX5_ACCEL_ESP_ACTION_DECRYPT
is incorrect as MLX5_ACCEL_ESP_ACTION_DECRYPT is zero and not intended
to be a bit-flag. Fix this by using the == operator as was originally
intended.

Addresses-Coverity: ("Logically dead code")
Fixes: 7dfee4b1d79e ("net/mlx5: IPsec, Refactor SA handle creation and destruction")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: DR, Alloc cmd buffer with kvzalloc() instead of kzalloc()

The cmd size is 8K so use kvzalloc().

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: DR, Use variably sized data structures for different actions

mlx5dr_action is a generally used data structure, and there is an
union for different types of actions in it. The size of mlx5dr_action
is about 72 bytes, but for those actions with fewer fields, most of
the allocated memory is wasted.
Remove this union, and mlx5dr_action becomes a generic action header.
Then actions are dynamically allocated with needed memory, the data
for each action is stored right after the header.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: SF, Reuse stored hardware function id

SF's hardware function id is already stored in mlx5_sf. Reuse it,
instead of querying the hw table.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: SF, Use device pointer directly

At many places in the code, device pointer is directly available. Make
use of it, instead of accessing it from the table.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: E-Switch, Initialize eswitch acls ns when eswitch is enabled

Currently eswitch flow steering (FS) namespace of vport's ingress and
egress ACL are enabled when FS layer is initialized. This is done even
when eswitch is diabled. This demands that total eswitch ports to be
known to FS layer without eswitch in use.

Given the FS core is not dependent on eswitch, make namespace init and
cleanup routines as helper routines to be invoked only when eswitch is
needed.

With this change, ingress and egress ACL namespaces are created only
when eswitch legacy/offloads mode is enabled.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: E-Switch, Move legacy code to a individual file

Currently eswitch offers two modes. Legacy and offloads.
Offloads code is already in its own file eswitch_offloads.c

However eswitch.c contains the eswitch legacy code and common
infrastructure code.

To enable future extensions and to better manage generic common eswitch
infrastructure code, move the legacy code to its own legacy.c file.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: E-Switch, Convert a macro to a helper routine

Convert ESW_ALLOWED macro to a helper routine so that it can be used in
other eswitch files.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: E-Switch Make cleanup sequence mirror of init

Make cleanup sequence mirror of init sequence for cleaning up reps
and freeing vports.

Also when reps initialization fails, there is no need to perform reps
cleanup.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: E-Switch, Make vport number u16

Vport number is 16-bit field in hardware. Make it u16.

Move location of vport in the structure so that it reduces a hole
in the structure.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: E-Switch, Skip querying SF enabled bits

With vhca events, SF state is queried through the VHCA events. Device no
longer expects SF bitmap in the query eswitch functions command.

Hence, remove it to simplify the code.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: E-Switch, let user to enable disable metadata

Currently each packet inserted in eswitch is tagged with a internal
metadata to indicate source vport. Metadata tagging is not always
needed. Metadata insertion is needed for multi-port RoCE, failover
between representors and stacked devices. In many other cases,
metadata enablement is not needed.

Metadata insertion slows down the packet processing rate of the E-switch
when it is in switchdev mode.

Below table show performance gain with metadata disabled for VXLAN
offload rules in both SMFS and DMFS steering mode on ConnectX-5 device.

----------------------------------------------
| steering | metadata | pkt size | rx pps    |
| mode     |          |          | (million) |
----------------------------------------------
| smfs     | disabled | 128Bytes | 42        |
----------------------------------------------
| smfs     | enabled  | 128Bytes | 36        |
----------------------------------------------
| dmfs     | disabled | 128Bytes | 42        |
----------------------------------------------
| dmfs     | enabled  | 128Bytes | 36        |
----------------------------------------------

Hence, allow user to disable metadata using driver specific devlink
parameter. Metadata setting of the eswitch is applicable only for the
switchdev mode.

Example to show and disable metadata before changing eswitch mode:
$ devlink dev param show pci/0000:06:00.0 name esw_port_metadata
pci/0000:06:00.0:
  name esw_port_metadata type driver-specific
    values:
      cmode runtime value true

$ devlink dev param set pci/0000:06:00.0 \
  name esw_port_metadata value false cmode runtime

$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
changelog:
v1->v2:
- added performance numbers in commit log
- updated commit log and documentation for switchdev mode
- added explicit note on when user can disable metadata in
   documentation

can: etas_es58x: fix null pointer dereference when handling error frames

During the handling of CAN bus errors, a CAN error SKB is allocated
using alloc_can_err_skb(). Even if the allocation of the SKB fails,
the function continues in order to do the stats handling.

All access to the can_frame pointer (cf) should be guarded by an if
statement:
if (cf)

However, the increment of the rx_bytes stats:
netdev->stats.rx_bytes += cf->can_dlc;
dereferences the cf pointer and was not guarded by an if condition
leading to a NULL pointer dereference if the can_err_skb() function
failed.

Replacing the cf->can_dlc by the macro CAN_ERR_DLC (which is the
length of any CAN error frames) solves this NULL pointer dereference.

Fixes: 8537257874e9 ("can: etas_es58x: add core support for ETAS ES58X CAN USB interfaces")
Link: https://lore.kernel.org/r/20210413114242.2760-1-mailhol.vincent@wanadoo.fr
Reported-by: Arunachalam Santhanam <arunachalam.santhanam@in.bosch.com>
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

net: ethernet: intel: Fix a typo in the file ixgbe_dcb_nl.c

s/Reprogam/Reprogram/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

net: intel: Remove unused function pointer typedef ixgbe_mc_addr_itr

Remove the 'ixgbe_mc_addr_itr' typedef as it is not used.

Signed-off-by: Chen Lin <chen.lin5@zte.com.cn>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ixgbe: Support external GBE SerDes PHY BCM54616s

The Broadcom PHY is used in switches, so add the ID, and hook it up.

This upstreams the Linux kernel patch from the network operating system
SONiC from February 2020 [1].

[1]: https://github.com/Azure/sonic-linux-kernel/pull/122

Signed-off-by: Jostar Yang <jostar_yang@accton.com>
Signed-off-by: Guohan Lu <lguohan@gmail.com>
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

net: Space: remove hp100 probe

The driver was removed last year, but the static initialization got left
behind by accident.

Fixes: a10079c66290 ("staging: remove hp100 driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'dpaa2-switch-tc-hw-offload'

Ioana Ciornei says:

====================
dpaa2-switch: add tc hardware offload on ingress traffic

This patch set adds tc hardware offload on ingress traffic in
dpaa2-switch. The cls flower and matchall classifiers are supported
using the same ACL infrastructure supported by the dpaa2-switch.

The first patch creates a new structure to hold all the necessary
information related to an ACL table. This structure is used in the next
patches to create a link between each switch port and the table used.
Multiple ports can share the same ACL table when they also share the
ingress tc block. Also, some small changes in the priority of the
default STP trap is done in the second patch.

The support for cls flower is added in the 3rd patch, while the 4th
one builds on top of the infrastructure put in place and adds cls
matchall support.

The following flow keys are supported:
- Ethernet: dst_mac/src_mac
- IPv4: dst_ip/src_ip/ip_proto/tos
- VLAN: vlan_id/vlan_prio/vlan_tpid/vlan_dei
- L4: dst_port/src_port

Each filter can support only one action from the following list:
- drop
- mirred egress redirect
- trap

With the last patch, we reuse the dpaa2_switch_acl_entry_add() function
added previously instead of open-coding the install of a new ACL entry
into the table.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

dpaa2-switch: reuse dpaa2_switch_acl_entry_add() for STP frames trap

Since we added the dpaa2_switch_acl_entry_add() function in the previous
patches to hide all the details of actually adding the ACL entry by
issuing a firmware command, let's use it also for adding a CPU trap for
the STP frames.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dpaa2-switch: add tc matchall filter support

Add support TC_SETUP_CLSMATCHALL by using the same ACL table entries
framework as for tc flower. Adding a matchall rule is done by installing
an entry which has a mask of all zeroes, thus matching on any packet.

This can be used as a catch-all type of rule if used correctly, ie the
priority of the matchall filter should be kept as the lowest one in the
entire filter block.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dpaa2-switch: add tc flower hardware offload on ingress traffic

This patch adds support for tc flower hardware offload on the ingress
path. Shared filter blocks are supported by sharing a single ACL table
between multiple ports.

The following flow keys are supported:
- Ethernet: dst_mac/src_mac
- IPv4: dst_ip/src_ip/ip_proto/tos
- VLAN: vlan_id/vlan_prio/vlan_tpid/vlan_dei
- L4: dst_port/src_port

As per flow actions, the following are supported:
- drop
- mirred egress redirect
- trap
Each ACL entry (filter) can be setup with only one of the listed
actions.

A sorted single linked list is used to keep the ACL entries by their
order of priority. When adding a new filter, this enables us to quickly
ascertain if the new entry has the highest priority of the entire block
or if we should make some space in the ACL table by increasing the
priority of the filters already in the table.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dpaa2-switch: install default STP trap rule with the highest priority

Change the default ACL trap rule for STP frames to have the highest
priority.

In the same ACL table will reside both default rules added by the driver
for its internal use as well as rules added with tc flower. In this
case, the default rules such as the STP one that we already have should
have the highest priority.

Also, remove the check for a full ACL table since we already know that
it's sized so that we don't hit this case. The last thing changes is
that default trap filters will not be counted in the acl_tbl's num_rules
variable since their number doesn't change.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dpaa2-switch: create a central dpaa2_switch_acl_tbl structure

Introduce a new structure - dpaa2_switch_acl_tbl - to hold all data
related to an ACL table: number of rules added, ACL table id, etc.
This will be used more in the next patches when adding support for
sharing an ACL table between ports.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ionic: return -EFAULT if copy_to_user() fails

The copy_to_user() function returns the number of bytes that it wasn't
able to copy. We want to return -EFAULT to the user.

Fixes: fee6efce565d ("ionic: add hw timestamp support files")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'stmmac-xdp-zc'

Ong Boon Leong says:

====================
stmmac: add XDP ZC support

This is the v2 patch series to add XDP ZC support to stmmac driver.

Summary of v2 patch change:-

6/7: fix synchronize_rcu() is called stmmac_disable_all_queues() that is
     used by ndo_setup_tc().

########################################################################

Continuous burst traffics are generated by pktgen script and in the midst
of each packet processing operation by xdpsock the following tc-loop.sh
script is looped continuously:-

#!/bin/bash
tc qdisc del dev eth0 parent root
tc qdisc add dev eth0 ingress
tc qdisc add dev eth0 root mqprio num_tc 4 map 0 1 2 3 0 0 0 0 0 0 0 0 0 0 0 0 queues 1@0 1@1 1@2 1@3 hw 0
tc filter add dev eth0 parent ffff: protocol 802.1Q flower vlan_prio 0 hw_tc 0
tc filter add dev eth0 parent ffff: protocol 802.1Q flower vlan_prio 1 hw_tc 1
tc filter add dev eth0 parent ffff: protocol 802.1Q flower vlan_prio 2 hw_tc 2
tc filter add dev eth0 parent ffff: protocol 802.1Q flower vlan_prio 3 hw_tc 3
tc qdisc list dev eth0
tc filter show dev eth0 ingress

On different ssh terminal
$ while true; do ./tc-loop.sh; sleep 1; done

The v2 patch series have been tested using the xdpsock app:
$ ./xdpsock -i eth0 -l -z

From xdpsock poller pps report and dmesg, we don't find any warning
related to rcu and the only difference when the script is executed is
the pps rate drops momentarily.

sock0@eth0:0 l2fwd xdp-drv
                   pps            pkts           1.00
rx                 436347         191361334
tx                 436411         191361334

sock0@eth0:0 l2fwd xdp-drv
                   pps            pkts           1.00
rx                 254117         191615476
tx                 254053         191615412

sock0@eth0:0 l2fwd xdp-drv
                   pps            pkts           1.00
rx                 466395         192081924
tx                 466395         192081860

sock0@eth0:0 l2fwd xdp-drv
                   pps            pkts           1.00
rx                 287410         192369365
tx                 287474         192369365

sock0@eth0:0 l2fwd xdp-drv
                   pps            pkts           1.00
rx                 395853         192765329
tx                 395789         192765265

sock0@eth0:0 l2fwd xdp-drv
                   pps            pkts           1.00
rx                 466132         193231514
tx                 466132         193231450

########################################################################

Based on the above result, the fix looks promising. Appreciate that if
community can help to review the patch series and provide me feedback
for improvement.
====================

net: stmmac: Add TX via XDP zero-copy socket

We add the support of XDP ZC TX submission and cleaning into
stmmac_tx_clean(). The function is made to clean as many TX complete
frames as possible, i.e. limit by priv->dma_tx_size instead of NAPI
budget. For TX ring that is associated with XSK pool, the function
stmmac_xdp_xmit_zc() is introduced to TX frame buffers from XSK pool by
using xsk_tx_peek_desc(). To make stmmac_tx_clean() support the cleaning
of XSK TX frames, STMMAC_TXBUF_T_XSK_TX TX buffer type is introduced.

As stmmac_tx_clean() uses the return value to cue whether NAPI function
should continue to poll, we augment the caller of stmmac_tx_clean() to
pass NAPI budget instead of priv->dma_tx_size through 'budget' input and
made stmmac_tx_clean() to always clean up-to the TX ring size instead.
This allows us to use the return boolean status of stmmac_xdp_xmit_zc()
to decide if XSK TX work is done or not: If true, set 'xmits' to return
'budget - 1' so that NAPI poll may exit. Else, set 'xmits' to return
'budget' to make NAPI poll continue to poll since XSK TX work is not
done. Finally, at the end of stmmac_tx_clean(), the function now take
a maximum value between 'count' and 'xmits' so that status from both
TX cleaning and XSK TX (only for XDP ZC) is considered.

This patch adds a new NAPI poll called stmmac_napi_poll_rxtx() that is
meant to be enabled/disabled for RX and TX ring that are bound to XSK
pool. This NAPI poll function starts with cleaning TX ring, then submits
XSK TX frames to TX ring before proceed to perform RX operations, i.e.
, receiving RX frames and replenishing RX ring with RX free buffers
obtained from XSK pool. Therefore, during XSK RX and TX setup, the driver
enables stmmac_napi_poll_rxtx() for RX and TX operations, then during
XSK RX and TX pool tear-down, the driver reenables the exisiting
independent NAPI poll functions accordingly: stmmac_napi_poll_rx() and
stmmac_napi_poll_tx().

Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: Enable RX via AF_XDP zero-copy

This patch adds the support for receiving packet via AF_XDP zero-copy
mechanism.

XDP ZC uses 1:1 mapping of XDP buffer to receive packet, therefore the
use of split header is not used currently. The 'xdp_buff' is declared as
union together with a struct that contains 'page', 'addr' and
'page_offset' that are associated with primary buffer.

RX buffers are now allocated either via page_pool or xsk pool. For RX
buffers from xsk_pool they are allocated and deallocated using below
functions:

* stmmac_alloc_rx_buffers_zc(struct stmmac_priv *priv, u32 queue)
* dma_free_rx_xskbufs(struct stmmac_priv *priv, u32 queue)

With above functions now available, we then extend the following driver
functions to support XDP ZC:
* stmmac_reinit_rx_buffers()
* __init_dma_rx_desc_rings()
* init_dma_rx_desc_rings()
* __free_dma_rx_desc_resources()

Note: stmmac_alloc_rx_buffers_zc() may return -ENOMEM due to RX XDP
buffer pool is not allocated (e.g. samples/bpf/xdpsock TX-only). But,
it is still ok to let TX XDP ZC to continue, therefore, the -ENOMEM
is silently ignored to let the driver succcessfully transition to XDP
ZC mode for the said RX and TX queue.

As XDP ZC buffer size is different, the DMA buffer size is required
to be reprogrammed accordingly for RX DMA/Queue that is populated with
XDP buffer from XSK pool.

Next, to add or remove per-queue XSK pool, stmmac_xdp_setup_pool()
will call stmmac_xdp_enable_pool() or stmmac_xdp_disable_pool()
that in-turn coordinates the tearing down and setting up RX ring via
RX buffers and descriptors removal and reallocation through
stmmac_disable_rx_queue() and stmmac_enable_rx_queue(). In addition,
stmmac_xsk_wakeup() is added to initiate XDP RX buffer replenishing
by signalling user application to add available XDP frames back to
FILL queue.

For RX processing using XDP zero-copy buffer, stmmac_rx_zc() is
introduced which is implemented with the assumption that RX split
header is disabled. For XDP verdict is XDP_PASS, the XDP buffer is
copied into a sk_buff allocated through stmmac_construct_skb_zc()
and sent to Linux network GRO inside stmmac_dispatch_skb_zc(). Free RX
buffers are then replenished using stmmac_rx_refill_zc()

v2: introduce __stmmac_disable_all_queues() to contain the original code
    that does napi_disable() and then make stmmac_setup_tc_block_cb()
    to use it. Move synchronize_rcu() into stmmac_disable_all_queues()
    that eventually calls __stmmac_disable_all_queues(). Then,
    make both stmmac_release() and stmmac_suspend() to use
    stmmac_disable_all_queues(). Thanks David Miller for spotting the
    synchronize_rcu() issue in v1 patch.

Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: Refactor __stmmac_xdp_run_prog for XDP ZC

Prepare stmmac_xdp_run_prog() for AF_XDP zero-copy support which will be
added by upcoming patches by splitting out the XDP verdict processing
into __stmmac_xdp_run_prog() and it callable for XDP ZC path which does
not need to verify bpf_prog is not NULL.

The stmmac_xdp_run_prog() is used for regular XDP Rx path which requires
bpf_prog to be verified.

Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: rearrange RX and TX desc init into per-queue basis

Below functions are made to be per-queue in preparation of XDP ZC:

__init_dma_rx_desc_rings(struct stmmac_priv *priv, u32 queue, gfp_t flags)
__init_dma_tx_desc_rings(struct stmmac_priv *priv, u32 queue)

The original functions below are stay maintained for all queue usage:

init_dma_rx_desc_rings(struct net_device *dev, gfp_t flags)
init_dma_tx_desc_rings(struct net_device *dev)

Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: refactor stmmac_init_rx_buffers for stmmac_reinit_rx_buffers

The per-queue RX buffer allocation in stmmac_reinit_rx_buffers() can be
made to use stmmac_alloc_rx_buffers() by merging the page_pool alloc
checks for "buf->page" and "buf->sec_page" in stmmac_init_rx_buffers().

This is in preparation for XSK pool allocation later.

Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: introduce dma_recycle_rx_skbufs for stmmac_reinit_rx_buffers

Rearrange RX buffer page_pool recycling logics into dma_recycle_rx_skbufs,
so that we prepare stmmac_reinit_rx_buffers() for XSK pool expansion.

Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: rearrange RX buffer allocation and free functions

This patch restructures the per RX queue buffer allocation from page_pool
to stmmac_alloc_rx_buffers().

We also rearrange dma_free_rx_skbufs() so that it can be used in
init_dma_rx_desc_rings() during freeing of RX buffer in the event of
page_pool allocation failure to replace the more efficient method earlier.
The replacement is needed to make the RX buffer alloc and free method
scalable to XDP ZC xsk_pool alloc and free later.

Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ipa-SM8350-SoC'

Alex Elder says:

====================
net: ipa: add support for the SM8350 SoC

This small series adds IPA driver support for the Qualcomm SM8350
SoC, which implements IPA v4.9.

The first patch updates the DT binding, and depends on a previous
patch that has already been accepted into net-next.

The second just defines the IPA v4.9 configuration data file.

(Device Tree files to support this SoC will be sent separately and
will go through the Qualcomm tree.)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: add IPA v4.9 configuration data

Add support for the SM8350 SoC, which includes IPA version 4.9.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: qcom,ipa: add support for SM8350

Add support for "qcom,sm8350-ipa", which uses IPA v4.9.

Use "enum" rather than "oneOf/const ..." to specify compatible
strings, as suggested by Rob Herring.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ionic: git_ts_info bit shifters

All the uses of HWTSTAMP_FILTER_* values need to be
bit shifters, not straight values.

v2: fixed subject and added Cc Dan and SoB Allen

Fixes: f8ba81da73fc ("ionic: add ethtool support for PTP")
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

ibmvnic: queue reset work in system_long_wq

The reset process for ibmvnic commonly takes multiple seconds, clearly
making it inappropriate for schedule_work/system_wq. The reason to make
this change is that ibmvnic's use of the default system-wide workqueue
for a relatively long-running work item can negatively affect other
workqueue users. So, queue the relatively slow reset job to the
system_long_wq.

Suggested-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Lijun Pan <lijunp213@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'linux-can-next-for-5.13-20210413' of git://git./linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2021-04-13

this is a pull request of 14 patches for net-next/master.

The first patch is by Yoshihiro Shimoda and updates the DT bindings
for the rcar_can driver.

Vincent Mailhol contributes 3 patches that add support for several
ETAS USB CAN adapters.

The final 10 patches are by me and clean up the peak_usb CAN driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

rsi: remove unused including <linux/version.h>

Fix the following versioncheck warning:
./drivers/net/wireless/rsi/rsi_91x_ps.c: 19 linux/version.h not needed.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfc: st-nci: remove unnecessary label

in st_nci_spi_write function, first assign a value to a variable then
goto exit label. return statement just follow the label and exit label
just used once, so we should directly return and remove exit label.

Signed-off-by: wengjianfeng <wengjianfeng@yulong.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ibmvnic: improve failover sysfs entry

The current implementation relies on H_IOCTL call to issue a
H_SESSION_ERR_DETECTED command to let the hypervisor to send a failover
signal. However, it may not work if there is no backup device or if
the vnic is already in error state,
e.g., "ibmvnic 30000003 env3: rx buffer returned with rc 6".
Add a last resort, that is to schedule a failover reset via CRQ command.

Signed-off-by: Lijun Pan <lijunp213@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

icmp: ICMPV6: pass RFC 8335 reply messages to ping_rcv

The current icmp_rcv function drops all unknown ICMP types, including
ICMP_EXT_ECHOREPLY (type 43). In order to parse Extended Echo Reply messages, we have
to pass these packets to the ping_rcv function, which does not do any
other filtering and passes the packet to the designated socket.

Pass incoming RFC 8335 ICMP Extended Echo Reply packets to the ping_rcv
handler instead of discarding the packet.

Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'non-platform-devices-of_get_mac_address'

Michael Walle says:

====================
of: net: support non-platform devices in of_get_mac_address()

of_get_mac_address() is commonly used to fetch the MAC address
from the device tree. It also supports reading it from a NVMEM
provider. But the latter is only possible for platform devices,
because only platform devices are searched for a matching device
node.

Add a second method to fetch the NVMEM cell by a device tree node
instead of a "struct device".

Moreover, the NVMEM subsystem will return dynamically allocated
data which has to be freed after use. Currently, this is handled
by allocating a device resource manged buffer to store the MAC
address. of_get_mac_address() then returns a pointer to this
buffer. Without a device, this trick is not possible anymore.
Thus, change the of_get_mac_address() API to have the caller
supply a buffer.

It was considered to use the network device to attach the buffer
to, but then the order matters and netdev_register() has to be
called before of_get_mac_address(). No driver does it this way.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

of: net: fix of_get_mac_addr_nvmem() for non-platform devices

of_get_mac_address() already supports fetching the MAC address by an
nvmem provider. But until now, it was just working for platform devices.
Esp. it was not working for DSA ports and PCI devices. It gets more
common that PCI devices have a device tree binding since SoCs contain
integrated root complexes.

Use the nvmem of_* binding to fetch the nvmem cells by a struct
device_node. We still have to try to read the cell by device first
because there might be a nvmem_cell_lookup associated with that device.

Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>