linux-2.6-microblaze.git
2 years agoocteontx2-pf: Add check for non zero mcam flows
Sunil Goutham [Fri, 20 Aug 2021 08:25:22 +0000 (13:55 +0530)]
octeontx2-pf: Add check for non zero mcam flows

This patch ensures that mcam flows are allocated
before adding or destroying the flows.

Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotools/net: Use bitwise instead of arithmetic operator for flags
jing yangyang [Fri, 20 Aug 2021 03:35:27 +0000 (20:35 -0700)]
tools/net: Use bitwise instead of arithmetic operator for flags

This silences the following coccinelle warning:

"WARNING: sum of probable bitmasks, consider |"

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: jing yangyang <jing.yangyang@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'ipa-kill-off-ipa_clock_get'
David S. Miller [Fri, 20 Aug 2021 13:45:48 +0000 (14:45 +0100)]
Merge branch 'ipa-kill-off-ipa_clock_get'

Alex Elder says:

====================
net: ipa: kill off ipa_clock_get()

This series replaces the remaining uses of ipa_clock_get() with
calls to pm_runtime_get_sync() instead.  It replaces all calls to
ipa_clock_put() with calls to pm_runtime_put().

This completes the preparation for enabling automated suspend under
the control of the power management core code.  The next patch (in
an upcoming series) enables that.  Then the "ipa_clock" files and
symbols will switch to using an "ipa_power" naming convention instead.

Additional info

It is possible for pm_runtime_get_sync() to return an error.  There
are really three cases, identified by return value:
  - 1, meaning power was already active
  - 0, meaning power was not previously active, but is now
  - EACCES, meaning runtime PM is disabled
One additional case is EINVAL, meaning a previous suspend or resume
(or idle) call returned an error.  But we have always assumed this
won't happen (we previously didn't even check for an error).

But because we use pm_runtime_force_suspend() to implement system
suspend, there's a chance we'd get an EACCES error (the first thing
that function does is disable runtime suspend).  Individual patches
explain what happens in that case, but generally we just accept that
it could be an unlikely problem (occurring only at startup time).

Similarly, pm_runtime_put() could return an error.  There too, we
ignore EINVAL, assuming the IPA suspend and resume operations won't
produce an error.  EBUSY and EPERM are not applicable, EAGAIN is not
expected (and harmless).  We should never get EACCES (runtime
suspend disabled), because pm_runtime_put() calls match prior
pm_runtime_get_sync() calls, and a system suspend will not be
started while a runtime suspend or resume is underway.  In summary,
the value returned from pm_runtime_put() is not meaningful, so we
explicitly ignore it.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ipa: kill ipa_clock_get()
Alex Elder [Thu, 19 Aug 2021 22:19:27 +0000 (17:19 -0500)]
net: ipa: kill ipa_clock_get()

The only remaining user of the ipa_clock_{get,put}() interface is
ipa_isr_thread().  Replace calls to ipa_clock_get() there calling
pm_runtime_get_sync() instead.  And call pm_runtime_put() there
rather than ipa_clock_put().  Warn if we ever get an error.

With that, we can get rid of ipa_clock_get() and ipa_clock_put().

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ipa: don't use ipa_clock_get() in "ipa_modem.c"
Alex Elder [Thu, 19 Aug 2021 22:19:26 +0000 (17:19 -0500)]
net: ipa: don't use ipa_clock_get() in "ipa_modem.c"

When we open or close the modem network device we need to ensure the
hardware is powered.  Replace the callers of ipa_clock_get() found
in ipa_open() and ipa_stop() with calls to pm_runtime_get_sync().
If an error is returned, simply return that error to the caller
(without any error or warning message).  This could conceivably
occur if the function was called while the system was suspended,
but that really shouldn't happen.  Replace corresponding calls to
ipa_clock_put() with pm_runtime_put() also.

If the modem crashes we also need to ensure the hardware is powered
to recover.  If getting power returns an error there's not much we
can do, but at least report the error.  (Ideally the remoteproc SSR
code would ensure the AP was not suspended when it sends the
notification, but that is not (yet) the case.)

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ipa: don't use ipa_clock_get() in "ipa_uc.c"
Alex Elder [Thu, 19 Aug 2021 22:19:25 +0000 (17:19 -0500)]
net: ipa: don't use ipa_clock_get() in "ipa_uc.c"

Replace the ipa_clock_get() call in ipa_uc_clock() when taking the
"proxy" clock reference for the microcontroller with a call to
pm_runtime_get_sync().  Replace calls of ipa_clock_put() for the
microcontroller with pm_runtime_put() calls instead.

There is a chance we get an error when taking the microcontroller
power reference.  This is an unlikely scenario, where system suspend
is initiated just before we learn the modem is booting.  For now
we'll just accept that this could occur, and report it if it does.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ipa: don't use ipa_clock_get() in "ipa_smp2p.c"
Alex Elder [Thu, 19 Aug 2021 22:19:24 +0000 (17:19 -0500)]
net: ipa: don't use ipa_clock_get() in "ipa_smp2p.c"

If the "modem-init" Device Tree property is present for a platform,
the modem performs early IPA hardware initialization, and signals
this is complete with an "ipa-setup-ready" SMP2P interrupt.  This
triggers a call to ipa_setup(), which requires the hardware to be
powered.

Replace the call to ipa_clock_get() in this case with a call to
pm_runtime_get_sync().  And replace the corresponding calls to
ipa_clock_put() with calls to pm_runtime_put() instead.

There is a chance we get an error when taking this power reference.
This is an unlikely scenario, where system suspend is initiated just
before the modem signals it has finished initializing the IPA
hardware.  For now we'll just accept that this could occur, and
report it if it does.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ipa: don't use ipa_clock_get() in "ipa_main.c"
Alex Elder [Thu, 19 Aug 2021 22:19:23 +0000 (17:19 -0500)]
net: ipa: don't use ipa_clock_get() in "ipa_main.c"

We need the hardware to be powered starting at the config stage of
initialization when the IPA driver probes.  And we need it powered
when the driver is removed, at least until the deconfig stage has
completed.

Replace callers of ipa_clock_get() in ipa_probe() and ipa_exit(),
calling pm_runtime_get_sync() instead.  Replace the corresponding
callers of ipa_clock_put(), calling pm_runtime_put() instead.

The only error we expect when getting power would occur when the
system is suspended.  The ->probe and ->remove driver callbacks
won't be called when suspended, so issue a WARN() call if an error
is seen getting power.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ipa: fix TX queue race
Alex Elder [Thu, 19 Aug 2021 21:12:28 +0000 (16:12 -0500)]
net: ipa: fix TX queue race

Jakub Kicinski pointed out a race condition in ipa_start_xmit() in a
recently-accepted series of patches:
  https://lore.kernel.org/netdev/20210812195035.2816276-1-elder@linaro.org/
We are stopping the modem TX queue in that function if the power
state is not active.  We restart the TX queue again once hardware
resume is complete.

  TX path                       Power Management
  -------                       ----------------
  pm_runtime_get(); no power    Start resume
  Stop TX queue                      ...
  pm_runtime_put()              Resume complete
  return NETDEV_TX_BUSY         Start TX queue

  pm_runtime_get()
  Power present, transmit
  pm_runtime_put()              (auto-suspend)

The issue is that the power management (resume) activity and the
network transmit activity can occur concurrently, and there's a
chance the queue will be stopped *after* it has been started again.

  TX path                       Power Management
  -------                       ----------------
                                Resume underway
  pm_runtime_get(); no power         ...
                                Resume complete
                                Start TX queue
  Stop TX queue       <-- No more transmits after this
  pm_runtime_put()
  return NETDEV_TX_BUSY

We address this using a STARTED flag to indicate when the TX queue
has been started from the resume path, and a spinlock to make the
flag and queue updates happen atomically.

  TX path                       Power Management
  -------                       ----------------
                                Resume underway
  pm_runtime_get(); no power    Resume complete
                                start TX queue     \
  If STARTED flag is *not* set:                     > atomic
      Stop TX queue             set STARTED flag   /
  pm_runtime_put()
  return NETDEV_TX_BUSY

A second flag is used to address a different race that involves
another path requesting power.

  TX path            Other path              Power Management
  -------            ----------              ----------------
                     pm_runtime_get_sync()   Resume
                                             Start TX queue   \ atomic
                                             Set STARTED flag /
                     (do its thing)
                     pm_runtime_put()
                                             (auto-suspend)
  pm_runtime_get()                           Mark delayed resume
  STARTED *is* set, so
    do *not* stop TX queue  <-- Queue should be stopped here
  pm_runtime_put()
  return NETDEV_TX_BUSY                      Suspend done, resume
                                             Resume complete
  pm_runtime_get()
  Stop TX queue
    (STARTED is *not* set)                   Start TX queue   \ atomic
  pm_runtime_put()                           Set STARTED flag /
  return NETDEV_TX_BUSY

So a STOPPED flag is set in the transmit path when it has stopped
the TX queue, and this pair of operations is also protected by the
spinlock.  The resume path only restarts the TX queue if the STOPPED
flag is set.  This case isn't a major problem, but it avoids the
"non-trivial amount of useless work" done by the networking stack
when NETDEV_TX_BUSY is returned.

Fixes: 6b51f802d652b ("net: ipa: ensure hardware has power in ipa_start_xmit()")
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'ocelot-vlan'
David S. Miller [Fri, 20 Aug 2021 13:39:52 +0000 (14:39 +0100)]
Merge branch 'ocelot-vlan'

Vladimir Oltean says:

====================
Small ocelot VLAN improvements

This small series propagates some VLAN restrictions via netlink extack
and creates some helper functions instead of open-coding VLAN table
manipulations from multiple places.

This is split from the larger "DSA FDB isolation" series, hence the v2
tag:
https://patchwork.kernel.org/project/netdevbpf/cover/20210818120150.892647-1-vladimir.oltean@nxp.com/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: mscc: ocelot: use helpers for port VLAN membership
Vladimir Oltean [Thu, 19 Aug 2021 17:40:08 +0000 (20:40 +0300)]
net: mscc: ocelot: use helpers for port VLAN membership

This is a mostly cosmetic patch that creates some helpers for accessing
the VLAN table. These helpers are also a bit more careful in that they
do not modify the ocelot->vlan_mask unless the hardware operation
succeeded.

Not all callers check the return value (the init code doesn't), but anyway.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: mscc: ocelot: transmit the VLAN filtering restrictions via extack
Vladimir Oltean [Thu, 19 Aug 2021 17:40:07 +0000 (20:40 +0300)]
net: mscc: ocelot: transmit the VLAN filtering restrictions via extack

We need to transmit more restrictions in future patches, convert this
one to netlink extack.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: mscc: ocelot: transmit the "native VLAN" error via extack
Vladimir Oltean [Thu, 19 Aug 2021 17:40:06 +0000 (20:40 +0300)]
net: mscc: ocelot: transmit the "native VLAN" error via extack

We need to reject some more configurations in future patches, convert
the existing one to netlink extack.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'ocelot-phylink-fixes'
David S. Miller [Fri, 20 Aug 2021 13:36:42 +0000 (14:36 +0100)]
Merge branch 'ocelot-phylink-fixes'

Vladimir Oltean says:

====================
Ocelot phylink fixes

This series addresses a regression reported by Horatiu which introduced
by the ocelot conversion to phylink: there are broken device trees in
the wild, and the driver fails to probe the entire switch when a port
fails to probe, which it previously did not do.

Continue probing even when some ports fail to initialize properly.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: mscc: ocelot: allow probing to continue with ports that fail to register
Vladimir Oltean [Thu, 19 Aug 2021 16:49:58 +0000 (19:49 +0300)]
net: mscc: ocelot: allow probing to continue with ports that fail to register

The existing ocelot device trees, like ocelot_pcb123.dts for example,
have SERDES ports (ports 4 and higher) that do not have status = "disabled";
but on the other hand do not have a phy-handle or a fixed-link either.

So from the perspective of phylink, they have broken DT bindings.

Since the blamed commit, probing for the entire switch will fail when
such a device tree binding is encountered on a port. There used to be
this piece of code which skipped ports without a phy-handle:

phy_node = of_parse_phandle(portnp, "phy-handle", 0);
if (!phy_node)
continue;

but now it is gone.

Anyway, fixed-link setups are a thing which should work out of the box
with phylink, so it would not be in the best interest of the driver to
add that check back.

Instead, let's look at what other drivers do. Since commit 86f8b1c01a0a
("net: dsa: Do not make user port errors fatal"), DSA continues after a
switch port fails to register, and works only with the ports that
succeeded.

We can achieve the same behavior in ocelot by unregistering the devlink
port for ports where ocelot_port_phylink_create() failed (called via
ocelot_probe_port), and clear the bit in devlink_ports_registered for
that port. This will make the next iteration reconsider the port that
failed to probe as an unused port, and re-register a devlink port of
type UNUSED for it. No other cleanup should need to be performed, since
ocelot_probe_port() should be self-contained when it fails.

Fixes: e6e12df625f2 ("net: mscc: ocelot: convert to phylink")
Reported-and-tested-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: mscc: ocelot: be able to reuse a devlink_port after teardown
Horatiu Vultur [Thu, 19 Aug 2021 16:49:57 +0000 (19:49 +0300)]
net: mscc: ocelot: be able to reuse a devlink_port after teardown

There are cases where we would like to continue probing the switch even
if one port has failed to probe. When that happens, we need to
unregister a devlink_port of type DEVLINK_PORT_FLAVOUR_PHYSICAL and
re-register it of type DEVLINK_PORT_FLAVOUR_UNUSED.

This is fine, except when calling devlink_port_attrs_set on a structure
on which devlink_port_register has been previously called, there is a
WARN_ON in devlink_port_attrs_set that devlink_port->devlink must be
NULL.

So don't assume that the memory behind dlp is clean when calling
ocelot_port_devlink_init, just zero-initialize it.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'dpaa2-switch-phylikn-fixes'
David S. Miller [Fri, 20 Aug 2021 13:34:49 +0000 (14:34 +0100)]
Merge branch 'dpaa2-switch-phylikn-fixes'

Vladimir Oltean says:

====================
dpaa2-switch phylink fixes

This is fixing two regressions introduced by the recent conversion of
the dpaa2-switch driver to phylink.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dpaa2-switch: call dpaa2_switch_port_disconnect_mac on probe error path
Vladimir Oltean [Thu, 19 Aug 2021 14:40:19 +0000 (17:40 +0300)]
net: dpaa2-switch: call dpaa2_switch_port_disconnect_mac on probe error path

Currently when probing returns an error, the netdev is freed but
phylink_disconnect is not called.

Create a common function between the unbind path and the error path,
call it the opposite of dpaa2_switch_probe_port: dpaa2_switch_remove_port,
and call it from both the unbind and the error path.

Fixes: 84cba72956fd ("dpaa2-switch: integrate the MAC endpoint support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dpaa2-switch: phylink_disconnect_phy needs rtnl_lock
Vladimir Oltean [Thu, 19 Aug 2021 14:40:18 +0000 (17:40 +0300)]
net: dpaa2-switch: phylink_disconnect_phy needs rtnl_lock

There is an ASSERT_RTNL in phylink_disconnect_phy which triggers
whenever dpaa2_switch_port_disconnect_mac is called.

To follow the pattern established by dpaa2_eth_disconnect_mac, take the
rtnl_mutex every time we call dpaa2_switch_port_disconnect_mac.

Fixes: 84cba72956fd ("dpaa2-switch: integrate the MAC endpoint support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'gmii2rgmii-loopback'
David S. Miller [Fri, 20 Aug 2021 13:31:46 +0000 (14:31 +0100)]
Merge branch 'gmii2rgmii-loopback'

Gerhard Engleder says:

====================
Add Xilinx GMII2RGMII loopback support

The Xilinx GMII2RGMII driver overrides PHY driver functions in order to
configure the device according to the link speed of the PHY attached to
it. This is implemented for a normal link but not for loopback.

Andrew told me to use phy_loopback and this changes make phy_loopback
work in combination with Xilinx GMII2RGMII.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phy: gmii2rgmii: Support PHY loopback
Gerhard Engleder [Thu, 19 Aug 2021 13:11:54 +0000 (15:11 +0200)]
net: phy: gmii2rgmii: Support PHY loopback

Configure speed if loopback is used. read_status is not called for
loopback.

Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phy: Uniform PHY driver access
Gerhard Engleder [Thu, 19 Aug 2021 13:11:53 +0000 (15:11 +0200)]
net: phy: Uniform PHY driver access

struct phy_device contains a pointer to the PHY driver and nearly
everywhere this pointer is used to access the PHY driver. Only
mdio_bus_phy_may_suspend() is still using to_phy_driver() instead of the
PHY driver pointer. Uniform PHY driver access by eliminating
to_phy_driver() use in mdio_bus_phy_may_suspend().

Only phy_bus_match() and phy_probe() are still using to_phy_driver(),
because PHY driver pointer is not available there.

Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phy: Support set_loopback override
Gerhard Engleder [Thu, 19 Aug 2021 13:11:52 +0000 (15:11 +0200)]
net: phy: Support set_loopback override

phy_read_status and various other PHY functions support PHY specific
overriding of driver functions by using a PHY specific pointer to the
PHY driver. Add support of PHY specific override to phy_loopback too.

Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'sparx5-dma'
David S. Miller [Fri, 20 Aug 2021 13:28:55 +0000 (14:28 +0100)]
Merge branch 'sparx5-dma'

Steen Hegelund says:

====================
Adding Frame DMA functionality to Sparx5

v2:
    Removed an unused variable (proc_ctrl) from sparx5_fdma_start.

This add frame DMA functionality to the Sparx5 platform.

Until now the Sparx5 SwitchDev driver has been using register based
injection and extraction when sending frames to/from the host CPU.

With this series the Frame DMA functionality now added.

The Frame DMA is only used if the Frame DMA interrupt is configured in the
device tree; otherwise the existing register based injection and extraction
is used.

The Sparx5 has two ports that can be used for sending and receiving frames,
but there are 8 channels that can be configured: 6 for injection and 2 for
extraction.

The additional channels can be used for more advanced scenarios e.g. where
virtual cores are used, but currently the driver only uses port 0 and
channel 0 and 6 respectively.

DCB (data control block) structures are passed to the Frame DMA with
suitable information about frame start/end etc, as well as pointers to DB
(data blocks) buffers.

The Frame DMA engine can use interrupts to signal back when the frames have
been injected or extracted.

There is a limitation on the DB alignment also for injection: Block must
start on 16byte boundaries, and this is why the driver currently copies the
data to into separate buffers.

The Sparx5 switch core needs a IFH (Internal Frame Header) to pass
information from the port to the switch core, and this header is added
before injection and stripped after extraction.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoarm64: dts: sparx5: Add the Sparx5 switch frame DMA support
Steen Hegelund [Thu, 19 Aug 2021 07:39:40 +0000 (09:39 +0200)]
arm64: dts: sparx5: Add the Sparx5 switch frame DMA support

This adds the interrupt for the Sparx5 Frame DMA.

If this configuration is present the Sparx5 SwitchDev driver will use the
Frame DMA feature, and if not it will use register based injection and
extraction for sending and receiving frames to the CPU.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: sparx5: switchdev: adding frame DMA functionality
Steen Hegelund [Thu, 19 Aug 2021 07:39:39 +0000 (09:39 +0200)]
net: sparx5: switchdev: adding frame DMA functionality

This add frame DMA functionality to the Sparx5 platform.

Ethernet frames can be extracted or injected autonomously to or from the
device’s DDR3/DDR3L memory and/or PCIe memory space. Linked list data
structures in memory are used for injecting or extracting Ethernet frames.
The FDMA generates interrupts when frame extraction or injection is done
and when the linked lists need updating.

The FDMA implements two extraction channels, one per switch core port
towards the VCore CPU system and a total of six injection channels.
Extraction channels are mapped one-to-one to the CPU ports, while injection
channels can be individually assigned to any CPU port.

- FDMA channel 0 through 5 corresponds to CPU port 0 injection direction
  FDMA_CH_CFG[channel].CH_INJ_PORT is set to 0.
- FDMA channel 0 through 5 corresponds to CPU port 1 injection direction when
  FDMA_CH_CFG[channel].CH_INJ_PORT is set to 1.
- FDMA channel 6 corresponds to CPU port 0 extraction direction.
- FDMA channel 7 corresponds to CPU port 1 extraction direction.

The FDMA implements a strict priority scheme among channels. Extraction
channels are prioritized over injection channels and secondarily channels
with higher channel number are prioritized over channels with lower number.
On the other hand, ports are being served on an equal-bandwidth principle
both on injection and extraction directions.  The equal-bandwidth principle
will not force an equal bandwidth. Instead, it ensures that the ports
perform at their best considering the operating conditions.

When more than one injection channel is enabled for injection on the same
CPU port, priority determines which channel can inject data. Ownership
is re-arbitrated on frame boundaries.

The FDMA processes linked lists of DMA Control Block Structures (DCBs). The
DCBs have the same basic structure for both injection and extraction. A DCB
must be placed on a 64-bit word-aligned address in memory. Each DCB has a
per-channel configurable amount of associated data blocks in memory, where
the frame data is stored.

The data blocks that are used by extraction channels must be placed on
64-bit word aligned addresses in memory, and their length must be a
multiple of 128 bytes.

A DCB carries the pointer to the next DCB of the linked list, the INFO word
which holds information for the DCB, and a pair of status word and memory
pointer for every data block that it is associated with.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge tag 'batadv-next-pullrequest-20210820' of git://git.open-mesh.org/linux-merge
David S. Miller [Fri, 20 Aug 2021 12:28:38 +0000 (13:28 +0100)]
Merge tag 'batadv-next-pullrequest-20210820' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
This (updated) cleanup patchset includes the following patches:

 - bump version strings, by Simon Wunderlich

 - update docs about move IRC channel away from freenode,
   by Sven Eckelmann (updated, added missing sign-off)

 - Switch to kstrtox.h for kstrtou64, by Sven Eckelmann

 - Update NULL checks, by Sven Eckelmann (2 patches)

 - remove remaining skb-copy calls for broadcast packets,
   by Linus Lüssing
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge tag 'mlx5-updates-2021-08-19' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Fri, 20 Aug 2021 11:49:04 +0000 (12:49 +0100)]
Merge tag 'mlx5-updates-2021-08-19' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2021-08-19

This series introduces the support for two new mlx5 features:

1) Sample offload for tunneled traffic
2) devlink rate objects support

1) From Chris Mi: Sample offload for tunneled traffic
=====================================================

Background and solution
-----------------------

Currently the sample offload actions send the encapsulated packet
to software. This series de-capsulates the packet before performing
the sampling and set the tunnel properties on the skb metadata
fields to make the behavior consistent with OVS sFlow.

If de-capsulating first, we can't use the same match like before in
default table. So instantiate a post action instance to continue
processing the action list. If HW can preserve reg_c, also use the
post action instance.

Post action infrastructure
--------------------------

Some tc actions are modeled in hardware using multiple tables
causing a tc action list split. For example, CT action is modeled
by jumping to a ct table which is controlled by nf flow table.
sFlow jumps in hardware to a sample table, which continues to a
"default table" where it should continue processing the action list.

Multi table actions are modeled in hardware using a unique fte_id.
The fte_id is set before jumping to a table. Split actions continue
to a post-action table where the matched fte_id value continues the
execution the tc action list.

This series also introduces post action infrastructure. Both ct and
sample use it.

Sample for tunnel in TC SW
--------------------------

tc filter add dev vxlan1 protocol ip parent ffff: prio 3 \
flower src_mac 24:25:d0:e1:00:00 dst_mac 02:25:d0:13:01:02 \
enc_src_ip 192.168.1.14 enc_dst_ip 192.168.1.13 \
enc_dst_port 4789 enc_key_id 4 \
action sample rate 1 group 6 \
action tunnel_key unset \
action mirred egress redirect dev enp4s0f0_1

MLX5 sample HW offload
----------------------

For the following typical flow table:

+-------------------------------+
+       original flow table     +
+-------------------------------+
+         original match        +
+-------------------------------+
+ sample action + other actions +
+-------------------------------+

We translate the tc filter with sample action to the following HW model:

        +---------------------+
        + original flow table +
        +---------------------+
        +   original match    +
        +---------------------+
              | set fte_id (if reg_c preserve cap)
              | do decap
              v
+------------------------------------------------+
+                Flow Sampler Object             +
+------------------------------------------------+
+                    sample ratio                +
+------------------------------------------------+
+    sample table id    |    default table id    +
+------------------------------------------------+
           |                            |
           v                            v
+-----------------------------+  +-------------------+
+        sample table         +  +   default table   +
+-----------------------------+  +-------------------+
+ forward to management vport +             |
+-----------------------------+             |
                                    +-------+------+
                                    |              |reg_c preserve cap
                                    |              |or decap action
                                    v              v
                       +-----------------+   +-------------+
                       + per vport table +   + post action +
                       +-----------------+   +-------------+
                       + original match  +
                       +-----------------+
                       + other actions   +
                       +-----------------+

2) From Dmytro Linkin: devlink rate object support for mlx5_core driver
=======================================================================

HIGH-LEVEL OVERVIEW

Devlink leaf rate objects created per vport (VF/SF, and PF on BlueField)
in switchdev mode on devlink port registration.
Implement devlink ops callbacks to create/destroy rate groups, set TX
rate values of the vport/group, assign vport to the group.
Driver accepts TX rate values as fraction of 1Mbps.

Refactor existing eswitch QoS infrastructure to be accessible by legacy
NDO rate API and new devlink rate API. NDO rate API is not
removed/disabled in switchdev mode to not break existing users. Rate
values configured with NDO rate API are not visible for devlink
infrastructure, therefore APIs should not be used simultaneously.

IMPLEMENTATION DETAILS

Driver provide two level rate hierarchy to manage bandwidth - group
level and vport level. Initially each vport added to internal unlimited
group created by default. Each rate element (vport or group) receive
bandwidth relative to its parent element (for groups the parent is a
physical link itself) in a Round Robin manner, where element get
bandwidth value according to its weight. Example:

Created four rate groups with tx_share limits:

$ devlink port function rate add \
    pci/0000:06:00.0/group_1 tx_share 30gbit
$ devlink port function rate add \
    pci/0000:06:00.0/group_2 tx_share 20gbit
$ devlink port function rate add \
    pci/0000:06:00.0/group_3 tx_share 20gbit
$ devlink port function rate add \
    pci/0000:06:00.0/group_4 tx_share 10gbit

Weights created in HW for each group are relative to the bigest tx_share
value, which is 30gbit:

<group_1> 1.0
<group_2> 0.67
<group_3> 0.67
<group_4> 0.33

Assuming link speed is 50 Gbit/sec and each group can sustain such
amount of traffic, maximum bandwidth is 50 / (1.0 + 0.67 + 0.67 + 0.33)
 = ~18.75 Gbit/sec. Normilized bandwidth values for groups:

<group_1> 18.75 * 1.0  = 18.75 Gbit/sec
<group_2> 18.75 * 0.67 = 12.5 Gbit/sec
<group_3> 18.75 * 0.67 = 12.5 Gbit/sec
<group_4> 18.75 * 0.33 = 6.25 Gbit/sec

If in example above group_1 doesn't produce any traffic, then maximum
bandwidth becomes 50 / (0.67 + 0.67 + 0.33) = ~30.0 Gbit/sec. Normalized
values:

<group_2> 30.0 * 0.67 = 20.0 Gbit/sec
<group_3> 30.0 * 0.67 = 20.0 Gbit/sec
<group_4> 30.0 * 0.33 = 10.0 Gbit/sec

Same normalization applied to each vport in the group.

Normalized values are internal, therefore driver provides QoS
tracepoints for next events:

* vport rate element creation/deletion:
* vport rate element configuration;
* group rate element creation/deletion;
* group rate element configuration.

PATCHES OVERVIEW

1 - Moving and isolation of eswitch QoS logic in separate file;

2 - Implement devlink leaf rate object support for vports;

3 - Implement rate groups creation/deletion;

4 - Implement TX rate management for the groups;

5 - Implement parent set for vports;

6 - Eswitch QoS tracepoints.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge tag 'for-net-next-2021-08-19' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Fri, 20 Aug 2021 11:16:05 +0000 (12:16 +0100)]
Merge tag 'for-net-next-2021-08-19' of git://git./linux/kernel/git/bluetooth/bluetooth-next

Luiz Augusto von Dentz says:

====================
bluetooth-next pull request for net-next:

 - Add support for Foxconn Mediatek Chip
 - Add support for LG LGSBWAC92/TWCM-K505D
 - hci_h5 flow control fixes and suspend support
 - Switch to use lock_sock for SCO and RFCOMM
 - Various fixes for extended advertising
 - Reword Intel's setup on btusb unifying the supported generations
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge tag 'batadv-next-pullrequest-20210819' of git://git.open-mesh.org/linux-merge
David S. Miller [Fri, 20 Aug 2021 11:04:50 +0000 (12:04 +0100)]
Merge tag 'batadv-next-pullrequest-20210819' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
This cleanup patchset includes the following patches:

 - bump version strings, by Simon Wunderlich

 - update docs about move IRC channel away from freenode,
   by Sven Eckelmann

 - Switch to kstrtox.h for kstrtou64, by Sven Eckelmann

 - Update NULL checks, by Sven Eckelmann (2 patches)

 - remove remaining skb-copy calls for broadcast packets,
   by Linus Lüssing
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agobatman-adv: bcast: remove remaining skb-copy calls
Linus Lüssing [Sun, 16 May 2021 22:33:09 +0000 (00:33 +0200)]
batman-adv: bcast: remove remaining skb-copy calls

We currently have two code paths for broadcast packets:

A) self-generated, via batadv_interface_tx()->
   batadv_send_bcast_packet().
B) received/forwarded, via batadv_recv_bcast_packet()->
   batadv_forw_bcast_packet().

For A), self-generated broadcast packets:

The only modifications to the skb data is the ethernet header which is
added/pushed to the skb in
batadv_send_broadcast_skb()->batadv_send_skb_packet(). However before
doing so, batadv_skb_head_push() is called which calls skb_cow_head() to
unshare the space for the to be pushed ethernet header. So for this
case, it is safe to use skb clones.

For B), received/forwarded packets:

The same applies as in A) for the to be forwarded packets. Only the
ethernet header is added. However after (queueing for) forwarding the
packet in batadv_recv_bcast_packet()->batadv_forw_bcast_packet(), a
packet is additionally decapsulated and is sent up the stack through
batadv_recv_bcast_packet()->batadv_interface_rx().

Protocols higher up the stack are already required to check if the
packet is shared and create a copy for further modifications. When the
next (protocol) layer works correctly, it cannot happen that it tries to
operate on the data behind the skb clone which is still queued up for
forwarding.

Co-authored-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2 years agobatman-adv: Drop NULL check before dropping references
Sven Eckelmann [Sun, 8 Aug 2021 17:11:08 +0000 (19:11 +0200)]
batman-adv: Drop NULL check before dropping references

The check if a batman-adv related object is NULL or not is now directly in
the batadv_*_put functions. It is not needed anymore to perform this check
outside these function:

The changes were generated using a coccinelle semantic patch:

  @@
  expression E;
  @@
  - if (likely(E != NULL))
  (
  batadv_backbone_gw_put
  |
  batadv_claim_put
  |
  batadv_dat_entry_put
  |
  batadv_gw_node_put
  |
  batadv_hardif_neigh_put
  |
  batadv_hardif_put
  |
  batadv_nc_node_put
  |
  batadv_nc_path_put
  |
  batadv_neigh_ifinfo_put
  |
  batadv_neigh_node_put
  |
  batadv_orig_ifinfo_put
  |
  batadv_orig_node_put
  |
  batadv_orig_node_vlan_put
  |
  batadv_softif_vlan_put
  |
  batadv_tp_vars_put
  |
  batadv_tt_global_entry_put
  |
  batadv_tt_local_entry_put
  |
  batadv_tt_orig_list_entry_put
  |
  batadv_tt_req_node_put
  |
  batadv_tvlv_container_put
  |
  batadv_tvlv_handler_put
  )(E);

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2 years agobatman-adv: Check ptr for NULL before reducing its refcnt
Sven Eckelmann [Sun, 8 Aug 2021 17:56:17 +0000 (19:56 +0200)]
batman-adv: Check ptr for NULL before reducing its refcnt

The commit b37a46683739 ("netdevice: add the case if dev is NULL") changed
the way how the NULL check for net_devices have to be handled when trying
to reduce its reference counter. Before this commit, it was the
responsibility of the caller to check whether the object is NULL or not.
But it was changed to behave more like kfree. Now the callee has to handle
the NULL-case.

The batman-adv code was scanned via cocinelle for similar places. These
were changed to use the paradigm

  @@
  identifier E, T, R, C;
  identifier put;
  @@
   void put(struct T *E)
   {
  + if (!E)
  + return;
   kref_put(&E->C, R);
   }

Functions which were used in other sources files were moved to the header
to allow the compiler to inline the NULL check and the kref_put call.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2 years agobatman-adv: Switch to kstrtox.h for kstrtou64
Sven Eckelmann [Fri, 23 Jul 2021 17:23:17 +0000 (19:23 +0200)]
batman-adv: Switch to kstrtox.h for kstrtou64

The commit 4c52729377ea ("kernel.h: split out kstrtox() and simple_strtox()
to a separate header") moved the kstrtou64 function to a new header called
linux/kstrtox.h.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2 years agobatman-adv: Move IRC channel to hackint.org
Sven Eckelmann [Sun, 13 Jun 2021 14:37:47 +0000 (16:37 +0200)]
batman-adv: Move IRC channel to hackint.org

Due to recent developments around the Freenode.org IRC network, the
opinions about the usage of this service shifted dramatically. The majority
of the still active users of the #batman channel prefers a move to the
hackint.org network.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2 years agonet/mlx5: E-switch, Add QoS tracepoints
Dmytro Linkin [Tue, 1 Jun 2021 10:24:00 +0000 (13:24 +0300)]
net/mlx5: E-switch, Add QoS tracepoints

Add tracepoints to log QoS enabling/disabling/configuration for vports
and rate groups.

Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Huy Nguyen <huyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: E-switch, Allow to add vports to rate groups
Dmytro Linkin [Tue, 1 Jun 2021 09:37:39 +0000 (12:37 +0300)]
net/mlx5: E-switch, Allow to add vports to rate groups

Implement eswitch API that allows updating rate groups. If group
pointer is NULL, then move the vport to internal unlimited group zero.

Implement devlink_ops->rate_parent_node_set() callback in the terms of
the new eswitch group update API.

Enable QoS for all group's elements if a group has allocated BW share.

Co-developed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Huy Nguyen <huyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: E-switch, Allow setting share/max tx rate limits of rate groups
Dmytro Linkin [Mon, 31 May 2021 15:19:50 +0000 (18:19 +0300)]
net/mlx5: E-switch, Allow setting share/max tx rate limits of rate groups

Provide eswitch API to allow controlling group rate limits. Use it to
implement devlink_ops->mlx5_devlink_rate_node_tx_{share|max}_set().

The share rate will create relative bandwidth share on the groups level
while within the group the user can set shared rate on the member vports
of that group and this rate will be relative to the group's share rate.
The group with the highest shared rate will get a BW share of 100 and
the rest of the groups will get a value that reflects the ratio between
their share rate and the maximum share rate.

Example:
Created four rate groups with tx_share limits:

$ devlink port function rate add \
    pci/0000:06:00.0/group_1 tx_share 30gbit
$ devlink port function rate add \
    pci/0000:06:00.0/group_2 tx_share 20gbit
$ devlink port function rate add \
    pci/0000:06:00.0/group_3 tx_share 20gbit
$ devlink port function rate add \
    pci/0000:06:00.0/group_4 tx_share 10gbit

Assuming link speed is 50 Gbit/sec ratio divider will be
50 / (30+20+20+10) = 0.625. Normalized rate values for the groups:

<group_1> 30 * 0.625 = 18.75 Gbit/sec
<group_2> 20 * 0.625 = 12.5 Gbit/sec
<group_3> 20 * 0.625 = 12.5 Gbit/sec
<group_4> 10 * 0.625 = 6.25 Gbit/sec

Rate group with unlimited tx_share rate will receive minimum BW value
(1Mbit/sec) if presented any group with tx_share rate limit. This allow
to not drop all packets in case of heavy traffic.

Co-developed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Huy Nguyen <huyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: E-switch, Introduce rate limiting groups API
Dmytro Linkin [Mon, 31 May 2021 14:08:14 +0000 (17:08 +0300)]
net/mlx5: E-switch, Introduce rate limiting groups API

Extend eswitch API with rate limiting groups:

- Define new struct mlx5_esw_rate_group that is used to hold all
  internal group data.

- Implement functions that allow creation, destruction and cleanup of
  groups.

- Assign all vports to internal unlimited zero group by default.

This commit lays the groundwork for group rate limiting by implementing
devlink_ops->rate_node_{new|del}() callbacks to support creating and
deleting groups through devlink rate node objects. APIs that allows
setting rates and adding/removing members are implemented in following
patches.

Co-developed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Huy Nguyen <huyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: E-switch, Enable devlink port tx_{share|max} rate control
Dmytro Linkin [Fri, 28 May 2021 14:42:15 +0000 (17:42 +0300)]
net/mlx5: E-switch, Enable devlink port tx_{share|max} rate control

Register devlink rate leaf object for every eswitch vport.
Implement devlink ops that enable setting shared and max tx rates
through devlink API.
Extract common eswitch code from existing tx rate set function that is
accessed through NDO to be reused for the devlink. Values configured
with NDO API are not visible for the devlink API, therefore shouldn't be
used simultaneously.

When normalizing the BW share value, dividing the desired minimum rate
by the common divider results in losing information since the quotient
is rounded down. This has a significant affect on configurations of low
rate where the round down eliminates a large percentage of the total
rate. To improve the formula, round up the division result to make sure
that the BW share is at least the value it was supposed to be and won't
lost a significant amount of the expected value.

Co-developed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Huy Nguyen <huyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: E-switch, Move QoS related code to dedicated file
Dmytro Linkin [Fri, 28 May 2021 14:30:22 +0000 (17:30 +0300)]
net/mlx5: E-switch, Move QoS related code to dedicated file

Move eswitch QoS related code into dedicated file. Provide eswitch API
to access this code meaning it is isolated and restricted to be used
only by eswitch.c. Exception is legacy NDO vf set rate, which moved to
esw/legacy.c.

Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Huy Nguyen <huyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5e: TC, Support sample offload action for tunneled traffic
Chris Mi [Mon, 21 Jun 2021 07:49:50 +0000 (10:49 +0300)]
net/mlx5e: TC, Support sample offload action for tunneled traffic

Currently the sample offload actions send the encapsulated packet
to software. This commit decapsulates the packet before performing
the sampling and set the tunnel properties on the skb metadata
fields to make the behavior consistent with OVS sFlow.

If decapsulating first, we can't use the same match like before in
default table. So instantiate a post action instance to continue
processing the action list. If HW can preserve reg_c, also use the
post action instance.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5e: TC, Restore tunnel info for sample offload
Chris Mi [Fri, 30 Apr 2021 07:17:33 +0000 (10:17 +0300)]
net/mlx5e: TC, Restore tunnel info for sample offload

Currently the sample offload actions send the encapsulated packet
to software. sFlow expects tunneled packets to be decapsulated while
having the tunnel properties on the skb metadata fields.

Reuse the functions used by connection tracking to map the outer
header properties to a unique id. The next patch  will use that id
to restore the tunnel information of decapsulated packets onto the
skb.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5e: TC, Remove CONFIG_NET_TC_SKB_EXT dependency when restoring tunnel
Chris Mi [Fri, 30 Apr 2021 09:08:40 +0000 (12:08 +0300)]
net/mlx5e: TC, Remove CONFIG_NET_TC_SKB_EXT dependency when restoring tunnel

CONFIG_NET_TC_SKB_EXT controls the SKB extension support for
restoring chain ids. SKB extension is not required for tunnel
restoration.

Remove the CONFIG_NET_TC_SKB_EXT dependency as a pre-step for
using the tunnel restore methods for sample offload use cases.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5e: Refactor ct to use post action infrastructure
Chris Mi [Tue, 17 Aug 2021 03:23:09 +0000 (11:23 +0800)]
net/mlx5e: Refactor ct to use post action infrastructure

Move post action table management to common library providing
add/del/get API. Refactor the ct action offload to use the common
API.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5e: Introduce post action infrastructure
Chris Mi [Mon, 16 Aug 2021 14:00:30 +0000 (22:00 +0800)]
net/mlx5e: Introduce post action infrastructure

Some tc actions are modeled in hardware using multiple tables
causing a tc action list split. For example, CT action is modeled
by jumping to a ct table which is controlled by nf flowtable.
sFlow jumps in hardware to a sample table, which continues to a
"default table" where it should continue processing the action list.

Multi table actions are modeled in hardware using a unique fte_id.
The fte_id is set before jumping to a table. Split actions continue
to a post-action table where the matched fte_id value continues the
execution the tc action list.

Currently the post-action design is implemented only by the ct
action. Introduce post action infrastructure as a pre-step for
reusing it with the sFlow offload feature. Init and destroy the
common post action table. Refactor the ct offload to use the
common post table infrastructure in the next patch.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5e: CT, Use xarray to manage fte ids
Chris Mi [Wed, 2 Jun 2021 03:23:08 +0000 (06:23 +0300)]
net/mlx5e: CT, Use xarray to manage fte ids

IDR is deprecated. Use xarray instead.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5e: Move sample attribute to flow attribute
Chris Mi [Wed, 18 Aug 2021 12:18:57 +0000 (20:18 +0800)]
net/mlx5e: Move sample attribute to flow attribute

Currently it is in eswitch attribute. Move it to flow attribute to
reflect the change in previous patch.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5e: Move esw/sample to en/tc/sample
Chris Mi [Wed, 18 Aug 2021 07:52:18 +0000 (15:52 +0800)]
net/mlx5e: Move esw/sample to en/tc/sample

Module sample belongs to en/tc instead of esw. Move it and rename
accordingly.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5e: Remove mlx5e dependency from E-Switch sample
Saeed Mahameed [Mon, 16 Aug 2021 18:58:19 +0000 (11:58 -0700)]
net/mlx5e: Remove mlx5e dependency from E-Switch sample

mlx5/esw/sample.c doesn't really need mlx5e_priv object, we can remove
this redundant dependency by passing the eswitch object directly to
the sample object constructor.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
2 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Fri, 20 Aug 2021 01:09:18 +0000 (18:09 -0700)]
Merge git://git./linux/kernel/git/netdev/net

drivers/ptp/Kconfig:
  55c8fca1dae1 ("ptp_pch: Restore dependency on PCI")
  e5f31552674e ("ethernet: fix PTP_1588_CLOCK dependencies")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 'net-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 19 Aug 2021 19:33:43 +0000 (12:33 -0700)]
Merge tag 'net-5.14-rc7' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Networking fixes, including fixes from bpf, wireless and mac80211
  trees.

  Current release - regressions:

   - tipc: call tipc_wait_for_connect only when dlen is not 0

   - mac80211: fix locking in ieee80211_restart_work()

  Current release - new code bugs:

   - bpf: add rcu_read_lock in bpf_get_current_[ancestor_]cgroup_id()

   - ethernet: ice: fix perout start time rounding

   - wwan: iosm: prevent underflow in ipc_chnl_cfg_get()

  Previous releases - regressions:

   - bpf: clear zext_dst of dead insns

   - sch_cake: fix srchost/dsthost hashing mode

   - vrf: reset skb conntrack connection on VRF rcv

   - net/rds: dma_map_sg is entitled to merge entries

  Previous releases - always broken:

   - ethernet: bnxt: fix Tx path locking and races, add Rx path
     barriers"

* tag 'net-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (42 commits)
  net: dpaa2-switch: disable the control interface on error path
  Revert "flow_offload: action should not be NULL when it is referenced"
  iavf: Fix ping is lost after untrusted VF had tried to change MAC
  i40e: Fix ATR queue selection
  r8152: fix the maximum number of PLA bp for RTL8153C
  r8152: fix writing USB_BP2_EN
  mptcp: full fully established support after ADD_ADDR
  mptcp: fix memory leak on address flush
  net/rds: dma_map_sg is entitled to merge entries
  net: mscc: ocelot: allow forwarding from bridge ports to the tag_8021q CPU port
  net: asix: fix uninit value bugs
  ovs: clear skb->tstamp in forwarding path
  net: mdio-mux: Handle -EPROBE_DEFER correctly
  net: mdio-mux: Don't ignore memory allocation errors
  net: mdio-mux: Delete unnecessary devm_kfree
  net: dsa: sja1105: fix use-after-free after calling of_find_compatible_node, or worse
  sch_cake: fix srchost/dsthost hashing mode
  ixgbe, xsk: clean up the resources in ixgbe_xsk_pool_enable error path
  net: qlcnic: add missed unlock in qlcnic_83xx_flash_read32
  mac80211: fix locking in ieee80211_restart_work()
  ...

2 years agoMerge tag 'platform-drivers-x86-v5.14-4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 19 Aug 2021 19:19:58 +0000 (12:19 -0700)]
Merge tag 'platform-drivers-x86-v5.14-4' of git://git./linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Hans de Goede:

 - Enable SW_TABLET_MODE support for the TP200s

 - Enable WMI on two more Gigabyte motherboards

* tag 'platform-drivers-x86-v5.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: gigabyte-wmi: add support for B450M S2H V2
  platform/x86: gigabyte-wmi: add support for X570 GAMING X
  platform/x86: asus-nb-wmi: Add tablet_mode_sw=lid-flip quirk for the TP200s
  platform/x86: asus-nb-wmi: Allow configuring SW_TABLET_MODE method with a module option

2 years agoocteontx2-af: remove redudant second error check on variable err
Colin Ian King [Wed, 18 Aug 2021 13:09:27 +0000 (14:09 +0100)]
octeontx2-af: remove redudant second error check on variable err

A recent change added error checking messages and failed to remove one
of the previous error checks. There are now two checks on variable err
so the second one is redundant dead code and can be removed.

Addresses-Coverity: ("Logically dead code")
Fixes: a83bdada06bf ("octeontx2-af: Add debug messages for failures")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210818130927.33895-1-colin.king@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 'linux-can-next-for-5.15-20210819' of git://git.kernel.org/pub/scm/linux...
Jakub Kicinski [Thu, 19 Aug 2021 18:51:13 +0000 (11:51 -0700)]
Merge tag 'linux-can-next-for-5.15-20210819' of git://git./linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
linux-can-next-for-5.15-20210819

The first patch is by me, for the mailmap file and maps the email
address of two former ESD employees to a newly created role account.

The next 3 patches are by Oleksij Rempel and add support for GPIO
based switchable CAN bus termination.

The next 3 patches are by Vincent Mailhol. The first one changes the
CAN netlink interface to not bail out if the user switched off
unsupported features. The next one adds Vincent as the maintainer of
the etas_es58x driver and the last one cleans up the documentation of
struct es58x_fd_tx_conf_msg.

The next patch is by me, for the mcp251xfd driver and marks some
instances of struct mcp251xfd_priv as const. Lad Prabhakar contributes
2 patches for the rcar_canfd driver, that add support for RZ/G2L
family.

The next 5 patches target the m_can/tcan45x5 driver. 2 are by me an
fix trivial checkpatch warnings. The remaining 3 patches are by Matt
Kline and improve the performance on the SPI based tcan4x5x chip by
batching FIFO reads and writes.

The last 7 patches are for the c_can driver. Dario Binacchi's patch
converts the DT bindings to yaml, 2 patches by me fix a typo and
rename a macro to properly represent the usage. The last 4 patches are
again by Dario Binacchi and provide a performance improvement for the
TX path by operating the TX mailboxes as a true FIFO.

* tag 'linux-can-next-for-5.15-20210819' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: (22 commits)
  can: c_can: cache frames to operate as a true FIFO
  can: c_can: support tx ring algorithm
  can: c_can: exit c_can_do_tx() early if no frames have been sent
  can: c_can: remove struct c_can_priv::priv field
  can: c_can: rename IF_RX -> IF_NAPI
  can: c_can: c_can_do_tx(): fix typo in comment
  dt-bindings: net: can: c_can: convert to json-schema
  can: m_can: Batch FIFO writes during CAN transmit
  can: m_can: Batch FIFO reads during CAN receive
  can: m_can: Disable IRQs on FIFO bus errors
  can: m_can: fix block comment style
  can: tcan4x5x: cdev_to_priv(): remove stray empty line
  can: rcar_canfd: Add support for RZ/G2L family
  dt-bindings: net: can: renesas,rcar-canfd: Document RZ/G2L SoC
  can: mcp251xfd: mark some instances of struct mcp251xfd_priv as const
  can: etas_es58x: clean-up documentation of struct es58x_fd_tx_conf_msg
  MAINTAINERS: add Vincent MAILHOL as maintainer for the ETAS ES58X CAN/USB driver
  can: netlink: allow user to turn off unsupported features
  can: dev: provide optional GPIO based termination support
  dt-bindings: can: fsl,flexcan: enable termination-* bindings
  ...
====================

Link: https://lore.kernel.org/r/20210819133913.657715-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: dpaa2-switch: disable the control interface on error path
Vladimir Oltean [Thu, 19 Aug 2021 14:17:55 +0000 (17:17 +0300)]
net: dpaa2-switch: disable the control interface on error path

Currently dpaa2_switch_takedown has a funny name and does not do the
opposite of dpaa2_switch_init, which makes probing fail when we need to
handle an -EPROBE_DEFER.

A sketch of what dpaa2_switch_init does:

dpsw_open

dpaa2_switch_detect_features

dpsw_reset

for (i = 0; i < ethsw->sw_attr.num_ifs; i++) {
dpsw_if_disable

dpsw_if_set_stp

dpsw_vlan_remove_if_untagged

dpsw_if_set_tci

dpsw_vlan_remove_if
}

dpsw_vlan_remove

alloc_ordered_workqueue

dpsw_fdb_remove

dpaa2_switch_ctrl_if_setup

When dpaa2_switch_takedown is called from the error path of
dpaa2_switch_probe(), the control interface, enabled by
dpaa2_switch_ctrl_if_setup from dpaa2_switch_init, remains enabled,
because dpaa2_switch_takedown does not call
dpaa2_switch_ctrl_if_teardown.

Since dpaa2_switch_probe might fail due to EPROBE_DEFER of a PHY, this
means that a second probe of the driver will happen with the control
interface directly enabled.

This will trigger a second error:

[   93.273528] fsl_dpaa2_switch dpsw.0: dpsw_ctrl_if_set_pools() failed
[   93.281966] fsl_dpaa2_switch dpsw.0: fsl_mc_driver_probe failed: -13
[   93.288323] fsl_dpaa2_switch: probe of dpsw.0 failed with error -13

Which if we investigate the /dev/dpaa2_mc_console log, we find out is
caused by:

[E, ctrl_if_set_pools:2211, DPMNG]  ctrl_if must be disabled

So make dpaa2_switch_takedown do the opposite of dpaa2_switch_init (in
reasonable limits, no reason to change STP state, re-add VLANs etc), and
rename it to something more conventional, like dpaa2_switch_teardown.

Fixes: 613c0a5810b7 ("staging: dpaa2-switch: enable the control interface")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://lore.kernel.org/r/20210819141755.1931423-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoRevert "flow_offload: action should not be NULL when it is referenced"
Ido Schimmel [Thu, 19 Aug 2021 10:58:42 +0000 (13:58 +0300)]
Revert "flow_offload: action should not be NULL when it is referenced"

This reverts commit 9ea3e52c5bc8bb4a084938dc1e3160643438927a.

Cited commit added a check to make sure 'action' is not NULL, but
'action' is already dereferenced before the check, when calling
flow_offload_has_one_action().

Therefore, the check does not make any sense and results in a smatch
warning:

include/net/flow_offload.h:322 flow_action_mixed_hw_stats_check() warn:
variable dereferenced before check 'action' (see line 319)

Fix by reverting this commit.

Cc: gushengxian <gushengxian@yulong.com>
Fixes: 9ea3e52c5bc8 ("flow_offload: action should not be NULL when it is referenced")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20210819105842.1315705-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'intel-wired-lan-driver-updates-2021-08-18'
Jakub Kicinski [Thu, 19 Aug 2021 16:56:42 +0000 (09:56 -0700)]
Merge branch 'intel-wired-lan-driver-updates-2021-08-18'

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2021-08-18

This series contains updates to i40e and iavf drivers.

Arkadiusz fixes Flow Director not using the correct queue due to calling
the wrong pick Tx function for i40e.

Sylwester resolves traffic loss for iavf when it attempts to change its
MAC address when it does not have permissions to do so.
====================

Link: https://lore.kernel.org/r/20210818174217.4138922-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoiavf: Fix ping is lost after untrusted VF had tried to change MAC
Sylwester Dziedziuch [Wed, 18 Aug 2021 17:42:17 +0000 (10:42 -0700)]
iavf: Fix ping is lost after untrusted VF had tried to change MAC

Make changes to MAC address dependent on the response of PF.
Disallow changes to HW MAC address and MAC filter from untrusted
VF, thanks to that ping is not lost if VF tries to change MAC.
Add a new field in iavf_mac_filter, to indicate whether there
was response from PF for given filter. Based on this field pass
or discard the filter.
If untrusted VF tried to change it's address, it's not changed.
Still filter was changed, because of that ping couldn't go through.

Fixes: c5c922b3e09b ("iavf: fix MAC address setting for VFs when filter is rejected")
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Gurucharan G <Gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoi40e: Fix ATR queue selection
Arkadiusz Kubalewski [Wed, 18 Aug 2021 17:42:16 +0000 (10:42 -0700)]
i40e: Fix ATR queue selection

Without this patch, ATR does not work. Receive/transmit uses queue
selection based on SW DCB hashing method.

If traffic classes are not configured for PF, then use
netdev_pick_tx function for selecting queue for packet transmission.
Instead of calling i40e_swdcb_skb_tx_hash, call netdev_pick_tx,
which ensures that packet is transmitted/received from CPU that is
running the application.

Reproduction steps:
1. Load i40e driver
2. Map each MSI interrupt of i40e port for each CPU
3. Disable ntuple, enable ATR i.e.:
ethtool -K $interface ntuple off
ethtool --set-priv-flags $interface flow-director-atr
4. Run application that is generating traffic and is bound to a
single CPU, i.e.:
taskset -c 9 netperf -H 1.1.1.1 -t TCP_RR -l 10
5. Observe behavior:
Application's traffic should be restricted to the CPU provided in
taskset.

Fixes: 89ec1f0886c1 ("i40e: Fix queue-to-TC mapping on Tx")
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Jakub Kicinski [Thu, 19 Aug 2021 15:58:16 +0000 (08:58 -0700)]
Merge https://git./linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2021-08-19

We've added 3 non-merge commits during the last 3 day(s) which contain
a total of 3 files changed, 29 insertions(+), 6 deletions(-).

The main changes are:

1) Fix to clear zext_dst for dead instructions which was causing invalid program
   rejections on JITs with bpf_jit_needs_zext such as s390x, from Ilya Leoshkevich.

2) Fix RCU splat in bpf_get_current_{ancestor_,}cgroup_id() helpers when they are
   invoked from sleepable programs, from Yonghong Song.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests, bpf: Test that dead ldx_w insns are accepted
  bpf: Clear zext_dst of dead insns
  bpf: Add rcu_read_lock in bpf_get_current_[ancestor_]cgroup_id() helpers
====================

Link: https://lore.kernel.org/r/20210819144904.20069-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoBluetooth: Fix return value in hci_dev_do_close()
Kangmin Park [Thu, 19 Aug 2021 15:27:18 +0000 (00:27 +0900)]
Bluetooth: Fix return value in hci_dev_do_close()

hci_error_reset() return without calling hci_dev_do_open() when
hci_dev_do_close() return error value which is not 0.

Also, hci_dev_close() return hci_dev_do_close() function's return
value.

But, hci_dev_do_close() return always 0 even if hdev->shutdown
return error value. So, fix hci_dev_do_close() to save and return
the return value of the hdev->shutdown when it is called.

Signed-off-by: Kangmin Park <l4stpr0gr4m@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2 years agoBluetooth: add timeout sanity check to hci_inquiry
Pavel Skripkin [Thu, 19 Aug 2021 15:15:21 +0000 (18:15 +0300)]
Bluetooth: add timeout sanity check to hci_inquiry

Syzbot hit "task hung" bug in hci_req_sync(). The problem was in
unreasonable huge inquiry timeout passed from userspace.
Fix it by adding sanity check for timeout value to hci_inquiry().

Since hci_inquiry() is the only user of hci_req_sync() with user
controlled timeout value, it makes sense to check timeout value in
hci_inquiry() and don't touch hci_req_sync().

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-and-tested-by: syzbot+be2baed593ea56c6a84c@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2 years agoBluetooth: btusb: Remove WAKEUP_DISABLE and add WAKEUP_AUTOSUSPEND for Realtek devices
Max Chou [Tue, 17 Aug 2021 03:03:12 +0000 (11:03 +0800)]
Bluetooth: btusb: Remove WAKEUP_DISABLE and add WAKEUP_AUTOSUSPEND for Realtek devices

For the commit of 9e45524a011107a73bc2cdde8370c61e82e93a4d, wakeup is
always disabled for Realtek devices. However, there's the capability
for Realtek devices to apply USB wakeup.

In this commit, remove WAKEUP_DISABLE feature for Realtek devices.
If users would switch wakeup, they should access
"/sys/bus/usb/.../power/wakeup"

In this commit, it also adds the feature as WAKEUP_AUTOSUSPEND
for Realtek devices because it should set do_remote_wakeup on autosuspend.

Signed-off-by: Max Chou <max.chou@realtek.com>
Tested-by: Hilda Wu <hildawu@realtek.com>
Reviewed-by: Archie Pusaka <apusaka@chromium.org>
Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2 years agoBluetooth: mgmt: Pessimize compile-time bounds-check
Kees Cook [Wed, 18 Aug 2021 04:39:12 +0000 (21:39 -0700)]
Bluetooth: mgmt: Pessimize compile-time bounds-check

After gaining __alloc_size hints, GCC thinks it can reach a memcpy()
with eir_len == 0 (since it can't see into the rewrite of status).
Instead, check eir_len == 0, avoiding this future warning:

In function 'eir_append_data',
    inlined from 'read_local_oob_ext_data_complete' at net/bluetooth/mgmt.c:7210:12:
./include/linux/fortify-string.h:54:29: warning: '__builtin_memcpy' offset 5 is out of the bounds [0, 3] [-Warray-bounds]
...
net/bluetooth/hci_request.h:133:2: note: in expansion of macro 'memcpy'
  133 |  memcpy(&eir[eir_len], data, data_len);
      |  ^~~~~~

Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Cc: Luiz Augusto von Dentz <luiz.dentz@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: linux-bluetooth@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2 years agocan: c_can: cache frames to operate as a true FIFO
Dario Binacchi [Sat, 7 Aug 2021 13:08:00 +0000 (15:08 +0200)]
can: c_can: cache frames to operate as a true FIFO

As reported by a comment in the c_can_start_xmit() this was not a FIFO.
C/D_CAN controller sends out the buffers prioritized so that the lowest
buffer number wins.

What did c_can_start_xmit() do if head was less tail in the tx ring ? It
waited until all the frames queued in the FIFO was actually transmitted
by the controller before accepting a new CAN frame to transmit, even if
the FIFO was not full, to ensure that the messages were transmitted in
the order in which they were loaded.

By storing the frames in the FIFO without requiring its transmission, we
will be able to use the full size of the FIFO even in cases such as the
one described above. The transmission interrupt will trigger their
transmission only when all the messages previously loaded but stored in
less priority positions of the buffers have been transmitted.

Link: https://lore.kernel.org/r/20210807130800.5246-5-dariobin@libero.it
Suggested-by: Gianluca Falavigna <gianluca.falavigna@inwind.it>
Signed-off-by: Dario Binacchi <dariobin@libero.it>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agocan: c_can: support tx ring algorithm
Dario Binacchi [Sat, 7 Aug 2021 13:07:59 +0000 (15:07 +0200)]
can: c_can: support tx ring algorithm

The algorithm is already used successfully by other CAN drivers
(e.g. mcp251xfd). Its implementation was kindly suggested to me by
Marc Kleine-Budde following a patch I had previously submitted. You can
find every detail at https://lore.kernel.org/patchwork/patch/1422929/.

The idea is that after this patch, it will be easier to patch the driver
to use the message object memory as a true FIFO.

Link: https://lore.kernel.org/r/20210807130800.5246-4-dariobin@libero.it
Suggested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Dario Binacchi <dariobin@libero.it>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agocan: c_can: exit c_can_do_tx() early if no frames have been sent
Dario Binacchi [Sat, 7 Aug 2021 13:07:58 +0000 (15:07 +0200)]
can: c_can: exit c_can_do_tx() early if no frames have been sent

The c_can_poll() handles RX/TX events unconditionally. It may therefore
happen that c_can_do_tx() is called unnecessarily because the interrupt
was triggered by the reception of a frame. In these cases, we avoid to
execute unnecessary statements and exit immediately.

Link: https://lore.kernel.org/r/20210807130800.5246-3-dariobin@libero.it
Signed-off-by: Dario Binacchi <dariobin@libero.it>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agocan: c_can: remove struct c_can_priv::priv field
Dario Binacchi [Sat, 7 Aug 2021 13:07:57 +0000 (15:07 +0200)]
can: c_can: remove struct c_can_priv::priv field

It references the clock but it is never used. So let's remove it.

Link: https://lore.kernel.org/r/20210807130800.5246-2-dariobin@libero.it
Signed-off-by: Dario Binacchi <dariobin@libero.it>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agocan: c_can: rename IF_RX -> IF_NAPI
Marc Kleine-Budde [Mon, 9 Aug 2021 07:26:35 +0000 (09:26 +0200)]
can: c_can: rename IF_RX -> IF_NAPI

The C_CAN/D_CAN cores implement 2 interfaces to manage the message
objects. To avoid concurrency and the need for locking one interface
is used in the TX path (IF_TX). While the other one, named IF_RX is
used from NAPI context only. As this interface is not only used to
manage RX, but also TX message objects, this patch renames IF_RX to
IF_NAPI.

Link: https://lore.kernel.org/r/20210809080608.171545-1-mkl@pengutronix.de
Cc: Dario Binacchi <dariobin@libero.it>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agocan: c_can: c_can_do_tx(): fix typo in comment
Marc Kleine-Budde [Fri, 6 Aug 2021 08:49:52 +0000 (10:49 +0200)]
can: c_can: c_can_do_tx(): fix typo in comment

This patch fixes a typo in the comment in c_can_do_tx().

Fixes: eddf67115040 ("can: c_can: add a comment about IF_RX interface's use")
Link: https://lore.kernel.org/r/20210806105127.103302-1-mkl@pengutronix.de
Cc: Dario Binacchi <dariobin@libero.it>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agodt-bindings: net: can: c_can: convert to json-schema
Dario Binacchi [Thu, 5 Aug 2021 19:27:50 +0000 (21:27 +0200)]
dt-bindings: net: can: c_can: convert to json-schema

Convert the Bosch C_CAN/D_CAN controller device tree binding
documentation to json-schema.

Document missing properties.
Remove "ti,hwmods" as it is no longer used in TI dts.
Make "clocks" required as it is used in all dts.
Update the examples.

Link: https://lore.kernel.org/r/20210805192750.9051-1-dariobin@libero.it
Signed-off-by: Dario Binacchi <dariobin@libero.it>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agocan: m_can: Batch FIFO writes during CAN transmit
Matt Kline [Tue, 17 Aug 2021 05:08:53 +0000 (22:08 -0700)]
can: m_can: Batch FIFO writes during CAN transmit

Give FIFO writes the same treatment as reads to avoid fixed costs of
individual transfers on a slow bus (e.g., tcan4x5x).

Link: https://lore.kernel.org/r/20210817050853.14875-4-matt@bitbashing.io
Signed-off-by: Matt Kline <matt@bitbashing.io>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agocan: m_can: Batch FIFO reads during CAN receive
Matt Kline [Tue, 17 Aug 2021 05:08:52 +0000 (22:08 -0700)]
can: m_can: Batch FIFO reads during CAN receive

On peripherals communicating over a relatively slow SPI line
(e.g. tcan4x5x), individual transfers have high fixed costs.
This causes the driver to spend most of its time waiting between
transfers and severely limits throughput.

Reduce these overheads by reading more than one word at a time.
Writing could get a similar treatment in follow-on commits.

Link: https://lore.kernel.org/r/20210817050853.14875-3-matt@bitbashing.io
Signed-off-by: Matt Kline <matt@bitbashing.io>
[mkl: remove __packed from struct id_and_dlc]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agocan: m_can: Disable IRQs on FIFO bus errors
Matt Kline [Tue, 17 Aug 2021 05:08:51 +0000 (22:08 -0700)]
can: m_can: Disable IRQs on FIFO bus errors

If FIFO reads or writes fail due to the underlying regmap (e.g., SPI)
I/O, propagate that up to the m_can driver, log an error, and disable
interrupts, similar to the mcp251xfd driver.

While reworking the FIFO functions to add this error handling,
add support for bulk reads and writes of multiple registers.

Link: https://lore.kernel.org/r/20210817050853.14875-2-matt@bitbashing.io
Signed-off-by: Matt Kline <matt@bitbashing.io>
[mkl: re-wrap long lines, remove WARN_ON, convert to netdev block comments]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agocan: m_can: fix block comment style
Marc Kleine-Budde [Thu, 19 Aug 2021 10:20:03 +0000 (12:20 +0200)]
can: m_can: fix block comment style

This patch fixes the commenting style in the m_can driver.

Fixes: 1be37d3b0414 ("can: m_can: fix periph RX path: use rx-offload to ensure skbs are sent from softirq context")
Fixes: df06fd678260 ("can: m_can: m_can_chip_config(): enable and configure internal timestamps")
Link: https://lore.kernel.org/r/20210819111703.599686-2-mkl@pengutronix.de
Cc: Chandrasekar Ramakrishnan <rcsekar@samsung.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agocan: tcan4x5x: cdev_to_priv(): remove stray empty line
Marc Kleine-Budde [Thu, 19 Aug 2021 09:48:45 +0000 (11:48 +0200)]
can: tcan4x5x: cdev_to_priv(): remove stray empty line

This patch removes a stray empty line in the cdev_to_priv() function.

Fixes: ac33ffd3e2b0 ("can: m_can: let m_can_class_allocate_dev() allocate driver specific private data")
Link: https://lore.kernel.org/r/20210819111703.599686-1-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agocan: rcar_canfd: Add support for RZ/G2L family
Lad Prabhakar [Tue, 27 Jul 2021 13:30:21 +0000 (14:30 +0100)]
can: rcar_canfd: Add support for RZ/G2L family

CANFD block on RZ/G2L SoC is almost identical to one found on
R-Car Gen3 SoC's. On RZ/G2L SoC interrupt sources for each channel
are split into different sources and the IP doesn't divide (1/2)
CANFD clock within the IP.

This patch adds compatible string for RZ/G2L family and splits
the irq handlers to accommodate both RZ/G2L and R-Car Gen3 SoC's.

Link: https://lore.kernel.org/r/20210727133022.634-3-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Biju Das <biju.das.jz@bp.renesas.com>
[mkl: fixed typo: recieve -> receive, thanks Geert]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agodt-bindings: net: can: renesas,rcar-canfd: Document RZ/G2L SoC
Lad Prabhakar [Tue, 27 Jul 2021 13:30:20 +0000 (14:30 +0100)]
dt-bindings: net: can: renesas,rcar-canfd: Document RZ/G2L SoC

Add CANFD binding documentation for Renesas RZ/G2L SoC.

Link: https://lore.kernel.org/r/20210727133022.634-2-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agocan: mcp251xfd: mark some instances of struct mcp251xfd_priv as const
Marc Kleine-Budde [Mon, 9 Aug 2021 18:31:54 +0000 (20:31 +0200)]
can: mcp251xfd: mark some instances of struct mcp251xfd_priv as const

With the patch 07ff4aed015c ("time/timecounter: Mark 1st argument of
timecounter_cyc2time() as const") some instances of the struct
mcp251xfd_priv can be marked as const. This patch marks these as
const.

Link: https://lore.kernel.org/r/20210813091027.159379-1-mkl@pengutronix.de
Cc: Manivannan Sadhasivam <mani@kernel.org>
Cc: Thomas Kopp <thomas.kopp@microchip.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agocan: etas_es58x: clean-up documentation of struct es58x_fd_tx_conf_msg
Vincent Mailhol [Sun, 15 Aug 2021 03:32:48 +0000 (12:32 +0900)]
can: etas_es58x: clean-up documentation of struct es58x_fd_tx_conf_msg

The documentation of struct es58x_fd_tx_conf_msg explains in details
the different TDC parameters. However, those description are redundant
with the documentation of struct can_tdc.

Remove most of the description.

Also, fixes a typo in the reference to the datasheet (E701 -> E70).

Link: https://lore.kernel.org/r/20210815033248.98111-8-mailhol.vincent@wanadoo.fr
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agoMAINTAINERS: add Vincent MAILHOL as maintainer for the ETAS ES58X CAN/USB driver
Vincent Mailhol [Sat, 14 Aug 2021 09:33:53 +0000 (18:33 +0900)]
MAINTAINERS: add Vincent MAILHOL as maintainer for the ETAS ES58X CAN/USB driver

Adding myself (Vincent Mailhol) as a maintainer for the ETAS ES58X
CAN/USB driver.

Link: https://lore.kernel.org/r/20210814093353.74391-1-mailhol.vincent@wanadoo.fr
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agocan: netlink: allow user to turn off unsupported features
Vincent Mailhol [Sun, 15 Aug 2021 03:32:42 +0000 (12:32 +0900)]
can: netlink: allow user to turn off unsupported features

The sanity checks on the control modes will reject any request related
to an unsupported features, even turning it off.

Example on an interface which does not support CAN-FD:

$ ip link set can0 type can bitrate 500000 fd off
RTNETLINK answers: Operation not supported

This patch lets such command go through (but requests to turn on an
unsupported feature are, of course, still denied).

Link: https://lore.kernel.org/r/20210815033248.98111-2-mailhol.vincent@wanadoo.fr
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agocan: dev: provide optional GPIO based termination support
Oleksij Rempel [Wed, 18 Aug 2021 07:12:32 +0000 (09:12 +0200)]
can: dev: provide optional GPIO based termination support

For CAN buses to work, a termination resistor has to be present at both
ends of the bus. This resistor is usually 120 Ohms, other values may be
required for special bus topologies.

This patch adds support for a generic GPIO based CAN termination. The
resistor value has to be specified via device tree, and it can only be
attached to or detached from the bus. By default the termination is not
active.

Link: https://lore.kernel.org/r/20210818071232.20585-4-o.rempel@pengutronix.de
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agodt-bindings: can: fsl,flexcan: enable termination-* bindings
Oleksij Rempel [Wed, 18 Aug 2021 07:12:31 +0000 (09:12 +0200)]
dt-bindings: can: fsl,flexcan: enable termination-* bindings

Enable termination-* binding and provide validation example for it.

Link: https://lore.kernel.org/r/20210818071232.20585-3-o.rempel@pengutronix.de
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agodt-bindings: can-controller: add support for termination-gpios
Oleksij Rempel [Wed, 18 Aug 2021 07:12:30 +0000 (09:12 +0200)]
dt-bindings: can-controller: add support for termination-gpios

Some boards provide GPIO controllable termination resistor. Provide
binding to make use of it.

Link: https://lore.kernel.org/r/20210818071232.20585-2-o.rempel@pengutronix.de
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2 years agonet: ethernet: ti: cpsw: make array stpa static const, makes object smaller
Colin Ian King [Thu, 19 Aug 2021 12:04:43 +0000 (13:04 +0100)]
net: ethernet: ti: cpsw: make array stpa static const, makes object smaller

Don't populate the array stpa on the stack but instead it
static const. Makes the object code smaller by 81 bytes:

Before:
   text    data   bss    dec    hex filename
  54993   17248     0  72241  11a31 ./drivers/net/ethernet/ti/cpsw_new.o

After:
   text    data   bss    dec    hex filename
  54784   17376     0  72160  119e0 ./drivers/net/ethernet/ti/cpsw_new.o

(gcc version 10.3.0)

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hns3: make array spec_opcode static const, makes object smaller
Colin Ian King [Thu, 19 Aug 2021 11:58:13 +0000 (12:58 +0100)]
net: hns3: make array spec_opcode static const, makes object smaller

Don't populate the array spec_opcode on the stack but instead it
static const. Makes the object code smaller by 158 bytes:

Before:
   text   data   bss     dec    hex filename
  12271   3976   128   16375   3ff7 .../hisilicon/hns3/hns3pf/hclge_cmd.o

After:
   text   data   bss     dec    hex filename
  12017   4072   128   16217   3f59 .../hisilicon/hns3/hns3pf/hclge_cmd.o

(gcc version 10.3.0)

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agohinic: make array speeds static const, makes object smaller
Colin Ian King [Thu, 19 Aug 2021 11:52:53 +0000 (12:52 +0100)]
hinic: make array speeds static const, makes object smaller

Don't populate the array speeds on the stack but instead it
static const. Makes the object code smaller by 17 bytes:

Before:
   text    data     bss     dec     hex filename
  39987   14200      64   54251    d3eb .../huawei/hinic/hinic_sriov.o

After:
   text    data     bss     dec     hex filename
  39906   14264      64   54234    d3da .../huawei/hinic/hinic_sriov.o

(gcc version 10.3.0)

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'indirect-qdisc-order'
David S. Miller [Thu, 19 Aug 2021 12:19:30 +0000 (13:19 +0100)]
Merge branch 'indirect-qdisc-order'

Eli Cohen says:

====================
Indirect dev ingress qdisc creation order

The first patch is just a cleanup of the code.
The second patch is fixing the dependency in ingress qdisc creation
relative to offloading driver registration to filter configurations.

v1 -> v2:
Fix warning - variable set but not used
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: Fix offloading indirect devices dependency on qdisc order creation
Eli Cohen [Tue, 17 Aug 2021 17:05:18 +0000 (20:05 +0300)]
net: Fix offloading indirect devices dependency on qdisc order creation

Currently, when creating an ingress qdisc on an indirect device before
the driver registered for callbacks, the driver will not have a chance
to register its filter configuration callbacks.

To fix that, modify the code such that it keeps track of all the ingress
qdiscs that call flow_indr_dev_setup_offload(). When a driver calls
flow_indr_dev_register(),  go through the list of tracked ingress qdiscs
and call the driver callback entry point so as to give it a chance to
register its callback.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet/core: Remove unused field from struct flow_indr_dev
Eli Cohen [Tue, 17 Aug 2021 17:05:17 +0000 (20:05 +0300)]
net/core: Remove unused field from struct flow_indr_dev

rcu field is not used. Remove it.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: mii: make mii_ethtool_gset() return void
Pavel Skripkin [Wed, 18 Aug 2021 15:07:09 +0000 (18:07 +0300)]
net: mii: make mii_ethtool_gset() return void

mii_ethtool_gset() does not return any errors. Since there are no users
of this function that rely on its return value, it can be
made void.

Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: pch_gbe: remove mii_ethtool_gset() error handling
Pavel Skripkin [Wed, 18 Aug 2021 15:06:30 +0000 (18:06 +0300)]
net: pch_gbe: remove mii_ethtool_gset() error handling

mii_ethtool_gset() does not return any errors, so error handling can be
omitted to make code more simple.

Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'r8152-bp-settings'
David S. Miller [Thu, 19 Aug 2021 11:19:30 +0000 (12:19 +0100)]
Merge branch 'r8152-bp-settings'

Hayes Wang says:

====================
r8152: fix bp settings

Fix the wrong bp settings of the firmware.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agor8152: fix the maximum number of PLA bp for RTL8153C
Hayes Wang [Thu, 19 Aug 2021 03:05:37 +0000 (11:05 +0800)]
r8152: fix the maximum number of PLA bp for RTL8153C

The maximum PLA bp number of RTL8153C is 16, not 8. That is, the
bp 0 ~ 15 are at 0xfc28 ~ 0xfc46, and the bp_en is at 0xfc48.

Fixes: 195aae321c82 ("r8152: support new chips")
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agor8152: fix writing USB_BP2_EN
Hayes Wang [Thu, 19 Aug 2021 03:05:36 +0000 (11:05 +0800)]
r8152: fix writing USB_BP2_EN

The register of USB_BP2_EN is 16 bits, so we should use
ocp_write_word(), not ocp_write_byte().

Fixes: 9370f2d05a2a ("support request_firmware for RTL8153")
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'mptcp-fixes'
David S. Miller [Thu, 19 Aug 2021 11:17:05 +0000 (12:17 +0100)]
Merge branch 'mptcp-fixes'

Mat Martineau says:

====================
mptcp: Bug fixes

Here are two bug fixes for the net tree:

Patch 1 fixes a memory leak that could be encountered when clearing the
list of advertised MPTCP addresses.

Patch 2 fixes a protocol issue early in an MPTCP connection, to ensure
both peers correctly understand that the full MPTCP connection handshake
has completed even when the server side quickly sends an ADD_ADDR
option.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomptcp: full fully established support after ADD_ADDR
Matthieu Baerts [Wed, 18 Aug 2021 23:42:37 +0000 (16:42 -0700)]
mptcp: full fully established support after ADD_ADDR

If directly after an MP_CAPABLE 3WHS, the client receives an ADD_ADDR
with HMAC from the server, it is enough to switch to a "fully
established" mode because it has received more MPTCP options.

It was then OK to enable the "fully_established" flag on the MPTCP
socket. Still, best to check if the ADD_ADDR looks valid by looking if
it contains an HMAC (no 'echo' bit). If an ADD_ADDR echo is received
while we are not in "fully established" mode, it is strange and then
we should not switch to this mode now.

But that is not enough. On one hand, the path-manager has be notified
the state has changed. On the other hand, the "fully_established" flag
on the subflow socket should be turned on as well not to re-send the
MP_CAPABLE 3rd ACK content with the next ACK.

Fixes: 84dfe3677a6f ("mptcp: send out dedicated ADD_ADDR packet")
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomptcp: fix memory leak on address flush
Paolo Abeni [Wed, 18 Aug 2021 23:42:36 +0000 (16:42 -0700)]
mptcp: fix memory leak on address flush

The endpoint cleanup path is prone to a memory leak, as reported
by syzkaller:

 BUG: memory leak
 unreferenced object 0xffff88810680ea00 (size 64):
   comm "syz-executor.6", pid 6191, jiffies 4295756280 (age 24.138s)
   hex dump (first 32 bytes):
     58 75 7d 3c 80 88 ff ff 22 01 00 00 00 00 ad de  Xu}<....".......
     01 00 02 00 00 00 00 00 ac 1e 00 07 00 00 00 00  ................
   backtrace:
     [<0000000072a9f72a>] kmalloc include/linux/slab.h:591 [inline]
     [<0000000072a9f72a>] mptcp_nl_cmd_add_addr+0x287/0x9f0 net/mptcp/pm_netlink.c:1170
     [<00000000f6e931bf>] genl_family_rcv_msg_doit.isra.0+0x225/0x340 net/netlink/genetlink.c:731
     [<00000000f1504a2c>] genl_family_rcv_msg net/netlink/genetlink.c:775 [inline]
     [<00000000f1504a2c>] genl_rcv_msg+0x341/0x5b0 net/netlink/genetlink.c:792
     [<0000000097e76f6a>] netlink_rcv_skb+0x148/0x430 net/netlink/af_netlink.c:2504
     [<00000000ceefa2b8>] genl_rcv+0x24/0x40 net/netlink/genetlink.c:803
     [<000000008ff91aec>] netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
     [<000000008ff91aec>] netlink_unicast+0x537/0x750 net/netlink/af_netlink.c:1340
     [<0000000041682c35>] netlink_sendmsg+0x846/0xd80 net/netlink/af_netlink.c:1929
     [<00000000df3aa8e7>] sock_sendmsg_nosec net/socket.c:704 [inline]
     [<00000000df3aa8e7>] sock_sendmsg+0x14e/0x190 net/socket.c:724
     [<000000002154c54c>] ____sys_sendmsg+0x709/0x870 net/socket.c:2403
     [<000000001aab01d7>] ___sys_sendmsg+0xff/0x170 net/socket.c:2457
     [<00000000fa3b1446>] __sys_sendmsg+0xe5/0x1b0 net/socket.c:2486
     [<00000000db2ee9c7>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
     [<00000000db2ee9c7>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
     [<000000005873517d>] entry_SYSCALL_64_after_hwframe+0x44/0xae

We should not require an allocation to cleanup stuff.

Rework the code a bit so that the additional RCU work is no more needed.

Fixes: 1729cf186d8a ("mptcp: create the listening socket for new port")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>