Geliang Tang [Sat, 9 Jan 2021 00:48:02 +0000 (16:48 -0800)]
selftests: mptcp: add the MP_PRIO testcases
This patch added the MP_PRIO testcases:
Add a new argument bkup for run_tests and do_transfer, it can be set as
"backup" or "nobackup", the default value is "".
Add a new function chk_prio_nr to check the MP_PRIO related MIB counters.
The output looks like this:
29 single subflow, backup syn[ ok ] - synack[ ok ] - ack[ ok ]
ptx[ ok ] - prx [ ok ]
30 single address, backup syn[ ok ] - synack[ ok ] - ack[ ok ]
add[ ok ] - echo [ ok ]
ptx[ ok ] - prx [ ok ]
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Geliang Tang [Sat, 9 Jan 2021 00:48:01 +0000 (16:48 -0800)]
mptcp: add the mibs for MP_PRIO
This patch added the mibs for MP_PRIO, MPTCP_MIB_MPPRIOTX for transmitting
of the MP_PRIO suboption, and MPTCP_MIB_MPPRIORX for receiving of it.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Geliang Tang [Sat, 9 Jan 2021 00:48:00 +0000 (16:48 -0800)]
selftests: mptcp: add set_flags command in pm_nl_ctl
This patch added the set_flags command in pm_nl_ctl, currently we can only
set two flags: backup and nobackup. The set_flags command can be used like
this:
# pm_nl_ctl set 10.0.0.1 flags backup
# pm_nl_ctl set 10.0.0.1 flags nobackup
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Geliang Tang [Sat, 9 Jan 2021 00:47:59 +0000 (16:47 -0800)]
mptcp: add set_flags command in PM netlink
This patch added a new command MPTCP_PM_CMD_SET_FLAGS in PM netlink:
In mptcp_nl_cmd_set_flags, parse the input address, get the backup value
according to whether the address's FLAG_BACKUP flag is set from the
user-space. Then check whether this address had been added in the local
address list. If it had been, then call mptcp_nl_addr_backup to deal with
this address.
In mptcp_nl_addr_backup, traverse all the existing msk sockets to find
the relevant sockets, and call mptcp_pm_nl_mp_prio_send_ack to send out
a MP_PRIO ACK packet.
Finally in mptcp_nl_cmd_set_flags, set or clear the address's FLAG_BACKUP
flag.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Geliang Tang [Sat, 9 Jan 2021 00:47:58 +0000 (16:47 -0800)]
mptcp: add the incoming MP_PRIO support
This patch added the incoming MP_PRIO logic:
Added a flag named mp_prio in struct mptcp_options_received, to mark the
MP_PRIO is received, and save the priority value to struct
mptcp_options_received's backup member. Then invoke
mptcp_pm_mp_prio_received with the receiving subsocket and the backup
value.
In mptcp_pm_mp_prio_received, get the subflow context according the input
subsocket, and change the subflow's backup as the incoming priority value.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Geliang Tang [Sat, 9 Jan 2021 00:47:57 +0000 (16:47 -0800)]
mptcp: add the outgoing MP_PRIO support
This patch added the outgoing MP_PRIO logic:
In mptcp_pm_nl_mp_prio_send_ack, find the related subflow and subsocket
according to the input parameter addr. Save the input priority value to
suflow's backup, then set subflow's send_mp_prio flag to true, and save
the input priority value to suflow's request_bkup. Finally, send out a
pure ACK on the related subsocket.
In mptcp_established_options_mp_prio, check whether the subflow's
send_mp_prio is set. If it is, this is the packet for sending MP_PRIO.
So save subflow->request_bkup value to mptcp_out_options's backup, and
change the option type to OPTION_MPTCP_PRIO.
In mptcp_write_options, clear the send_mp_prio flag and send out the
MP_PRIO suboption with mptcp_out_options's backup value.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Geliang Tang [Sat, 9 Jan 2021 00:47:56 +0000 (16:47 -0800)]
selftests: mptcp: add testcases for setting the address ID
Since the address ID can be set from user-space, some of the tests in
pm_netlink.sh will fail. This patch fixed the failures, and add the
testcases for setting the address ID.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Geliang Tang [Sat, 9 Jan 2021 00:47:55 +0000 (16:47 -0800)]
mptcp: add the address ID assignment bitmap
Currently the address ID set by the netlink PM from user-space is
overridden by the kernel. This patch added the address ID assignment
bitmap to allow user-space to set the address ID.
Use a per netns bitmask id_bitmap (256 bits) to keep track of in-use IDs.
And use next_id to keep track of the highest ID currently in use. If the
user-space provides an ID at endpoint creation time, try to use it. If
already in use, endpoint creation fails. Otherwise pick the first ID
available after the highest currently in use, with wrap-around.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sun, 10 Jan 2021 02:07:54 +0000 (18:07 -0800)]
Merge branch 'r8169-small-improvements'
Heiner Kallweit says:
====================
r8169: small improvements
This series includes a number of smaller improvements.
v2:
- return on WARN in patch 1
====================
Link: https://lore.kernel.org/r/938caef4-8a0b-bbbd-66aa-76f758ff877a@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit [Fri, 8 Jan 2021 12:00:13 +0000 (13:00 +0100)]
r8169: don't wakeup-enable device on shutdown if WOL is disabled
If WOL isn't enabled, then there's no need to enable wakeup from D3
on system shutdown.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit [Fri, 8 Jan 2021 11:58:54 +0000 (12:58 +0100)]
r8169: improve rtl_ocp_reg_failure
Use WARN_ONCE here to get a call trace in case of a problem.
This facilitates finding the offending code part.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit [Fri, 8 Jan 2021 11:57:57 +0000 (12:57 +0100)]
r8169: replace BUG_ON with WARN in _rtl_eri_write
Use WARN here to avoid stopping the system. In addition print the addr
and mask values that triggered the warning.
v2:
- return on WARN to avoid an invalid register write
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Fri, 8 Jan 2021 23:30:54 +0000 (01:30 +0200)]
net: dsa: dsa_legacy_fdb_{add,del} can be static
Introduced in commit
37b8da1a3c68 ("net: dsa: Move FDB add/del
implementation inside DSA") in net/dsa/legacy.c, these functions were
moved again to slave.c as part of commit
2a93c1a3651f ("net: dsa: Allow
compiling out legacy support"), before actually deleting net/dsa/slave.c
in
93e86b3bc842 ("net: dsa: Remove legacy probing support"). Along with
that movement there should have been a deletion of the prototypes from
dsa_priv.h, they are not useful.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210108233054.1222278-1-olteanv@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sun, 10 Jan 2021 00:21:33 +0000 (16:21 -0800)]
Merge branch 'dpaa2-mac-various-updates'
Ioana Ciornei says:
====================
dpaa2-mac: various updates
The first two patches of this series extends the MAC statistics support
to also work for network interfaces which have their link status handled
by firmware (TYPE_FIXED).
The next two patches are fixing a sporadic problem which happens when
the connected DPMAC object is not yet discovered by the fsl-mc bus, thus
the dpaa2-eth is not able to get a reference to it. A referred probe
will be requested in this case.
Finally, the last two patches make some cosmetic changes, mostly
removing comments and unnecessary checks.
Changes in v2:
- replaced IS_ERR_OR_NULL() by IS_ERR() in patch 4/6
- reworded the commit message of patch 6/6
====================
Link: https://lore.kernel.org/r/20210108090727.866283-1-ciorneiioana@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ioana Ciornei [Fri, 8 Jan 2021 09:07:27 +0000 (11:07 +0200)]
dpaa2-mac: remove a comment regarding pause settings
The MC firmware takes these PAUSE/ASYM_PAUSE flags provided by the
driver, transforms them back into rx/tx pause enablement status and
applies them to hardware. We are not losing information by this
transformation, thus remove the comment.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ioana Ciornei [Fri, 8 Jan 2021 09:07:26 +0000 (11:07 +0200)]
dpaa2-mac: remove an unnecessary check
The dpaa2-eth driver has phylink integration only if the connected dpmac
object is in TYPE_PHY (aka the PCS/PHY etc link status is managed by
Linux instead of the firmware). The check is thus unnecessary because
the code path that reaches the .mac_link_up() callback is only with
TYPE_PHY dpmac objects.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ioana Ciornei [Fri, 8 Jan 2021 09:07:25 +0000 (11:07 +0200)]
dpaa2-eth: retry the probe when the MAC is not yet discovered on the bus
The fsl_mc_get_endpoint() function now returns -EPROBE_DEFER when the
dpmac device was not yet discovered by the fsl-mc bus. When this
happens, pass the error code up so that we can retry the probe at a
later time.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ioana Ciornei [Fri, 8 Jan 2021 09:07:24 +0000 (11:07 +0200)]
bus: fsl-mc: return -EPROBE_DEFER when a device is not yet discovered
The fsl_mc_get_endpoint() should return a pointer to the connected
fsl_mc device, if there is one. By interrogating the MC firmware, we
know if there is an endpoint or not so when the endpoint device is
actually searched on the fsl-mc bus and not found we are hitting the
case in which the device has not been yet discovered by the bus.
Return -EPROBE_DEFER so that callers can differentiate this case.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Acked-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ioana Ciornei [Fri, 8 Jan 2021 09:07:23 +0000 (11:07 +0200)]
dpaa2-mac: export MAC counters even when in TYPE_FIXED
If the network interface object is connected to a MAC of TYPE_FIXED, the
link status management is handled exclusively by the firmware. This does
not mean that the driver cannot access the MAC counters and export them
in ethtool.
For this to happen, we open the attached dpmac device and keep a pointer
to it in priv->mac. Because of this, all the checks in the driver of the
following form 'if (priv->mac)' have to be updated to actually check
the dpmac attribute and not rely on the presence of a non-NULL value.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ioana Ciornei [Fri, 8 Jan 2021 09:07:22 +0000 (11:07 +0200)]
dpaa2-mac: split up initializing the MAC object from connecting to it
Split up the initialization phase of the dpmac object from actually
configuring the phylink instance, connecting to it and configuring the
MAC. This is done so that even though the dpni object is connected to a
dpmac which has link management handled by the firmware we are still
able to export the MAC counters.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 9 Jan 2021 22:24:29 +0000 (14:24 -0800)]
Merge branch 'net-gro-gro_drop-deprecation'
Eric Dumazet says:
====================
net-gro: GRO_DROP deprecation
GRO_DROP has no practical use and can be removed,
once ice driver is cleaned up.
This removes one useless conditional test in napi_gro_frags().
====================
Link: https://lore.kernel.org/r/20210108113903.3779510-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Fri, 8 Jan 2021 11:39:03 +0000 (03:39 -0800)]
net-gro: remove GRO_DROP
GRO_DROP can only be returned from napi_gro_frags()
if the skb has not been allocated by a prior napi_get_frags()
Since drivers must use napi_get_frags() and test its result
before populating the skb with metadata, we can safely remove
GRO_DROP since it offers no practical use.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Fri, 8 Jan 2021 11:39:02 +0000 (03:39 -0800)]
ice: drop dead code in ice_receive_skb()
napi_gro_receive() can never return GRO_DROP
GRO_DROP can only be returned from napi_gro_frags()
which is the other NAPI GRO entry point.
Followup patch will remove GRO_DROP, because drivers
are not supposed to call napi_gro_frags() if prior
napi_get_frags() has failed.
Note that I have left the gro_dropped variable. I leave to ice
maintainers the decision to further remove it from ethtool -S results.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Menglong Dong [Fri, 8 Jan 2021 02:53:32 +0000 (18:53 -0800)]
net: bridge: fix misspellings using codespell tool
Some typos are found out by codespell tool:
$ codespell ./net/bridge/
./net/bridge/br_stp.c:604: permanant ==> permanent
./net/bridge/br_stp.c:605: persistance ==> persistence
./net/bridge/br.c:125: underlaying ==> underlying
./net/bridge/br_input.c:43: modue ==> mode
./net/bridge/br_mrp.c:828: Determin ==> Determine
./net/bridge/br_mrp.c:848: Determin ==> Determine
./net/bridge/br_mrp.c:897: Determin ==> Determine
Fix typos found by codespell.
Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20210108025332.52480-1-dong.menglong@zte.com.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 9 Jan 2021 21:51:39 +0000 (13:51 -0800)]
Merge branch 'net-ipa-support-compile_test'
Alex Elder says:
====================
net: ipa: support COMPILE_TEST
This series adds the IPA driver as a possible target when
the COMPILE_TEST configuration is enabled. Two small changes to
dependent subsystems needed to be made for this to work.
Version 2 of this series adds one more patch, which adds the
declation of struct page to "gsi_trans.h". The Intel kernel test
robot reported that this was a problem for the alpha build.
====================
Link: https://lore.kernel.org/r/20210107233404.17030-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Thu, 7 Jan 2021 23:34:04 +0000 (17:34 -0600)]
net: ipa: support COMPILE_TEST
Arrange for the IPA driver to be built when COMPILE_TEST is enabled.
Update the help text to reflect that we support two Qualcomm SoCs.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Thu, 7 Jan 2021 23:34:03 +0000 (17:34 -0600)]
net: ipa: declare the page pointer type in "gsi_trans.h"
The second argument to gsi_trans_page_add() is a page pointer.
That declaration is found in header files used by "gsi_trans.h" for
(at least) arm64 and x86 builds, but apparently not for alpha
builds.
Fix this by adding a declaration of struct page to the top of
"gsi_trans.h".
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Thu, 7 Jan 2021 23:34:02 +0000 (17:34 -0600)]
soc: qcom: mdt_loader: define stubs for COMPILE_TEST
Define stub functions for the exposed MDT functions in case
QCOM_MDT_LOADER is not configured. This allows users of these
functions to link correctly for COMPILE_TEST builds without
QCOM_SCM enabled.
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Thu, 7 Jan 2021 23:34:01 +0000 (17:34 -0600)]
remoteproc: qcom: expose types for COMPILE_TEST
Stub functions are defined for SSR notifier registration in case
QCOM_RPROC_COMMON is not configured. As a result, code that uses
these functions can link successfully even if the common remoteproc
code is not built.
Code that registers an SSR notifier function likely needs the
types defined in "qcom_rproc.h", but those are only exposed if
QCOM_RPROC_COMMON is enabled.
Rearrange the conditional definition so the qcom_ssr_notify_data
structure and qcom_ssr_notify_type enumerated type are defined
whether or not QCOM_RPROC_COMMON is enabled.
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lijun Pan [Wed, 6 Jan 2021 21:35:14 +0000 (15:35 -0600)]
ibmvnic: merge do_change_param_reset into do_reset
Commit
b27507bb59ed ("net/ibmvnic: unlock rtnl_lock in reset so
linkwatch_event can run") introduced do_change_param_reset function to
solve the rtnl lock issue. Majority of the code in do_change_param_reset
duplicates do_reset. Also, we can handle the rtnl lock issue in do_reset
itself. Hence merge do_change_param_reset back into do_reset to clean up
the code.
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Link: https://lore.kernel.org/r/20210106213514.76027-1-ljp@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Julian Wiedmann [Thu, 7 Jan 2021 14:39:56 +0000 (15:39 +0100)]
ppp: clean up endianness conversions
sparse complains about some harmless endianness issues:
> drivers/net/ppp/pptp.c:281:21: warning: incorrect type in assignment (different base types)
> drivers/net/ppp/pptp.c:281:21: expected unsigned int [usertype] ack
> drivers/net/ppp/pptp.c:281:21: got restricted __be32
> drivers/net/ppp/pptp.c:283:23: warning: cast to restricted __be32
Here 'ack' is assigned a value in network-order, and then also the
byte-swapped value in host-order. Clean this up by doing the byte-swap
as part of the assignment.
> drivers/net/ppp/pptp.c:358:26: warning: cast from restricted __be16
> drivers/net/ppp/pptp.c:358:26: warning: incorrect type in argument 1 (different base types)
> drivers/net/ppp/pptp.c:358:26: expected unsigned short [usertype] call_id
> drivers/net/ppp/pptp.c:358:26: got restricted __be16 [usertype]
Here we use the wrong flavour of byte-swap. Use ntohs(), which of course
gives the same result.
Cc: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Link: https://lore.kernel.org/r/20210107143956.25549-1-jwi@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Julian Wiedmann [Thu, 7 Jan 2021 14:40:08 +0000 (15:40 +0100)]
net: ip_tunnel: clean up endianness conversions
sparse complains about some harmless endianness issues:
> net/ipv4/ip_tunnel_core.c:225:43: warning: cast to restricted __be16
> net/ipv4/ip_tunnel_core.c:225:43: warning: incorrect type in initializer (different base types)
> net/ipv4/ip_tunnel_core.c:225:43: expected restricted __be16 [usertype] mtu
> net/ipv4/ip_tunnel_core.c:225:43: got unsigned short [usertype]
iptunnel_pmtud_build_icmp() uses the wrong flavour of byte-order conversion
when storing the MTU into the ICMPv4 packet. Use htons(), just like
iptunnel_pmtud_build_icmpv6() does.
> net/ipv4/ip_tunnel_core.c:248:35: warning: cast from restricted __be16
> net/ipv4/ip_tunnel_core.c:248:35: warning: incorrect type in argument 3 (different base types)
> net/ipv4/ip_tunnel_core.c:248:35: expected unsigned short type
> net/ipv4/ip_tunnel_core.c:248:35: got restricted __be16 [usertype]
> net/ipv4/ip_tunnel_core.c:341:35: warning: cast from restricted __be16
> net/ipv4/ip_tunnel_core.c:341:35: warning: incorrect type in argument 3 (different base types)
> net/ipv4/ip_tunnel_core.c:341:35: expected unsigned short type
> net/ipv4/ip_tunnel_core.c:341:35: got restricted __be16 [usertype]
eth_header() wants the Ethertype in host-order, use the correct flavour of
byte-order conversion.
> net/ipv4/ip_tunnel_core.c:600:45: warning: restricted __be16 degrades to integer
> net/ipv4/ip_tunnel_core.c:609:30: warning: incorrect type in assignment (different base types)
> net/ipv4/ip_tunnel_core.c:609:30: expected int type
> net/ipv4/ip_tunnel_core.c:609:30: got restricted __be16 [usertype]
> net/ipv4/ip_tunnel_core.c:619:30: warning: incorrect type in assignment (different base types)
> net/ipv4/ip_tunnel_core.c:619:30: expected int type
> net/ipv4/ip_tunnel_core.c:619:30: got restricted __be16 [usertype]
> net/ipv4/ip_tunnel_core.c:629:30: warning: incorrect type in assignment (different base types)
> net/ipv4/ip_tunnel_core.c:629:30: expected int type
> net/ipv4/ip_tunnel_core.c:629:30: got restricted __be16 [usertype]
The TUNNEL_* types are big-endian, so adjust the type of the local
variable in ip_tun_parse_opts().
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Link: https://lore.kernel.org/r/20210107144008.25777-1-jwi@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rafał Miłecki [Thu, 7 Jan 2021 18:00:51 +0000 (19:00 +0100)]
MAINTAINERS: add bgmac section entry
This driver exists for years but was missing its MAINTAINERS entry.
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210107180051.1542-3-zajec5@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rafał Miłecki [Thu, 7 Jan 2021 18:00:50 +0000 (19:00 +0100)]
net: broadcom: share header defining UniMAC registers
UniMAC is integrated into multiple Broadcom's Ethernet controllers so
use a shared header file for it and avoid some code duplication.
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Doug Berger <opendmb@gmail.com>
Link: https://lore.kernel.org/r/20210107180051.1542-2-zajec5@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rafał Miłecki [Thu, 7 Jan 2021 18:00:49 +0000 (19:00 +0100)]
bgmac: add bgmac_umac_*() helpers for accessing UniMAC registers
UniMAC is a hardware block commonly used in Broadcom Ethernet controllers
that should get its own header file. Not every controller has it mapped at
the 0x800 offset so add bgmac access helpers. They will allow using
shared register defines.
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210107180051.1542-1-zajec5@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 9 Jan 2021 02:38:33 +0000 (18:38 -0800)]
Merge branch 'update-register-bit-definitions-in-the-etheravb-driver'
Sergey Shtylyov says:
====================
Update register/bit definitions in the EtherAVB driver
Here are 2 patches against DaveM's 'net-next' repo.
I'm updating the driver to match the recent R-Car gen2/3 manuals.
====================
Link: https://lore.kernel.org/r/6aef8856-4bf5-1512-2ad4-62af05f00cc6@omprussia.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sergey Shtylyov [Wed, 6 Jan 2021 20:32:29 +0000 (23:32 +0300)]
ravb: update "undocumented" annotations
The "undocumented" annotations in the EtherAVB driver were done against
the R-Car gen2 manuals; most of these registers/bits were then described
in the R-Car gen3 manuals -- reflect this fact in the annotations (note
that ECSIPR.LCHNGIP was documented in the recent R-Car gen2 manual)...
Signed-off-by: Sergey Shtylyov <s.shtylyov@omprussia.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sergey Shtylyov [Wed, 6 Jan 2021 20:31:37 +0000 (23:31 +0300)]
ravb: remove APSR_DM
According to the R-Car Series, 3rd Generation User's Manual: Hardware,
Rev. 1.50, there's no APSR.DM field, instead there are 2 independent
RX/TX clock internal delay bits. Follow the suit: remove #define APSR_DM
and rename #define's APSR_DM_{R|T}DM to APSR_{R|T}DM.
While at it, do several more things to the declaration of *enum* APSR_BIT:
- remove superfluous indentation;
- annotate APSR_MEMS as undocumented;
- annotate APSR as R-Car Gen3 only.
Signed-off-by: Sergey Shtylyov <s.shtylyov@omprussia.ru>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 8 Jan 2021 21:28:00 +0000 (13:28 -0800)]
Merge git://git./linux/kernel/git/netdev/net
Trivial conflict in CAN on file rename.
Conflicts:
drivers/net/can/m_can/tcan4x5x-core.c
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Fri, 8 Jan 2021 20:12:30 +0000 (12:12 -0800)]
Merge tag 'net-5.11-rc3-2' of git://git./linux/kernel/git/netdev/net
Pull more networking fixes from Jakub Kicinski:
"Slightly lighter pull request to get back into the Thursday cadence.
Current release - always broken:
- can: mcp251xfd: fix Tx/Rx ring buffer driver race conditions
- dsa: hellcreek: fix led_classdev build errors
Previous releases - regressions:
- ipv6: fib: flush exceptions when purging route to avoid netdev
reference leak
- ip_tunnels: fix pmtu check in nopmtudisc mode
- ip: always refragment ip defragmented packets to avoid MTU issues
when forwarding through tunnels, correct "packet too big" message
is prohibitively tricky to generate
- s390/qeth: fix locking for discipline setup / removal and during
recovery to prevent both deadlocks and races
- mlx5: Use port_num 1 instead of 0 when delete a RoCE address
Previous releases - always broken:
- cdc_ncm: correct overhead calculation in delayed_ndp_size to
prevent out of bound accesses with Huawei 909s-120 LTE module
- fix stmmac dwmac-sun8i suspend/resume:
- PHY being left powered off
- MAC syscon configuration being reset
- reference to the reset controller being improperly dropped
- qrtr: fix null-ptr-deref in qrtr_ns_remove
- can: tcan4x5x: fix bittiming const, use common bittiming from m_can
driver
- mlx5e: CT: Use per flow counter when CT flow accounting is enabled
- mlx5e: Fix SWP offsets when vlan inserted by driver
Misc:
- bpf: Fix a task_iter bug caused by a bpf -> net merge conflict
resolution
And the usual many fixes to various error paths"
* tag 'net-5.11-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits)
net: dsa: lantiq_gswip: Exclude RMII from modes that report 1 GbE
s390/qeth: fix L2 header access in qeth_l3_osa_features_check()
s390/qeth: fix locking for discipline setup / removal
s390/qeth: fix deadlock during recovery
selftests: fib_nexthops: Fix wrong mausezahn invocation
nexthop: Bounce NHA_GATEWAY in FDB nexthop groups
nexthop: Unlink nexthop group entry in error path
nexthop: Fix off-by-one error in error path
octeontx2-af: fix memory leak of lmac and lmac->name
chtls: Fix chtls resources release sequence
chtls: Added a check to avoid NULL pointer dereference
chtls: Replace skb_dequeue with skb_peek
chtls: Avoid unnecessary freeing of oreq pointer
chtls: Fix panic when route to peer not configured
chtls: Remove invalid set_tcb call
chtls: Fix hardware tid leak
net: ip: always refragment ip defragmented packets
net: fix pmtu check in nopmtudisc mode
selftests: netfilter: add selftest for ipip pmtu discovery with enabled connection tracking
docs: octeontx2: tune rst markup
...
Linus Torvalds [Fri, 8 Jan 2021 20:05:11 +0000 (12:05 -0800)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"This fixes a functional bug in arm/chacha-neon as well as a potential
buffer overflow in ecdh"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: ecdh - avoid buffer overflow in ecdh_set_secret()
crypto: arm/chacha-neon - add missing counter increment
Linus Torvalds [Thu, 7 Jan 2021 17:43:54 +0000 (09:43 -0800)]
poll: fix performance regression due to out-of-line __put_user()
The kernel test robot reported a -5.8% performance regression on the
"poll2" test of will-it-scale, and bisected it to commit
d55564cfc222
("x86: Make __put_user() generate an out-of-line call").
I didn't expect an out-of-line __put_user() to matter, because no normal
core code should use that non-checking legacy version of user access any
more. But I had overlooked the very odd poll() usage, which does a
__put_user() to update the 'revents' values of the poll array.
Now, Al Viro correctly points out that instead of updating just the
'revents' field, it would be much simpler to just copy the _whole_
pollfd entry, and then we could just use "copy_to_user()" on the whole
array of entries, the same way we use "copy_from_user()" a few lines
earlier to get the original values.
But that is not what we've traditionally done, and I worry that threaded
applications might be concurrently modifying the other fields of the
pollfd array. So while Al's suggestion is simpler - and perhaps worth
trying in the future - this instead keeps the "just update revents"
model.
To fix the performance regression, use the modern "unsafe_put_user()"
instead of __put_user(), with the proper "user_write_access_begin()"
guarding in place. This improves code generation enormously.
Link: https://lore.kernel.org/lkml/20210107134723.GA28532@xsang-OptiPlex-9020/
Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: Oliver Sang <oliver.sang@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Laight <David.Laight@aculab.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Petr Mladek [Fri, 8 Jan 2021 11:48:47 +0000 (12:48 +0100)]
Revert "init/console: Use ttynull as a fallback when there is no console"
This reverts commit
757055ae8dedf5333af17b3b5b4b70ba9bc9da4e.
The commit caused that ttynull was used as the default console
on several systems[1][2][3]. As a result, the console was
blank even when a better alternative existed.
It happened when there was no console configured
on the command line and ttynull_init() was the first initcall
calling register_console().
Or it happened when /dev/ did not exist when console_on_rootfs()
was called. It was not able to open /dev/console even though
a console driver was registered. It tried to add ttynull console
but it obviously did not help. But ttynull became the preferred
console and was used by /dev/console when it was available later.
The commit tried to fix a historical problem that have been there
for ages. The primary motivation was the commit
3cffa06aeef7ece30f6
("printk/console: Allow to disable console output by using console=""
or console=null"). It provided a clean solution for a workaround
that was widely used and worked only by chance.
This revert causes that the console="" or console=null command line
options will again work only by chance. These options will cause that
a particular console will be preferred and the default (tty) ones
will not get enabled. There will be no console registered at
all. As a result there won't be stdin, stdout, and stderr for
the init process. But it worked exactly this way even before.
The proper solution has to fulfill many conditions:
+ Register ttynull only when explicitly required or as
the ultimate fallback.
+ ttynull should get associated with /dev/console but it must
not become preferred console when used as a fallback.
Especially, it must still be possible to replace it
by a better console later.
Such a change requires clean up of the register_console() code.
Otherwise, it would be even harder to follow. Especially, the use
of has_preferred_console and CON_CONSDEV flag is tricky. The clean
up is risky. The ordering of consoles is not well defined. And
any changes tend to break existing user settings.
Do the revert at the least risky solution for now.
[1] https://lore.kernel.org/linux-kselftest/
20201221144302.GR4077@smile.fi.intel.com/
[2] https://lore.kernel.org/lkml/
d2a3b3c0-e548-7dd1-730f-
59bc5c04e191@synopsys.com/
[3] https://patchwork.ozlabs.org/project/linux-um/patch/
20210105120128.10854-1-thomas@m3y3r.de/
Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reported-by: Vineet Gupta <vgupta@synopsys.com>
Reported-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jakub Kicinski [Fri, 8 Jan 2021 03:13:29 +0000 (19:13 -0800)]
Merge tag 'mlx5-fixes-2021-01-07' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2021-01-07
* tag 'mlx5-fixes-2021-01-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: Fix memleak in mlx5e_create_l2_table_groups
net/mlx5e: Fix two double free cases
net/mlx5: Release devlink object if adev fails
net/mlx5e: ethtool, Fix restriction of autoneg with 56G
net/mlx5e: In skb build skip setting mark in switchdev mode
net/mlx5: E-Switch, fix changing vf VLANID
net/mlx5e: Fix SWP offsets when vlan inserted by driver
net/mlx5e: CT: Use per flow counter when CT flow accounting is enabled
net/mlx5: Use port_num 1 instead of 0 when delete a RoCE address
net/mlx5e: Add missing capability check for uplink follow
net/mlx5: Check if lag is supported before creating one
====================
Link: https://lore.kernel.org/r/20210107202845.470205-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Aleksander Jan Bajkowski [Thu, 7 Jan 2021 19:58:18 +0000 (20:58 +0100)]
net: dsa: lantiq_gswip: Exclude RMII from modes that report 1 GbE
Exclude RMII from modes that report 1 GbE support. Reduced MII supports
up to 100 MbE.
Fixes:
14fceff4771e ("net: dsa: Add Lantiq / Intel DSA driver for vrx200")
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210107195818.3878-1-olek2@wp.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 8 Jan 2021 02:54:08 +0000 (18:54 -0800)]
Merge branch 's390-qeth-fixes-2021-01-07'
Julian Wiedmann says:
====================
s390/qeth: fixes 2021-01-07
This brings two locking fixes for the device control path.
Also one fix for a path where our .ndo_features_check() attempts to
access a non-existent L2 header.
====================
Link: https://lore.kernel.org/r/20210107172442.1737-1-jwi@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Julian Wiedmann [Thu, 7 Jan 2021 17:24:42 +0000 (18:24 +0100)]
s390/qeth: fix L2 header access in qeth_l3_osa_features_check()
ip_finish_output_gso() may call .ndo_features_check() even before the
skb has a L2 header. This conflicts with qeth_get_ip_version()'s attempt
to inspect the L2 header via vlan_eth_hdr().
Switch to vlan_get_protocol(), as already used further down in the
common qeth_features_check() path.
Fixes:
f13ade199391 ("s390/qeth: run non-offload L3 traffic over common xmit path")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Julian Wiedmann [Thu, 7 Jan 2021 17:24:41 +0000 (18:24 +0100)]
s390/qeth: fix locking for discipline setup / removal
Due to insufficient locking, qeth_core_set_online() and
qeth_dev_layer2_store() can run in parallel, both attempting to load &
setup the discipline (and stepping on each other toes along the way).
A similar race can also occur between qeth_core_remove_device() and
qeth_dev_layer2_store().
Access to .discipline is meant to be protected by the discipline_mutex,
so add/expand the locking in qeth_core_remove_device() and
qeth_core_set_online().
Adjust the locking in qeth_l*_remove_device() accordingly, as it's now
handled by the callers in a consistent manner.
Based on an initial patch by Ursula Braun.
Fixes:
9dc48ccc68b9 ("qeth: serialize sysfs-triggered device configurations")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Julian Wiedmann [Thu, 7 Jan 2021 17:24:40 +0000 (18:24 +0100)]
s390/qeth: fix deadlock during recovery
When qeth_dev_layer2_store() - holding the discipline_mutex - waits
inside qeth_l*_remove_device() for a qeth_do_reset() thread to complete,
we can hit a deadlock if qeth_do_reset() concurrently calls
qeth_set_online() and thus tries to aquire the discipline_mutex.
Move the discipline_mutex locking outside of qeth_set_online() and
qeth_set_offline(), and turn the discipline into a parameter so that
callers understand the dependency.
To fix the deadlock, we can now relax the locking:
As already established, qeth_l*_remove_device() waits for
qeth_do_reset() to complete. So qeth_do_reset() itself is under no risk
of having card->discipline ripped out while it's running, and thus
doesn't need to take the discipline_mutex.
Fixes:
9dc48ccc68b9 ("qeth: serialize sysfs-triggered device configurations")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 8 Jan 2021 02:47:21 +0000 (18:47 -0800)]
Merge branch 'nexthop-various-fixes'
Ido Schimmel says:
====================
nexthop: Various fixes
This series contains various fixes for the nexthop code. The bugs were
uncovered during the development of resilient nexthop groups.
Patches #1-#2 fix the error path of nexthop_create_group(). I was not
able to trigger these bugs with current code, but it is possible with
the upcoming resilient nexthop groups code which adds a user
controllable memory allocation further in the function.
Patch #3 fixes wrong validation of netlink attributes.
Patch #4 fixes wrong invocation of mausezahn in a selftest.
====================
Link: https://lore.kernel.org/r/20210107144824.1135691-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Thu, 7 Jan 2021 14:48:24 +0000 (16:48 +0200)]
selftests: fib_nexthops: Fix wrong mausezahn invocation
For IPv6 traffic, mausezahn needs to be invoked with '-6'. Otherwise an
error is returned:
# ip netns exec me mausezahn veth1 -B 2001:db8:101::2 -A 2001:db8:91::1 -c 0 -t tcp "dp=1-1023, flags=syn"
Failed to set source IPv4 address. Please check if source is set to a valid IPv4 address.
Invalid command line parameters!
Fixes:
7c741868ceab ("selftests: Add torture tests to nexthop tests")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Petr Machata [Thu, 7 Jan 2021 14:48:23 +0000 (16:48 +0200)]
nexthop: Bounce NHA_GATEWAY in FDB nexthop groups
The function nh_check_attr_group() is called to validate nexthop groups.
The intention of that code seems to have been to bounce all attributes
above NHA_GROUP_TYPE except for NHA_FDB. However instead it bounces all
these attributes except when NHA_FDB attribute is present--then it accepts
them.
NHA_FDB validation that takes place before, in rtm_to_nh_config(), already
bounces NHA_OIF, NHA_BLACKHOLE, NHA_ENCAP and NHA_ENCAP_TYPE. Yet further
back, NHA_GROUPS and NHA_MASTER are bounced unconditionally.
But that still leaves NHA_GATEWAY as an attribute that would be accepted in
FDB nexthop groups (with no meaning), so long as it keeps the address
family as unspecified:
# ip nexthop add id 1 fdb via 127.0.0.1
# ip nexthop add id 10 fdb via default group 1
The nexthop code is still relatively new and likely not used very broadly,
and the FDB bits are newer still. Even though there is a reproducer out
there, it relies on an improbable gateway arguments "via default", "via
all" or "via any". Given all this, I believe it is OK to reformulate the
condition to do the right thing and bounce NHA_GATEWAY.
Fixes:
38428d68719c ("nexthop: support for fdb ecmp nexthops")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Thu, 7 Jan 2021 14:48:22 +0000 (16:48 +0200)]
nexthop: Unlink nexthop group entry in error path
In case of error, remove the nexthop group entry from the list to which
it was previously added.
Fixes:
430a049190de ("nexthop: Add support for nexthop groups")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Thu, 7 Jan 2021 14:48:21 +0000 (16:48 +0200)]
nexthop: Fix off-by-one error in error path
A reference was not taken for the current nexthop entry, so do not try
to put it in the error path.
Fixes:
430a049190de ("nexthop: Add support for nexthop groups")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Colin Ian King [Thu, 7 Jan 2021 12:39:16 +0000 (12:39 +0000)]
octeontx2-af: fix memory leak of lmac and lmac->name
Currently the error return paths don't kfree lmac and lmac->name
leading to some memory leaks. Fix this by adding two error return
paths that kfree these objects
Addresses-Coverity: ("Resource leak")
Fixes:
1463f382f58d ("octeontx2-af: Add support for CGX link management")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210107123916.189748-1-colin.king@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 8 Jan 2021 01:06:05 +0000 (17:06 -0800)]
Merge branch 'bug-fixes-for-chtls-driver'
Ayush Sawal says:
====================
Bug fixes for chtls driver
patch 1: Fix hardware tid leak.
patch 2: Remove invalid set_tcb call.
patch 3: Fix panic when route to peer not configured.
patch 4: Avoid unnecessary freeing of oreq pointer.
patch 5: Replace skb_dequeue with skb_peek.
patch 6: Added a check to avoid NULL pointer dereference patch.
patch 7: Fix chtls resources release sequence.
====================
Link: https://lore.kernel.org/r/20210106042912.23512-1-ayush.sawal@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ayush Sawal [Wed, 6 Jan 2021 04:29:12 +0000 (09:59 +0530)]
chtls: Fix chtls resources release sequence
CPL_ABORT_RPL is sent after releasing the resources by calling
chtls_release_resources(sk); and chtls_conn_done(sk);
eventually causing kernel panic. Fixing it by calling release
in appropriate order.
Fixes:
cc35c88ae4db ("crypto : chtls - CPL handler definition")
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ayush Sawal [Wed, 6 Jan 2021 04:29:11 +0000 (09:59 +0530)]
chtls: Added a check to avoid NULL pointer dereference
In case of server removal lookup_stid() may return NULL pointer, which
is used as listen_ctx. So added a check before accessing this pointer.
Fixes:
cc35c88ae4db ("crypto : chtls - CPL handler definition")
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ayush Sawal [Wed, 6 Jan 2021 04:29:10 +0000 (09:59 +0530)]
chtls: Replace skb_dequeue with skb_peek
The skb is unlinked twice, one in __skb_dequeue in function
chtls_reset_synq() and another in cleanup_syn_rcv_conn().
So in this patch using skb_peek() instead of __skb_dequeue(),
so that unlink will be handled only in cleanup_syn_rcv_conn().
Fixes:
cc35c88ae4db ("crypto : chtls - CPL handler definition")
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ayush Sawal [Wed, 6 Jan 2021 04:29:09 +0000 (09:59 +0530)]
chtls: Avoid unnecessary freeing of oreq pointer
In chtls_pass_accept_request(), removing the chtls_reqsk_free()
call to avoid oreq freeing twice. Here oreq is the pointer to
struct request_sock.
Fixes:
cc35c88ae4db ("crypto : chtls - CPL handler definition")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ayush Sawal [Wed, 6 Jan 2021 04:29:08 +0000 (09:59 +0530)]
chtls: Fix panic when route to peer not configured
If route to peer is not configured, we might get non tls
devices from dst_neigh_lookup() which is invalid, adding a
check to avoid it.
Fixes:
cc35c88ae4db ("crypto : chtls - CPL handler definition")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ayush Sawal [Wed, 6 Jan 2021 04:29:07 +0000 (09:59 +0530)]
chtls: Remove invalid set_tcb call
At the time of SYN_RECV, connection information is not
initialized at FW, updating tcb flag over uninitialized
connection causes adapter crash. We don't need to
update the flag during SYN_RECV state, so avoid this.
Fixes:
cc35c88ae4db ("crypto : chtls - CPL handler definition")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ayush Sawal [Wed, 6 Jan 2021 04:29:06 +0000 (09:59 +0530)]
chtls: Fix hardware tid leak
send_abort_rpl() is not calculating cpl_abort_req_rss offset and
ends up sending wrong TID with abort_rpl WR causng tid leaks.
Replaced send_abort_rpl() with chtls_send_abort_rpl() as it is
redundant.
Fixes:
cc35c88ae4db ("crypto : chtls - CPL handler definition")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 8 Jan 2021 00:08:38 +0000 (16:08 -0800)]
Merge branch 'generic-zcopy_-functions'
Jonathan Lemon says:
====================
Generic zcopy_* functions
This is set of cleanup patches for zerocopy which are intended
to allow a introduction of a different zerocopy implementation.
The top level API will use the skb_zcopy_*() functions, while
the current TCP specific zerocopy ends up using msg_zerocopy_*()
calls.
There should be no functional changes from these patches.
====================
Link: https://lore.kernel.org/r/20210106221841.1880536-1-jonathan.lemon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jonathan Lemon [Wed, 6 Jan 2021 22:18:41 +0000 (14:18 -0800)]
skbuff: Rename skb_zcopy_{get|put} to net_zcopy_{get|put}
Unlike the rest of the skb_zcopy_ functions, these routines
operate on a 'struct ubuf', not a skb. Remove the 'skb_'
prefix from the naming to make things clearer.
Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jonathan Lemon [Wed, 6 Jan 2021 22:18:40 +0000 (14:18 -0800)]
tap/tun: add skb_zcopy_init() helper for initialization.
Replace direct assignments with skb_zcopy_init() for zerocopy
cases where a new skb is initialized, without changing the
reference counts.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jonathan Lemon [Wed, 6 Jan 2021 22:18:39 +0000 (14:18 -0800)]
skbuff: add flags to ubuf_info for ubuf setup
Currently, when an ubuf is attached to a new skb, the shared
flags word is initialized to a fixed value. Instead of doing
this, set the default flags in the ubuf, and have new skbs
inherit from this default.
This is needed when setting up different zerocopy types.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jonathan Lemon [Wed, 6 Jan 2021 22:18:38 +0000 (14:18 -0800)]
net: group skb_shinfo zerocopy related bits together.
In preparation for expanded zerocopy (TX and RX), move
the zerocopy related bits out of tx_flags into their own
flag word.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jonathan Lemon [Wed, 6 Jan 2021 22:18:37 +0000 (14:18 -0800)]
skbuff: rename sock_zerocopy_* to msg_zerocopy_*
At Willem's suggestion, rename the sock_zerocopy_* functions
so that they match the MSG_ZEROCOPY flag, which makes it clear
they are specific to this zerocopy implementation.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jonathan Lemon [Wed, 6 Jan 2021 22:18:36 +0000 (14:18 -0800)]
skbuff: Call skb_zcopy_clear() before unref'ing fragments
RX zerocopy fragment pages which are not allocated from the
system page pool require special handling. Give the callback
in skb_zcopy_clear() a chance to process them first.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jonathan Lemon [Wed, 6 Jan 2021 22:18:35 +0000 (14:18 -0800)]
skbuff: Call sock_zerocopy_put_abort from skb_zcopy_put_abort
The sock_zerocopy_put_abort function contains logic which is
specific to the current zerocopy implementation. Add a wrapper
which checks the callback and dispatches apppropriately.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jonathan Lemon [Wed, 6 Jan 2021 22:18:34 +0000 (14:18 -0800)]
skbuff: Add skb parameter to the ubuf zerocopy callback
Add an optional skb parameter to the zerocopy callback parameter,
which is passed down from skb_zcopy_clear(). This gives access
to the original skb, which is needed for upcoming RX zero-copy
error handling.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jonathan Lemon [Wed, 6 Jan 2021 22:18:33 +0000 (14:18 -0800)]
skbuff: replace sock_zerocopy_get with skb_zcopy_get
Rename the get routines for consistency.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jonathan Lemon [Wed, 6 Jan 2021 22:18:32 +0000 (14:18 -0800)]
skbuff: replace sock_zerocopy_put() with skb_zcopy_put()
Replace sock_zerocopy_put with the generic skb_zcopy_put()
function. Pass 'true' as the success argument, as this
is identical to no change.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jonathan Lemon [Wed, 6 Jan 2021 22:18:31 +0000 (14:18 -0800)]
skbuff: Push status and refcounts into sock_zerocopy_callback
Before this change, the caller of sock_zerocopy_callback would
need to save the zerocopy status, decrement and check the refcount,
and then call the callback function - the callback was only invoked
when the refcount reached zero.
Now, the caller just passes the status into the callback function,
which saves the status and handles its own refcounts.
This makes the behavior of the sock_zerocopy_callback identical
to the tpacket and vhost callbacks.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jonathan Lemon [Wed, 6 Jan 2021 22:18:30 +0000 (14:18 -0800)]
skbuff: simplify sock_zerocopy_put
All 'struct ubuf_info' users should have a callback defined
as of commit
0a4a060bb204 ("sock: fix zerocopy_success regression
with msg_zerocopy").
Remove the dead code path to consume_skb(), which makes
assumptions about how the structure was allocated.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jonathan Lemon [Wed, 6 Jan 2021 22:18:29 +0000 (14:18 -0800)]
skbuff: remove unused skb_zcopy_abort function
skb_zcopy_abort() has no in-tree consumers, remove it.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Fri, 8 Jan 2021 00:03:19 +0000 (16:03 -0800)]
Merge tag 'gcc-plugins-v5.11-rc3' of git://git./linux/kernel/git/kees/linux
Pull gcc-plugins fix from Kees Cook:
"Bump c++ standard version for latest GCC versions (Valdis Kletnieks)"
* tag 'gcc-plugins-v5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
gcc-plugins: fix gcc 11 indigestion with plugins...
Jakub Kicinski [Thu, 7 Jan 2021 23:58:35 +0000 (15:58 -0800)]
Merge branch 'dwmac-meson8b-picosecond-precision-rx-delay-support'
Martin Blumenstingl says:
====================
dwmac-meson8b: picosecond precision RX delay support
with the help of Jianxin Pan (many thanks!) the meaning of the "new"
PRG_ETH1[19:16] register bits on Amlogic Meson G12A, G12B and SM1 SoCs
are finally known. These SoCs allow fine-tuning the RGMII RX delay in
200ps steps (contrary to what I have thought in the past [0] these are
not some "calibration" values).
The vendor u-boot has code to automatically detect the best RX/TX delay
settings. For now we keep it simple and add a device-tree property with
200ps precision to select the "right" RX delay for each board.
While here, deprecate the "amlogic,rx-delay-ns" property as it's not
used on any upstream .dts (yet). The driver is backwards compatible.
I have tested this on an X96 Air 4GB board (not upstream yet). Testing
with iperf3 gives 938 Mbits/sec in both directions (RX and TX). The
following network settings were used in the .dts (2ns TX delay
generated by the PHY, 800ps RX delay generated by the MAC as the PHY
only supports 0ns or 2ns RX delays):
&ext_mdio {
external_phy: ethernet-phy@0 {
/* Realtek RTL8211F (0x001cc916) */
reg = <0>;
eee-broken-1000t;
reset-assert-us = <10000>;
reset-deassert-us = <30000>;
reset-gpios = <&gpio GPIOZ_15 (GPIO_ACTIVE_LOW |
GPIO_OPEN_DRAIN)>;
interrupt-parent = <&gpio_intc>;
/* MAC_INTR on GPIOZ_14 */
interrupts = <26 IRQ_TYPE_LEVEL_LOW>;
};
};
ðmac {
status = "okay";
pinctrl-0 = <ð_pins>, <ð_rgmii_pins>;
pinctrl-names = "default";
phy-mode = "rgmii-txid";
phy-handle = <&external_phy>;
amlogic,rgmii-rx-delay-ps = <800>;
};
To use the same settings from vendor u-boot (which in my case has broken
Ethernet) the following commands can be used:
mw.l 0xff634540 0x1621
mw.l 0xff634544 0x30000
phyreg w 0x0 0x1040
phyreg w 0x1f 0xd08
phyreg w 0x11 0x9
phyreg w 0x15 0x11
phyreg w 0x1f 0x0
phyreg w 0x0 0x9200
Also I have tested this on a X96 Max board without any .dts changes
to confirm that other boards with the same IP block still work fine
with these changes.
Changes since v3 at [3].
- added Florian's Reviewed-by to patch 1 (thank you!)
- rebased on top of net-next
Changes since v2 at [2]:
- use the generic property name "rx-internal-delay-ps" as suggested by
Rob (thanks!). This affects patches #1 and #3. The biggest change is
is in patch #1 which is why I didn't add Florian's and Andrew's
Reviewed-by
- added Andrew's and Florian's Reviewed-by to patches 2, 3, 4, 5 (many
thanks to both!). I decided to do this despite renaming the property
to the generic name "rx-internal-delay-ps" as it only affects the
patch description and one line of code
- updated patch description of patch #3 to explain why there's not a
lot of validation when parsing the old device-tree property (in
nanosecond precision)
- dropped RFC status
Changes since v1 at [1]:
- updated patch 1 by making it more clear when the RX delay is applied.
Thanks to Andrew for the suggestion!
- added a fix to enabling the timing-adjustment clock only when really
needed. Found by Andrew - thanks!
- added testing not about X96 Max
- v1 did not go to the netdev mailing list, v2 fixes this
[0] https://lore.kernel.org/netdev/CAFBinCATt4Hi9rigj52nMf3oygyFbnopZcsakGL=KyWnsjY3JA@mail.gmail.com/
[1] https://patchwork.kernel.org/project/linux-amlogic/list/?series=384279&state=%2A&archive=both
[2] https://patchwork.kernel.org/project/linux-amlogic/list/?series=384491&state=%2A&archive=both
[3] https://patchwork.kernel.org/project/linux-amlogic/list/?series=406005&state=%2A&archive=both
====================
Link: https://lore.kernel.org/r/20210106134251.45264-1-martin.blumenstingl@googlemail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Martin Blumenstingl [Wed, 6 Jan 2021 13:42:51 +0000 (14:42 +0100)]
net: stmmac: dwmac-meson8b: add support for the RGMII RX delay on G12A
Amlogic Meson G12A (and newer: G12B, SM1) SoCs have a more advanced RX
delay logic. Instead of fine-tuning the delay in the nanoseconds range
it now allows tuning in 200 picosecond steps. This support comes with
new bits in the PRG_ETH1[19:16] register.
Add support for validating the RGMII RX delay as well as configuring the
register accordingly on these platforms.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Martin Blumenstingl [Wed, 6 Jan 2021 13:42:50 +0000 (14:42 +0100)]
net: stmmac: dwmac-meson8b: move RGMII delays into a separate function
Newer SoCs starting with the Amlogic Meson G12A have more a precise
RGMII RX delay configuration register. This means more complexity in the
code. Extract the existing RGMII delay configuration code into a
separate function to make it easier to read/understand even when adding
more logic in the future.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Martin Blumenstingl [Wed, 6 Jan 2021 13:42:49 +0000 (14:42 +0100)]
net: stmmac: dwmac-meson8b: use picoseconds for the RGMII RX delay
Amlogic Meson G12A, G12B and SM1 SoCs have a more advanced RGMII RX
delay register which allows picoseconds precision. Parse the new
"rx-internal-delay-ps" property or fall back to the value from the old
"amlogic,rx-delay-ns" property.
No upstream DTB uses the old "amlogic,rx-delay-ns" property (yet).
Only include minimalistic logic to fall back to the old property,
without any special validation (for example if the old and new
property are given at the same time).
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Martin Blumenstingl [Wed, 6 Jan 2021 13:42:48 +0000 (14:42 +0100)]
net: stmmac: dwmac-meson8b: fix enabling the timing-adjustment clock
The timing-adjustment clock only has to be enabled when a) there is a
2ns RX delay configured using device-tree and b) the phy-mode indicates
that the RX delay should be enabled.
Only enable the RX delay if both are true, instead of (by accident) also
enabling it when there's the 2ns RX delay configured but the phy-mode
incicates that the RX delay is not used.
Fixes:
9308c47640d515 ("net: stmmac: dwmac-meson8b: add support for the RX delay configuration")
Reported-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Martin Blumenstingl [Wed, 6 Jan 2021 13:42:47 +0000 (14:42 +0100)]
dt-bindings: net: dwmac-meson: use picoseconds for the RGMII RX delay
Amlogic Meson G12A, G12B and SM1 SoCs have a more advanced RGMII RX
delay register which allows picoseconds precision. Deprecate the old
"amlogic,rx-delay-ns" in favour of the generic "rx-internal-delay-ps"
property.
For older SoCs the only known supported values were 0ns and 2ns. The new
SoCs have support for RGMII RX delays between 0ps and 3000ps in 200ps
steps.
Don't carry over the description for the "rx-internal-delay-ps" property
and inherit that from ethernet-controller.yaml instead.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 7 Jan 2021 23:42:09 +0000 (15:42 -0800)]
Merge branch 'reduce-coupling-between-dsa-and-broadcom-systemport-driver'
Vladimir Oltean says:
====================
Reduce coupling between DSA and Broadcom SYSTEMPORT driver
Upon a quick inspection, it seems that there is some code in the generic
DSA layer that is somehow specific to the Broadcom SYSTEMPORT driver.
The challenge there is that the hardware integration is very tight between
the switch and the DSA master interface. However this does not mean that
the drivers must also be as integrated as the hardware is. We can avoid
creating a DSA notifier just for the Broadcom SYSTEMPORT, and we can
move some Broadcom-specific queue mapping helpers outside of the common
include/net/dsa.h.
====================
Link: https://lore.kernel.org/r/20210107012403.1521114-1-olteanv@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Thu, 7 Jan 2021 01:24:03 +0000 (03:24 +0200)]
net: dsa: remove the DSA specific notifiers
This effectively reverts commit
60724d4bae14 ("net: dsa: Add support for
DSA specific notifiers"). The reason is that since commit
2f1e8ea726e9
("net: dsa: link interfaces with the DSA master to get rid of lockdep
warnings"), it appears that there is a generic way to achieve the same
purpose. The only user thus far, the Broadcom SYSTEMPORT driver, was
converted to use the generic notifiers.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Thu, 7 Jan 2021 01:24:02 +0000 (03:24 +0200)]
net: systemport: use standard netdevice notifier to detect DSA presence
The SYSTEMPORT driver maps each port of the embedded Broadcom DSA switch
port to a certain queue of the master Ethernet controller. For that it
currently uses a dedicated notifier infrastructure which was added in
commit
60724d4bae14 ("net: dsa: Add support for DSA specific notifiers").
However, since commit
2f1e8ea726e9 ("net: dsa: link interfaces with the
DSA master to get rid of lockdep warnings"), DSA is actually an upper of
the Broadcom SYSTEMPORT as far as the netdevice adjacency lists are
concerned. So naturally, the plain NETDEV_CHANGEUPPER net device notifiers
are emitted. It looks like there is enough API exposed by DSA to the
outside world already to make the call_dsa_notifiers API redundant. So
let's convert its only user to plain netdev notifiers.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Thu, 7 Jan 2021 01:24:01 +0000 (03:24 +0200)]
net: dsa: export dsa_slave_dev_check
Using the NETDEV_CHANGEUPPER notifications, drivers can be aware when
they are enslaved to e.g. a bridge by calling netif_is_bridge_master().
Export this helper from DSA to get the equivalent functionality of
determining whether the upper interface of a CHANGEUPPER notifier is a
DSA switch interface or not.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Thu, 7 Jan 2021 01:24:00 +0000 (03:24 +0200)]
net: dsa: move the Broadcom tag information in a separate header file
It is a bit strange to see something as specific as Broadcom SYSTEMPORT
bits in the main DSA include file. Move these away into a separate
header, and have the tagger and the SYSTEMPORT driver include them.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 7 Jan 2021 23:34:48 +0000 (15:34 -0800)]
Merge branch 'offload-software-learnt-bridge-addresses-to-dsa'
Vladimir Oltean says:
====================
Offload software learnt bridge addresses to DSA
This series tries to make DSA behave a bit more sanely when bridged with
"foreign" (non-DSA) interfaces and source address learning is not
supported on the hardware CPU port (which would make things work more
seamlessly without software intervention). When a station A connected to
a DSA switch port needs to talk to another station B connected to a
non-DSA port through the Linux bridge, DSA must explicitly add a route
for station B towards its CPU port.
Initial RFC was posted here:
https://patchwork.ozlabs.org/project/netdev/cover/
20201108131953.
2462644-1-olteanv@gmail.com/
v2 was posted here:
https://patchwork.kernel.org/project/netdevbpf/cover/
20201213024018.772586-1-vladimir.oltean@nxp.com/
v3 was posted here:
https://patchwork.kernel.org/project/netdevbpf/cover/
20201213140710.
1198050-1-vladimir.oltean@nxp.com/
This is a resend of the previous v3 with some added Reviewed-by tags.
====================
Link: https://lore.kernel.org/r/20210106095136.224739-1-olteanv@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Wed, 6 Jan 2021 09:51:36 +0000 (11:51 +0200)]
net: dsa: ocelot: request DSA to fix up lack of address learning on CPU port
Given the following setup:
ip link add br0 type bridge
ip link set eno0 master br0
ip link set swp0 master br0
ip link set swp1 master br0
ip link set swp2 master br0
ip link set swp3 master br0
Currently, packets received on a DSA slave interface (such as swp0)
which should be routed by the software bridge towards a non-switch port
(such as eno0) are also flooded towards the other switch ports (swp1,
swp2, swp3) because the destination is unknown to the hardware switch.
This patch addresses the issue by monitoring the addresses learnt by the
software bridge on eno0, and adding/deleting them as static FDB entries
on the CPU port accordingly.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Wed, 6 Jan 2021 09:51:35 +0000 (11:51 +0200)]
net: dsa: listen for SWITCHDEV_{FDB,DEL}_ADD_TO_DEVICE on foreign bridge neighbors
Some DSA switches (and not only) cannot learn source MAC addresses from
packets injected from the CPU. They only perform hardware address
learning from inbound traffic.
This can be problematic when we have a bridge spanning some DSA switch
ports and some non-DSA ports (which we'll call "foreign interfaces" from
DSA's perspective).
There are 2 classes of problems created by the lack of learning on
CPU-injected traffic:
- excessive flooding, due to the fact that DSA treats those addresses as
unknown
- the risk of stale routes, which can lead to temporary packet loss
To illustrate the second class, consider the following situation, which
is common in production equipment (wireless access points, where there
is a WLAN interface and an Ethernet switch, and these form a single
bridging domain).
AP 1:
+------------------------------------------------------------------------+
| br0 |
+------------------------------------------------------------------------+
+------------+ +------------+ +------------+ +------------+ +------------+
| swp0 | | swp1 | | swp2 | | swp3 | | wlan0 |
+------------+ +------------+ +------------+ +------------+ +------------+
| ^ ^
| | |
| | |
| Client A Client B
|
|
|
+------------+ +------------+ +------------+ +------------+ +------------+
| swp0 | | swp1 | | swp2 | | swp3 | | wlan0 |
+------------+ +------------+ +------------+ +------------+ +------------+
+------------------------------------------------------------------------+
| br0 |
+------------------------------------------------------------------------+
AP 2
- br0 of AP 1 will know that Clients A and B are reachable via wlan0
- the hardware fdb of a DSA switch driver today is not kept in sync with
the software entries on other bridge ports, so it will not know that
clients A and B are reachable via the CPU port UNLESS the hardware
switch itself performs SA learning from traffic injected from the CPU.
Nonetheless, a substantial number of switches don't.
- the hardware fdb of the DSA switch on AP 2 may autonomously learn that
Client A and B are reachable through swp0. Therefore, the software br0
of AP 2 also may or may not learn this. In the example we're
illustrating, some Ethernet traffic has been going on, and br0 from AP
2 has indeed learnt that it can reach Client B through swp0.
One of the wireless clients, say Client B, disconnects from AP 1 and
roams to AP 2. The topology now looks like this:
AP 1:
+------------------------------------------------------------------------+
| br0 |
+------------------------------------------------------------------------+
+------------+ +------------+ +------------+ +------------+ +------------+
| swp0 | | swp1 | | swp2 | | swp3 | | wlan0 |
+------------+ +------------+ +------------+ +------------+ +------------+
| ^
| |
| Client A
|
|
| Client B
| |
| v
+------------+ +------------+ +------------+ +------------+ +------------+
| swp0 | | swp1 | | swp2 | | swp3 | | wlan0 |
+------------+ +------------+ +------------+ +------------+ +------------+
+------------------------------------------------------------------------+
| br0 |
+------------------------------------------------------------------------+
AP 2
- br0 of AP 1 still knows that Client A is reachable via wlan0 (no change)
- br0 of AP 1 will (possibly) know that Client B has left wlan0. There
are cases where it might never find out though. Either way, DSA today
does not process that notification in any way.
- the hardware FDB of the DSA switch on AP 1 may learn autonomously that
Client B can be reached via swp0, if it receives any packet with
Client 1's source MAC address over Ethernet.
- the hardware FDB of the DSA switch on AP 2 still thinks that Client B
can be reached via swp0. It does not know that it has roamed to wlan0,
because it doesn't perform SA learning from the CPU port.
Now Client A contacts Client B.
AP 1 routes the packet fine towards swp0 and delivers it on the Ethernet
segment.
AP 2 sees a frame on swp0 and its fdb says that the destination is swp0.
Hairpinning is disabled => drop.
This problem comes from the fact that these switches have a 'blind spot'
for addresses coming from software bridging. The generic solution is not
to assume that hardware learning can be enabled somehow, but to listen
to more bridge learning events. It turns out that the bridge driver does
learn in software from all inbound frames, in __br_handle_local_finish.
A proper SWITCHDEV_FDB_ADD_TO_DEVICE notification is emitted for the
addresses serviced by the bridge on 'foreign' interfaces. The software
bridge also does the right thing on migration, by notifying that the old
entry is deleted, so that does not need to be special-cased in DSA. When
it is deleted, we just need to delete our static FDB entry towards the
CPU too, and wait.
The problem is that DSA currently only cares about SWITCHDEV_FDB_ADD_TO_DEVICE
events received on its own interfaces, such as static FDB entries.
Luckily we can change that, and DSA can listen to all switchdev FDB
add/del events in the system and figure out if those events were emitted
by a bridge that spans at least one of DSA's own ports. In case that is
true, DSA will also offload that address towards its own CPU port, in
the eventuality that there might be bridge clients attached to the DSA
switch who want to talk to the station connected to the foreign
interface.
In terms of implementation, we need to keep the fdb_info->added_by_user
check for the case where the switchdev event was targeted directly at a
DSA switch port. But we don't need to look at that flag for snooped
events. So the check is currently too late, we need to move it earlier.
This also simplifies the code a bit, since we avoid uselessly allocating
and freeing switchdev_work.
We could probably do some improvements in the future. For example,
multi-bridge support is rudimentary at the moment. If there are two
bridges spanning a DSA switch's ports, and both of them need to service
the same MAC address, then what will happen is that the migration of one
of those stations will trigger the deletion of the FDB entry from the
CPU port while it is still used by other bridge. That could be improved
with reference counting but is left for another time.
This behavior needs to be enabled at driver level by setting
ds->assisted_learning_on_cpu_port = true. This is because we don't want
to inflict a potential performance penalty (accesses through
MDIO/I2C/SPI are expensive) to hardware that really doesn't need it
because address learning on the CPU port works there.
Reported-by: DENG Qingfang <dqfext@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Wed, 6 Jan 2021 09:51:34 +0000 (11:51 +0200)]
net: dsa: exit early in dsa_slave_switchdev_event if we can't program the FDB
Right now, the following would happen for a switch driver that does not
implement .port_fdb_add or .port_fdb_del.
dsa_slave_switchdev_event returns NOTIFY_OK and schedules:
-> dsa_slave_switchdev_event_work
-> dsa_port_fdb_add
-> dsa_port_notify(DSA_NOTIFIER_FDB_ADD)
-> dsa_switch_fdb_add
-> if (!ds->ops->port_fdb_add) return -EOPNOTSUPP;
-> an error is printed with dev_dbg, and
dsa_fdb_offload_notify(switchdev_work) is not called.
We can avoid scheduling the worker for nothing and say NOTIFY_DONE.
Because we don't call dsa_fdb_offload_notify, the static FDB entry will
remain just in the software bridge.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Wed, 6 Jan 2021 09:51:33 +0000 (11:51 +0200)]
net: dsa: move switchdev event implementation under the same switch/case statement
We'll need to start listening to SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE
events even for interfaces where dsa_slave_dev_check returns false, so
we need that check inside the switch-case statement for SWITCHDEV_FDB_*.
This movement also avoids a useless allocation / free of switchdev_work
on the untreated "default event" case.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Wed, 6 Jan 2021 09:51:32 +0000 (11:51 +0200)]
net: dsa: don't use switchdev_notifier_fdb_info in dsa_switchdev_event_work
Currently DSA doesn't add FDB entries on the CPU port, because it only
does so through switchdev, which is associated with a net_device, and
there are none of those for the CPU port.
But actually FDB addresses on the CPU port have some use cases of their
own, if the switchdev operations are initiated from within the DSA
layer. There is just one problem with the existing code: it passes a
structure in dsa_switchdev_event_work which was retrieved directly from
switchdev, so it contains a net_device. We need to generalize the
contents to something that covers the CPU port as well: the "ds, port"
tuple is fine for that.
Note that the new procedure for notifying the successful FDB offload is
inspired from the rocker model.
Also, nothing was being done if added_by_user was false. Let's check for
that a lot earlier, and don't actually bother to schedule the worker
for nothing.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Wed, 6 Jan 2021 09:51:31 +0000 (11:51 +0200)]
net: dsa: be louder when a non-legacy FDB operation fails
The dev_close() call was added in commit
c9eb3e0f8701 ("net: dsa: Add
support for learning FDB through notification") "to indicate inconsistent
situation" when we could not delete an FDB entry from the port.
bridge fdb del d8:58:d7:00:ca:6d dev swp0 self master
It is a bit drastic and at the same time not helpful if the above fails
to only print with netdev_dbg log level, but on the other hand to bring
the interface down.
So increase the verbosity of the error message, and drop dev_close().
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Wed, 6 Jan 2021 09:51:30 +0000 (11:51 +0200)]
net: bridge: notify switchdev of disappearance of old FDB entry upon migration
Currently the bridge emits atomic switchdev notifications for
dynamically learnt FDB entries. Monitoring these notifications works
wonders for switchdev drivers that want to keep their hardware FDB in
sync with the bridge's FDB.
For example station A wants to talk to station B in the diagram below,
and we are concerned with the behavior of the bridge on the DUT device:
DUT
+-------------------------------------+
| br0 |
| +------+ +------+ +------+ +------+ |
| | | | | | | | | |
| | swp0 | | swp1 | | swp2 | | eth0 | |
+-------------------------------------+
| | |
Station A | |
| |
+--+------+--+ +--+------+--+
| | | | | | | |
| | swp0 | | | | swp0 | |
Another | +------+ | | +------+ | Another
switch | br0 | | br0 | switch
| +------+ | | +------+ |
| | | | | | | |
| | swp1 | | | | swp1 | |
+--+------+--+ +--+------+--+
|
Station B
Interfaces swp0, swp1, swp2 are handled by a switchdev driver that has
the following property: frames injected from its control interface bypass
the internal address analyzer logic, and therefore, this hardware does
not learn from the source address of packets transmitted by the network
stack through it. So, since bridging between eth0 (where Station B is
attached) and swp0 (where Station A is attached) is done in software,
the switchdev hardware will never learn the source address of Station B.
So the traffic towards that destination will be treated as unknown, i.e.
flooded.
This is where the bridge notifications come in handy. When br0 on the
DUT sees frames with Station B's MAC address on eth0, the switchdev
driver gets these notifications and can install a rule to send frames
towards Station B's address that are incoming from swp0, swp1, swp2,
only towards the control interface. This is all switchdev driver private
business, which the notification makes possible.
All is fine until someone unplugs Station B's cable and moves it to the
other switch:
DUT
+-------------------------------------+
| br0 |
| +------+ +------+ +------+ +------+ |
| | | | | | | | | |
| | swp0 | | swp1 | | swp2 | | eth0 | |
+-------------------------------------+
| | |
Station A | |
| |
+--+------+--+ +--+------+--+
| | | | | | | |
| | swp0 | | | | swp0 | |
Another | +------+ | | +------+ | Another
switch | br0 | | br0 | switch
| +------+ | | +------+ |
| | | | | | | |
| | swp1 | | | | swp1 | |
+--+------+--+ +--+------+--+
|
Station B
Luckily for the use cases we care about, Station B is noisy enough that
the DUT hears it (on swp1 this time). swp1 receives the frames and
delivers them to the bridge, who enters the unlikely path in br_fdb_update
of updating an existing entry. It moves the entry in the software bridge
to swp1 and emits an addition notification towards that.
As far as the switchdev driver is concerned, all that it needs to ensure
is that traffic between Station A and Station B is not forever broken.
If it does nothing, then the stale rule to send frames for Station B
towards the control interface remains in place. But Station B is no
longer reachable via the control interface, but via a port that can
offload the bridge port learning attribute. It's just that the port is
prevented from learning this address, since the rule overrides FDB
updates. So the rule needs to go. The question is via what mechanism.
It sure would be possible for this switchdev driver to keep track of all
addresses which are sent to the control interface, and then also listen
for bridge notifier events on its own ports, searching for the ones that
have a MAC address which was previously sent to the control interface.
But this is cumbersome and inefficient. Instead, with one small change,
the bridge could notify of the address deletion from the old port, in a
symmetrical manner with how it did for the insertion. Then the switchdev
driver would not be required to monitor learn/forget events for its own
ports. It could just delete the rule towards the control interface upon
bridge entry migration. This would make hardware address learning be
possible again. Then it would take a few more packets until the hardware
and software FDB would be in sync again.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 7 Jan 2021 23:10:26 +0000 (15:10 -0800)]
Merge https://git./linux/kernel/git/bpf/bpf
Alexei Starovoitov says:
====================
pull-request: bpf 2021-01-07
We've added 4 non-merge commits during the last 10 day(s) which contain
a total of 4 files changed, 14 insertions(+), 7 deletions(-).
The main changes are:
1) Fix task_iter bug caused by the merge conflict resolution, from Yonghong.
2) Fix resolve_btfids for multiple type hierarchies, from Jiri.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpftool: Fix compilation failure for net.o with older glibc
tools/resolve_btfids: Warn when having multiple IDs for single type
bpf: Fix a task_iter bug caused by a merge conflict resolution
selftests/bpf: Fix a compile error for BPF_F_BPRM_SECUREEXEC
====================
Link: https://lore.kernel.org/r/20210107221555.64959-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 7 Jan 2021 22:56:35 +0000 (14:56 -0800)]
Merge branch 'r8169-improve-rtl8168g-phy-suspend-quirk'
Heiner Kallweit says:
====================
r8169: improve RTL8168g PHY suspend quirk
According to Realtek the ERI register 0x1a8 quirk is needed to work
around a hw issue with the PHY on RTL8168g. The register needs to be
changed before powering down the PHY. Currently we don't meet this
requirement, however I'm not aware of any problems caused by this.
Therefore I see the change as an improvement.
The PHY driver has no means to access the chip ERI registers,
therefore we have to intercept MDIO writes to the BMCR register.
If the BMCR_PDOWN bit is going to be set, then let's apply the
quirk before actually powering down the PHY.
====================
Link: https://lore.kernel.org/r/9303c2cf-c521-beea-c09f-63b5dfa91b9c@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit [Wed, 6 Jan 2021 10:49:50 +0000 (11:49 +0100)]
r8169: improve RTL8168g PHY suspend quirk
According to Realtek the ERI register 0x1a8 quirk is needed to work
around a hw issue with the PHY on RTL8168g. The register needs to be
changed before powering down the PHY. Currently we don't meet this
requirement, however I'm not aware of any problems caused by this.
Therefore I see the change as an improvement.
The PHY driver has no means to access the chip ERI registers,
therefore we have to intercept MDIO writes to BMCR register.
If the BMCR_PDOWN bit is going to be set, then let's apply the
quirk before actually powering down the PHY.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>