Parav Pandit [Mon, 1 Mar 2021 12:12:13 +0000 (14:12 +0200)]
net/mlx5: E-Switch, Initialize eswitch acls ns when eswitch is enabled
Currently eswitch flow steering (FS) namespace of vport's ingress and
egress ACL are enabled when FS layer is initialized. This is done even
when eswitch is diabled. This demands that total eswitch ports to be
known to FS layer without eswitch in use.
Given the FS core is not dependent on eswitch, make namespace init and
cleanup routines as helper routines to be invoked only when eswitch is
needed.
With this change, ingress and egress ACL namespaces are created only
when eswitch legacy/offloads mode is enabled.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Parav Pandit [Fri, 26 Feb 2021 12:30:50 +0000 (14:30 +0200)]
net/mlx5: E-Switch, Move legacy code to a individual file
Currently eswitch offers two modes. Legacy and offloads.
Offloads code is already in its own file eswitch_offloads.c
However eswitch.c contains the eswitch legacy code and common
infrastructure code.
To enable future extensions and to better manage generic common eswitch
infrastructure code, move the legacy code to its own legacy.c file.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Parav Pandit [Fri, 26 Feb 2021 12:36:22 +0000 (14:36 +0200)]
net/mlx5: E-Switch, Convert a macro to a helper routine
Convert ESW_ALLOWED macro to a helper routine so that it can be used in
other eswitch files.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Parav Pandit [Tue, 2 Mar 2021 19:39:01 +0000 (21:39 +0200)]
net/mlx5: E-Switch Make cleanup sequence mirror of init
Make cleanup sequence mirror of init sequence for cleaning up reps
and freeing vports.
Also when reps initialization fails, there is no need to perform reps
cleanup.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Parav Pandit [Tue, 2 Mar 2021 19:27:47 +0000 (21:27 +0200)]
net/mlx5: E-Switch, Make vport number u16
Vport number is 16-bit field in hardware. Make it u16.
Move location of vport in the structure so that it reduces a hole
in the structure.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Parav Pandit [Mon, 8 Mar 2021 09:50:40 +0000 (11:50 +0200)]
net/mlx5: E-Switch, Skip querying SF enabled bits
With vhca events, SF state is queried through the VHCA events. Device no
longer expects SF bitmap in the query eswitch functions command.
Hence, remove it to simplify the code.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Parav Pandit [Fri, 30 Oct 2020 20:44:16 +0000 (22:44 +0200)]
net/mlx5: E-Switch, let user to enable disable metadata
Currently each packet inserted in eswitch is tagged with a internal
metadata to indicate source vport. Metadata tagging is not always
needed. Metadata insertion is needed for multi-port RoCE, failover
between representors and stacked devices. In many other cases,
metadata enablement is not needed.
Metadata insertion slows down the packet processing rate of the E-switch
when it is in switchdev mode.
Below table show performance gain with metadata disabled for VXLAN
offload rules in both SMFS and DMFS steering mode on ConnectX-5 device.
----------------------------------------------
| steering | metadata | pkt size | rx pps |
| mode | | | (million) |
----------------------------------------------
| smfs | disabled | 128Bytes | 42 |
----------------------------------------------
| smfs | enabled | 128Bytes | 36 |
----------------------------------------------
| dmfs | disabled | 128Bytes | 42 |
----------------------------------------------
| dmfs | enabled | 128Bytes | 36 |
----------------------------------------------
Hence, allow user to disable metadata using driver specific devlink
parameter. Metadata setting of the eswitch is applicable only for the
switchdev mode.
Example to show and disable metadata before changing eswitch mode:
$ devlink dev param show pci/0000:06:00.0 name esw_port_metadata
pci/0000:06:00.0:
name esw_port_metadata type driver-specific
values:
cmode runtime value true
$ devlink dev param set pci/0000:06:00.0 \
name esw_port_metadata value false cmode runtime
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
changelog:
v1->v2:
- added performance numbers in commit log
- updated commit log and documentation for switchdev mode
- added explicit note on when user can disable metadata in
documentation
Adam Ford [Mon, 12 Apr 2021 13:26:19 +0000 (08:26 -0500)]
net: ethernet: ravb: Enable optional refclk
For devices that use a programmable clock for the AVB reference clock,
the driver may need to enable them. Add code to find the optional clock
and enable it when available.
Signed-off-by: Adam Ford <aford173@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adam Ford [Mon, 12 Apr 2021 13:26:18 +0000 (08:26 -0500)]
dt-bindings: net: renesas,etheravb: Add additional clocks
The AVB driver assumes there is an external crystal, but it could
be clocked by other means. In order to enable a programmable
clock, it needs to be added to the clocks list and enabled in the
driver. Since there currently only one clock, there is no
clock-names list either.
Update bindings to add the additional optional clock, and explicitly
name both of them.
Signed-off-by: Adam Ford <aford173@gmail.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Rob Herring <robh@kernel.org>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 12 Apr 2021 20:34:21 +0000 (13:34 -0700)]
Merge branch 'enetc-ptp'
Yangbo Lu says:
====================
enetc: support PTP Sync packet one-step timestamping
This patch-set is to add support for PTP Sync packet one-step timestamping.
Since ENETC single-step register has to be configured dynamically per
packet for correctionField offeset and UDP checksum update, current
one-step timestamping packet has to be sent only when the last one
completes transmitting on hardware. So, on the TX, this patch handles
one-step timestamping packet as below:
- Trasmit packet immediately if no other one in transfer, or queue to
skb queue if there is already one in transfer.
The test_and_set_bit_lock() is used here to lock and check state.
- Start a work when complete transfer on hardware, to release the bit
lock and to send one skb in skb queue if has.
Changes for v2:
- Rebased.
- Fixed issues from patchwork checks.
- netif_tx_lock for one-step timestamping packet sending.
Changes for v3:
- Used system workqueue.
- Set bit lock when transmitted one-step packet, and scheduled
work when completed. The worker cleared the bit lock, and
transmitted one skb in skb queue if has, instead of a loop.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yangbo Lu [Mon, 12 Apr 2021 09:03:27 +0000 (17:03 +0800)]
enetc: support PTP Sync packet one-step timestamping
This patch is to add support for PTP Sync packet one-step timestamping.
Since ENETC single-step register has to be configured dynamically per
packet for correctionField offeset and UDP checksum update, current
one-step timestamping packet has to be sent only when the last one
completes transmitting on hardware. So, on the TX, this patch handles
one-step timestamping packet as below:
- Trasmit packet immediately if no other one in transfer, or queue to
skb queue if there is already one in transfer.
The test_and_set_bit_lock() is used here to lock and check state.
- Start a work when complete transfer on hardware, to release the bit
lock and to send one skb in skb queue if has.
And the configuration for one-step timestamping on ENETC before
transmitting is,
- Set one-step timestamping flag in extension BD.
- Write 30 bits current timestamp in tstamp field of extension BD.
- Update PTP Sync packet originTimestamp field with current timestamp.
- Configure single-step register for correctionField offeset and UDP
checksum update.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yangbo Lu [Mon, 12 Apr 2021 09:03:26 +0000 (17:03 +0800)]
enetc: mark TX timestamp type per skb
Mark TX timestamp type per skb on skb->cb[0], instead of
global variable for all skbs. This is a preparation for
one step timestamp support.
For one-step timestamping enablement, there will be both
one-step and two-step PTP messages to transfer. And a skb
queue is needed for one-step PTP messages making sure
start to send current message only after the last one
completed on hardware. (ENETC single-step register has to
be dynamically configured per message.) So, marking TX
timestamp type per skb is required.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 12 Apr 2021 20:31:27 +0000 (13:31 -0700)]
Merge branch 'ibmvnic-errors'
Lijun Pan says:
====================
ibmvnic: improve error printing
Patch 1 prints reset reason as a string.
Patch 2 prints adapter state as a string.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Lijun Pan [Mon, 12 Apr 2021 07:41:28 +0000 (02:41 -0500)]
ibmvnic: print adapter state as a string
The adapter state can be added or deleted over different versions
of the source code. Print a string instead of a number.
Signed-off-by: Lijun Pan <lijunp213@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lijun Pan [Mon, 12 Apr 2021 07:41:27 +0000 (02:41 -0500)]
ibmvnic: print reset reason as a string
The reset reason can be added or deleted over different versions
of the source code. Print a string instead of a number.
Signed-off-by: Lijun Pan <lijunp213@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lijun Pan [Mon, 12 Apr 2021 07:40:59 +0000 (02:40 -0500)]
ibmvnic: clean up the remaining debugfs data structures
Commit
e704f0434ea6 ("ibmvnic: Remove debugfs support") did not
clean up everything. Remove the remaining code.
Signed-off-by: Lijun Pan <lijunp213@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 12 Apr 2021 20:27:11 +0000 (13:27 -0700)]
Merge branch 'netns-sysctl-isolation'
Jonathon Reinhart says:
====================
Ensuring net sysctl isolation
This patchset is the result of an audit of /proc/sys/net to prove that
it is safe to be mouted read-write in a container when a net namespace
is in use. See [1].
The first commit adds code to detect sysctls which are not netns-safe,
and can "leak" changes to other net namespaces.
My manual audit found, and the above feature confirmed, that there are
two nf_conntrack sysctls which are in fact not netns-safe.
I considered sending the latter to netfilter-devel, but I think it's
better to have both together on net-next: Adding only the former causes
undesirable warnings in the kernel log.
[1]: https://github.com/opencontainers/runc/issues/2826
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jonathon Reinhart [Mon, 12 Apr 2021 04:24:53 +0000 (00:24 -0400)]
netfilter: conntrack: Make global sysctls readonly in non-init netns
These sysctls point to global variables:
- NF_SYSCTL_CT_MAX (&nf_conntrack_max)
- NF_SYSCTL_CT_EXPECT_MAX (&nf_ct_expect_max)
- NF_SYSCTL_CT_BUCKETS (&nf_conntrack_htable_size_user)
Because their data pointers are not updated to point to per-netns
structures, they must be marked read-only in a non-init_net ns.
Otherwise, changes in any net namespace are reflected in (leaked into)
all other net namespaces. This problem has existed since the
introduction of net namespaces.
The current logic marks them read-only only if the net namespace is
owned by an unprivileged user (other than init_user_ns).
Commit
d0febd81ae77 ("netfilter: conntrack: re-visit sysctls in
unprivileged namespaces") "exposes all sysctls even if the namespace is
unpriviliged." Since we need to mark them readonly in any case, we can
forego the unprivileged user check altogether.
Fixes:
d0febd81ae77 ("netfilter: conntrack: re-visit sysctls in unprivileged namespaces")
Signed-off-by: Jonathon Reinhart <Jonathon.Reinhart@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jonathon Reinhart [Mon, 12 Apr 2021 04:24:52 +0000 (00:24 -0400)]
net: Ensure net namespace isolation of sysctls
This adds an ensure_safe_net_sysctl() check during register_net_sysctl()
to validate that sysctl table entries for a non-init_net netns are
sufficiently isolated. To be netns-safe, an entry must adhere to at
least (and usually exactly) one of these rules:
1. It is marked read-only inside the netns.
2. Its data pointer does not point to kernel/module global data.
An entry which fails both of these checks is indicative of a bug,
whereby a child netns can affect global net sysctl values.
If such an entry is found, this code will issue a warning to the kernel
log, and force the entry to be read-only to prevent a leak.
To test, simply create a new netns:
$ sudo ip netns add dummy
As it sits now, this patch will WARN for two sysctls which will be
addressed in a subsequent patch:
- /proc/sys/net/netfilter/nf_conntrack_max
- /proc/sys/net/netfilter/nf_conntrack_expect_max
Signed-off-by: Jonathon Reinhart <Jonathon.Reinhart@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
wengjianfeng [Mon, 12 Apr 2021 02:20:06 +0000 (10:20 +0800)]
nfc: pn533: remove redundant assignment
In many places,first assign a value to a variable and then return
the variable. which is redundant, we should directly return the value.
in pn533_rf_field funciton,return rc also in the if statement, so we
use return 0 to replace the last return rc.
Signed-off-by: wengjianfeng <wengjianfeng@yulong.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 12 Apr 2021 20:20:38 +0000 (13:20 -0700)]
Merge branch 'bnxt_en-error-recovery'
Michael Chan says:
====================
bnxt_en: Error recovery fixes.
This series adds some fixes and enhancements to the error recovery
logic. The health register logic is improved and we also add missing
code to free and re-create VF representors in the firmware after
error recovery.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sriharsha Basavapatna [Mon, 12 Apr 2021 00:18:15 +0000 (20:18 -0400)]
bnxt_en: Free and allocate VF-Reps during error recovery.
During firmware recovery, VF-Rep configuration in the firmware is lost.
Fix it by freeing and (re)allocating VF-Reps in FW at relevant points
during the error recovery process.
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 12 Apr 2021 00:18:14 +0000 (20:18 -0400)]
bnxt_en: Refactor __bnxt_vf_reps_destroy().
Add a new helper function __bnxt_free_one_vf_rep() to free one VF rep.
We also reintialize the VF rep fields to proper initial values so that
the function can be used without freeing the VF rep data structure. This
will be used in subsequent patches to free and recreate VF reps after
error recovery.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Reviewed-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sriharsha Basavapatna [Mon, 12 Apr 2021 00:18:13 +0000 (20:18 -0400)]
bnxt_en: Refactor bnxt_vf_reps_create().
Add a new function bnxt_alloc_vf_rep() to allocate a VF representor.
This function will be needed in subsequent patches to recreate the
VF reps after error recovery.
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasundhara Volam [Mon, 12 Apr 2021 00:18:12 +0000 (20:18 -0400)]
bnxt_en: Invalidate health register mapping at the end of probe.
After probe is successful, interface may not be bought up in all
the cases and health register mapping could be invalid if firmware
undergoes reset. Fix it by invalidating the health register at the
end of probe. It will be remapped during ifup.
Fixes:
43a440c4007b ("bnxt_en: Improve the status_reliable flag in bp->fw_health.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 12 Apr 2021 00:18:11 +0000 (20:18 -0400)]
bnxt_en: Treat health register value 0 as valid in bnxt_try_reover_fw().
The retry loop in bnxt_try_recover_fw() should not abort when the
health register value is 0. It is a valid value that indicates the
firmware is booting up.
Fixes:
861aae786f2f ("bnxt_en: Enhance retry of the first message to the firmware.")
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrea Mayer [Sat, 10 Apr 2021 17:46:14 +0000 (19:46 +0200)]
net: seg6: trivial fix of a spelling mistake in comment
There is a comment spelling mistake "interfarence" -> "interference" in
function parse_nla_action(). Fix it.
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Fri, 9 Apr 2021 16:37:26 +0000 (17:37 +0100)]
net: hns3: Fix potential null pointer defererence of null ae_dev
The reset_prepare and reset_done calls have a null pointer check
on ae_dev however ae_dev is being dereferenced via the call to
ns3_is_phys_func with the ae->pdev argument. Fix this by performing
a null pointer check on ae_dev and hence short-circuiting the
dereference to ae_dev on the call to ns3_is_phys_func.
Addresses-Coverity: ("Dereference before null check")
Fixes:
715c58e94f0d ("net: hns3: add suspend and resume pm_ops")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Fri, 9 Apr 2021 13:07:26 +0000 (14:07 +0100)]
net: thunderx: Fix unintentional sign extension issue
The shifting of the u8 integers rq->caching by 26 bits to
the left will be promoted to a 32 bit signed int and then
sign-extended to a u64. In the event that rq->caching is
greater than 0x1f then all then all the upper 32 bits of
the u64 end up as also being set because of the int
sign-extension. Fix this by casting the u8 values to a
u64 before the 26 bit left shift.
Addresses-Coverity: ("Unintended sign extension")
Fixes:
4863dea3fab0 ("net: Adding support for Cavium ThunderX network controller")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Fri, 9 Apr 2021 11:08:57 +0000 (12:08 +0100)]
cxgb4: Fix unintentional sign extension issues
The shifting of the u8 integers f->fs.nat_lip[] by 24 bits to
the left will be promoted to a 32 bit signed int and then
sign-extended to a u64. In the event that the top bit of the u8
is set then all then all the upper 32 bits of the u64 end up as
also being set because of the sign-extension. Fix this by
casting the u8 values to a u64 before the 24 bit left shift.
Addresses-Coverity: ("Unintended sign extension")
Fixes:
12b276fbf6e0 ("cxgb4: add support to create hash filters")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 11 Apr 2021 23:49:08 +0000 (16:49 -0700)]
Merge branch 'ipa-next'
Alex Elder says:
====================
net: ipa: support two more platforms
This series adds IPA support for two more Qualcomm SoCs.
The first patch updates the DT binding to add compatible strings.
The second temporarily disables checksum offload support for IPA
version 4.5 and above. Changes are required to the RMNet driver
to support the "inline" checksum offload used for IPA v4.5+, and
once those are present this capability will be enabled for IPA.
The third and fourth patches add configuration data for IPA versions
4.5 (used for the SDX55 SoC) and 4.11 (used for the SD7280 SoC).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Fri, 9 Apr 2021 20:40:24 +0000 (15:40 -0500)]
net: ipa: add IPA v4.11 configuration data
Add support for the SC7280 SoC, which includes IPA version 4.11.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Fri, 9 Apr 2021 20:40:23 +0000 (15:40 -0500)]
net: ipa: add IPA v4.5 configuration data
Add support for the SDX55 SoC, which includes IPA version 4.5.
Starting with IPA v4.5, a few of the memory regions have a different
number of "canary" values; update comments in the where the region
identifers are defined to accurately reflect that.
I'll note three differences in SDX55 versus the other two existing
platforms (SDM845 and SC7180):
- SDX55 uses a 32-bit Linux kernel
- SDX55 has four interconnects rather than three
- SDX55 uses IPA v4.5, which uses inline checksum offload
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Fri, 9 Apr 2021 20:40:22 +0000 (15:40 -0500)]
net: ipa: disable checksum offload for IPA v4.5+
Checksum offload for IPA v4.5+ is implemented differently, using
"inline" offload (which uses a common header format for both upload
and download offload).
The IPA hardware must be programmed to enable MAP checksum offload,
but the RMNet driver is responsible for interpreting checksum
metadata supplied with messages.
Currently, the RMNet driver does not support inline checksum offload.
This support is imminent, but until it is available, do not allow
newer versions of IPA to specify checksum offload for endpoints.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Fri, 9 Apr 2021 20:40:21 +0000 (15:40 -0500)]
dt-bindings: net: qcom,ipa: add some compatible strings
Add existing supported platform "qcom,sc7180-ipa" to the set of IPA
compatible strings. Also add newly-supported "qcom,sdx55-ipa",
"qcom,sc7280-ipa".
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Qiheng Lin [Fri, 9 Apr 2021 11:09:11 +0000 (19:09 +0800)]
ehea: add missing MODULE_DEVICE_TABLE
This patch adds missing MODULE_DEVICE_TABLE definition which generates
correct modalias for automatic loading of this driver when it is built
as an external module.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Qiheng Lin <linqiheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 11 Apr 2021 23:39:28 +0000 (16:39 -0700)]
Merge branch 'veth-gro'
Paolo Abeni says:
====================
veth: allow GRO even without XDP
This series allows the user-space to enable GRO/NAPI on a veth
device even without attaching an XDP program.
It does not change the default veth behavior (no NAPI, no GRO),
except that the GRO feature bit on top of this series will be
effectively off by default on veth devices. Note that currently
the GRO bit is on by default, but GRO never takes place in
absence of XDP.
On top of this series, setting the GRO feature bit enables NAPI
and allows the GRO to take place. The TSO features on the peer
device are preserved.
The main goal is improving UDP forwarding performances for
containers in a typical virtual network setup:
(container) veth -> veth peer -> bridge/ovs -> vxlan -> NIC
Enabling the NAPI threaded mode, GRO the NETIF_F_GRO_UDP_FWD
feature on the veth peer improves the UDP stream performance
with not void netfilter configuration by 2x factor with no
measurable overhead for TCP traffic: some heuristic ensures
that TCP will not go through the additional NAPI/GRO layer.
Some self-tests are added to check the expected behavior in
the default configuration, with XDP and with plain GRO enabled.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 9 Apr 2021 11:04:40 +0000 (13:04 +0200)]
self-tests: add veth tests
Add some basic veth tests, that verify the expected flags and
aggregation with different setups (default, xdp, etc...)
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 9 Apr 2021 11:04:39 +0000 (13:04 +0200)]
veth: refine napi usage
After the previous patch, when enabling GRO, locally generated
TCP traffic experiences some measurable overhead, as it traverses
the GRO engine without any chance of aggregation.
This change refine the NAPI receive path admission test, to avoid
unnecessary GRO overhead in most scenarios, when GRO is enabled
on a veth peer.
Only skbs that are eligible for aggregation enter the GRO layer,
the others will go through the traditional receive path.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 9 Apr 2021 11:04:38 +0000 (13:04 +0200)]
veth: allow enabling NAPI even without XDP
Currently the veth device has the GRO feature bit set, even if
no GRO aggregation is possible with the default configuration,
as the veth device does not hook into the GRO engine.
Flipping the GRO feature bit from user-space is a no-op, unless
XDP is enabled. In such scenario GRO could actually take place, but
TSO is forced to off on the peer device.
This change allow user-space to really control the GRO feature, with
no need for an XDP program.
The GRO feature bit is now cleared by default - so that there are no
user-visible behavior changes with the default configuration.
When the GRO bit is set, the per-queue NAPI instances are initialized
and registered. On xmit, when napi instances are available, we try
to use them.
Some additional checks are in place to ensure we initialize/delete NAPIs
only when needed in case of overlapping XDP and GRO configuration
changes.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 9 Apr 2021 11:04:37 +0000 (13:04 +0200)]
veth: use skb_orphan_partial instead of skb_orphan
As described by commit
9c4c325252c5 ("skbuff: preserve sock
reference when scrubbing the skb."), orphaning a skb
in the TX path will cause OoO.
Let's use skb_orphan_partial() instead of skb_orphan(), so
that we keep the sk around for queue's selection sake and we
still avoid the problem fixed with commit
4bf9ffa0fb57 ("veth:
Orphan skb before GRO")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 11 Apr 2021 23:34:56 +0000 (16:34 -0700)]
Merge branch 'ethtool-eeprom'
Moshe Shemesh says:
====================
ethtool: Extend module EEPROM dump API
Ethtool supports module EEPROM dumps via the `ethtool -m <dev>` command.
But in current state its functionality is limited - offset and length
parameters, which are used to specify a linear desired region of EEPROM
data to dump, is not enough, considering emergence of complex module
EEPROM layouts such as CMIS 4.0.
Moreover, CMIS 4.0 extends the amount of pages that may be accessible by
introducing another parameter for page addressing - banks.
Besides, currently module EEPROM is represented as a chunk of
concatenated pages, where lower 128 bytes of all pages, except page 00h,
are omitted. Offset and length are used to address parts of this fake
linear memory. But in practice drivers, which implement
get_module_info() and get_module_eeprom() ethtool ops still calculate
page number and set I2C address on their own.
This series tackles these issues by adding ethtool op, which allows to
pass page number, bank number and I2C address in addition to offset and
length parameters to the driver, adds corresponding netlink
infrastructure and implements the new interface in mlx5 driver.
This allows to extend userspace 'ethtool -m' CLI by adding new
parameters - page, bank and i2c. New command line format:
ethtool -m <dev> [hex on|off] [raw on|off] [offset N] [length N] [page N] [bank N] [i2c N]
The consequence of this series is a possibility to dump arbitrary EEPROM
page at a time, in contrast to dumps of concatenated pages. Therefore,
offset and length change their semantics and may be used only to specify
a part of data within half page boundary, which size is currently limited
to 128 bytes.
As for drivers that support legacy get_module_info() and
get_module_eeprom() pair, the series addresses it by implementing a
fallback mechanism. As mentioned earlier, such drivers derive a page
number from 'global' offset, so this can be done vice versa without
their involvement thanks to standardization. If kernel netlink handler
of 'ethtool -m' command detects that new ethtool op is not supported by
the driver, it calculates offset from given page number and page offset
and calls old ndos, if they are available.
====================
\Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Fri, 9 Apr 2021 08:06:41 +0000 (11:06 +0300)]
ethtool: wire in generic SFP module access
If the device has a sfp bus attached, call its
sfp_get_module_eeprom_by_page() function, otherwise use the ethtool op
for the device. This follows how the IOCTL works.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Fri, 9 Apr 2021 08:06:40 +0000 (11:06 +0300)]
phy: sfp: add netlink SFP support to generic SFP code
The new netlink API for reading SFP data requires a new op to be
implemented. The idea of the new netlink SFP code is that userspace is
responsible to parsing the EEPROM data and requesting pages, rather
than have the kernel decide what pages are interesting and returning
them. This allows greater flexibility for newer formats.
Currently the generic SFP code only supports simple SFPs. Allow i2c
address 0x50 and 0x51 to be accessed with page and bank must always be
0. This interface will later be extended when for example QSFP support
is added.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladyslav Tarasiuk [Fri, 9 Apr 2021 08:06:39 +0000 (11:06 +0300)]
ethtool: Add fallback to get_module_eeprom from netlink command
In case netlink get_module_eeprom_by_page() callback is not implemented
by the driver, try to call old get_module_info() and get_module_eeprom()
pair. Recalculate parameters to get_module_eeprom() offset and len using
page number and their sizes. Return error if this can't be done.
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Fri, 9 Apr 2021 08:06:38 +0000 (11:06 +0300)]
net: ethtool: Export helpers for getting EEPROM info
There are two ways to retrieve information from SFP EEPROMs. Many
devices make use of the common code, and assign the sfp_bus pointer in
the netdev to point to the bus holding the SFP device. Some MAC
drivers directly implement ops in there ethool structure.
Export within net/ethtool the two helpers used to call these methods,
so that they can also be used in the new netlink code.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladyslav Tarasiuk [Fri, 9 Apr 2021 08:06:37 +0000 (11:06 +0300)]
net/mlx5: Add support for DSFP module EEPROM dumps
Allow the driver to recognise DSFP transceiver module ID and therefore
allow its EEPROM dumps using ethtool.
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladyslav Tarasiuk [Fri, 9 Apr 2021 08:06:36 +0000 (11:06 +0300)]
net/mlx5: Implement get_module_eeprom_by_page()
Implement ethtool_ops::get_module_eeprom_by_page() to enable
support of new SFP standards.
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladyslav Tarasiuk [Fri, 9 Apr 2021 08:06:35 +0000 (11:06 +0300)]
net/mlx5: Refactor module EEPROM query
Prepare for ethtool_ops::get_module_eeprom_data() implementation by
extracting common part of mlx5_query_module_eeprom() into a separate
function.
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladyslav Tarasiuk [Fri, 9 Apr 2021 08:06:34 +0000 (11:06 +0300)]
ethtool: Allow network drivers to dump arbitrary EEPROM data
Define get_module_eeprom_by_page() ethtool callback and implement
netlink infrastructure.
get_module_eeprom_by_page() allows network drivers to dump a part of
module's EEPROM specified by page and bank numbers along with offset and
length. It is effectively a netlink replacement for get_module_info()
and get_module_eeprom() pair, which is needed due to emergence of
complex non-linear EEPROM layouts.
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Sat, 10 Apr 2021 03:57:28 +0000 (20:57 -0700)]
Merge branch 'net-ipa-a-few-small-fixes'
Alex Elder says:
====================
net: ipa: a few small fixes
This series implements some minor bug fixes or improvements.
The first patch removes an apparently unnecessary restriction, which
results in an error on a 32-bit ARM build.
The second makes a definition used for SDM845 match what is used in
the downstream code.
The third just ensures two netdev pointers are only non-null when
valid.
The fourth simplifies a little code, knowing that a called function
never returns an error.
The fifth and sixth just remove some empty/place holder functions.
And the last patch fixes a comment, makes a function private, and
removes an unnecessary double-negation of a Boolean variable. This
patch produces a warning from checkpatch, indicating that a pair of
parentheses is unnecessary. I agree with that advice, but it
conflicts with a suggestion from the compiler. I left the "problem"
in place to avoid the compiler warning.
====================
Link: https://lore.kernel.org/r/20210409180722.1176868-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Fri, 9 Apr 2021 18:07:22 +0000 (13:07 -0500)]
net: ipa: three small fixes
Some time ago changes were made to stop referring to clearing the
hardware pipeline as a "tag process." Fix a comment to use the
newer terminology.
Get rid of a pointless double-negation of the Boolean toward_ipa
flag in ipa_endpoint_config().
make ipa_endpoint_exit_one() private; it's only referenced inside
"ipa_endpoint.c".
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Fri, 9 Apr 2021 18:07:21 +0000 (13:07 -0500)]
net: ipa: get rid of empty GSI functions
There are place holder functions in the GSI code that do nothing.
Remove these, knowing we can add something back in their place if
they're really needed someday.
Some of these are inverse functions (such as teardown to match setup).
Explicitly comment that there is no inverse in these cases.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Fri, 9 Apr 2021 18:07:20 +0000 (13:07 -0500)]
net: ipa: get rid of empty IPA functions
There are place holder functions in the IPA code that do nothing.
For the most part these are inverse functions, for example, once the
routing or filter tables are set up there is no need to perform any
matching teardown activity at shutdown, or in the case of an error.
These can be safely removed, resulting in some code simplification.
Add comments in these spots making it explicit that there is no
inverse.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Fri, 9 Apr 2021 18:07:19 +0000 (13:07 -0500)]
net: ipa: ipa_stop() does not return an error
In ipa_modem_stop(), if the modem netdev pointer is non-null we call
ipa_stop(). We check for an error and if one is returned we handle
it. But ipa_stop() never returns an error, so this extra handling
is unnecessary. Simplify the code in ipa_modem_stop() based on the
knowledge no error handling is needed at this spot.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Fri, 9 Apr 2021 18:07:18 +0000 (13:07 -0500)]
net: ipa: only set endpoint netdev pointer when in use
In ipa_modem_start(), we set endpoint netdev pointers before the
network device is registered. If registration fails, we don't undo
those assignments. Instead, wait to assign the netdev pointer until
after registration succeeds.
Set these endpoint netdev pointers to NULL in ipa_modem_stop()
before unregistering the network device.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Fri, 9 Apr 2021 18:07:17 +0000 (13:07 -0500)]
net: ipa: update sequence type for modem TX endpoint
On IPA v3.5.1, the sequencer type for the modem TX endpoint does not
define the replication portion in the same way the downstream code
does. This difference doesn't affect the behavior of the upstream
code, but I'd prefer the two code bases use the same configuration
value here.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Fri, 9 Apr 2021 18:07:16 +0000 (13:07 -0500)]
net: ipa: relax pool entry size requirement
I no longer know why a validation check ensured the size of an entry
passed to gsi_trans_pool_init() was restricted to be a multiple of 8.
For 32-bit builds, this condition doesn't always hold, and for DMA
pools, the size is rounded up to a power of 2 anyway.
Remove this restriction.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 10 Apr 2021 03:46:01 +0000 (20:46 -0700)]
Merge git://git./linux/kernel/git/netdev/net
Conflicts:
MAINTAINERS
- keep Chandrasekar
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
- simple fix + trust the code re-added to param.c in -next is fine
include/linux/bpf.h
- trivial
include/linux/ethtool.h
- trivial, fix kdoc while at it
include/linux/skmsg.h
- move to relevant place in tcp.c, comment re-wrapped
net/core/skmsg.c
- add the sk = sk // sk = NULL around calls
net/tipc/crypto.c
- trivial
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Claudiu Manoil [Fri, 9 Apr 2021 07:16:13 +0000 (10:16 +0300)]
enetc: Use generic rule to map Tx rings to interrupt vectors
Even if the current mapping is correct for the 1 CPU and 2 CPU cases
(currently enetc is included in SoCs with up to 2 CPUs only), better
use a generic rule for the mapping to cover all possible cases.
The number of CPUs is the same as the number of interrupt vectors:
Per device Tx rings -
device_tx_ring[idx], where idx = 0..n_rings_total-1
Per interrupt vector Tx rings -
int_vector[i].ring[j], where i = 0..n_int_vects-1
j = 0..n_rings_per_v-1
Mapping rule -
n_rings_per_v = n_rings_total / n_int_vects
for i = 0..n_int_vects - 1:
for j = 0..n_rings_per_v - 1:
idx = n_int_vects * j + i
int_vector[i].ring[j] <- device_tx_ring[idx]
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20210409071613.28912-1-claudiu.manoil@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Fri, 9 Apr 2021 19:27:59 +0000 (22:27 +0300)]
net: enetc: fix TX ring interrupt storm
The blamed commit introduced a bit in the TX software buffer descriptor
structure for determining whether a BD is final or not; we rearm the TX
interrupt vector for every frame (hence final BD) transmitted.
But there is a problem with the patch: it replaced a condition whose
expression is a bool which was evaluated at the beginning of the "while"
loop with a bool expression that is evaluated on the spot: tx_swbd->is_eof.
The problem with the latter expression is that the tx_swbd has already
been incremented at that stage, so the tx_swbd->is_eof check is in fact
with the _next_ software BD. Which is _not_ final.
The effect is that the CPU is in 100% load with ksoftirqd because it
does not acknowledge the TX interrupt, so the handler keeps getting
called again and again.
The fix is to restore the code structure, and keep the local bool is_eof
variable, just to assign it the tx_swbd->is_eof value instead of
!!tx_swbd->skb.
Fixes:
d504498d2eb3 ("net: enetc: add a dedicated is_eof bit in the TX software BD")
Reported-by: Alex Marginean <alexandru.marginean@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://lore.kernel.org/r/20210409192759.3895104-1-olteanv@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 10 Apr 2021 01:07:20 +0000 (18:07 -0700)]
Merge branch 'mlx5-next' of git://git./linux/kernel/git/mellanox/linux
Saeed Mahameed says:
====================
mlx5-next 2021-04-09
This pr contains changes from mlx5-next branch,
already reviewed on netdev and rdma mailing lists, links below.
1) From Leon, Dynamically assign MSI-X vectors count
Already Acked by Bjorn Helgaas.
https://patchwork.kernel.org/project/netdevbpf/cover/
20210314124256.70253-1-leon@kernel.org/
2) Cleanup series:
https://patchwork.kernel.org/project/netdevbpf/cover/
20210311070915.321814-1-saeed@kernel.org/
From Mark, E-Switch cleanups and refactoring, and the addition
of single FDB mode needed HW bits.
From Mikhael, Remove unused struct field
From Saeed, Cleanup W=1 prototype warning
From Zheng, Esw related cleanup
From Tariq, User order-0 page allocation for EQs
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
net/mlx5: Implement sriov_get_vf_total_msix/count() callbacks
net/mlx5: Dynamically assign MSI-X vectors count
net/mlx5: Add dynamic MSI-X capabilities bits
PCI/IOV: Add sysfs MSI-X vector assignment interface
net/mlx5: Use order-0 allocations for EQs
net/mlx5: Add IFC bits needed for single FDB mode
net/mlx5: E-Switch, Refactor send to vport to be more generic
RDMA/mlx5: Use representor E-Switch when getting netdev and metadata
net/mlx5: E-Switch, Add eswitch pointer to each representor
net/mlx5: E-Switch, Add match on vhca id to default send rules
net/mlx5: Remove unused mlx5_core_health member recover_work
net/mlx5: simplify the return expression of mlx5_esw_offloads_pair()
net/mlx5: Cleanup prototype warning
====================
Link: https://lore.kernel.org/r/20210409200704.10886-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dan Carpenter [Fri, 9 Apr 2021 12:24:28 +0000 (15:24 +0300)]
net: enetc: fix array underflow in error handling code
This loop will try to unmap enetc_unmap_tx_buff[-1] and crash.
Fixes:
9d2b68cc108d ("net: enetc: add support for XDP_REDIRECT")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/YHBHfCY/yv3EnM9z@mwanda
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Qiheng Lin [Fri, 9 Apr 2021 11:53:39 +0000 (19:53 +0800)]
cxgb4: remove unneeded if-null-free check
Eliminate the following coccicheck warning:
drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_u32.c:529:3-9: WARNING:
NULL check before some freeing functions is not needed.
drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_u32.c:533:2-8: WARNING:
NULL check before some freeing functions is not needed.
drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c:161:2-7: WARNING:
NULL check before some freeing functions is not needed.
drivers/net/ethernet/chelsio/cxgb4/clip_tbl.c:327:3-9: WARNING:
NULL check before some freeing functions is not needed.
Signed-off-by: Qiheng Lin <linqiheng@huawei.com>
Link: https://lore.kernel.org/r/20210409115339.4598-1-linqiheng@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 9 Apr 2021 23:37:08 +0000 (16:37 -0700)]
Merge branch 'net-make-phy-pm-ops-a-no-op-if-mac-driver-manages-phy-pm'
Heiner Kallweit says:
====================
net: make PHY PM ops a no-op if MAC driver manages PHY PM
Resume callback of the PHY driver is called after the one for the MAC
driver. The PHY driver resume callback calls phy_init_hw(), and this is
potentially problematic if the MAC driver calls phy_start() in its resume
callback. One issue was reported with the fec driver and a KSZ8081 PHY
which seems to become unstable if a soft reset is triggered during aneg.
The new flag allows MAC drivers to indicate that they take care of
suspending/resuming the PHY. Then the MAC PM callbacks can handle
any dependency between MAC and PHY PM.
====================
Link: https://lore.kernel.org/r/9e695411-ab1d-34fe-8b90-3e8192ab84f6@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit [Wed, 7 Apr 2021 15:53:52 +0000 (17:53 +0200)]
r8169: use mac-managed PHY PM
Use the new mac_managed_pm flag to indicate that the driver takes care
of PHY power management.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit [Wed, 7 Apr 2021 15:52:45 +0000 (17:52 +0200)]
net: fec: use mac-managed PHY PM
Use the new mac_managed_pm flag to work around an issue with KSZ8081 PHY
that becomes unstable when a soft reset is triggered during aneg.
Reported-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Tested-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit [Wed, 7 Apr 2021 15:51:56 +0000 (17:51 +0200)]
net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM
Resume callback of the PHY driver is called after the one for the MAC
driver. The PHY driver resume callback calls phy_init_hw(), and this is
potentially problematic if the MAC driver calls phy_start() in its resume
callback. One issue was reported with the fec driver and a KSZ8081 PHY
which seems to become unstable if a soft reset is triggered during aneg.
The new flag allows MAC drivers to indicate that they take care of
suspending/resuming the PHY. Then the MAC PM callbacks can handle
any dependency between MAC and PHY PM.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Fri, 9 Apr 2021 17:02:37 +0000 (10:02 -0700)]
Revert "tcp: Reset tcp connections in SYN-SENT state"
This reverts commit
e880f8b3a24a73704731a7227ed5fee14bd90192.
1) Patch has not been properly tested, and is wrong [1]
2) Patch submission did not include TCP maintainer (this is me)
[1]
divide error: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 8426 Comm: syz-executor478 Not tainted 5.12.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:__tcp_select_window+0x56d/0xad0 net/ipv4/tcp_output.c:3015
Code: 44 89 ff e8 d5 cd f0 f9 45 39 e7 0f 8d 20 ff ff ff e8 f7 c7 f0 f9 44 89 e3 e9 13 ff ff ff e8 ea c7 f0 f9 44 89 e0 44 89 e3 99 <f7> 7c 24 04 29 d3 e9 fc fe ff ff e8 d3 c7 f0 f9 41 f7 dc bf 1f 00
RSP: 0018:
ffffc9000184fac0 EFLAGS:
00010293
RAX:
0000000000000000 RBX:
0000000000000000 RCX:
0000000000000000
RDX:
0000000000000000 RSI:
ffffffff87832e76 RDI:
0000000000000003
RBP:
0000000000000000 R08:
0000000000000000 R09:
0000000000000000
R10:
ffffffff87832e14 R11:
0000000000000000 R12:
0000000000000000
R13:
1ffff92000309f5c R14:
0000000000000000 R15:
0000000000000000
FS:
00000000023eb300(0000) GS:
ffff8880b9c00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007fc2b5f426c0 CR3:
000000001c5cf000 CR4:
00000000001506f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
tcp_select_window net/ipv4/tcp_output.c:264 [inline]
__tcp_transmit_skb+0xa82/0x38f0 net/ipv4/tcp_output.c:1351
tcp_transmit_skb net/ipv4/tcp_output.c:1423 [inline]
tcp_send_active_reset+0x475/0x8e0 net/ipv4/tcp_output.c:3449
tcp_disconnect+0x15a9/0x1e60 net/ipv4/tcp.c:2955
inet_shutdown+0x260/0x430 net/ipv4/af_inet.c:905
__sys_shutdown_sock net/socket.c:2189 [inline]
__sys_shutdown_sock net/socket.c:2183 [inline]
__sys_shutdown+0xf1/0x1b0 net/socket.c:2201
__do_sys_shutdown net/socket.c:2209 [inline]
__se_sys_shutdown net/socket.c:2207 [inline]
__x64_sys_shutdown+0x50/0x70 net/socket.c:2207
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes:
e880f8b3a24a ("tcp: Reset tcp connections in SYN-SENT state")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Manoj Basapathi <manojbm@codeaurora.org>
Cc: Sauvik Saha <ssaha@codeaurora.org>
Link: https://lore.kernel.org/r/20210409170237.274904-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Westphal [Thu, 8 Apr 2021 17:45:02 +0000 (19:45 +0200)]
net: dccp: use net_generic storage
DCCP is virtually never used, so no need to use space in struct net for it.
Put the pernet ipv4/v6 socket in the dccp ipv4/ipv6 modules instead.
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20210408174502.1625-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Fri, 9 Apr 2021 22:26:51 +0000 (15:26 -0700)]
Merge tag 'net-5.12-rc7' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes for 5.12-rc7, including fixes from can, ipsec,
mac80211, wireless, and bpf trees.
No scary regressions here or in the works, but small fixes for 5.12
changes keep coming.
Current release - regressions:
- virtio: do not pull payload in skb->head
- virtio: ensure mac header is set in virtio_net_hdr_to_skb()
- Revert "net: correct sk_acceptq_is_full()"
- mptcp: revert "mptcp: provide subflow aware release function"
- ethernet: lan743x: fix ethernet frame cutoff issue
- dsa: fix type was not set for devlink port
- ethtool: remove link_mode param and derive link params from driver
- sched: htb: fix null pointer dereference on a null new_q
- wireless: iwlwifi: Fix softirq/hardirq disabling in
iwl_pcie_enqueue_hcmd()
- wireless: iwlwifi: fw: fix notification wait locking
- wireless: brcmfmac: p2p: Fix deadlock introduced by avoiding the
rtnl dependency
Current release - new code bugs:
- napi: fix hangup on napi_disable for threaded napi
- bpf: take module reference for trampoline in module
- wireless: mt76: mt7921: fix airtime reporting and related tx hangs
- wireless: iwlwifi: mvm: rfi: don't lock mvm->mutex when sending
config command
Previous releases - regressions:
- rfkill: revert back to old userspace API by default
- nfc: fix infinite loop, refcount & memory leaks in LLCP sockets
- let skb_orphan_partial wake-up waiters
- xfrm/compat: Cleanup WARN()s that can be user-triggered
- vxlan, geneve: do not modify the shared tunnel info when PMTU
triggers an ICMP reply
- can: fix msg_namelen values depending on CAN_REQUIRED_SIZE
- can: uapi: mark union inside struct can_frame packed
- sched: cls: fix action overwrite reference counting
- sched: cls: fix err handler in tcf_action_init()
- ethernet: mlxsw: fix ECN marking in tunnel decapsulation
- ethernet: nfp: Fix a use after free in nfp_bpf_ctrl_msg_rx
- ethernet: i40e: fix receiving of single packets in xsk zero-copy
mode
- ethernet: cxgb4: avoid collecting SGE_QBASE regs during traffic
Previous releases - always broken:
- bpf: Refuse non-O_RDWR flags in BPF_OBJ_GET
- bpf: Refcount task stack in bpf_get_task_stack
- bpf, x86: Validate computation of branch displacements
- ieee802154: fix many similar syzbot-found bugs
- fix NULL dereferences in netlink attribute handling
- reject unsupported operations on monitor interfaces
- fix error handling in llsec_key_alloc()
- xfrm: make ipv4 pmtu check honor ip header df
- xfrm: make hash generation lock per network namespace
- xfrm: esp: delete NETIF_F_SCTP_CRC bit from features for esp
offload
- ethtool: fix incorrect datatype in set_eee ops
- xdp: fix xdp_return_frame() kernel BUG throw for page_pool memory
model
- openvswitch: fix send of uninitialized stack memory in ct limit
reply
Misc:
- udp: add get handling for UDP_GRO sockopt"
* tag 'net-5.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (182 commits)
net: fix hangup on napi_disable for threaded napi
net: hns3: Trivial spell fix in hns3 driver
lan743x: fix ethernet frame cutoff issue
net: ipv6: check for validity before dereferencing cfg->fc_nlinfo.nlh
net: dsa: lantiq_gswip: Configure all remaining GSWIP_MII_CFG bits
net: dsa: lantiq_gswip: Don't use PHY auto polling
net: sched: sch_teql: fix null-pointer dereference
ipv6: report errors for iftoken via netlink extack
net: sched: fix err handler in tcf_action_init()
net: sched: fix action overwrite reference counting
Revert "net: sched: bump refcount for new action in ACT replace mode"
ice: fix memory leak of aRFS after resuming from suspend
i40e: Fix sparse warning: missing error code 'err'
i40e: Fix sparse error: 'vsi->netdev' could be null
i40e: Fix sparse error: uninitialized symbol 'ring'
i40e: Fix sparse errors in i40e_txrx.c
i40e: Fix parameters in aq_get_phy_register()
nl80211: fix beacon head validation
bpf, x86: Validate computation of branch displacements for x86-32
bpf, x86: Validate computation of branch displacements for x86-64
...
Linus Torvalds [Fri, 9 Apr 2021 22:06:52 +0000 (15:06 -0700)]
Merge tag 'io_uring-5.12-2021-04-09' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Two minor fixups for the reissue logic, and one for making sure that
unbounded work is canceled on io-wq exit"
* tag 'io_uring-5.12-2021-04-09' of git://git.kernel.dk/linux-block:
io-wq: cancel unbounded works on io-wq destroy
io_uring: fix rw req completion
io_uring: clear F_REISSUE right after getting it
Linus Torvalds [Fri, 9 Apr 2021 20:01:48 +0000 (13:01 -0700)]
Merge tag 'devicetree-fixes-for-5.12-2' of git://git./linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:
- Fix fw_devlink failure with ".*,nr-gpios" properties
- Doc link reference fixes from Mauro
- Fixes for unaligned FDT handling found on OpenRisc. First, avoid
crash with better error handling when unflattening an unaligned FDT.
Second, fix memory allocations for FDTs to ensure alignment.
* tag 'devicetree-fixes-for-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
of: property: fw_devlink: do not link ".*,nr-gpios"
dt-bindings:iio:adc: update motorola,cpcap-adc.yaml reference
dt-bindings: fix references for iio-bindings.txt
dt-bindings: don't use ../dir for doc references
of: unittest: overlay: ensure proper alignment of copied FDT
of: properly check for error returned by fdt_get_name()
Linus Torvalds [Fri, 9 Apr 2021 19:56:10 +0000 (12:56 -0700)]
Merge tag 'drm-fixes-2021-04-10' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Was relatively quiet this week, but still a few pulls came in, pretty
much small fixes across the board, a couple of regression fixes in the
amdgpu/radeon code, msm has a few minor fixes across the board, a
panel regression fix also.
amdgpu:
- DCN3 fix
- Fix CAC setting regression for TOPAZ
- Fix ttm regression
radeon:
- Fix ttm regression
msm:
- a5xx/a6xx timestamp fix
- microcode version check
- fail path fix
- block programming fix
- error removal fix
i915:
- Fix invalid access to ACPI _DSM objects
xen:
- Fix use-after-free in xen
- minor duplicate defintion cleanup
vc4:
- Reduce fifo threshold on hvs4 to fix a fifo full error
- minor redunantant assignment cleanup
panel:
- Disable TE support for Droid4 and N950"
* tag 'drm-fixes-2021-04-10' of git://anongit.freedesktop.org/drm/drm:
drm/vc4: crtc: Reduce PV fifo threshold on hvs4
drm/vc4: plane: Remove redundant assignment
drm/amdgpu/smu7: fix CAC setting on TOPAZ
drm/radeon: Fix size overflow
drm/amdgpu: Fix size overflow
drm/i915: Fix invalid access to ACPI _DSM objects
drm/amd/display: Add missing mask for DCN3
drm/panel: panel-dsi-cm: disable TE for now
drm/msm/disp/dpu1: program 3d_merge only if block is attached
drm/msm: a6xx: fix version check for the A650 SQE microcode
drm/msm: Fix a5xx/a6xx timestamps
drm/msm: Fix removal of valid error case when checking speed_bin
drm/msm: Set drvdata to NULL when msm_drm_init() fails
drivers: gpu: drm: xen_drm_front_drm_info is declared twice
gpu/xen: Fix a use after free in xen_drm_drv_init
Paolo Abeni [Fri, 9 Apr 2021 15:24:17 +0000 (17:24 +0200)]
net: fix hangup on napi_disable for threaded napi
napi_disable() is subject to an hangup, when the threaded
mode is enabled and the napi is under heavy traffic.
If the relevant napi has been scheduled and the napi_disable()
kicks in before the next napi_threaded_wait() completes - so
that the latter quits due to the napi_disable_pending() condition,
the existing code leaves the NAPI_STATE_SCHED bit set and the
napi_disable() loop waiting for such bit will hang.
This patch addresses the issue by dropping the NAPI_STATE_DISABLE
bit test in napi_thread_wait(). The later napi_threaded_poll()
iteration will take care of clearing the NAPI_STATE_SCHED.
This also addresses a related problem reported by Jakub:
before this patch a napi_disable()/napi_enable() pair killed
the napi thread, effectively disabling the threaded mode.
On the patched kernel napi_disable() simply stops scheduling
the relevant thread.
v1 -> v2:
- let the main napi_thread_poll() loop clear the SCHED bit
Reported-by: Jakub Kicinski <kuba@kernel.org>
Fixes:
29863d41bb6e ("net: implement threaded-able napi poll loop support")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/883923fa22745a9589e8610962b7dc59df09fb1f.1617981844.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Salil Mehta [Fri, 9 Apr 2021 07:42:23 +0000 (08:42 +0100)]
net: hns3: Trivial spell fix in hns3 driver
Some trivial spelling mistakes which caught my eye during the
review of the code.
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Link: https://lore.kernel.org/r/20210409074223.32480-1-salil.mehta@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sven Van Asbroeck [Fri, 9 Apr 2021 00:39:04 +0000 (20:39 -0400)]
lan743x: fix ethernet frame cutoff issue
The ethernet frame length is calculated incorrectly. Depending on
the value of RX_HEAD_PADDING, this may result in ethernet frames
that are too short (cut off at the end), or too long (garbage added
to the end).
Fix by calculating the ethernet frame length correctly. For added
clarity, use the ETH_FCS_LEN constant in the calculation.
Many thanks to Heiner Kallweit for suggesting this solution.
Suggested-by: Heiner Kallweit <hkallweit1@gmail.com>
Fixes:
3e21a10fdea3 ("lan743x: trim all 4 bytes of the FCS; not just 2")
Link: https://lore.kernel.org/lkml/20210408172353.21143-1-TheSven73@gmail.com/
Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
Reviewed-by: George McCollister <george.mccollister@gmail.com>
Tested-by: George McCollister <george.mccollister@gmail.com>
Link: https://lore.kernel.org/r/20210409003904.8957-1-TheSven73@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ilya Lipnitskiy [Mon, 5 Apr 2021 22:25:40 +0000 (15:25 -0700)]
of: property: fw_devlink: do not link ".*,nr-gpios"
[<vendor>,]nr-gpios property is used by some GPIO drivers[0] to indicate
the number of GPIOs present on a system, not define a GPIO. nr-gpios is
not configured by #gpio-cells and can't be parsed along with other
"*-gpios" properties.
nr-gpios without the "<vendor>," prefix is not allowed by the DT
spec[1], so only add exception for the ",nr-gpios" suffix and let the
error message continue being printed for non-compliant implementations.
[0] nr-gpios is referenced in Documentation/devicetree/bindings/gpio:
- gpio-adnp.txt
- gpio-xgene-sb.txt
- gpio-xlp.txt
- snps,dw-apb-gpio.yaml
Link: https://github.com/devicetree-org/dt-schema/blob/cb53a16a1eb3e2169ce170c071e47940845ec26e/schemas/gpio/gpio-consumer.yaml#L20
Fixes errors such as:
OF: /palmbus@300000/gpio@600: could not find phandle
Fixes:
7f00be96f125 ("of: property: Add device link support for interrupt-parent, dmas and -gpio(s)")
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Cc: Saravana Kannan <saravanak@google.com>
Cc: stable@vger.kernel.org # v5.5+
Link: https://lore.kernel.org/r/20210405222540.18145-1-ilya.lipnitskiy@gmail.com
Signed-off-by: Rob Herring <robh@kernel.org>
Mauro Carvalho Chehab [Fri, 9 Apr 2021 12:47:47 +0000 (14:47 +0200)]
dt-bindings:iio:adc: update motorola,cpcap-adc.yaml reference
Changeset
1ca9d1b1342d ("dt-bindings:iio:adc:motorola,cpcap-adc yaml conversion")
renamed: Documentation/devicetree/bindings/iio/adc/cpcap-adc.txt
to: Documentation/devicetree/bindings/iio/adc/motorola,cpcap-adc.yaml.
Update its cross-reference accordingly.
Fixes:
1ca9d1b1342d ("dt-bindings:iio:adc:motorola,cpcap-adc yaml conversion")
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/3e205e5fa701e4bc15d39d6ac1f57717df2bb4c6.1617972339.git.mchehab+huawei@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Mauro Carvalho Chehab [Fri, 9 Apr 2021 12:47:46 +0000 (14:47 +0200)]
dt-bindings: fix references for iio-bindings.txt
The iio-bindings.txt was converted into two files and merged
at the dt-schema git tree at:
https://github.com/devicetree-org/dt-schema
Yet, some documents still refer to the old file. Fix their
references, in order to point to the right URL.
Fixes:
dba91f82d580 ("dt-bindings:iio:iio-binding.txt Drop file as content now in dt-schema")
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/4efd81eca266ca0875d3bf9d1672097444146c69.1617972339.git.mchehab+huawei@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Mauro Carvalho Chehab [Fri, 9 Apr 2021 12:47:45 +0000 (14:47 +0200)]
dt-bindings: don't use ../dir for doc references
As documents have been renamed and moved around, their
references will break, but this will be unnoticed, as the
script which checks for it won't handle "../" references.
So, replace them by the full patch.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/68d3a1244119d1f2829c375b0ef554cf348bc89f.1617972339.git.mchehab+huawei@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Dave Airlie [Fri, 9 Apr 2021 19:18:31 +0000 (05:18 +1000)]
Merge tag 'drm-intel-fixes-2021-04-09' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Fix invalid access to ACPI _DSM objects (Takashi)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YHAW6NInrybUoat6@intel.com
Dave Airlie [Fri, 9 Apr 2021 19:15:35 +0000 (05:15 +1000)]
Merge tag 'drm-misc-fixes-2021-04-09' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v5.12-rc7:
- Fix use-after-free in xen.
- Reduce fifo threshold on hvs4 to fix a fifo full error.
- Disable TE support for Droid4 and N950.
- Small compiler fixes.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/e7647dd9-60c3-9dfd-a377-89d717212e13@linux.intel.com
Linus Torvalds [Fri, 9 Apr 2021 18:51:06 +0000 (11:51 -0700)]
Merge tag 'selinux-pr-
20210409' of git://git./linux/kernel/git/pcmoore/selinux
Pull selinux fixes from Paul Moore:
"Three SELinux fixes.
These fix known problems relating to (re)loading SELinux policy or
changing the policy booleans, and pass our test suite without problem"
* tag 'selinux-pr-
20210409' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: fix race between old and new sidtab
selinux: fix cond_list corruption when changing booleans
selinux: make nslot handling in avtab more robust
Linus Torvalds [Fri, 9 Apr 2021 17:09:51 +0000 (10:09 -0700)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost
Pull vdpa/mlx5 fixes from Michael Tsirkin:
"Last minute fixes.
These all look like something we are better off having
than not ..."
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vdpa/mlx5: Fix suspend/resume index restoration
vdpa/mlx5: Fix wrong use of bit numbers
vdpa/mlx5: Retrieve BAR address suitable any function
vdpa/mlx5: Use the correct dma device when registering memory
vdpa/mlx5: should exclude header length and fcs from mtu
Linus Torvalds [Fri, 9 Apr 2021 17:05:25 +0000 (10:05 -0700)]
Merge tag 'rproc-v5.12-fixes' of git://git./linux/kernel/git/andersson/remoteproc
Pull remoteproc fixes from Bjorn Andersson:
"This fixes an issue with firmware loading on the TI K3 PRU, fixes
compatibility with GNU binutils for the same and resolves link error
due to a 64-bit division in the Qualcomm PIL info.
It also recognizes Mathieu Poirier as co-maintainer of the remoteproc
and rpmsg subsystems"
* tag 'rproc-v5.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc:
remoteproc: pru: Fix firmware loading crashes on K3 SoCs
remoteproc: pru: Fix loading of GNU Binutils ELF
MAINTAINERS: Add co-maintainer for remoteproc/RPMSG subsystems
remoteproc: qcom: pil_info: avoid 64-bit division
Linus Torvalds [Fri, 9 Apr 2021 16:58:42 +0000 (09:58 -0700)]
Merge tag 'for-linus-5.12b-rc7-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fix from Juergen Gross:
"A single fix of a 5.12 patch for the rather uncommon problem of
running as a Xen guest with a real time kernel config"
* tag 'for-linus-5.12b-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/evtchn: Change irq_info lock to raw_spinlock_t
Linus Torvalds [Fri, 9 Apr 2021 16:25:31 +0000 (09:25 -0700)]
Merge tag 'acpi-5.12-rc7' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Fix a build issue introduced by a previous fix in the ACPI processor
driver (Vitaly Kuznetsov)"
* tag 'acpi-5.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: processor: Fix build when CONFIG_ACPI_PROCESSOR=m
Eli Cohen [Thu, 8 Apr 2021 09:10:47 +0000 (12:10 +0300)]
vdpa/mlx5: Fix suspend/resume index restoration
When we suspend the VM, the VDPA interface will be reset. When the VM is
resumed again, clear_virtqueues() will clear the available and used
indices resulting in hardware virqtqueue objects becoming out of sync.
We can avoid this function alltogether since qemu will clear them if
required, e.g. when the VM went through a reboot.
Moreover, since the hw available and used indices should always be
identical on query and should be restored to the same value same value
for virtqueues that complete in order, we set the single value provided
by set_vq_state(). In get_vq_state() we return the value of hardware
used index.
Fixes:
b35ccebe3ef7 ("vdpa/mlx5: Restore the hardware used index after change map")
Fixes:
1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20210408091047.4269-6-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Eli Cohen [Thu, 8 Apr 2021 09:10:46 +0000 (12:10 +0300)]
vdpa/mlx5: Fix wrong use of bit numbers
VIRTIO_F_VERSION_1 is a bit number. Use BIT_ULL() with mask
conditionals.
Also, in mlx5_vdpa_is_little_endian() use BIT_ULL for consistency with
the rest of the code.
Fixes:
1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20210408091047.4269-5-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Eli Cohen [Thu, 8 Apr 2021 09:10:45 +0000 (12:10 +0300)]
vdpa/mlx5: Retrieve BAR address suitable any function
struct mlx5_core_dev has a bar_addr field that contains the correct bar
address for the function regardless of whether it is pci function or sub
function. Use it.
Fixes:
1958fc2f0712 ("net/mlx5: SF, Add auxiliary device driver")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Link: https://lore.kernel.org/r/20210408091047.4269-4-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Eli Cohen [Thu, 8 Apr 2021 09:10:44 +0000 (12:10 +0300)]
vdpa/mlx5: Use the correct dma device when registering memory
In cases where the vdpa instance uses a SF (sub function), the DMA
device is the parent device. Use a function to retrieve the correct DMA
device.
Fixes:
1958fc2f0712 ("net/mlx5: SF, Add auxiliary device driver")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Link: https://lore.kernel.org/r/20210408091047.4269-3-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Si-Wei Liu [Thu, 8 Apr 2021 09:10:43 +0000 (12:10 +0300)]
vdpa/mlx5: should exclude header length and fcs from mtu
When feature VIRTIO_NET_F_MTU is negotiated on mlx5_vdpa,
22 extra bytes worth of MTU length is shown in guest.
This is because the mlx5_query_port_max_mtu API returns
the "hardware" MTU value, which does not just contain the
Ethernet payload, but includes extra lengths starting
from the Ethernet header up to the FCS altogether.
Fix the MTU so packets won't get dropped silently.
Fixes:
1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20210408091047.4269-2-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Hans de Goede [Fri, 9 Apr 2021 13:58:50 +0000 (15:58 +0200)]
Bluetooth: btusb: Revert Fix the autosuspend enable and disable
drivers/usb/core/hub.c: usb_new_device() contains the following:
/* By default, forbid autosuspend for all devices. It will be
* allowed for hubs during binding.
*/
usb_disable_autosuspend(udev);
So for anything which is not a hub, such as btusb devices, autosuspend is
disabled by default and we must call usb_enable_autosuspend(udev) to
enable it.
This means that the "Fix the autosuspend enable and disable" commit,
which drops the usb_enable_autosuspend() call when the enable_autosuspend
module option is true, is completely wrong, revert it.
This reverts commit
7bd9fb058d77213130e4b3e594115c028b708e7e.
Cc: Hui Wang <hui.wang@canonical.com>
Fixes:
7bd9fb058d77 ("Bluetooth: btusb: Fix the autosuspend enable and disable")
Acked-by: Hui Wang <hui.wang@canonical.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 9 Apr 2021 01:57:47 +0000 (18:57 -0700)]
Merge tag '5.12-rc6-smb3' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Three cifs/smb3 fixes, two for stable: a reconnect fix and a fix for
display of devnames with special characters"
* tag '5.12-rc6-smb3' of git://git.samba.org/sfrench/cifs-2.6:
cifs: escape spaces in share names
fs: cifs: Remove unnecessary struct declaration
cifs: On cifs_reconnect, resolve the hostname again.
Dave Airlie [Fri, 9 Apr 2021 00:33:11 +0000 (10:33 +1000)]
Merge tag 'drm-msm-fixes-2021-04-02' of https://gitlab.freedesktop.org/drm/msm into drm-fixes
some more minor fixes:
- a5xx/a6xx timestamp fix
- microcode version check
- fail path fix
- block programming fix
- error removal fix.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGsMj7Nv3vVaVWMxPy8Y=Z_SnZmVKhKgKDxDYTr9rGN_+w@mail.gmail.com
Muhammad Usama Anjum [Thu, 8 Apr 2021 22:01:29 +0000 (03:01 +0500)]
net: ipv6: check for validity before dereferencing cfg->fc_nlinfo.nlh
nlh is being checked for validtity two times when it is dereferenced in
this function. Check for validity again when updating the flags through
nlh pointer to make the dereferencing safe.
CC: <stable@vger.kernel.org>
Addresses-Coverity: ("NULL pointer dereference")
Signed-off-by: Muhammad Usama Anjum <musamaanjum@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 8 Apr 2021 23:38:23 +0000 (16:38 -0700)]
Merge branch 'lantiq-GSWIP-fixes'
Martin Blumenstingl says:
====================
lantiq: GSWIP: two more fixes
after my last patch got accepted and is now in net as commit
3e6fdeb28f4c33 ("net: dsa: lantiq_gswip: Let GSWIP automatically set
the xMII clock") [0] some more people from the OpenWrt community
(many thanks to everyone involved) helped test the GSWIP driver: [1]
It turns out that the previous fix does not work for all boards.
There's no regression, but it doesn't fix as many problems as I
thought. This is why two more fixes are needed:
- the first one solves many (four known but probably there are
a few extra hidden ones) reported bugs with the GSWIP where no
traffic would flow. Not all circumstances are fully understood
but testing shows that switching away from PHY auto polling
solves all of them
- while investigating the different problems which are addressed
by the first patch some small issues with the existing code were
found. These are addressed by the second patch
Changes since v1 at [0]:
- Don't configure the link parameters in gswip_phylink_mac_config
(as we're using the "modern" way in gswip_phylink_mac_link_up).
Thanks to Andrew for the hint with the phylink documentation.
- Clarify that GSWIP_MII_CFG_RMII_CLK is ignored by the hardware in
the description of the second patch as suggested by Hauke
- Don't set GSWIP_MII_CFG_RGMII_IBS in the second patch as we don't
have any hardware available for testing this. The patch
description now also reflects this.
- Added Andrew's Reviewed-by to the first patch (thank you!)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin Blumenstingl [Thu, 8 Apr 2021 18:38:28 +0000 (20:38 +0200)]
net: dsa: lantiq_gswip: Configure all remaining GSWIP_MII_CFG bits
There are a few more bits in the GSWIP_MII_CFG register for which we
did rely on the boot-loader (or the hardware defaults) to set them up
properly.
For some external RMII PHYs we need to select the GSWIP_MII_CFG_RMII_CLK
bit and also we should un-set it for non-RMII PHYs. The
GSWIP_MII_CFG_RMII_CLK bit is ignored for other PHY connection modes.
The GSWIP IP also supports in-band auto-negotiation for RGMII PHYs when
the GSWIP_MII_CFG_RGMII_IBS bit is set. Clear this bit always as there's
no known hardware which uses this (so it is not tested yet).
Clear the xMII isolation bit when set at initialization time if it was
previously set by the bootloader. Not doing so could lead to no traffic
(neither RX nor TX) on a port with this bit set.
While here, also add the GSWIP_MII_CFG_RESET bit. We don't need to
manage it because this bit is self-clearning when set. We still add it
here to get a better overview of the GSWIP_MII_CFG register.
Fixes:
14fceff4771e51 ("net: dsa: Add Lantiq / Intel DSA driver for vrx200")
Cc: stable@vger.kernel.org
Suggested-by: Hauke Mehrtens <hauke@hauke-m.de>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin Blumenstingl [Thu, 8 Apr 2021 18:38:27 +0000 (20:38 +0200)]
net: dsa: lantiq_gswip: Don't use PHY auto polling
PHY auto polling on the GSWIP hardware can be used so link changes
(speed, link up/down, etc.) can be detected automatically. Internally
GSWIP reads the PHY's registers for this functionality. Based on this
automatic detection GSWIP can also automatically re-configure it's port
settings. Unfortunately this auto polling (and configuration) mechanism
seems to cause various issues observed by different people on different
devices:
- FritzBox 7360v2: the two Gbit/s ports (connected to the two internal
PHY11G instances) are working fine but the two Fast Ethernet ports
(using an AR8030 RMII PHY) are completely dead (neither RX nor TX are
received). It turns out that the AR8030 PHY sets the BMSR_ESTATEN bit
as well as the ESTATUS_1000_TFULL and ESTATUS_1000_XFULL bits. This
makes the PHY auto polling state machine (rightfully?) think that the
established link speed (when the other side is Gbit/s capable) is
1Gbit/s.
- None of the Ethernet ports on the Zyxel P-2812HNU-F1 (two are
connected to the internal PHY11G GPHYs while the other three are
external RGMII PHYs) are working. Neither RX nor TX traffic was
observed. It is not clear which part of the PHY auto polling state-
machine caused this.
- FritzBox 7412 (only one LAN port which is connected to one of the
internal GPHYs running in PHY22F / Fast Ethernet mode) was seeing
random disconnects (link down events could be seen). Sometimes all
traffic would stop after such disconnect. It is not clear which part
of the PHY auto polling state-machine cauased this.
- TP-Link TD-W9980 (two ports are connected to the internal GPHYs
running in PHY11G / Gbit/s mode, the other two are external RGMII
PHYs) was affected by similar issues as the FritzBox 7412 just without
the "link down" events
Switch to software based configuration instead of PHY auto polling (and
letting the GSWIP hardware configure the ports automatically) for the
following link parameters:
- link up/down
- link speed
- full/half duplex
- flow control (RX / TX pause)
After a big round of manual testing by various people (who helped test
this on OpenWrt) it turns out that this fixes all reported issues.
Additionally it can be considered more future proof because any
"quirk" which is implemented for a PHY on the driver side can now be
used with the GSWIP hardware as well because Linux is in control of the
link parameters.
As a nice side-effect this also solves a problem where fixed-links were
not supported previously because we were relying on the PHY auto polling
mechanism, which cannot work for fixed-links as there's no PHY from
where it can read the registers. Configuring the link settings on the
GSWIP ports means that we now use the settings from device-tree also for
ports with fixed-links.
Fixes:
14fceff4771e51 ("net: dsa: Add Lantiq / Intel DSA driver for vrx200")
Fixes:
3e6fdeb28f4c33 ("net: dsa: lantiq_gswip: Let GSWIP automatically set the xMII clock")
Cc: stable@vger.kernel.org
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>