git.monstr.eu Git - linux-2.6-microblaze.git/log

qlcnic: Use list_for_each_entry() to simplify code in qlcnic_main.c

Convert list_for_each() to list_for_each_entry() where
applicable. This simplifies the code.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethtool: add a stricter length check

There has been a few errors in the ethtool reply size calculations,
most of those are hard to trigger during basic testing because of
skb size rounding up and netdev names being shorter than max.
Add a more precise check.

This change will affect the value of payload length displayed in
case of -EMSGSIZE but that should be okay, "payload length" isn't
a well defined term here.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

inet_diag: add support for tw_mark

Timewait sockets have included mark since approx 4.18.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Jon Maxwell <jmaxwell37@gmail.com>
Fixes: 00483690552c ("tcp: Add mark for TIMEWAIT sockets")
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mhi_net: make mhi_wwan_ops static

This symbol is not used outside of net.c, so marks it static.

Fix the following sparse warning:

drivers/net/mhi/net.c:385:23: warning: symbol 'mhi_wwan_ops' was not
declared. Should it be static?

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'hns3-next'

Guangbin Huang says:

====================
net: hns3: updates for -next

This series includes some optimization in IO path for the HNS3 ethernet
driver.
====================

Cc: Loic Poulain <loic.poulain@linaro.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: use bounce buffer when rx page can not be reused

Currently rx page will be reused to receive future packet when
the stack releases the previous skb quickly. If the old page
can not be reused, a new page will be allocated and mapped,
which comsumes a lot of cpu when IOMMU is in the strict mode,
especially when the application and irq/NAPI happens to run on
the same cpu.

So allocate a new frag to memcpy the data to avoid the costly
IOMMU unmapping/mapping operation, and add "frag_alloc_err"
and "frag_alloc" stats in "ethtool -S ethX" cmd.

The throughput improves above 50% when running single thread of
iperf using TCP when IOMMU is in strict mode and iperf shares the
same cpu with irq/NAPI(rx_copybreak = 2048 and mtu = 1500).

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: optimize the rx page reuse handling process

Current rx page offset only reset to zero when all the below
conditions are satisfied:
1. rx page is only owned by driver.
2. rx page is reusable.
3. the page offset that is above to be given to the stack has
reached the end of the page.

If the page offset is over the hns3_buf_size(), it means the
buffer below the offset of the page is usable when the above
condition 1 & 2 are satisfied, so page offset can be reset to
zero instead of increasing the offset. We may be able to always
reuse the first 4K buffer of a 64K page, which means we can
limit the hot buffer size as much as possible.

The above optimization is a side effect when refacting the
rx page reuse handling in order to support the rx copybreak.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: support dma_map_sg() for multi frags skb

Using the queue based tx buffer, it is also possible to allocate a
sgl buffer, and use skb_to_sgvec() to convert the skb to the sgvec
in order to support the dma_map_sg() to decreases the overhead of
IOMMU mapping and unmapping.

Firstly, it reduces the number of buffers. For example, a tcp skb
may have a 66-byte header and 3 fragments of 4328, 32768, and 28064
bytes. With this patch, dma_map_sg() will combine them into two
buffers, 66-bytes header and one 65160-bytes fragment by using IOMMU.

Secondly, it reduces the number of dma mapping and unmapping. All the
original 4 buffers are mapped only once rather than 4 times.

The throughput improves above 10% when running single thread of iperf
using TCP when IOMMU is in strict mode.

Suggested-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: add support to query tx spare buffer size for pf

Add support to query tx spare buffer size from configuration
file, and use this info to do spare buffer initialization when
the module parameter 'tx_spare_buf_size' is not specified.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: use tx bounce buffer for small packets

when the packet or frag size is small, it causes both security and
performance issue. As dma can't map sub-page, this means some extra
kernel data is visible to devices. On the other hand, the overhead
of dma map and unmap is huge when IOMMU is on.

So add a queue based tx shared bounce buffer to memcpy the small
packet when the len of the xmitted skb is below tx_copybreak.
Add tx_spare_buf_size module param to set the size of tx spare
buffer, and add set/get_tunable to set or query the tx_copybreak.

The throughtput improves from 30 Gbps to 90+ Gbps when running 16
netperf threads with 32KB UDP message size when IOMMU is in the
strict mode(tx_copybreak = 2000 and mtu = 1500).

Suggested-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: refactor for hns3_fill_desc() function

Factor out hns3_fill_desc() so that it can be reused in the
tx bounce supporting.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: minor refactor related to desc_cb handling

desc_cb is used to store mapping and freeing info for the
corresponding desc, which is used in the cleaning process.
There will be more desc_cb type coming up when supporting the
tx bounce buffer, change desc_cb type to bit-wise value in order
to reduce the desc_cb type checking operation in the data path.

Also move the desc_cb type definition to hns3_enet.h because it
is only used in hns3_enet.c, and declare a local variable desc_cb
in hns3_clear_desc() to reduce lines of code.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ti: add pp skb recycling support

As already done for mvneta and mvpp2, enable skb recycling for ti
ethernet drivers

ti driver on net-next:
----------------------
[perf top]
47.15%  [kernel]     [k] _raw_spin_unlock_irqrestore
11.77%  [kernel]     [k] __cpdma_chan_free
  3.16%  [kernel]     [k] ___bpf_prog_run
  2.52%  [kernel]     [k] cpsw_rx_vlan_encap
  2.34%  [kernel]     [k] __netif_receive_skb_core
  2.27%  [kernel]     [k] free_unref_page
  2.26%  [kernel]     [k] kmem_cache_free
  2.24%  [kernel]     [k] kmem_cache_alloc
  1.69%  [kernel]     [k] __softirqentry_text_start
  1.61%  [kernel]     [k] cpsw_rx_handler
  1.19%  [kernel]     [k] page_pool_release_page
  1.19%  [kernel]     [k] clear_bits_ll
  1.15%  [kernel]     [k] page_frag_free
  1.06%  [kernel]     [k] __dma_page_dev_to_cpu
  0.99%  [kernel]     [k] memset
  0.94%  [kernel]     [k] __alloc_pages_bulk
  0.92%  [kernel]     [k] kfree_skb
  0.85%  [kernel]     [k] packet_rcv
  0.78%  [kernel]     [k] page_address
  0.75%  [kernel]     [k] v7_dma_inv_range
  0.71%  [kernel]     [k] __lock_text_start

[iperf3 tcp]
[  5]   0.00-10.00  sec   873 MBytes   732 Mbits/sec    0   sender
[  5]   0.00-10.01  sec   866 MBytes   726 Mbits/sec        receiver

ti + skb recycling:
-------------------
[perf top]
40.58%  [kernel]    [k] _raw_spin_unlock_irqrestore
16.18%  [kernel]    [k] __softirqentry_text_start
10.33%  [kernel]    [k] __cpdma_chan_free
  2.62%  [kernel]    [k] ___bpf_prog_run
  2.05%  [kernel]    [k] cpsw_rx_vlan_encap
  2.00%  [kernel]    [k] kmem_cache_alloc
  1.86%  [kernel]    [k] __netif_receive_skb_core
  1.80%  [kernel]    [k] kmem_cache_free
  1.63%  [kernel]    [k] cpsw_rx_handler
  1.12%  [kernel]    [k] cpsw_rx_mq_poll
  1.11%  [kernel]    [k] page_pool_put_page
  1.04%  [kernel]    [k] _raw_spin_unlock
  0.97%  [kernel]    [k] clear_bits_ll
  0.90%  [kernel]    [k] packet_rcv
  0.88%  [kernel]    [k] __dma_page_dev_to_cpu
  0.85%  [kernel]    [k] kfree_skb
  0.80%  [kernel]    [k] memset
  0.71%  [kernel]    [k] __lock_text_start
  0.66%  [kernel]    [k] v7_dma_inv_range
  0.64%  [kernel]    [k] gen_pool_free_owner

[iperf3 tcp]
[  5]   0.00-10.00  sec   884 MBytes   742 Mbits/sec    0   sender
[  5]   0.00-10.01  sec   878 MBytes   735 Mbits/sec        receiver

Tested-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: wwan: iosm: Fix htmldocs warnings

Fixes .rst file warnings seen on linux-next build.

Fixes: f7af616c632e ("net: iosm: infrastructure")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: M Chetan Kumar <m.chetan.kumar@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-pf: Fix spelling mistake "morethan" -> "more than"

There is a spelling mistake in a dev_err message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: b53: remove redundant null check on dev

The pointer dev can never be null, the null check is redundant
and can be removed. Cleans up a static analysis warning that
pointer priv is dereferencing dev before dev is being null
checked.

Addresses-Coverity: ("Dereference before null check")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bonding: Use per-cpu rr_tx_counter

The round-robin rr_tx_counter was shared across CPUs leading to
significant cache thrashing at high packet rates. This patch switches
the round-robin packet counter to use a per-cpu variable to decide
the destination slave.

On a test with 2x100Gbit ICE nic with pktgen_sample_04_many_flows.sh
(-s 64 -t 32) the tx rate was 19.6Mpps before and 22.3Mpps after
this patch.

"perf top -e cache_misses" before:
    12.31%  [bonding]       [k] bond_xmit_roundrobin_slave_get
    10.59%  [sch_fq_codel]  [k] fq_codel_dequeue
     9.34%  [kernel]        [k] skb_release_data
after:
    15.42%  [sch_fq_codel]  [k] fq_codel_dequeue
    10.06%  [kernel]        [k] __memset
     9.12%  [kernel]        [k] skb_release_data

Signed-off-by: Jussi Maki <joamaki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netlabel: Fix memory leak in netlbl_mgmt_add_common

Hulk Robot reported memory leak in netlbl_mgmt_add_common.
The problem is non-freed map in case of netlbl_domhsh_add() failed.

BUG: memory leak
unreferenced object 0xffff888100ab7080 (size 96):
  comm "syz-executor537", pid 360, jiffies 4294862456 (age 22.678s)
  hex dump (first 32 bytes):
    05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    fe 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01  ................
  backtrace:
    [<0000000008b40026>] netlbl_mgmt_add_common.isra.0+0xb2a/0x1b40
    [<000000003be10950>] netlbl_mgmt_add+0x271/0x3c0
    [<00000000c70487ed>] genl_family_rcv_msg_doit.isra.0+0x20e/0x320
    [<000000001f2ff614>] genl_rcv_msg+0x2bf/0x4f0
    [<0000000089045792>] netlink_rcv_skb+0x134/0x3d0
    [<0000000020e96fdd>] genl_rcv+0x24/0x40
    [<0000000042810c66>] netlink_unicast+0x4a0/0x6a0
    [<000000002e1659f0>] netlink_sendmsg+0x789/0xc70
    [<000000006e43415f>] sock_sendmsg+0x139/0x170
    [<00000000680a73d7>] ____sys_sendmsg+0x658/0x7d0
    [<0000000065cbb8af>] ___sys_sendmsg+0xf8/0x170
    [<0000000019932b6c>] __sys_sendmsg+0xd3/0x190
    [<00000000643ac172>] do_syscall_64+0x37/0x90
    [<000000009b79d6dc>] entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: 63c416887437 ("netlabel: Add network address selectors to the NetLabel/LSM domain mapping")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'mlx5-updates-2021-06-14' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2021-06-14

1) Trivial Lag refactroing in preparation for upcomming Single FDB lag feature
- First 3 patches

2) Scalable IRQ distriburion for Sub-functions

A subfunction (SF) is a lightweight function that has a parent PCI
function (PF) on which it is deployed.

Currently, mlx5 subfunction is sharing the IRQs (MSI-X) with their
parent PCI function.

Before this series the PF allocates enough IRQs to cover
all the cores in a system, Newly created SFs will re-use all the IRQs
that the PF has allocated for itself.
Hence, the more SFs are created, there are more EQs per IRQs. Therefore,
whenever we handle an interrupt, we need to pull all SFs EQs and PF EQs
instead of PF EQs without SFs on the system. This leads to a hard impact
on the performance of SFs and PF.

For example, on machine with:
Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz with 56 cores.
PCI Express 3 with BW of 126 Gb/s.
ConnectX-5 Ex; EDR IB (100Gb/s) and 100GbE; dual-port QSFP28; PCIe4.0 x16.

test case: iperf TX BW single CPU, affinity of app and IRQ are the same.
PF only: no SFs on the system, 56 IRQs.
SF (before), 250 SFs Sharing the same 56 IRQs .
SF (now),    250 SFs + 255 avaiable IRQs for the NIC. (please see IRQ spread scheme below).

    application SF-IRQ  channel   BW(Gb/sec)         interrupts/sec
            iperf TX            affinity
PF only     cpu={0}     cpu={0} cpu={0}   79                 8200
SF (before) cpu={0}     cpu={0} cpu={0}   51.3 (-35%)        9500
SF (now)    cpu={0}     cpu={0} cpu={0}   78 (-2%)           8200

command:
$ taskset -c 0 iperf -c 11.1.1.1 -P 3 -i 6 -t 30 | grep SUM

The different between the SF examples is that before this series we
allocate num_cpus (56) IRQs, and all of them were shared among the PF
and the SFs. And after this series, we allocate 255 IRQs, and we spread
the SFs among the above IRQs. This have significantly decreased the load
on each IRQ and the number of EQs per IRQ is down by 95% (251->11).

In this patchset the solution proposed is to have a dedicated IRQ pool
for SFs to use. the pool will allocate a large number of IRQs
for SFs to grab from in order to minimize irq sharing between the
different SFs.
IRQs will not be requested from the OS until they are 1st requested by
an SF consumer, and will be eventually released when the last SF consumer
releases them.

For the detailed IRQ spread and allocation scheme  please see last patch:
("net/mlx5: Round-Robin EQs over IRQs")
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'occteontx2-rate-limit-offload'

Subbaraya Sundeep says:

====================
octeontx2: Add ingress ratelimit offload

This patchset adds ingress rate limiting hardware
offload support for CN10K silicons. Police actions
are added for TC matchall and flower filters.
CN10K has ingress rate limiting feature where
a receive queue is mapped to bandwidth profile
and the profile is configured with rate and burst
parameters by software. CN10K hardware supports
three levels of ingress policing or ratelimiting.
Multiple leaf profiles can point to a single mid
level profile and multiple mid level profile can
point to a single top level one. Only leaf level
profiles are used for configuring rate limiting.

Patch 1 adds the new bandwidth profile contexts
in AF driver similar to other hardware contexts
Patch 2 adds the debugfs changes to dump bandwidth
profile contexts
Patch 3 adds support for police action with TC matchall filter
Patch 4 uses NL_SET_ERR_MSG_MOD for tc code
Patch 5 adds support for police action with TC flower filter
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-pf: Add police action for TC flower

Added police action for ingress TC flower
hardware offload. With this rate limiting can be
done per flow. Since rate limiting is tied to
RQs in hardware the number of TC flower filters
with action as police is limited to number
of receive queues of the interface. Both bps
and pps modes are supported.

Examples to rate limit a flow:
$ ethtool -K eth0 hw-tc-offload on
$ tc qdisc add dev eth0 ingress
$ tc filter add dev eth0 parent ffff: protocol ip \
  flower ip_proto udp dst_port 80 action \
  police rate 100Mbit burst 32Kbit

$ tc filter add dev eth0 parent ffff: \
  protocol ip flower dst_mac 5e:b2:34:ee:29:49 \
  action police pkts_rate 5000 pkts_burst 2048

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-pf: Use NL_SET_ERR_MSG_MOD for TC

This patch modifies all netdev_err messages in
tc code to NL_SET_ERR_MSG_MOD. NL_SET_ERR_MSG_MOD
does not support format specifiers yet hence
netdev_err messages with only strings are modified.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-pf: TC_MATCHALL ingress ratelimiting offload

Add TC_MATCHALL ingress ratelimiting offload support with POLICE
action for entire traffic coming into the interface.

Eg: To ratelimit ingress traffic to 100Mbps

$ ethtool -K eth0 hw-tc-offload on
$ tc qdisc add dev eth0 clsact
$ tc filter add dev eth0 ingress matchall skip_sw \
action police rate 100Mbit burst 32Kbit

To support this, a leaf level bandwidth profile is allocated and all
RQs' contexts used by this interface are updated to point to it.
And the leaf level bandwidth profile is configured with user specified
rate and burst sizes.

Co-developed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-af: cn10k: Debugfs support for bandwidth profiles

Added support for dumping current resource status of bandwidth
profiles and contexts of allocated profiles via debugfs.

Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-af: cn10k: Bandwidth profiles config support

CN10K silicons supports hierarchial ingress packet ratelimiting.
There are 3 levels of profilers supported leaf, mid and top.
Ratelimiting is done after packet forwarding decision is taken
and a NIXLF's RQ is identified to DMA the packet. RQ's context
points to a leaf bandwidth profile which can be configured
to achieve desired ratelimit.

This patch adds logic for management of these bandwidth profiles
ie profile alloc, free, context update etc.

Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'pci200syn-cleanups'

Peng Li says:

====================
net: pci200syn: clean up some code style issues

This patchset clean up some code style issues.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: pci200syn: fix the comments style issue

Networking block comments don't use an empty /* line,
use /* Comment...

This patch fixes the comments style issues.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: pci200syn: add necessary () to macro argument

Macro argument 'card' may be better as '(card)' to
avoid precedence issues.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: pci200syn: add some required spaces

Add spaces required after that close brace '}'.
Add spaces required before the open parenthesis '('.
Add spaces required after that ','.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: pci200syn: replace comparison to NULL with "!card"

According to the chackpatch.pl, comparison to NULL could
be written "!card".

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: pci200syn: add blank line after declarations

This patch fixes the checkpatch error about missing a blank line
after declarations.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: pci200syn: remove redundant blank lines

This patch removes some redundant blank lines.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'z85230-cleanups'

Peng Li says:

====================
net: z85230: clean up some code style issues

This patchset clean up some code style issues.

---
Change Log:
V1 -> V2:
1, fix the comments from Andrew, add commit message to [patch 04/11]
about remove volatile.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: z85230: remove unnecessary out of memory message

This patch removes unnecessary out of memory message,
to fix the following checkpatch.pl warning:
"WARNING: Possible unnecessary 'out of memory' message"

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: z85230: fix the code style issue about open brace {

This patch fixes the code style issue according to checkpatch.pl error:
"ERROR: that open brace { should be on the previous line".

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: z85230: add some required spaces

Add space required before the open parenthesis '(' and '{'.
Add space required after that close brace '}' and ','
Add spaces required around that '=' , '&', '*', '|', '+', '/' and '-'.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: z85230: remove trailing whitespaces

This patch removes trailing whitespaces.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: z85230: fix the code style issue about "if..else.."

According to the chackpatch.pl, else should follow close brace '}',
braces {} should be used on all arms of this statement.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: z85230: fix the comments style issue

Networking block comments don't use an empty /* line,
use /* Comment...

Block comments use * on subsequent lines.
Block comments use a trailing */ on a separate line.

This patch fixes the comments style issues.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: z85230: replace comparison to NULL with "!skb"

According to the chackpatch.pl, comparison to NULL could
be written "!skb".

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: z85230: fix the code style issue about EXPORT_SYMBOL(foo)

According to the chackpatch.pl,
EXPORT_SYMBOL(foo); should immediately follow its function/variable.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: z85230: add blank line after declarations

This patch fixes the checkpatch error about missing a blank line
after declarations.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: z85230: remove redundant blank lines

This patch removes some redundant blank lines.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/sched: cls_flower: Remove match on n_proto

The following flower filters fail to match packets:

tc filter add dev eth0 ingress protocol 0x8864 flower \
action simple sdata hi64
tc filter add dev eth0 ingress protocol 802.1q flower \
vlan_ethtype 0x8864 action simple sdata "hi vlan"

The protocol 0x8864 (ETH_P_PPP_SES) is a tunnel protocol. As such, it is
being dissected by __skb_flow_dissect and it's internal protocol is
being set as key->basic.n_proto. IOW, the existence of ETH_P_PPP_SES
tunnel is transparent to the callers of __skb_flow_dissect.

OTOH, in the filters above, cls_flower configures its key->basic.n_proto
to the ETH_P_PPP_SES value configured by the user. Matching on this key
fails because of __skb_flow_dissect "transparency" mentioned above.

In the following, I would argue that the problem lies with cls_flower,
unnessary attempting key->basic.n_proto match.

There are 3 close places in fl_set_key in cls_flower setting up
mask->basic.n_proto. They are (in reverse order of appearance in the
code) due to:

(a) No vlan is given: use TCA_FLOWER_KEY_ETH_TYPE parameter
(b) One vlan tag is given: use TCA_FLOWER_KEY_VLAN_ETH_TYPE
(c) Two vlans are given: use TCA_FLOWER_KEY_CVLAN_ETH_TYPE

The match in case (a) is unneeded because flower has no its own
eth_type parameter. It was removed by Jamal Hadi Salim in commit
488b41d020fb06428b90289f70a41210718f52b7 in iproute2. For
TCA_FLOWER_KEY_ETH_TYPE the userspace uses the generic tc filter
protocol field. Therefore the match for the case (a) is done by tc
itself.

The matches in cases (b), (c) are unneeded because the protocol will
appear in and will be matched by flow_dissector_key_vlan.vlan_tpid.
Therefore in the best case, key->basic.n_proto will try to repeat vlan
key match again.

The below patch removes mask->basic.n_proto setting and resets it to 0
in case (c).

Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: align RX buffers

On RX an SKB is allocated and the received buffer is copied into it.
But on some architectures, the memcpy() needs the source and destination
buffers to have the same alignment to be efficient.

This is not our case, because SKB data pointer is misaligned by two bytes
to compensate the ethernet header.

Align the RX buffer the same way as the SKB one, so the copy is faster.
An iperf3 RX test gives a decent improvement on a RISC-V machine:

before:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   733 MBytes   615 Mbits/sec   88             sender
[  5]   0.00-10.01  sec   730 MBytes   612 Mbits/sec                  receiver

after:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.10 GBytes   942 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.09 GBytes   940 Mbits/sec                  receiver

And the memcpy() overhead during the RX drops dramatically.

before:
Overhead  Shared O  Symbol
  43.35%  [kernel]  [k] memcpy
  33.77%  [kernel]  [k] __asm_copy_to_user
   3.64%  [kernel]  [k] sifive_l2_flush64_range

after:
Overhead  Shared O  Symbol
  45.40%  [kernel]  [k] __asm_copy_to_user
  28.09%  [kernel]  [k] memcpy
   4.27%  [kernel]  [k] sifive_l2_flush64_range

Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Round-Robin EQs over IRQs

Whenever users provided affinity for an EQ creation request, map the
EQ to a matching IRQ.
Matching IRQ=IRQ with the same affinity and type (completion/control) of
the EQ created.

This mapping is being done in agressive dedicated IRQ allocation scheme,
which described bellow.

First, we check whether there is a matching IRQ that his min threshold
is not exhausted.
- min_eqs_threshold = 3 for control EQ.
- min_eqs_threshold = 1 for completion EQ.
In case no matching IRQ was found, try to request a new IRQ.
In case we can't request a new IRQ, reuse least-used matching IRQ.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Separate between public and private API of sf.h

Move mlx5_sf_max_functions() and friends from the privete sf/sf.h
to the public lib/sf.h. This is done in order to have one direction
include paths.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Enlarge interrupt field in CREATE_EQ

FW is now supporting more than 256 MSI-X per PF (up to 2K).
Hence, enlarge interrupt field in CREATE_EQ to make use of the new
MSI-X's.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Allocating a pool of MSI-X vectors for SFs

SFs (Sub Functions) currently use IRQs from the global IRQ table their
parent Physical Function have. In order to better scale, we need to
allocate more IRQs and share them between different SFs.

Driver will maintain 3 separated irq pools:
1. A pool that serve the PF consumer (PF's netdev, rdma stacks), similar
to what the driver had before this patch. i.e, this pool will share irqs
between rdma and netev, and will keep the irq indexes and allocation
order. The last is important for PF netdev rmap (aRFS).

2. A pool of control IRQs for SFs. The size of this pool is the number
of SFs that can be created divided by SFS_PER_IRQ. This pool will serve
the control path EQs of the SFs.

3. A pool of completion data path IRQs for SFs transport queues. The
size of this pool is:
num_irqs_allocated - pf_pool_size - sf_ctrl_pool_size.
This pool will served netdev and rdma stacks. Moreover, rmap is not
supported on SFs.

Sharing methodology of the SFs pools is explained in the next patch.

Important note: rmap is not supported on SFs because rmap mapping cannot
function correctly for IRQs that are shared for different core/netdev RX
rings.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Change IRQ storage logic from static to dynamic

Store newly created IRQs in the xarray DB instead of a static array,
so we will be able to store only IRQs which are being used.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Moving rmap logic to EQs

IRQs are being simplified in order to ease their sharing and any feature
specific object will be moved to upper layer.
Hence we move rmap object into eq_table.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Extend mlx5_irq_request to request IRQ from the kernel

Extend mlx5_irq_request so that IRQs will be requested upon EQ creation,
and not on driver boot.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Removing rmap per IRQ

In next patches, IRQs will be requested according to demand, instead of
statically on driver boot.
Also, currently, rmap is managed by the IRQ layer. rmap management will
move out from the IRQ layer in future patches.

Therefore, we want to remove the IRQ from the rmap, when IRQ is destroyed,
instead of removing all the IRQs from the rmap when irq_table is destroyed.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Clean license text in eq.[c|h] files

The eq.[c|h] files are under major rewrite. so use this opportunity and
update their copyright and license texts.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Provide cpumask at EQ creation phase

The users of EQ are running their code on different CPUs and with
various affinity patterns. Move the cpumask setting close to their
actual usage.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Introduce API for request and release IRQs

Introduce new API that will allow IRQs users to hold a pointer to
mlx5_irq.
In the end of this series, IRQs will be allocated on demand. Hence,
this will allow us to properly manage and use IRQs.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Delay IRQ destruction till all users are gone

Shared IRQ are consumed by multiple EQ users and in order to properly
initialize and later release such IRQs, we add kref counting of IRQ
structure.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Change ownership model for lag

Lag is used to combine two PCI functions of the same HCA into a single
logical unit. This is a core functionality and as such should be managed by
the core driver. Currently this isn't the case. While we store the lag
software structure inside the lower device, its lifetime (creation /
destruction) is dictated by the mlx5e part. Change the ownership model so
lag is tied to the lifetime of the lower level driver instead to the
mlx5e part.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Lag, Don't rescan if the device is going down

If MLX5_PRIV_FLAGS_DISABLE_ALL_ADEV is set it means the device is going
down and mlx5_rescan_drivers_locked() shouldn't be called.
With this patch and the previous one in the series, unbinding a PCI
function when its netdev is part of a bond works and leaves the system in a
working state.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Lag, refactor disable flow

When a net device is removed (can happen if the PCI function is unbound
from the system) it's not enough to destroy the hardware lag. The system
should recreate the original devices that were present before the lag.
As the same flow is done when a net device is removed from the bond
refactor and reuse the code.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net: wwan: Fix WWAN config symbols

There is not strong reason to have both WWAN and WWAN_CORE symbols,
Let's build the WWAN core framework when WWAN is selected, in the
same way as for other subsystems.

This fixes issue with mhi_net selecting WWAN_CORE without WWAN and
reported by kernel test robot:

Kconfig warnings: (for reference only)
   WARNING: unmet direct dependencies detected for WWAN_CORE
   Depends on NETDEVICES && WWAN
   Selected by
   - MHI_NET && NETDEVICES && NET_CORE && MHI_BUS

Fixes: 9a44c1cc6388 ("net: Add a WWAN subsystem")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: flow_dissector: fix RPS on DSA masters

After the blamed patch, __skb_flow_dissect() on the DSA master stopped
adjusting for the length of the DSA headers. This is because it was told
to adjust only if the needed_headroom is zero, aka if there is no DSA
header. Of course, the adjustment should be done only if there _is_ a
DSA header.

Modify the comment too so it is clearer.

Fixes: 4e50025129ef ("net: dsa: generalize overhead for taggers that use both headers and trailers")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: sja1105: constify the sja1105_regs structures

The struct sja1105_regs tables are not modified during the runtime of
the driver, so they can be made constant. In fact, struct sja1105_info
already holds a const pointer to these.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'tja1103-improvewmentsa'

Vladimir Oltean says:

====================
Fixes and improvements to TJA1103 PHY driver

This series contains:
- an erratum workaround for the TJA1103 PHY integrated in SJA1110
- an adaptation of the driver so it prints less unnecessary information
when probing on SJA1110
- a PTP RX timestamping bug fix and a clarification patch

Targeting net-next since the PHY support is currently in net-next only.

Changes in v3:
Added one more patch which improves the readability of
nxp_c45_reconstruct_ts.

Changes in v2:
Added a comment to the hardware workaround procedure.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: nxp-c45-tja11xx: enable MDIO write access to the master/slave registers

The SJA1110 switch integrates TJA1103 PHYs, but in SJA1110 switch rev B
silicon, there is a bug in that the registers for selecting the 100base-T1
autoneg master/slave roles are not writable.

To enable write access to the master/slave registers, these additional
PHY writes are necessary during initialization.

The issue has been corrected in later SJA1110 silicon versions and is
not present in the standalone PHY variants, but applying the workaround
unconditionally in the driver should not do any harm.

Suggested-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: nxp-c45-tja11xx: fix potential RX timestamp wraparound

The reconstruction procedure for partial timestamps reads the current
PTP time and fills in the low 2 bits of the second portion, as well as
the nanoseconds portion, from the actual hardware packet timestamp.
Critically, the reconstruction procedure works because it assumes that
the current PTP time is strictly larger than the hardware timestamp was:
it detects a 2-bit wraparound of the 'seconds' portion by checking whether
the 'seconds' portion of the partial hardware timestamp is larger than
the 'seconds' portion of the current time. That can only happen if the
hardware timestamp was captured by the PHY during the last phase of a
'modulo 4 seconds' interval, and the current PTP time was read by the
driver during the initial phase of the next 'modulo 4 seconds' interval.

The partial RX timestamps are added to priv->rx_queue in
nxp_c45_rxtstamp() and they are processed potentially in parallel by the
aux worker thread in nxp_c45_do_aux_work(). This means that it is
possible for nxp_c45_do_aux_work() to process more than one RX timestamp
during the same schedule.

There is one premature optimization that will cause issues: for RX
timestamping, the driver reads the current time only once, and it uses
that to reconstruct all PTP RX timestamps in the queue. For the second
and later timestamps, this will be an issue if we are processing two RX
timestamps which are to the left and to the right, respectively, of a
4-bit wraparound of the 'seconds' portion of the PTP time, and the
current PTP time is also pre-wraparound.

0.000000000        4.000000000        8.000000000        12.000000000
|..................|..................|..................|............>
                 ^ ^ ^ ^                                            time
                 | | | |
                 | | | process hwts 1 and hwts 2
                 | | |
                 | | hwts 2
                 | |
                 | read current PTP time
                 |
                 hwts 1

What will happen in that case is that hwts 2 (post-wraparound) will use
a stale current PTP time that is pre-wraparound.
But nxp_c45_reconstruct_ts will not detect this condition, because it is
not coded up for it, so it will reconstruct hwts 2 with a current time
from the previous 4 second interval (i.e. 0.something instead of
4.something).

This is solvable by making sure that the full 64-bit current time is
always read after the PHY has taken the partial RX timestamp. We do this
by reading the current PTP time for every timestamp in the RX queue.

Fixes: 514def5dd339 ("phy: nxp-c45-tja11xx: add timestamping support")
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: nxp-c45-tja11xx: express timestamp wraparound interval in terms of TS_SEC_MASK

nxp_c45_reconstruct_ts() takes a partial hardware timestamp in @hwts,
with 2 bits of the 'seconds' portion, and a full PTP time in @ts.

It patches in the lower bits of @hwts into @ts, and to ensure that the
reconstructed timestamp is correct, it checks whether the lower 2 bits
of @hwts are not in fact higher than the lower 2 bits of @ts. This is
not logically possible because, according to the calling convention, @ts
was collected later in time than @hwts, but due to two's complement
arithmetic it can actually happen, because the current PTP time might
have wrapped around between when @hwts was collected and when @ts was,
yielding the lower 2 bits of @ts smaller than those of @hwts.

To correct for that situation which is expected to happen under normal
conditions, the driver subtracts exactly one wraparound interval from
the reconstructed timestamp, since the upper bits of that need to
correspond to what the upper bits of @hwts were, not to what the upper
bits of @ts were.

Readers might be confused because the driver denotes the amount of bits
that the partial hardware timestamp has to offer as TS_SEC_MASK
(timestamp mask for seconds). But it subtracts a seemingly unrelated
BIT(2), which is in fact more subtle: if the hardware timestamp provides
2 bits of partial 'seconds' timestamp, then the wraparound interval is
2^2 == BIT(2).

But nonetheless, it is better to express the wraparound interval in
terms of a definition we already have, so replace BIT(2) with
1 + GENMASK(1, 0) which produces the same result but is clearer.

Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: nxp-c45-tja11xx: demote the "no PTP support" message to debug

The SJA1110 switch integrates these PHYs, and they do not have support
for timestamping. This message becomes quite overwhelming:

[   10.056596] NXP C45 TJA1103 spi1.0-base-t1:01: the phy does not support PTP
[   10.112625] NXP C45 TJA1103 spi1.0-base-t1:02: the phy does not support PTP
[   10.167461] NXP C45 TJA1103 spi1.0-base-t1:03: the phy does not support PTP
[   10.223510] NXP C45 TJA1103 spi1.0-base-t1:04: the phy does not support PTP
[   10.278239] NXP C45 TJA1103 spi1.0-base-t1:05: the phy does not support PTP
[   10.332663] NXP C45 TJA1103 spi1.0-base-t1:06: the phy does not support PTP
[   15.390828] NXP C45 TJA1103 spi1.2-base-t1:01: the phy does not support PTP
[   15.445224] NXP C45 TJA1103 spi1.2-base-t1:02: the phy does not support PTP
[   15.499673] NXP C45 TJA1103 spi1.2-base-t1:03: the phy does not support PTP
[   15.554074] NXP C45 TJA1103 spi1.2-base-t1:04: the phy does not support PTP
[   15.608516] NXP C45 TJA1103 spi1.2-base-t1:05: the phy does not support PTP
[   15.662996] NXP C45 TJA1103 spi1.2-base-t1:06: the phy does not support PTP

So reduce its log level to debug.

Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'Ingenic-SOC-mac-support'

Zhou Yanjie says:

====================
Add Ingenic SoCs MAC support.

v2->v3:
1.Add "ingenic,mac.yaml" for Ingenic SoCs.
2.Change tx clk delay and rx clk delay from hardware value to ps.
3.return -EINVAL when a unsupported value is encountered when
  parsing the binding.
4.Simplify the code of the RGMII part of X2000 SoC according to
  Andrew Lunn’s suggestion.
5.Follow the example of "dwmac-mediatek.c" to improve the code
  that handles delays according to Andrew Lunn’s suggestion.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: Add Ingenic SoCs MAC support.

Add support for Ingenic SoC MAC glue layer support for the stmmac
device driver. This driver is used on for the MAC ethernet controller
found in the JZ4775 SoC, the X1000 SoC, the X1600 SoC, the X1830 SoC,
and the X2000 SoC.

Signed-off-by: 周琰杰 (Zhou Yanjie) <zhouyanjie@wanyeetech.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: dwmac: Add bindings for new Ingenic SoCs.

Add the dwmac bindings for the JZ4775 SoC, the X1000 SoC,
the X1600 SoC, the X1830 SoC and the X2000 SoC from Ingenic.

Signed-off-by: 周琰杰 (Zhou Yanjie) <zhouyanjie@wanyeetech.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'marvell-prestera-devlink'

Oleksandr Mazur says:

====================
Marvell Prestera driver implementation of devlink functionality.

This patch series implement Prestera Switchdev driver devlink traps,
that are registered within the driver, as well as extend current devlink
functionality by adding new hard drop statistics counter, that could be
retrieved on-demand: the counter shows number of packets that have been
dropped by the underlying device and haven't been passed to the devlink
subsystem.

The core prestera-devlink functionality is implemented in the prestera_devlink.c.

The patch series also extends the existing devlink kernel API:
- devlink: add trap_drop_counter_get callback for driver to register - make it possible
   to keep track of how many packets have been dropped (hard) by the switch device, before
   the packets even made it to the devlink subsystem (e.g. dropped due to RXDMA buffer
   overflow).

The core features that extend current functionality of prestera Switchdev driver:
- add logic for driver traps and drops registration (also traps with DROP action).
- add documentation for prestera driver traps and drops group.

PATCH v2:
1) Rebase whole series on top of latest mater;
2) Remove storm control-related patches, as they're out of devlink
    scope;
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

documentation: networking: devlink: add prestera switched driver Documentation

Add documentation for the devlink feature prestera switchdev driver supports:
add description for the support of the driver-specific devlink traps
(include both traps with action TRAP and action DROP);

Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: marvell: prestera: devlink: add traps with DROP action

Add traps that have init_action being set to DROP.
Add 'trap_drop_counter_get' (devlink API) callback implementation,
that is used to get number of packets that have been dropped by the HW
(traps with action 'DROP').
Add new FW command CPU_CODE_COUNTERS_GET.

Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: marvell: prestera: devlink: add traps/groups implementation

Add devlink traps registration (with corresponding groups) for
all the traffic types that driver traps to the CPU;
prestera_rxtx: report each packet trapped to the CPU (RX) to the
prestera_devlink;

Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>

testing: selftests: drivers: net: netdevsim: devlink: add test case for hard drop statistics

Add hard drop counter check testcase, to make sure netdevsim driver
properly handles the devlink hard drop counters get/set callbacks.

Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers: net: netdevsim: add devlink trap_drop_counter_get implementation

Whenever query statistics is issued for trap with DROP action,
devlink subsystem would also fill-in statistics 'dropped' field.
In case if device driver did't register callback for hard drop
statistics querying, 'dropped' field will be omitted and not filled.
Add trap_drop_counter_get callback implementation to the netdevsim.
Add new test cases for netdevsim, to test both the callback
functionality, as well as drop statistics alteration check.

Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>

testing: selftests: net: forwarding: add devlink-required functionality to test (hard) dropped stats field

Add devlink_trap_drop_packets_get function, as well as test that are
used to verify devlink (hard) dropped stats functionality works.

Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: core: devlink: add dropped stats traps field

Whenever query statistics is issued for trap, devlink subsystem
would also fill-in statistics 'dropped' field. This field indicates
the number of packets HW dropped and failed to report to the device driver,
and thus - to the devlink subsystem itself.
In case if device driver didn't register callback for hard drop
statistics querying, 'dropped' field will be omitted and not filled.

Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: wwan: iosm: Remove DEBUG flag

Author forgot to remove that flag.

Fixes: f7af616c632e ("net: iosm: infrastructure")
Reported-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ibmvnic: fix send_request_map incompatible argument

The 3rd argument is u32 by function definition while it is __be32
by function declaration.

Signed-off-by: Lijun Pan <lijunp213@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ksz886x-cable-test'

Oleksij Rempel says:

====================
provide cable test support for the ksz886x switch

changes v5:
- drop resume() patch
- add Reviewed-by tags.
- rework dsa_slave_phy_connect() patch

changes v4:
- use fallthrough;
- use EOPNOTSUPP instead of ENOTSUPP
- drop flags variable in dsa_slave_phy_connect patch
- extend description for the "net: phy: micrel: apply resume errat"
patch
- fix "use consistent alignments" patch

changes v3:
- remove RFC tag

changes v2:
- use generic MII_* defines where possible
- rework phylink validate
- remove phylink get state function
- reorder cabletest patches to make PHY flag patch in the right order
- fix MDI-X detection

This patches provide support for cable testing on the ksz886x switches.
Since it has one special port, we needed to add phylink with validation
and extra quirk for the PHY to signal, that one port will not provide
valid cable testing reports.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: micrel: ksz886x/ksz8081: add cabletest support

This patch support for cable test for the ksz886x switches and the
ksz8081 PHY.

The patch was tested on a KSZ8873RLL switch with following results:

- port 1:
  - provides invalid values, thus return -ENOTSUPP
    (Errata: DS80000830A: "LinkMD does not work on Port 1",
     http://ww1.microchip.com/downloads/en/DeviceDoc/KSZ8873-Errata-DS80000830A.pdf)

- port 2:
  - can detect distance
  - can detect open on each wire of pair A (wire 1 and 2)
  - can detect open only on one wire of pair B (only wire 3)
  - can detect short between wires of a pair (wires 1 + 2 or 3 + 6)
  - short between pairs is detected as open.
    For example short between wires 2 + 3 is detected as open.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: dsa_slave_phy_connect(): extend phy's flags with port specific phy flags

The current get_phy_flags() is only processed when we connect to a PHY
via a designed phy-handle property via phylink_of_phy_connect(), but if
we fallback on the internal MDIO bus created by a switch and take the
dsa_slave_phy_connect() path then we would not be processing that flag
and using it at PHY connection time.

Suggested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: microchip: ksz8795: add LINK_MD register support

Add mapping for LINK_MD register to enable cable testing functionality.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: micrel: ksz8081 add MDI-X support

Add support for MDI-X status and configuration

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy/dsa micrel/ksz886x add MDI-X support

Add support for MDI-X status and configuration

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: micrel: use consistent alignments

This patch changes the alignments to one space between "#define" and the
macro.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: microchip: ksz8795: add phylink support

This patch adds the phylink support to the ksz8795 driver to provide
configuration exceptions on quirky KSZ8863 and KSZ8873 ports.

Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: micrel: move phy reg offsets to common header

Some micrel devices share the same PHY register defines. This patch
moves them to one common header so other drivers can reuse them.
And reuse generic MII_* defines where possible.

Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mld: avoid unnecessary high order page allocation in mld_newpack()

If link mtu is too big, mld_newpack() allocates high-order page.
But most mld packets don't need high-order page.
So, it might waste unnecessary pages.
To avoid this, it makes mld_newpack() try to allocate order-0 page.

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: qualcomm: rmnet: always expose a few functions

A recent change tidied up some conditional code, avoiding the use of
some #ifdefs. Unfortunately, if CONFIG_IPV6 was not enabled, it
meant that two functions were referenced but never defined.

The easiest fix is to just define stubs for these functions if
CONFIG_IPV6 is not defined. This will soon be simplified further
by some other development in the works...

Reported-by: kernel test robot <lkp@intel.com>
Fixes: 75db5b07f8c39 ("net: qualcomm: rmnet: eliminate some ifdefs")
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: fib6: remove redundant initialization of variable err

The variable err is being initialized with a value that is never read, the
assignment is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: micrel: remove redundant assignment to pointer of_node

The pointer of_node is being initialized with a value that is never
read and it is being updated later with a new value inside a do-while
loop. The initialization is redundant and can be removed and the
pointer dev is no longer required and can be removed too.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-pf: Cleanup flow rule management

Current MCAM allocation scheme allocates a single lot of
MCAM entries for ntuple filters, unicast filters and VF VLAN
rules. This patch attempts to cleanup this logic by segregating
MCAM rule allocation and management for Ntuple rules and unicast,
VF VLAN rules. This segregation will result in reusing most of
the logic for supporting ntuple filters for VF devices.

Also added debug messages for MCAM entry allocation failures.

Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nexthops: Add selftests for cleanup of known bad route add

Test cleanup path for routes usinig nexthop objects before the
reference is taken on the nexthop. Specifically, bad metric for
ipv4 and ipv6 and source routing for ipv6.

Selftests that correspond to the recent bug fix:
821bbf79fe46 ("ipv6: Fix KASAN: slab-out-of-bounds Read in fib6_nh_flush_exceptions")

Signed-off-by: David Ahern <dsahern@kernel.org>
Cc: Coco Li <lixiaoyan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'iosm-driver'

M Chetan Kumar says:

====================
net: iosm: PCIe Driver for Intel M.2 Modem

The IOSM (IPC over Shared Memory) driver is a PCIe host driver implemented
for linux or chrome platform for data exchange over PCIe interface between
Host platform & Intel M.2 Modem. The driver exposes interface conforming to
the MBIM protocol. Any front end application ( eg: Modem Manager) could
easily manage the MBIM interface to enable data communication towards WWAN.

Intel M.2 modem uses 2 BAR regions. The first region is dedicated to Doorbell
register for IRQs and the second region is used as scratchpad area for book
keeping modem execution stage details along with host system shared memory
region context details. The upper edge of the driver exposes the control and
data channels for user space application interaction. At lower edge these data
and control channels are associated to pipes. The pipes are lowest level
interfaces used over PCIe as a logical channel for message exchange. A single
channel maps to UL and DL pipe and are initialized on device open.

On UL path, driver copies application sent data to SKBs associate it with
transfer descriptor and puts it on to ring buffer for DMA transfer. Once
information has been updated in shared memory region, host gives a Doorbell
to modem to perform DMA and modem uses MSI to communicate back to host.
For receiving data in DL path, SKBs are pre-allocated during pipe open and
transfer descriptors are given to modem for DMA transfer.

The driver exposes two types of ports, namely "wwan0mbim0", a char device node
which is used for MBIM control operation and "wwan0-x",(x = 0,1,2..7) network
interfaces for IP data communication.
1) MBIM Control Interface:
This node exposes an interface between modem and application using char device
exposed by "IOSM" driver to establish and manage the MBIM data communication
with PCIe based Intel M.2 Modems.

2) MBIM Data Interface:
The IOSM driver exposes IP link interface "wwan0-x" of type "wwan" for IP traffic.
Iproute network utility is used for creating "wwan0-x" network interface and for
associating it with MBIM IP session. The Driver supports upto 8 IP sessions for
simultaneous IP communication.

This applies on top of WWAN core rtnetlink series posted here:
https://lore.kernel.org/netdev/1623486057-13075-1-git-send-email-loic.poulain@linaro.org/

Also driver has been compiled and tested on top of netdev net-next tree.
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: iosm: infrastructure

1) Kconfig & Makefile changes for IOSM Driver compilation.
2) Add IOSM Driver documentation.
3) Modified MAINTAINER file for IOSM Driver addition.

Signed-off-by: M Chetan Kumar <m.chetan.kumar@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: iosm: net driver

1) Create net device & implement net operations for data/IP communication.
2) Bind IP Link to mux IP session for simultaneous IP traffic.

Signed-off-by: M Chetan Kumar <m.chetan.kumar@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: iosm: uevent support

Report modem status via uevent.

Signed-off-by: M Chetan Kumar <m.chetan.kumar@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>