linux-2.6-microblaze.git
6 years agoMerge branch 'act_csum-spinlock-remove'
David S. Miller [Wed, 24 Jan 2018 00:51:46 +0000 (19:51 -0500)]
Merge branch 'act_csum-spinlock-remove'

Davide Caratti says:

====================
net/sched: remove spinlock from 'csum' action

Similarly to what has been done earlier with other actions [1][2], this
series tries to improve the performance of 'csum' tc action, removing a
spinlock in the data path. Patch 1 lets act_csum use per-CPU counters;
patch 2 removes spin_{,un}lock_bh() calls from the act() method.

test procedure (using pktgen from https://github.com/netoptimizer):

 # ip link add name eth1 type dummy
 # ip link set dev eth1 up
 # tc qdisc add dev eth1 root handle 1: prio
 # for a in pass drop; do
 > tc filter del dev eth1 parent 1: pref 10 matchall action csum udp
 > tc filter add dev eth1 parent 1: pref 10 matchall action csum udp $a
 > for n in 2 4; do
 > ./pktgen_bench_xmit_mode_queue_xmit.sh -v -s 64 -t $n -n 1000000 -i eth1
 > done
 > done

test results:

      |    |  before patch   |   after patch
  $a  | $n | avg. pps/thread | avg. pps/thread
 -----+----+-----------------+----------------
 pass |  2 |    1671463 ± 4% |    1920789 ± 3%
 pass |  4 |     648797 ± 1% |     738190 ± 1%
 drop |  2 |    3212692 ± 2% |    3719811 ± 2%
 drop |  4 |    1078824 ± 1% |    1328099 ± 1%

references:

[1] https://www.spinics.net/lists/netdev/msg334760.html
[2] https://www.spinics.net/lists/netdev/msg465862.html

v3 changes:
 - use rtnl_dereference() in place of rcu_dereference() in tcf_csum_dump()

v2 changes:
 - add 'drop' test, it produces more contentions
 - use RCU-protected struct to store 'action' and 'update_flags', to avoid
   reading the values from subsequent configurations
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/sched: act_csum: don't use spinlock in the fast path
Davide Caratti [Mon, 22 Jan 2018 17:14:32 +0000 (18:14 +0100)]
net/sched: act_csum: don't use spinlock in the fast path

use RCU instead of spin_{,unlock}_bh() to protect concurrent read/write on
act_csum configuration, to reduce the effects of contention in the data
path when multiple readers are present.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/sched: act_csum: use per-core statistics
Davide Caratti [Mon, 22 Jan 2018 17:14:31 +0000 (18:14 +0100)]
net/sched: act_csum: use per-core statistics

use per-CPU counters, like other TC actions do, instead of maintaining one
set of stats across all cores. This allows updating act_csum stats without
the need of protecting them using spin_{,un}lock_bh() invocations.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: link_watch: mark bonding link events urgent
Roopa Prabhu [Mon, 22 Jan 2018 16:07:19 +0000 (08:07 -0800)]
net: link_watch: mark bonding link events urgent

It takes 1sec for bond link down notification to hit user-space
when all slaves of the bond go down. 1sec is too long for
protocol daemons in user-space relying on bond notification
to recover (eg: multichassis lag implementations in user-space).
Since the link event code already marks team device port link events
 as urgent, this patch moves the code to cover all lag ports and master.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Wed, 24 Jan 2018 00:29:26 +0000 (19:29 -0500)]
Merge branch '10GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
10GbE Intel Wired LAN Driver Updates 2018-01-23

This series contains updates to ixgbe only.

Shannon Nelson provides an implementation of the ipsec hardware offload
feature for the ixgbe driver for these devices: x540, x550, 82599.

The ixgbe NICs support ipsec offload for 1024 Rx and 1024 Tx Security
Associations (SAs), using up to 128 inbound IP addresses, and using the
rfc4106(gcm(aes)) encryption.  This code does not yet support checksum
offload, or TSO in conjunction with the ipsec offload - those will be
added in the future.

This code shows improvements in both packet throughput and CPU utilization.
For example, here are some quicky numbers that show the magnitude of the
performance gain on a single run of "iperf -c <dest>" with the ipsec
offload on both ends of a point-to-point connection:

9.4 Gbps - normal case
7.6 Gbps - ipsec with offload
343 Mbps - ipsec no offload
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'GEHC-Bx50-Switch-Support'
David S. Miller [Wed, 24 Jan 2018 00:22:39 +0000 (19:22 -0500)]
Merge branch 'GEHC-Bx50-Switch-Support'

Sebastian Reichel says:

====================
GEHC Bx50 Switch Support

This adds support for the internal switch found in GE Healthcare
B450v3, B650v3 and B850v3. All devices use a GPIO bitbanged MDIO
bus to communicate with the switch and a PCIe based network card
for exchanging network data. The cpu network data link requires,
that the switch's internal phy interface is enabled, so support
for that is added by the first patch in this series.

The patch series is based on v4.15-rc8.

Changes since PATCHv4:
 * Introduce dsa_port_link_(un)register_of and mark the fixed
   variant static.
 * Update patch description to describe the phy<->phy connection
   from i210 to the Marvell switch
Changes since PATCHv3:
 * Enable the phy in dsa_port_setup() instead of abusing the
   fixed link setup function
Changes since PATCHv2:
 * Add phy nodes to switch in bx50.dtsi and reference them
   from switch ports
 * Enable cpu-port's phy based on 'phy-handle' instead of 'phy-mode'
Changes since PATCHv1:
 * Use 'marvell,mv88e6085' instead of introducing compatible
   string for mv88e6240.
 * Fix indention of DT nodes
 * Only enable 'cpu' phy, if explicitly set to "internal".
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoARM: dts: imx6q-b450v3: Add switch port configuration
Sebastian Reichel [Tue, 23 Jan 2018 15:03:50 +0000 (16:03 +0100)]
ARM: dts: imx6q-b450v3: Add switch port configuration

This adds support for the Marvell switch and names the network
ports according to the labels, that can be found next to the
connectors. The switch is connected to the host system using a
PCI based network card.

The PCI bus configuration has been written using the following
information:

root@b450v3# lspci -tv
-[0000:00]---00.0-[01]----00.0  Intel Corporation I210 Gigabit Network Connection
root@b450v3# lspci -nn
00:00.0 PCI bridge [0604]: Synopsys, Inc. Device [16c3:abcd] (rev 01)
01:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)

Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoARM: dts: imx6q-b650v3: Add switch port configuration
Sebastian Reichel [Tue, 23 Jan 2018 15:03:49 +0000 (16:03 +0100)]
ARM: dts: imx6q-b650v3: Add switch port configuration

This adds support for the Marvell switch and names the network
ports according to the labels, that can be found next to the
connectors. The switch is connected to the host system using a
PCI based network card.

The PCI bus configuration has been written using the following
information:

root@b650v3# lspci -tv
-[0000:00]---00.0-[01]----00.0  Intel Corporation I210 Gigabit Network Connection
root@b650v3# lspci -nn
00:00.0 PCI bridge [0604]: Synopsys, Inc. Device [16c3:abcd] (rev 01)
01:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)

Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoARM: dts: imx6q-b850v3: Add switch port configuration
Sebastian Reichel [Tue, 23 Jan 2018 15:03:48 +0000 (16:03 +0100)]
ARM: dts: imx6q-b850v3: Add switch port configuration

This adds support for the Marvell switch and names the network
ports according to the labels, that can be found next to the
connectors ("ID", "IX", "ePort 1", "ePort 2"). The switch is
connected to the host system using a PCI based network card.

The PCI bus configuration has been written using the following
information:

root@b850v3# lspci -tv
-[0000:00]---00.0-[01]----00.0-[02-05]--+-01.0-[03]----00.0  Intel Corporation I210 Gigabit Network Connection
                                        +-02.0-[04]----00.0  Intel Corporation I210 Gigabit Network Connection
                                        \-03.0-[05]--
root@b850v3# lspci -nn
00:00.0 PCI bridge [0604]: Synopsys, Inc. Device [16c3:abcd] (rev 01)
01:00.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8605 PCI Express 4-port Gen2 Switch [10b5:8605] (rev ab)
02:01.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8605 PCI Express 4-port Gen2 Switch [10b5:8605] (rev ab)
02:02.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8605 PCI Express 4-port Gen2 Switch [10b5:8605] (rev ab)
02:03.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8605 PCI Express 4-port Gen2 Switch [10b5:8605] (rev ab)
03:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
04:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)

Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoARM: dts: imx6q-bx50v3: Add internal switch
Sebastian Reichel [Tue, 23 Jan 2018 15:03:47 +0000 (16:03 +0100)]
ARM: dts: imx6q-bx50v3: Add internal switch

B850v3, B650v3 and B450v3 all have a GPIO bit banged MDIO bus to
communicate with a Marvell switch. On all devices the switch is
connected to a PCI based network card, which needs to be referenced
by DT, so this also adds the common PCI root node.

Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: Support internal phy on 'cpu' port
Sebastian Reichel [Tue, 23 Jan 2018 15:03:46 +0000 (16:03 +0100)]
net: dsa: Support internal phy on 'cpu' port

This adds support for enabling the internal PHY for a 'cpu' port.
It has been tested on GE B850v3,  B650v3 and B450v3, which have a
built-in MV88E6240 switch hardwired to a PCIe based network card.
On these machines the internal PHY of the i210 network card and
the Marvell switch are connected to each other and must be enabled
for properly using the switch. While the i210 PHY will be enabled
when the network interface is enabled, the switch's port is not
exposed as network interface. Additionally the mv88e6xxx driver
resets the chip during probe, so the PHY is disabled without this
patch.

Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David S. Miller [Tue, 23 Jan 2018 18:49:06 +0000 (13:49 -0500)]
Merge git://git./linux/kernel/git/davem/net

en_rx_am.c was deleted in 'net-next' but had a bug fixed in it in
'net'.

The esp{4,6}_offload.c conflicts were overlapping changes.
The 'out' label is removed so we just return ERR_PTR(-EINVAL)
directly.

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoixgbe: register ipsec offload with the xfrm subsystem
Shannon Nelson [Wed, 20 Dec 2017 00:00:02 +0000 (16:00 -0800)]
ixgbe: register ipsec offload with the xfrm subsystem

With all the support code in place we can now link in the ipsec
offload operations and set the ESP feature flag for the XFRM
subsystem to see.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoixgbe: ipsec offload stats
Shannon Nelson [Wed, 20 Dec 2017 00:00:01 +0000 (16:00 -0800)]
ixgbe: ipsec offload stats

Add a simple statistic to count the ipsec offloads.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoixgbe: process the Tx ipsec offload
Shannon Nelson [Wed, 20 Dec 2017 00:00:00 +0000 (16:00 -0800)]
ixgbe: process the Tx ipsec offload

If the skb has a security association referenced in the skb, then
set up the Tx descriptor with the ipsec offload bits.  While we're
here, we fix an oddly named field in the context descriptor struct.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoixgbe: process the Rx ipsec offload
Shannon Nelson [Tue, 19 Dec 2017 23:59:59 +0000 (15:59 -0800)]
ixgbe: process the Rx ipsec offload

If the chip sees and decrypts an ipsec offload, set up the skb
sp pointer with the ralated SA info.  Since the chip is rude
enough to keep to itself the table index it used for the
decryption, we have to do our own table lookup, using the
hash for speed.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoixgbe: restore offloaded SAs after a reset
Shannon Nelson [Tue, 19 Dec 2017 23:59:58 +0000 (15:59 -0800)]
ixgbe: restore offloaded SAs after a reset

On a chip reset most of the table contents are lost, so must be
restored.  This scans the driver's ipsec tables and restores both
the filled and empty table slots to their pre-reset values.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoixgbe: add ipsec offload add and remove SA
Shannon Nelson [Tue, 19 Dec 2017 23:59:57 +0000 (15:59 -0800)]
ixgbe: add ipsec offload add and remove SA

Add the functions for setting up and removing offloaded SAs (Security
Associations) with the x540 hardware.  We set up the callback structure
but we don't yet set the hardware feature bit to be sure the XFRM service
won't actually try to use us for an offload yet.

The software tables are made up to mimic the hardware tables to make it
easier to track what's in the hardware, and the SA table index is used
for the XFRM offload handle.  However, there is a hashing field in the
Rx SA tracking that will be used to facilitate faster table searches in
the Rx fast path.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoixgbe: add ipsec data structures
Shannon Nelson [Tue, 19 Dec 2017 23:59:56 +0000 (15:59 -0800)]
ixgbe: add ipsec data structures

Set up the data structures to be used by the ipsec offload.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoixgbe: add ipsec engine start and stop routines
Shannon Nelson [Tue, 19 Dec 2017 23:59:55 +0000 (15:59 -0800)]
ixgbe: add ipsec engine start and stop routines

Add in the code for running and stopping the hardware ipsec
encryption/decryption engine.  It is good to keep the engine
off when not in use in order to save on the power draw.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoixgbe: add ipsec register access routines
Shannon Nelson [Tue, 19 Dec 2017 23:59:54 +0000 (15:59 -0800)]
ixgbe: add ipsec register access routines

Add a few routines to make access to the ipsec registers just a little
easier, and throw in the beginnings of an initialization.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Tue, 23 Jan 2018 16:52:55 +0000 (08:52 -0800)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Fix divide by zero in mlx5, from Talut Batheesh.

 2) Guard against invalid GSO packets coming from untrusted guests and
    arriving in qdisc_pkt_len_init(), from Eric Dumazet.

 3) Similarly add such protection to the various protocol GSO handlers.
    From Willem de Bruijn.

 4) Fix regression added to IGMP source address checking for IGMPv3
    reports, from Felix Feitkau.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  tls: Correct length of scatterlist in tls_sw_sendpage
  be2net: restore properly promisc mode after queues reconfiguration
  net: igmp: fix source address check for IGMPv3 reports
  gso: validate gso_type in GSO handlers
  net: qdisc_pkt_len_init() should be more robust
  ibmvnic: Allocate and request vpd in init_resources
  ibmvnic: Revert to previous mtu when unsupported value requested
  ibmvnic: Modify buffer size and number of queues on failover
  rds: tcp: compute m_ack_seq as offset from ->write_seq
  usbnet: silence an unnecessary warning
  cxgb4: fix endianness for vlan value in cxgb4_tc_flower
  cxgb4: set filter type to 1 for ETH_P_IPV6
  net/mlx5e: Fix fixpoint divide exception in mlx5e_am_stats_compare

6 years agoixgbe: clean up ipsec defines
Shannon Nelson [Tue, 19 Dec 2017 23:59:53 +0000 (15:59 -0800)]
ixgbe: clean up ipsec defines

Clean up the ipsec/macsec descriptor bit definitions to match the rest
of the defines and file organization.  Also recognise the bit-definition
overlap in the error mask macro.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agosctp: reset ret in again path in sctp_for_each_transport
Xin Long [Tue, 23 Jan 2018 10:22:25 +0000 (18:22 +0800)]
sctp: reset ret in again path in sctp_for_each_transport

Commit 97a6ec4ac021 ("rhashtable: Change rhashtable_walk_start to
return void") only initialized ret for the first time, when going
to again path, the next tsp could be NULL. Without resetting ret,
cb_done would be called with tsp as NULL.

A kernel crash was caused by this when running sctpdiag testcase
in sctp-tests.

Note that this issue doesn't affect net.git yet.

Fixes: 97a6ec4ac021 ("rhashtable: Change rhashtable_walk_start to return void")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnx2: remove redundant initializations of pointers txr and rxr
Colin Ian King [Tue, 23 Jan 2018 09:44:08 +0000 (09:44 +0000)]
bnx2: remove redundant initializations of pointers txr and rxr

Pointers txr and rxr are being initialized and a few statements later
are being assigned new values without the original values ever being
read. The initialized values are therefore redundant and can be
removed.

Cleans up clang warnings:
drivers/net/ethernet/broadcom/bnx2.c:5821:28: warning: Value stored to
'txr' during its initialization is never read
drivers/net/ethernet/broadcom/bnx2.c:5822:28: warning: Value stored to
'rxr' during its initialization is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoforcedeth: remove duplicate structure member in rx
Zhu Yanjun [Tue, 23 Jan 2018 07:03:37 +0000 (02:03 -0500)]
forcedeth: remove duplicate structure member in rx

Since both first_rx_ctx and rx_skb are the head of rx ctx, it not
necessary to use two structure members to statically indicate
the head of rx ctx. So first_rx_ctx is removed.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Joe Jin <joe.jin@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'Kernel-doc-fixes-for-networking'
David S. Miller [Tue, 23 Jan 2018 16:06:51 +0000 (11:06 -0500)]
Merge branch 'Kernel-doc-fixes-for-networking'

Florian Fainelli says:

====================
Kernel doc fixes for networking

This patch series fixes kernel doc warnings found while running make htmldocs
pertaining to the networking subsystem. There is a finaly set of warnings due
to PHYLINK which I have not been able to resolve yet.

The last patch could thereoteically be applied to 'net' since the commit
referenced by the Fixes: tag is present in v4.15-rcX.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: core: Fix kernel-doc for netdev_upper_link()
Florian Fainelli [Tue, 23 Jan 2018 03:14:28 +0000 (19:14 -0800)]
net: core: Fix kernel-doc for netdev_upper_link()

Fixes the following warnings:
./net/core/dev.c:6438: warning: No description found for parameter 'extack'
./net/core/dev.c:6461: warning: No description found for parameter 'extack'

Fixes: 42ab19ee9029 ("net: Add extack to upper device linking")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: core: Fix kernel-doc for call_netdevice_notifiers_info()
Florian Fainelli [Tue, 23 Jan 2018 03:14:27 +0000 (19:14 -0800)]
net: core: Fix kernel-doc for call_netdevice_notifiers_info()

Remove the @dev comment, since we do not have a net_device argument, fixes the
following kernel doc warning: /net/core/dev.c:1707: warning: Excess function
parameter 'dev' description in 'call_netdevice_notifiers_info'

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: phy: sfp: Fix kernel doc warning
Florian Fainelli [Tue, 23 Jan 2018 03:14:26 +0000 (19:14 -0800)]
net: phy: sfp: Fix kernel doc warning

We forgot to update the kernel doc header above sfp_register_upstream()

Fixes: c19bb00070dd ("sfp: convert to fwnode")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: core: Fix kernel-doc for carrier_* attributes
Florian Fainelli [Tue, 23 Jan 2018 03:14:25 +0000 (19:14 -0800)]
net: core: Fix kernel-doc for carrier_* attributes

Fix the documentation warning:

include/linux/netdevice.h:1939: warning: Excess struct member 'carrier_changes' description in 'net_device'

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: b2d3bcfa26a7 ("net: core: Expose number of link up/down transitions")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: aquantia: make symbol hw_atl_boards static
Wei Yongjun [Tue, 23 Jan 2018 02:10:38 +0000 (02:10 +0000)]
net: aquantia: make symbol hw_atl_boards static

Fixes the following sparse warning:

drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c:50:34: warning:
 symbol 'hw_atl_boards' was not declared. Should it be static?

Fixes: 4948293ff963 ("net: aquantia: Introduce new AQC devices and capabilities")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: aquantia: Fix error return code in aq_pci_probe()
Wei Yongjun [Tue, 23 Jan 2018 02:10:46 +0000 (02:10 +0000)]
net: aquantia: Fix error return code in aq_pci_probe()

Fix to return error code -ENOMEM from the aq_ndev_alloc() error
handling case instead of 0, as done elsewhere in this function.

Fixes: 23ee07ad3c2f ("net: aquantia: Cleanup pci functions module")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonfp: fix error return code in nfp_pci_probe()
Wei Yongjun [Tue, 23 Jan 2018 02:10:27 +0000 (02:10 +0000)]
nfp: fix error return code in nfp_pci_probe()

Fix to return error code -EINVAL instead of 0 when num_vfs above
limit_vfs, as done elsewhere in this function.

Fixes: 0dc786219186 ("nfp: handle SR-IOV already enabled when driver is probing")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonfp: fix fw dump handling of absolute rtsym size
Carl Heymann [Tue, 23 Jan 2018 01:29:43 +0000 (17:29 -0800)]
nfp: fix fw dump handling of absolute rtsym size

Fix bug that causes _absolute_ rtsym sizes of > 8 bytes (as per symbol
table) to result in incorrect space used during a TLV-based debug dump.

Detail: The size calculation stage calculates the correct size (size of
the rtsym address field == 8), while the dump uses the size in the table
to calculate the TLV size to reserve. Symbols with size <= 8 are handled
OK due to aligning sizes to 8, but including any absolute symbol with
listed size > 8 leads to an ENOSPC error during the dump.

Fixes: da762863edd9 ("nfp: fix absolute rtsym handling in debug dump")
Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonfsd: auth: Fix gid sorting when rootsquash enabled
Ben Hutchings [Mon, 22 Jan 2018 20:11:06 +0000 (20:11 +0000)]
nfsd: auth: Fix gid sorting when rootsquash enabled

Commit bdcf0a423ea1 ("kernel: make groups_sort calling a responsibility
group_info allocators") appears to break nfsd rootsquash in a pretty
major way.

It adds a call to groups_sort() inside the loop that copies/squashes
gids, which means the valid gids are sorted along with the following
garbage.  The net result is that the highest numbered valid gids are
replaced with any lower-valued garbage gids, possibly including 0.

We should sort only once, after filling in all the gids.

Fixes: bdcf0a423ea1 ("kernel: make groups_sort calling a responsibility ...")
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agotun: avoid calling xdp_rxq_info_unreg() twice
Cong Wang [Mon, 22 Jan 2018 21:49:27 +0000 (13:49 -0800)]
tun: avoid calling xdp_rxq_info_unreg() twice

Similarly to tx ring, xdp_rxq_info is only registered
when !tfile->detached, so we need to avoid calling
xdp_rxq_info_unreg() twice too. The helper tun_cleanup_tx_ring()
already checks for this properly, so it is correct to put
xdp_rxq_info_unreg() just inside there.

Reported-by: syzbot+1c788d7ce0f0888f1d7f@syzkaller.appspotmail.com
Fixes: 8565d26bcb2f ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoorangefs: initialize op on loop restart in orangefs_devreq_read
Martin Brandenburg [Mon, 22 Jan 2018 20:44:52 +0000 (15:44 -0500)]
orangefs: initialize op on loop restart in orangefs_devreq_read

In orangefs_devreq_read, there is a loop which picks an op off the list
of pending ops.  If the loop fails to find an op, there is nothing to
read, and it returns EAGAIN.  If the op has been given up on, the loop
is restarted via a goto.  The bug is that the variable which the found
op is written to is not reinitialized, so if there are no more eligible
ops on the list, the code runs again on the already handled op.

This is triggered by interrupting a process while the op is being copied
to the client-core.  It's a fairly small window, but it's there.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoorangefs: use list_for_each_entry_safe in purge_waiting_ops
Martin Brandenburg [Mon, 22 Jan 2018 20:44:51 +0000 (15:44 -0500)]
orangefs: use list_for_each_entry_safe in purge_waiting_ops

set_op_state_purged can delete the op.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoMerge branch 'net-sched-add-extack-support-for-cls-offloads'
David S. Miller [Mon, 22 Jan 2018 21:30:30 +0000 (16:30 -0500)]
Merge branch 'net-sched-add-extack-support-for-cls-offloads'

Jakub Kicinski says:

====================
net: sched: add extack support for cls offloads

I've dropped the tests from the series because test_offloads.py changes
will conflict with bpf-next patches.  I will send four more patches with
tests once bpf-next is merged back, hopefully still making it into 4.16 :)

v4:
 - rebase on top of Alex's changes.

---

Quentin says:

This series tries to improve user experience when eBPF hardware offload
hits error paths at load time. In particular, it introduces netlink
extended ack support in the nfp driver.

To that aim, transmission of the pointer to the extack object is piped
through the `change()` operation of the existing classifiers (patch 1 to
6). Then it is used for TC offload in the nfp driver (patch 8) and in
netdevsim (patch 9, selftest in patch 10). Patch 7 adds a helper to handle
extack messages in the core when TC offload is disabled on the net device.

For completeness extack is propagated for classifiers other than cls_bpf,
but it's up to the drivers to make use of it.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonfp: bpf: use extack support to improve debugging
Quentin Monnet [Sat, 20 Jan 2018 01:44:50 +0000 (17:44 -0800)]
nfp: bpf: use extack support to improve debugging

Use the recently added extack support for eBPF offload in the driver.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonfp: bpf: plumb extack into functions related to XDP offload
Quentin Monnet [Sat, 20 Jan 2018 01:44:49 +0000 (17:44 -0800)]
nfp: bpf: plumb extack into functions related to XDP offload

Pass a pointer to an extack object to nfp_app_xdp_offload() in order to
prepare for extack usage in the nfp driver. Next step will be to forward
this extack pointer to nfp_net_bpf_offload(), once this function is able
to use it for printing error messages.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: sched: create tc_can_offload_extack() wrapper
Quentin Monnet [Sat, 20 Jan 2018 01:44:48 +0000 (17:44 -0800)]
net: sched: create tc_can_offload_extack() wrapper

Create a wrapper around tc_can_offload() that takes an additional
extack pointer argument in order to output an error message if TC
offload is disabled on the device.

In this way, the error message is handled by the core and can be the
same for all drivers.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: sched: add extack support for offload via tc_cls_common_offload
Quentin Monnet [Sat, 20 Jan 2018 01:44:47 +0000 (17:44 -0800)]
net: sched: add extack support for offload via tc_cls_common_offload

Add extack support for hardware offload of classifiers. In order
to achieve this, a pointer to a struct netlink_ext_ack is added to the
struct tc_cls_common_offload that is passed to the callback for setting
up the classifier. Function tc_cls_common_offload_init() is updated to
support initialization of this new attribute.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: sched: cls_bpf: plumb extack support in filter for hardware offload
Quentin Monnet [Sat, 20 Jan 2018 01:44:46 +0000 (17:44 -0800)]
net: sched: cls_bpf: plumb extack support in filter for hardware offload

Pass the extack pointer obtained in the `->change()` filter operation to
cls_bpf_offload() and then to cls_bpf_offload_cmd(). This makes it
possible to use this extack pointer in drivers offloading BPF programs
in a future patch.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: sched: cls_u32: propagate extack support for filter offload
Quentin Monnet [Sat, 20 Jan 2018 01:44:45 +0000 (17:44 -0800)]
net: sched: cls_u32: propagate extack support for filter offload

Propagate the extack pointer from the `->change()` classifier operation
to the function used for filter replacement in cls_u32. This makes it
possible to use netlink extack messages in the future at replacement
time for this filter, although it is not used at this point.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: sched: cls_matchall: propagate extack support for filter offload
Quentin Monnet [Sat, 20 Jan 2018 01:44:44 +0000 (17:44 -0800)]
net: sched: cls_matchall: propagate extack support for filter offload

Propagate the extack pointer from the `->change()` classifier operation
to the function used for filter replacement in cls_matchall. This makes
it possible to use netlink extack messages in the future at replacement
time for this filter, although it is not used at this point.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: sched: cls_flower: propagate extack support for filter offload
Quentin Monnet [Sat, 20 Jan 2018 01:44:43 +0000 (17:44 -0800)]
net: sched: cls_flower: propagate extack support for filter offload

Propagate the extack pointer from the `->change()` classifier operation
to the function used for filter replacement in cls_flower. This makes it
possible to use netlink extack messages in the future at replacement
time for this filter, although it is not used at this point.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotls: Correct length of scatterlist in tls_sw_sendpage
Dave Watson [Fri, 19 Jan 2018 20:30:13 +0000 (12:30 -0800)]
tls: Correct length of scatterlist in tls_sw_sendpage

The scatterlist is reused by both sendmsg and sendfile.
If a sendmsg of smaller number of pages is followed by a sendfile
of larger number of pages, the scatterlist may be too short, resulting
in a crash in gcm_encrypt.

Add sg_unmark_end to make the list the correct length.

tls_sw_sendmsg already calls sg_unmark_end correctly when it allocates
memory in alloc_sg, or in zerocopy_from_iter.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agohv_netvsc: Use the num_online_cpus() for channel limit
Haiyang Zhang [Fri, 19 Jan 2018 20:26:43 +0000 (13:26 -0700)]
hv_netvsc: Use the num_online_cpus() for channel limit

Since we no longer localize channel/CPU affiliation within one NUMA
node, num_online_cpus() is used as the number of channel cap, instead of
the number of processors in a NUMA node.

This patch allows a bigger range for tuning the number of channels.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobe2net: restore properly promisc mode after queues reconfiguration
Ivan Vecera [Fri, 19 Jan 2018 19:23:50 +0000 (20:23 +0100)]
be2net: restore properly promisc mode after queues reconfiguration

The commit 622190669403 ("be2net: Request RSS capability of Rx interface
depending on number of Rx rings") modified be_update_queues() so the
IFACE (HW representation of the netdevice) is destroyed and then
re-created. This causes a regression because potential promiscuous mode
is not restored properly during be_open() because the driver thinks
that the HW has promiscuous mode already enabled.

Note that Lancer is not affected by this bug because RX-filter flags are
disabled during be_close() for this chipset.

Cc: Sathya Perla <sathya.perla@broadcom.com>
Cc: Ajit Khaparde <ajit.khaparde@broadcom.com>
Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Cc: Somnath Kotur <somnath.kotur@broadcom.com>
Fixes: 622190669403 ("be2net: Request RSS capability of Rx interface depending on number of Rx rings")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: converting spaces into tabs to avoid checkpatch.pl warning
Salil Mehta [Fri, 19 Jan 2018 15:20:53 +0000 (15:20 +0000)]
net: hns3: converting spaces into tabs to avoid checkpatch.pl warning

Spaces were mistakenly used instead of tabs in some of the code related
to reset functionality, which caused checkpatch.pl errors. These were
missed earlier so fixing them now.

Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: igmp: fix source address check for IGMPv3 reports
Felix Fietkau [Fri, 19 Jan 2018 10:50:46 +0000 (11:50 +0100)]
net: igmp: fix source address check for IGMPv3 reports

Commit "net: igmp: Use correct source address on IGMPv3 reports"
introduced a check to validate the source address of locally generated
IGMPv3 packets.
Instead of checking the local interface address directly, it uses
inet_ifa_match(fl4->saddr, ifa), which checks if the address is on the
local subnet (or equal to the point-to-point address if used).

This breaks for point-to-point interfaces, so check against
ifa->ifa_local directly.

Cc: Kevin Cernekee <cernekee@chromium.org>
Fixes: a46182b00290 ("net: igmp: Use correct source address on IGMPv3 reports")
Reported-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agocxgb3: assign port id to net_device->dev_port
Arjun Vynipadath [Fri, 19 Jan 2018 09:41:48 +0000 (15:11 +0530)]
cxgb3: assign port id to net_device->dev_port

T3 devices have different ports on same PCI function,
so using dev_port to identify ports.

Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobridge: return boolean instead of integer in br_multicast_is_router
Gustavo A. R. Silva [Thu, 18 Jan 2018 23:37:45 +0000 (17:37 -0600)]
bridge: return boolean instead of integer in br_multicast_is_router

Return statements in functions returning bool should use
true/false instead of 1/0.

This issue was detected with the help of Coccinelle.

Fixes: 85b352693264 ("bridge: Fix build error when IGMP_SNOOPING is not enabled")
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: stmmac: Fix reception of Broadcom switches tags
Florian Fainelli [Thu, 18 Jan 2018 23:12:21 +0000 (15:12 -0800)]
net: stmmac: Fix reception of Broadcom switches tags

Broadcom tags inserted by Broadcom switches put a 4 byte header after
the MAC SA and before the EtherType, which may look like some sort of 0
length LLC/SNAP packet (tcpdump and wireshark do think that way). With
ACS enabled in stmmac the packets were truncated to 8 bytes on
reception, whereas clearing this bit allowed normal reception to occur.

In order to make that possible, we need to pass a net_device argument to
the different core_init() functions and we are dependent on the Broadcom
tagger padding packets correctly (which it now does). To be as little
invasive as possible, this is only done for gmac1000 when the network
device is DSA-enabled (netdev_uses_dsa() returns true).

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'hns3-new-features'
David S. Miller [Mon, 22 Jan 2018 21:05:50 +0000 (16:05 -0500)]
Merge branch 'hns3-new-features'

Peng Li says:

====================
add some features to hns3 driver

This patchset adds some features to hns3 driver, include the support
for ethtool command -d, -p and support for manager table.

[Patch 1/4] adds support for ethtool command -d, its ops is get_regs.
driver will send command to command queue, and get regs number and
regs value from command queue.
[Patch 2/4] adds manager table initialization for hardware.
[Patch 3/4] adds support for ethtool command -p. For fiber ports, driver
sends command to command queue, and IMP will write SGPIO regs to control
leds.
[Patch 4/4] adds support for net status led for fiber ports. Net status
include  port speed, total rx/tx packets and link status. Driver send
the status to command queue, and IMP will write SGPIO to control leds.

---
Change log:
V1 -> V2:
1, fix comments from Andrew Lunn, remove the patch "net: hns3: add
ethtool -p support for phy device".
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: add net status led support for fiber port
Jian Shen [Fri, 19 Jan 2018 06:41:12 +0000 (14:41 +0800)]
net: hns3: add net status led support for fiber port

Check the net status per second, include port speed, total rx/tx packets
and link status. Updating the led status for fiber port.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: add ethtool -p support for fiber port
Jian Shen [Fri, 19 Jan 2018 06:41:11 +0000 (14:41 +0800)]
net: hns3: add ethtool -p support for fiber port

Add led location support for fiber port. The led will keep blinking
when locating.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: add manager table initialization for hardware
Fuyun Liang [Fri, 19 Jan 2018 06:41:10 +0000 (14:41 +0800)]
net: hns3: add manager table initialization for hardware

The manager table is empty by default. If it is not initialized, the
management pkgs like LLDP will be dropped by hardware. Default entries
need to be added to manager table.

Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: add support for get_regs
Fuyun Liang [Fri, 19 Jan 2018 06:41:09 +0000 (14:41 +0800)]
net: hns3: add support for get_regs

This patch adds get_regs support for ethtool cmd.

Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agogso: validate gso_type in GSO handlers
Willem de Bruijn [Fri, 19 Jan 2018 14:29:18 +0000 (09:29 -0500)]
gso: validate gso_type in GSO handlers

Validate gso_type during segmentation as SKB_GSO_DODGY sources
may pass packets where the gso_type does not match the contents.

Syzkaller was able to enter the SCTP gso handler with a packet of
gso_type SKB_GSO_TCPV4.

On entry of transport layer gso handlers, verify that the gso_type
matches the transport protocol.

Fixes: 90017accff61 ("sctp: Add GSO support")
Link: http://lkml.kernel.org/r/<001a1137452496ffc305617e5fe0@google.com>
Reported-by: syzbot+fee64147a25aecd48055@syzkaller.appspotmail.com
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: qdisc_pkt_len_init() should be more robust
Eric Dumazet [Fri, 19 Jan 2018 03:59:19 +0000 (19:59 -0800)]
net: qdisc_pkt_len_init() should be more robust

Without proper validation of DODGY packets, we might very well
feed qdisc_pkt_len_init() with invalid GSO packets.

tcp_hdrlen() might access out-of-bound data, so let's use
skb_header_pointer() and proper checks.

Whole story is described in commit d0c081b49137 ("flow_dissector:
properly cap thoff field")

We have the goal of validating DODGY packets earlier in the stack,
so we might very well revert this fix in the future.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jason Wang <jasowang@redhat.com>
Reported-by: syzbot+9da69ebac7dddd804552@syzkaller.appspotmail.com
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'ibmvnic-reset-behavior-fixes'
David S. Miller [Mon, 22 Jan 2018 20:46:56 +0000 (15:46 -0500)]
Merge branch 'ibmvnic-reset-behavior-fixes'

John Allen says:

====================
ibmvnic: Reset behavior fixes

This patchset fixes a number of issues related to ibmvnic reset uncovered
from testing new Power9 machines with Everglades adapters and the new
functionality to change mtu and other parameters in the driver.

Changes since v1:
-In patch 1/3, added the line to free the long term buffers before
allocating a new one. This change inadvertently uncovered the problem
that the number of queues can change after a failover as well. To fix
this, we check whether or not the number of queues has changed in
do_reset and if they have, we do a full release and init of the queues.
-In patch 1/3, added variables to the adapter struct to track how
many rx/tx pools have actually been allocated and modify the release
pools routines to use these values rather than the possibly incorrect
req_rx/tx_queues values.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoibmvnic: Allocate and request vpd in init_resources
John Allen [Thu, 18 Jan 2018 22:27:58 +0000 (16:27 -0600)]
ibmvnic: Allocate and request vpd in init_resources

In reset events in which our memory allocations need to be reallocated,
VPD data is being freed, but never reallocated. This can cause issues if
we later attempt to access that memory or reset and attempt to free the
memory. This patch moves the allocation of the VPD data to init_resources
so that it will be symmetrically freed during release resources.

Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoibmvnic: Revert to previous mtu when unsupported value requested
John Allen [Thu, 18 Jan 2018 22:27:12 +0000 (16:27 -0600)]
ibmvnic: Revert to previous mtu when unsupported value requested

If we request an unsupported mtu value, the vnic server will suggest a
different value. Currently we take the suggested value without question
and login with that value. However, the behavior doesn't seem completely
sane as attempting to change the mtu to some specific value will change
the mtu to some completely different value most of the time. This patch
fixes the issue by logging in with the previously used mtu value and
printing an error message saying that the given mtu is unsupported.

Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoibmvnic: Modify buffer size and number of queues on failover
John Allen [Thu, 18 Jan 2018 22:26:31 +0000 (16:26 -0600)]
ibmvnic: Modify buffer size and number of queues on failover

Using newer backing devices can cause the required padding at the end of
buffer as well as the number of queues to change after a failover.
Since we currently assume that these values never change, after a
failover to a backing device with different capabilities, we can get
errors from the vnic server, attempt to free long term buffers that are
no longer there, or not free long term buffers that should be freed.

This patch resolves the issue by checking whether any of these values
change, and if so perform the necessary re-allocations.

Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agords: tcp: compute m_ack_seq as offset from ->write_seq
Sowmini Varadhan [Thu, 18 Jan 2018 21:11:07 +0000 (13:11 -0800)]
rds: tcp: compute m_ack_seq as offset from ->write_seq

rds-tcp uses m_ack_seq to track the tcp ack# that indicates
that the peer has received a rds_message. The m_ack_seq is
used in rds_tcp_is_acked() to figure out when it is safe to
drop the rds_message from the RDS retransmit queue.

The m_ack_seq must be calculated as an offset from the right
edge of the in-flight tcp buffer, i.e., it should be based on
the ->write_seq, not the ->snd_nxt.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: core: Expose number of link up/down transitions
David Decotigny [Thu, 18 Jan 2018 17:59:13 +0000 (09:59 -0800)]
net: core: Expose number of link up/down transitions

Expose the number of times the link has been going UP or DOWN, and
update the "carrier_changes" counter to be the sum of these two events.
While at it, also update the sysfs-class-net documentation to cover:
carrier_changes (3.15), carrier_up_count (4.16) and carrier_down_count
(4.16)

Signed-off-by: David Decotigny <decot@googlers.com>
[Florian:
* rebase
* add documentation
* merge carrier_changes with up/down counters]
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomacsec: restore uAPI after addition of GCM-AES-256
Sabrina Dubroca [Thu, 18 Jan 2018 16:48:18 +0000 (17:48 +0100)]
macsec: restore uAPI after addition of GCM-AES-256

Commit ccfdec908922 ("macsec: Add support for GCM-AES-256 cipher suite")
changed a few values in the uapi headers for MACsec.

Because of existing userspace implementations, we need to preserve the
value of MACSEC_DEFAULT_CIPHER_ID. Not doing that resulted in
wpa_supplicant segfaults when a secure channel was created using the
default cipher. Thus, swap MACSEC_DEFAULT_CIPHER_{ID,ALT} back to their
original values.

Changing the maximum length of the MACSEC_SA_ATTR_KEY attribute is
unnecessary, as the previous value (MACSEC_MAX_KEY_LEN, which was 128B)
is large enough to carry 32-bytes keys. This patch reverts
MACSEC_MAX_KEY_LEN to 128B and restores the old length check on
MACSEC_SA_ATTR_KEY.

Fixes: ccfdec908922 ("macsec: Add support for GCM-AES-256 cipher suite")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns: Fix for variable may be used uninitialized warnings
Huazhong Tan [Thu, 18 Jan 2018 02:37:34 +0000 (10:37 +0800)]
net: hns: Fix for variable may be used uninitialized warnings

When !CONFIG_REGMAP hns throws compiler warnings since
dsaf_read_syscon ignores the return result from regmap_read,
which allows val to be uninitialized.

Fixes: 86897c960b49 ("net: hns: add syscon operation for dsaf")
Reported-by: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: fec: add necessary defines to work on ARM64
Lucas Stach [Wed, 17 Jan 2018 18:30:55 +0000 (19:30 +0100)]
net: fec: add necessary defines to work on ARM64

The i.MX8 is a ARMv8 based SoC, that uses the same FEC IP as the
earlier, ARMv7 based, i.MX SoCs. Allow the driver to work on ARM64.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Acked-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agousbnet: silence an unnecessary warning
Oliver Neukum [Wed, 17 Jan 2018 15:33:54 +0000 (16:33 +0100)]
usbnet: silence an unnecessary warning

That a kevent could not be scheduled is not an error.
Such handlers must be able to deal with multiple events anyway.
As the successful scheduling of a work is a debug event, make
the failure debug priority, too.

V2: coding style

Signed-off-by: Oliver Neukum <oneukum@suse.com>
Reported-by: Cristian Caravena <caravena@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'cxgb4-tc-flower-offload-fixes'
David S. Miller [Mon, 22 Jan 2018 20:26:57 +0000 (15:26 -0500)]
Merge branch 'cxgb4-tc-flower-offload-fixes'

Daniel Borkmann says:

====================
pull-request: bpf 2018-01-18

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix a divide by zero due to wrong if (src_reg == 0) check in
   64-bit mode. Properly handle this in interpreter and mask it
   also generically in verifier to guard against similar checks
   in JITs, from Eric and Alexei.

2) Fix a bug in arm64 JIT when tail calls are involved and progs
   have different stack sizes, from Daniel.

3) Reject stores into BPF context that are not expected BPF_STX |
   BPF_MEM variant, from Daniel.

4) Mark dst reg as unknown on {s,u}bounds adjustments when the
   src reg has derived bounds from dead branches, from Daniel.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agocxgb4: fix endianness for vlan value in cxgb4_tc_flower
Kumar Sanghvi [Wed, 17 Jan 2018 06:43:34 +0000 (12:13 +0530)]
cxgb4: fix endianness for vlan value in cxgb4_tc_flower

Don't change endianness when assigning vlan value in cxgb4_tc_flower
code when processing flow match parameters. The value gets converted
to network order as part of filtering code in set_filter_wr.

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agocxgb4: set filter type to 1 for ETH_P_IPV6
Kumar Sanghvi [Wed, 17 Jan 2018 06:43:33 +0000 (12:13 +0530)]
cxgb4: set filter type to 1 for ETH_P_IPV6

For ethtype_key = ETH_P_IPV6, set filter type as 1 in cxgb4_tc_flower
code when processing flow match parameters.

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agovirtio_net: Add ethtool stats
Toshiaki Makita [Wed, 17 Jan 2018 06:38:25 +0000 (15:38 +0900)]
virtio_net: Add ethtool stats

The main purpose of this patch is adding a way of checking per-queue stats.
It's useful to debug performance problems on multiqueue environment.

$ ethtool -S ens10
NIC statistics:
     rx_queue_0_packets: 2090408
     rx_queue_0_bytes: 3164825094
     rx_queue_1_packets: 2082531
     rx_queue_1_bytes: 3152932314
     tx_queue_0_packets: 2770841
     tx_queue_0_bytes: 4194955474
     tx_queue_1_packets: 3084697
     tx_queue_1_bytes: 4670196372

This change converts existing per-cpu stats structure into per-queue one.
This should not impact on performance since each queue counter is not
updated concurrently by multiple cpus.

Performance numbers:
 - Guest has 2 vcpus and 2 queues
 - Guest runs netserver
 - Host runs 100-flow super_netperf

                     Before      After       Diff
UDP_STREAM 18byte        86.22       87.00   +0.90%
UDP_STREAM 1472byte    4055.27     4042.18   -0.32%
TCP_STREAM            16956.32    16890.63   -0.39%
UDP_RR               178667.11   185862.70   +4.03%
TCP_RR               128473.04   124985.81   -2.71%

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomm, page_vma_mapped: Introduce pfn_in_hpage()
Kirill A. Shutemov [Mon, 22 Jan 2018 09:22:30 +0000 (12:22 +0300)]
mm, page_vma_mapped: Introduce pfn_in_hpage()

The new helper would check if the pfn belongs to the page. For huge
pages it checks if the PFN is within range covered by the huge page.

The helper is used in check_pte(). The original code the helper replaces
had two call to page_to_pfn(). page_to_pfn() is relatively costly.

Although current GCC is able to optimize code to have one call, it's
better to do this explicitly.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoMerge branch 'mvpp2-Armada-7k-8k-PP2-ACPI-support'
David S. Miller [Mon, 22 Jan 2018 15:57:05 +0000 (10:57 -0500)]
Merge branch 'mvpp2-Armada-7k-8k-PP2-ACPI-support'

Marcin Wojtas says:

====================
Armada 7k/8k PP2 ACPI support

I quickly resend the series, thanks to Antoine Tenart's remark,
who spotted !CONFIG_ACPI compilation issue after introducing
the new fwnode_irq_get() routine. Please see the details in the changelog
below and the 3/7 commit log.

mvpp2 driver can work with the ACPI representation, as exposed
on a public branch:
https://github.com/MarvellEmbeddedProcessors/edk2-open-platform/commits/marvell-armada-wip
It was compiled together with the most recent Tianocore EDK2 revision.
Please refer to the firmware build instruction on MacchiatoBin board:
http://wiki.macchiatobin.net/tiki-index.php?page=Build+from+source+-+UEFI+EDK+II

ACPI representation of PP2 controllers (withouth PHY support) can
be viewed in the github:
* MacchiatoBin:
https://github.com/MarvellEmbeddedProcessors/edk2-open-platform/blob/71ae395da1661374b0f07d1602afb1eee56e9794/Platforms/Marvell/Armada/AcpiTables/Armada80x0McBin/Dsdt.asl#L201

* Armada 7040 DB:
https://github.com/MarvellEmbeddedProcessors/edk2-open-platform/blob/71ae395da1661374b0f07d1602afb1eee56e9794/Platforms/Marvell/Armada/AcpiTables/Armada70x0/Dsdt.asl#L131

I will appreciate any comments or remarks.

Best regards,
Marcin

Changelog:
v3 -> v4:
* 3/7
    - add new macro (ACPI_HANDLE_FWNODE) and fix
      compilation with !CONFIG_ACPI
    - extend commit log and mention usability of fwnode_irq_get
      for the child nodes as well

v2 -> v3:
* 1/7, 2/7
    - Add Rafael's Acked-by's
* 3/7, 4/7
    - New patches
* 6/7, 7/7
    - Update driver with new helper routines usage
    - Improve commit log.

v1 -> v2:
* Remove MDIO patches
* Use PP2 ports only with link interrupts
* Release second region resources in mvpp2 driver (code moved from
  mvmdio), as explained in details in 5/5 commit message.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: mvpp2: enable ACPI support in the driver
Marcin Wojtas [Thu, 18 Jan 2018 12:31:44 +0000 (13:31 +0100)]
net: mvpp2: enable ACPI support in the driver

This patch introduces an alternative way of obtaining resources - via
ACPI tables provided by firmware. Enabling coexistence with the DT
support, in addition to the OF_*->device_*/fwnode_* API replacement,
required following steps to be taken:

* Add mvpp2_acpi_match table
* Omit clock configuration and obtain tclk from the property - in ACPI
  world, the firmware is responsible for clock maintenance.
* Disable comphy and syscon handling as they are not available for ACPI.
* Modify way of obtaining interrupts - use newly introduced
  fwnode_irq_get() routine
* Until proper MDIO bus and PHY handling with ACPI is established in the
  kernel, use only link interrupts feature in the driver. For the RGMII
  port it results in depending on GMAC settings done during firmware
  stage.
* When booting with ACPI MVPP2_QDIST_MULTI_MODE is picked by
  default, as there is no need to keep any kind of the backward
  compatibility.

Moreover, a memory region used by mvmdio driver is usually placed in
the middle of the address space of the PP2 network controller.
The MDIO base address is obtained without requesting memory region
(by devm_ioremap() call) in mvmdio.c, later overlapping resources are
requested by the network driver, which is responsible for avoiding
a concurrent access.

In case the MDIO memory region is declared in the ACPI, it can
already appear as 'in-use' in the OS. Because it is overlapped by second
region of the network controller, make sure it is released, before
requesting it again. The care is taken by mvpp2 driver to avoid
concurrent access to this memory region.

Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: mvpp2: use device_*/fwnode_* APIs instead of of_*
Marcin Wojtas [Thu, 18 Jan 2018 12:31:43 +0000 (13:31 +0100)]
net: mvpp2: use device_*/fwnode_* APIs instead of of_*

OF functions can be used only for the driver using DT.
As a preparation for introducing ACPI support in mvpp2
driver, use struct fwnode_handle in order to obtain
properties from the hardware description.

This patch replaces of_* function with device_*/fwnode_*
where possible in the mvpp2.

Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: mvpp2: simplify maintaining enabled ports' list
Marcin Wojtas [Thu, 18 Jan 2018 12:31:42 +0000 (13:31 +0100)]
net: mvpp2: simplify maintaining enabled ports' list

'port_count' field of the mvpp2 structure holds an overall amount
of available ports, based on DT nodes status. In order to be prepared
to support other HW description, obtain the value by incrementing it
upon each successful port initialization. This allowed for simplifying
port indexing in the controller's private array, whose size is now not
dynamically allocated, but fixed to MVPP2_MAX_PORTS.

This patch simplifies creating and filling list of enabled ports and
is a part of the preparation for adding ACPI support in the mvpp2 driver.

Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agodevice property: Allow iterating over available child fwnodes
Marcin Wojtas [Thu, 18 Jan 2018 12:31:41 +0000 (13:31 +0100)]
device property: Allow iterating over available child fwnodes

Implement a new helper function fwnode_get_next_available_child_node(),
which enables obtaining next enabled child fwnode, which
works on a similar basis to OF's of_get_next_available_child().

This commit also introduces a macro, thanks to which it is
possible to iterate over the available fwnodes, using the
new function described above.

Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agodevice property: Introduce fwnode_irq_get()
Marcin Wojtas [Thu, 18 Jan 2018 12:31:40 +0000 (13:31 +0100)]
device property: Introduce fwnode_irq_get()

Until now there were two very similar functions allowing
to get Linux IRQ number from ACPI handle (acpi_irq_get())
and OF node (of_irq_get()). The first one appeared to be used
only as a subroutine of platform_irq_get(), which (in the generic
code) limited IRQ obtaining from _CRS method only to nodes
associated to kernel's struct platform_device.

This patch introduces a new helper routine - fwnode_irq_get(),
which allows to get the IRQ number directly from the fwnode
to be used as common for OF/ACPI worlds. It is usable not
only for the parents fwnodes, but also for the child nodes
comprising their own _CRS methods with interrupts description.

In order to be able o satisfy compilation with !CONFIG_ACPI
and also simplify the new code, introduce a helper macro
(ACPI_HANDLE_FWNODE), with which it is possible to reach
an ACPI handle directly from its fwnode.

Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agodevice property: Introduce fwnode_get_phy_mode()
Marcin Wojtas [Thu, 18 Jan 2018 12:31:39 +0000 (13:31 +0100)]
device property: Introduce fwnode_get_phy_mode()

Until now there were two almost identical functions for
obtaining network PHY mode - of_get_phy_mode() and,
more generic, device_get_phy_mode(). However it is not uncommon,
that the network interface is represented as a child
of the actual controller, hence it is not associated
directly to any struct device, required by the latter
routine.

This commit allows for getting the PHY mode for
children nodes in the ACPI world by introducing a new function -
fwnode_get_phy_mode(). This commit also changes
device_get_phy_mode() routine to be its wrapper, in order
to prevent unnecessary duplication.

Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agodevice property: Introduce fwnode_get_mac_address()
Marcin Wojtas [Thu, 18 Jan 2018 12:31:38 +0000 (13:31 +0100)]
device property: Introduce fwnode_get_mac_address()

Until now there were two almost identical functions for
obtaining MAC address - of_get_mac_address() and, more generic,
device_get_mac_address(). However it is not uncommon,
that the network interface is represented as a child
of the actual controller, hence it is not associated
directly to any struct device, required by the latter
routine.

This commit allows for getting the MAC address for
children nodes in the ACPI world by introducing a new function -
fwnode_get_mac_address(). This commit also changes
device_get_mac_address() routine to be its wrapper, in order
to prevent unnecessary duplication.

Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: caif: remove redundant re-assignment of pointer pfrm
Colin Ian King [Mon, 22 Jan 2018 15:25:43 +0000 (15:25 +0000)]
net: caif: remove redundant re-assignment of pointer pfrm

The pointer pfrm is initialized and then later re-assigned the same
value and hence the second assignment is redundant and can be removed.

Cleans up clang warning:
drivers/net/caif/caif_hsi.c:222:6: warning: Value stored to 'pfrm'
during its initialization is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agocxgb4: add geneve offload support for T6
Ganesh Goudar [Mon, 22 Jan 2018 13:18:26 +0000 (18:48 +0530)]
cxgb4: add geneve offload support for T6

Add geneve segmentation offload support of T6 cards.

Original work by: Santosh Rastapur <santosh@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge tag 'mac80211-next-for-davem-2018-01-22' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Mon, 22 Jan 2018 14:36:37 +0000 (09:36 -0500)]
Merge tag 'mac80211-next-for-davem-2018-01-22' of git://git./linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
Less than a handful of changes:
 * possible memory leak fix in hwsim
 * speed up hwsim
 * add hwsim userspace rate control API
 * code cleanups
====================

A conflict was resolved in mac80211_hwsim.c, mostly of
the simple overlapping changes category.  One adding
a rhashtable and another adding a workqueue.

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agodevlink: fix memory leak on 'resource'
Colin Ian King [Mon, 22 Jan 2018 10:31:19 +0000 (10:31 +0000)]
devlink: fix memory leak on 'resource'

Currently, if the call to devlink_resource_find returns null then
the error exit path does not free the devlink_resource 'resource'
and a memory leak occurs. Fix this by kfree'ing resource on the
error exit path.

Detected by CoverityScan, CID#1464184 ("Resource leak")

Fixes: d9f9b9a4d05f ("devlink: Add support for resource abstraction")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'mlxsw-spectrum_router-Optimize-LPM-trees'
David S. Miller [Mon, 22 Jan 2018 14:22:11 +0000 (09:22 -0500)]
Merge branch 'mlxsw-spectrum_router-Optimize-LPM-trees'

Jiri Pirko says:

====================
mlxsw: spectrum_router: Optimize LPM trees

Ido says:

This set tries to optimize the structure of the LPM trees used for route
lookup by avoiding lookups that are guaranteed not to return a result.
This is done by making sure only used prefix lengths are present in the
tree.

First two patches are small preparatory steps towards the actual change
in the last patch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum_router: Remove unnecessary prefix lengths from LPM tree
Ido Schimmel [Mon, 22 Jan 2018 08:17:42 +0000 (09:17 +0100)]
mlxsw: spectrum_router: Remove unnecessary prefix lengths from LPM tree

In commit fc922bb0dd94 ("mlxsw: spectrum_router: Use one LPM tree for
all virtual routers") I tried to make sure only used prefix lengths are
present in the LPM tree shared between all virtual routers.

However, this optimization had to be removed in commit a69518cf0b4c
("mlxsw: spectrum_router: Avoid expensive lookup during route removal"),
since determining the used prefix lengths required us to traverse all
the active virtual routers, which could result in a hung task depending
on the number of VRFs and whether routes were removed due to abort or
not.

Re-introduce the optimization by moving the prefix usage accounting from
the virtual routers to the LPM tree, as this accounting is only used in
order to determine the tree's structure.

To make the sharing of the trees more explicit, the two trees (for IPv4
and IPv6) are stored in the shared router struct and upon the creation
of a virtual router it is immediately bound to both.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum_router: Pass FIB node to LPM tree unlink function
Ido Schimmel [Mon, 22 Jan 2018 08:17:41 +0000 (09:17 +0100)]
mlxsw: spectrum_router: Pass FIB node to LPM tree unlink function

Next patch will try to optimize the LPM tree and make sure only used
prefix lengths are present, to avoid unnecessary look-ups.

Pass the currently removed FIB node to the unlinking function as its
associated prefix length is a potential candidate for removal from the
tree.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum_router: Use the nodes list as indication for empty FIB
Ido Schimmel [Mon, 22 Jan 2018 08:17:40 +0000 (09:17 +0100)]
mlxsw: spectrum_router: Use the nodes list as indication for empty FIB

Currently, each FIB (IPv4 / IPv6) in a virtual router holds a prefix
usage that is used to choose a matching LPM tree, but also to check if
the FIB is empty, so that the LPM tree could be unbound.

Next patches will remove the reliance on the per-FIB prefix usage for
LPM tree matching. Keeping it only to check if the FIB is empty is a
waste, since we can use the nodes ({Prefix, Length}) list instead.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotun: add missing rcu annotation
Jason Wang [Mon, 22 Jan 2018 02:55:38 +0000 (10:55 +0800)]
tun: add missing rcu annotation

This patch fixes the following sparse warnings:

drivers/net/tun.c:2241:15: error: incompatible types in comparison expression (different address spaces)

Fixes: cd5681d7d890 ("tuntap: rename struct tun_steering_prog to struct tun_prog")
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomac80211_hwsim: fix possible memory leak in hwsim_new_radio_nl()
weiyongjun (A) [Thu, 18 Jan 2018 02:23:34 +0000 (02:23 +0000)]
mac80211_hwsim: fix possible memory leak in hwsim_new_radio_nl()

'hwname' is malloced in hwsim_new_radio_nl() and should be freed
before leaving from the error handling cases, otherwise it will cause
memory leak.

Fixes: ff4dd73dd2b4 ("mac80211_hwsim: check HWSIM_ATTR_RADIO_NAME length")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agodebugfs_sta: Remove unneeded semicolons
Christopher Díaz Riveros [Wed, 17 Jan 2018 21:03:00 +0000 (16:03 -0500)]
debugfs_sta: Remove unneeded semicolons

Trivial fix removes unneeded semicolons after switch blocks.

Signed-off-by: Christopher Díaz Riveros <chrisadr@gentoo.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agomm, page_vma_mapped: Drop faulty pointer arithmetics in check_pte()
Kirill A. Shutemov [Fri, 19 Jan 2018 12:49:24 +0000 (15:49 +0300)]
mm, page_vma_mapped: Drop faulty pointer arithmetics in check_pte()

Tetsuo reported random crashes under memory pressure on 32-bit x86
system and tracked down to change that introduced
page_vma_mapped_walk().

The root cause of the issue is the faulty pointer math in check_pte().
As ->pte may point to an arbitrary page we have to check that they are
belong to the section before doing math. Otherwise it may lead to weird
results.

It wasn't noticed until now as mem_map[] is virtually contiguous on
flatmem or vmemmap sparsemem. Pointer arithmetic just works against all
'struct page' pointers. But with classic sparsemem, it doesn't because
each section memap is allocated separately and so consecutive pfns
crossing two sections might have struct pages at completely unrelated
addresses.

Let's restructure code a bit and replace pointer arithmetic with
operations on pfns.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-and-tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()")
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoMerge branch 'mlxsw-Add-support-for-mirror-action-with-flower'
David S. Miller [Sun, 21 Jan 2018 23:21:31 +0000 (18:21 -0500)]
Merge branch 'mlxsw-Add-support-for-mirror-action-with-flower'

Jiri Pirko says:

====================
mlxsw: Add support for mirror action with flower

Arkadi says:

Add support for mirror action with flower classifier. The first 3 patches
introduce a generic per-block resource infra. The last 4 patches add
support for flow based span.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum_acl: Add support for mirror action
Arkadi Sharshevsky [Fri, 19 Jan 2018 08:24:52 +0000 (09:24 +0100)]
mlxsw: spectrum_acl: Add support for mirror action

Add support for mirror action. Only one mirror action can be set per rule.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>