linux-2.6-microblaze.git
5 years agonet: mvpp2: cls: Remove unnessesary check in mvpp2_ethtool_cls_rule_ins
YueHaibing [Wed, 29 May 2019 02:59:06 +0000 (10:59 +0800)]
net: mvpp2: cls: Remove unnessesary check in mvpp2_ethtool_cls_rule_ins

Fix smatch warning:

drivers/net/ethernet/marvell/mvpp2/mvpp2_cls.c:1236
 mvpp2_ethtool_cls_rule_ins() warn: unsigned 'info->fs.location' is never less than zero.

'info->fs.location' is u32 type, never less than zero.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: stmmac: Switch to devm_alloc_etherdev_mqs
Jisheng Zhang [Wed, 29 May 2019 02:26:07 +0000 (02:26 +0000)]
net: stmmac: Switch to devm_alloc_etherdev_mqs

Make use of devm_alloc_etherdev_mqs() to simplify the code.

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotua6100: Avoid build warnings.
David S. Miller [Thu, 30 May 2019 18:36:15 +0000 (11:36 -0700)]
tua6100: Avoid build warnings.

Rename _P to _P_VAL and _R to _R_VAL to avoid global
namespace conflicts:

drivers/media/dvb-frontends/tua6100.c: In function ‘tua6100_set_params’:
drivers/media/dvb-frontends/tua6100.c:79: warning: "_P" redefined
 #define _P 32

In file included from ./include/acpi/platform/aclinux.h:54,
                 from ./include/acpi/platform/acenv.h:152,
                 from ./include/acpi/acpi.h:22,
                 from ./include/linux/acpi.h:34,
                 from ./include/linux/i2c.h:17,
                 from drivers/media/dvb-frontends/tua6100.h:30,
                 from drivers/media/dvb-frontends/tua6100.c:32:
./include/linux/ctype.h:14: note: this is the location of the previous definition
 #define _P 0x10 /* punct */

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'Enable-SFP-on-ACPI-based-systems'
David S. Miller [Thu, 30 May 2019 18:27:47 +0000 (11:27 -0700)]
Merge branch 'Enable-SFP-on-ACPI-based-systems'

Ruslan Babayev says:

====================
Enable SFP on ACPI based systems

Changes:
v2:
- more descriptive commit body
v3:
- made 'i2c_acpi_find_adapter_by_handle' static inline
v4:
- don't initialize i2c_adapter to NULL. Instead see below...
- handle the case of neither DT nor ACPI present as invalid.
- alphabetical includes.
- use has_acpi_companion().
- use the same argument name in i2c_acpi_find_adapter_by_handle()
  in both stubbed and non-stubbed cases.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: sfp: enable i2c-bus detection on ACPI based systems
Ruslan Babayev [Tue, 28 May 2019 23:02:33 +0000 (16:02 -0700)]
net: phy: sfp: enable i2c-bus detection on ACPI based systems

Lookup I2C adapter using the "i2c-bus" device property on ACPI based
systems similar to how it's done with DT.

An example DSD describing an SFP on an ACPI based system:

Device (SFP0)
{
    Name (_HID, "PRP0001")
    Name (_CRS, ResourceTemplate()
    {
        GpioIo(Exclusive, PullDefault, 0, 0, IoRestrictionNone,
               "\\_SB.PCI0.RP01.GPIO", 0, ResourceConsumer)
            { 0, 1, 2, 3, 4 }
    })
    Name (_DSD, Package ()
    {
        ToUUID ("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
        Package () {
            Package () { "compatible", "sff,sfp" },
            Package () { "i2c-bus", \_SB.PCI0.RP01.I2C.MUX.CH0 },
            Package () { "maximum-power-milliwatt", 1000 },
            Package () { "tx-disable-gpios", Package () { ^SFP0, 0, 0, 1} },
            Package () { "reset-gpio",       Package () { ^SFP0, 0, 1, 1} },
            Package () { "mod-def0-gpios",   Package () { ^SFP0, 0, 2, 1} },
            Package () { "tx-fault-gpios",   Package () { ^SFP0, 0, 3, 0} },
            Package () { "los-gpios",        Package () { ^SFP0, 0, 4, 1} },
        },
    })
}

Device (PHY0)
{
    Name (_HID, "PRP0001")
    Name (_DSD, Package ()
    {
        ToUUID ("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
        Package () {
            Package () { "compatible", "ethernet-phy-ieee802.3-c45" },
            Package () { "sfp", \_SB.PCI0.RP01.SFP0 },
            Package () { "managed", "in-band-status" },
            Package () { "phy-mode", "sgmii" },
        },
    })
}

Signed-off-by: Ruslan Babayev <ruslan@babayev.com>
Cc: xe-linux-external@cisco.com
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoi2c: acpi: export i2c_acpi_find_adapter_by_handle
Ruslan Babayev [Tue, 28 May 2019 23:02:32 +0000 (16:02 -0700)]
i2c: acpi: export i2c_acpi_find_adapter_by_handle

This allows drivers to lookup i2c adapters on ACPI based systems similar to
of_get_i2c_adapter_by_node() with DT based systems.

Signed-off-by: Ruslan Babayev <ruslan@babayev.com>
Cc: xe-linux-external@cisco.com
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: tja11xx: Switch to HWMON_CHANNEL_INFO()
Marek Vasut [Tue, 28 May 2019 18:15:41 +0000 (20:15 +0200)]
net: phy: tja11xx: Switch to HWMON_CHANNEL_INFO()

The HWMON_CHANNEL_INFO macro simplifies the code, reduces the likelihood
of errors, and makes the code easier to read.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: linux-hwmon@vger.kernel.org
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ethernet: ti: cpsw: correct .ndo_open error path
Ivan Khoronzhuk [Tue, 28 May 2019 17:45:19 +0000 (20:45 +0300)]
net: ethernet: ti: cpsw: correct .ndo_open error path

It's found while review and probably never happens, but real number
of queues is set per device, and error path should be per device.
So split error path based on usage_count.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'Decoupling-PHYLINK-from-struct-net_device'
David S. Miller [Thu, 30 May 2019 04:48:54 +0000 (21:48 -0700)]
Merge branch 'Decoupling-PHYLINK-from-struct-net_device'

Ioana Ciornei says:

====================
Decoupling PHYLINK from struct net_device

Following two separate discussion threads in:
  https://www.spinics.net/lists/netdev/msg569087.html
and:
  https://www.spinics.net/lists/netdev/msg570450.html

Previous RFC patch set: https://www.spinics.net/lists/netdev/msg571995.html

PHYLINK was reworked in order to accept multiple operation types,
PHYLINK_NETDEV and PHYLINK_DEV, passed through a phylink_config
structure alongside the corresponding struct device.

One of the main concerns expressed in the RFC was that using notifiers
to signal the corresponding phylink_mac_ops would break PHYLINK's API
unity and that it would become harder to grep for its users.
Using the current approach, we maintain a common API for all users.
Also, printing useful information in PHYLINK, when decoupled from a
net_device, is achieved using dev_err&co on the struct device received
(in DSA's case is the device corresponding to the dsa_switch).

PHYLIB (which PHYLINK uses) was reworked to the extent that it does not
crash when connecting to a PHY and the net_device pointer is NULL.

Lastly, DSA has been reworked in its way that it handles PHYs for ports
that lack a net_device (CPU and DSA ports).  For these, it was
previously using PHYLIB and is now using the PHYLINK_DEV operation type.
Previously, a driver that wanted to support PHY operations on CPU/DSA
ports has to implement .adjust_link(). This patch set not only gives
drivers the options to use PHYLINK uniformly but also urges them to
convert to it. For compatibility, the old code is kept but it will be
removed once all drivers switch over.

The patchset was tested on the NXP LS1021A-TSN board having the
following Ethernet layout:
  https://lkml.org/lkml/2019/5/5/279
The CPU port was moved from the internal RGMII fixed-link (enet2 ->
switch port 4) to an external loopback Cat5 cable between the enet1 port
and the front-facing swp2 SJA1105 port. In this mode, both the master
and the CPU port have an attached PHY which detects link change events:

[   49.105426] fsl-gianfar soc:ethernet@2d50000 eth1: Link is Down
[   50.305486] sja1105 spi0.1: Link is Down
[   53.265596] fsl-gianfar soc:ethernet@2d50000 eth1: Link is Up - 1Gbps/Full - flow control off
[   54.466304] sja1105 spi0.1: Link is Up - 1Gbps/Full - flow control off

Changes in v2:
  - fixed sparse warnings
  - updated 'Documentation/ABI/testing/sysfs-class-net-phydev'
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: sja1105: Fix broken fixed-link interfaces on user ports
Vladimir Oltean [Tue, 28 May 2019 17:38:17 +0000 (20:38 +0300)]
net: dsa: sja1105: Fix broken fixed-link interfaces on user ports

PHYLIB and PHYLINK handle fixed-link interfaces differently. PHYLIB
wraps them in a software PHY ("pseudo fixed link") phydev construct such
that .adjust_link driver callbacks see an unified API. Whereas PHYLINK
simply creates a phylink_link_state structure and passes it to
.mac_config.

At the time the driver was introduced, DSA was using PHYLIB for the
CPU/cascade ports (the ones with no net devices) and PHYLINK for
everything else.

As explained below:

commit aab9c4067d2389d0adfc9c53806437df7b0fe3d5
Author: Florian Fainelli <f.fainelli@gmail.com>
Date:   Thu May 10 13:17:36 2018 -0700

  net: dsa: Plug in PHYLINK support

  Drivers that utilize fixed links for user-facing ports (e.g: bcm_sf2)
  will need to implement phylink_mac_ops from now on to preserve
  functionality, since PHYLINK *does not* create a phy_device instance
  for fixed links.

In the above patch, DSA guards the .phylink_mac_config callback against
a NULL phydev pointer.  Therefore, .adjust_link is not called in case of
a fixed-link user port.

This patch fixes the situation by converting the driver from using
.adjust_link to .phylink_mac_config.  This can be done now in a unified
fashion for both slave and CPU/cascade ports because DSA now uses
PHYLINK for all ports.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: Use PHYLINK for the CPU/DSA ports
Ioana Ciornei [Tue, 28 May 2019 17:38:16 +0000 (20:38 +0300)]
net: dsa: Use PHYLINK for the CPU/DSA ports

For DSA switches that do not have an .adjust_link callback, aka those
who transitioned totally to the PHYLINK-compliant API, use PHYLINK to
drive the CPU/DSA ports.

The PHYLIB usage and .adjust_link are kept but deprecated, and users are
asked to transition from it.  The reason why we can't do anything for
them is because PHYLINK does not wrap the fixed-link state behind a
phydev object, so we cannot wrap .phylink_mac_config into .adjust_link
unless we fabricate a phy_device structure.

For these ports, the newly introduced PHYLINK_DEV operation type is
used and the dsa_switch device structure is passed to PHYLINK for
printing purposes.  The handling of the PHYLINK_NETDEV and PHYLINK_DEV
PHYLINK instances is common from the perspective of the driver.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: Move the phylink driver calls into port.c
Ioana Ciornei [Tue, 28 May 2019 17:38:15 +0000 (20:38 +0300)]
net: dsa: Move the phylink driver calls into port.c

In order to have a common handling of PHYLINK for the slave and non-user
ports, the DSA core glue logic (between PHYLINK and the driver) must use
an API that does not rely on a struct net_device.

These will also be called by the CPU-port-handling code in a further
patch.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phylink: Add phylink_{printk, err, warn, info, dbg} macros
Ioana Ciornei [Tue, 28 May 2019 17:38:14 +0000 (20:38 +0300)]
net: phylink: Add phylink_{printk, err, warn, info, dbg} macros

With the latest addition to the PHYLINK infrastructure, we are faced
with a decision on when to print necessary info using the struct
net_device and when with the struct device.

Add a series of macros that encapsulate this decision and replace all
uses of netdev_err&co with phylink_err.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phylink: Add PHYLINK_DEV operation type
Ioana Ciornei [Tue, 28 May 2019 17:38:13 +0000 (20:38 +0300)]
net: phylink: Add PHYLINK_DEV operation type

In the PHYLINK_DEV operation type, the PHYLINK infrastructure can work
without an attached net_device. For printing usecases, instead, a struct
device * should be passed to PHYLINK using the phylink_config structure.

Also, netif_carrier_* calls ar guarded by the presence of a valid
net_device. When using the PHYLINK_DEV operation type, we cannot check
link status using the netif_carrier_ok() API so instead, keep an
internal state of the MAC and call mac_link_{down,up} only when the link
changed.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phylink: Add struct phylink_config to PHYLINK API
Ioana Ciornei [Tue, 28 May 2019 17:38:12 +0000 (20:38 +0300)]
net: phylink: Add struct phylink_config to PHYLINK API

The phylink_config structure will encapsulate a pointer to a struct
device and the operation type requested for this instance of PHYLINK.
This patch does not make any functional changes, it just transitions the
PHYLINK internals and all its users to the new API.

A pointer to a phylink_config structure will be passed to
phylink_create() instead of the net_device directly. Also, the same
phylink_config pointer will be passed back to all phylink_mac_ops
callbacks instead of the net_device. Using this mechanism, a PHYLINK
user can get the original net_device using a structure such as
'to_net_dev(config->dev)' or directly the structure containing the
phylink_config using a container_of call.

At the moment, only the PHYLINK_NETDEV is defined as a valid operation
type for PHYLINK. In this mode, a valid reference to a struct device
linked to the original net_device should be passed to PHYLINK through
the phylink_config structure.

This API changes is mainly driven by the necessity of adding a new
operation type in PHYLINK that disconnects the phy_device from the
net_device and also works when the net_device is lacking.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phylink: Add phylink_mac_link_{up, down} wrapper functions
Ioana Ciornei [Tue, 28 May 2019 17:38:11 +0000 (20:38 +0300)]
net: phylink: Add phylink_mac_link_{up, down} wrapper functions

This is a cosmetic patch that reduces the clutter in phylink_resolve
around calling the .mac_link_up/.mac_link_down driver callbacks.  In a
further patch this logic will be extended to emit notifications in case
a net device does not exist.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: Add phy_standalone sysfs entry
Ioana Ciornei [Tue, 28 May 2019 17:38:10 +0000 (20:38 +0300)]
net: phy: Add phy_standalone sysfs entry

Export a phy_standalone device attribute that is meant to give the
indication that this PHY lacks an attached_dev and its corresponding
sysfs link. The attribute will be created only when the
phy_attach_direct() function will be called with a NULL net_device.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: Check against net_device being NULL
Ioana Ciornei [Tue, 28 May 2019 17:38:09 +0000 (20:38 +0300)]
net: phy: Check against net_device being NULL

In general, we don't want MAC drivers calling phy_attach_direct with the
net_device being NULL. Add checks against this in all the functions
calling it: phy_attach() and phy_connect_direct().

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: Guard against the presence of a netdev
Ioana Ciornei [Tue, 28 May 2019 17:38:08 +0000 (20:38 +0300)]
net: phy: Guard against the presence of a netdev

A prerequisite for PHYLIB to work in the absence of a struct net_device
is to not access pointers to it.

Changes are needed in the following areas:

 - Printing: In some places netdev_err was replaced with phydev_err.

 - Incrementing reference count to the parent MDIO bus driver: If there
   is no net device, then the reference count should definitely be
   incremented since there is no chance that it was an Ethernet driver
   who registered the MDIO bus.

 - Sysfs links are not created in case there is no attached_dev.

 - No netif_carrier_off is done if there is no attached_dev.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: Add phy_sysfs_create_links helper function
Vladimir Oltean [Tue, 28 May 2019 17:38:07 +0000 (20:38 +0300)]
net: phy: Add phy_sysfs_create_links helper function

This is a cosmetic patch that wraps the operation of creating sysfs
links between the netdev->phydev and the phydev->attached_dev.

This is needed to keep the indentation level in check in a follow-up
patch where this function will be guarded against the existence of a
phydev->attached_dev.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: Introduce act_ctinfo action
Kevin 'ldir' Darbyshire-Bryant [Tue, 28 May 2019 17:03:50 +0000 (17:03 +0000)]
net: sched: Introduce act_ctinfo action

ctinfo is a new tc filter action module.  It is designed to restore
information contained in firewall conntrack marks to other packet fields
and is typically used on packet ingress paths.  At present it has two
independent sub-functions or operating modes, DSCP restoration mode &
skb mark restoration mode.

The DSCP restore mode:

This mode copies DSCP values that have been placed in the firewall
conntrack mark back into the IPv4/v6 diffserv fields of relevant
packets.

The DSCP restoration is intended for use and has been found useful for
restoring ingress classifications based on egress classifications across
links that bleach or otherwise change DSCP, typically home ISP Internet
links.  Restoring DSCP on ingress on the WAN link allows qdiscs such as
but by no means limited to CAKE to shape inbound packets according to
policies that are easier to set & mark on egress.

Ingress classification is traditionally a challenging task since
iptables rules haven't yet run and tc filter/eBPF programs are pre-NAT
lookups, hence are unable to see internal IPv4 addresses as used on the
typical home masquerading gateway.  Thus marking the connection in some
manner on egress for later restoration of classification on ingress is
easier to implement.

Parameters related to DSCP restore mode:

dscpmask - a 32 bit mask of 6 contiguous bits and indicate bits of the
conntrack mark field contain the DSCP value to be restored.

statemask - a 32 bit mask of (usually) 1 bit length, outside the area
specified by dscpmask.  This represents a conditional operation flag
whereby the DSCP is only restored if the flag is set.  This is useful to
implement a 'one shot' iptables based classification where the
'complicated' iptables rules are only run once to classify the
connection on initial (egress) packet and subsequent packets are all
marked/restored with the same DSCP.  A mask of zero disables the
conditional behaviour ie. the conntrack mark DSCP bits are always
restored to the ip diffserv field (assuming the conntrack entry is found
& the skb is an ipv4/ipv6 type)

e.g. dscpmask 0xfc000000 statemask 0x01000000

|----0xFC----conntrack mark----000000---|
| Bits 31-26 | bit 25 | bit24 |~~~ Bit 0|
| DSCP       | unused | flag  |unused   |
|-----------------------0x01---000000---|
      |                   |
      |                   |
      ---|             Conditional flag
         v             only restore if set
|-ip diffserv-|
| 6 bits      |
|-------------|

The skb mark restore mode (cpmark):

This mode copies the firewall conntrack mark to the skb's mark field.
It is completely the functional equivalent of the existing act_connmark
action with the additional feature of being able to apply a mask to the
restored value.

Parameters related to skb mark restore mode:

mask - a 32 bit mask applied to the firewall conntrack mark to mask out
bits unwanted for restoration.  This can be useful where the conntrack
mark is being used for different purposes by different applications.  If
not specified and by default the whole mark field is copied (i.e.
default mask of 0xffffffff)

e.g. mask 0x00ffffff to mask out the top 8 bits being used by the
aforementioned DSCP restore mode.

|----0x00----conntrack mark----ffffff---|
| Bits 31-24 |                          |
| DSCP & flag|      some value here     |
|---------------------------------------|
|
|
v
|------------skb mark-------------------|
|            |                          |
|  zeroed    |                          |
|---------------------------------------|

Overall parameters:

zone - conntrack zone

control - action related control (reclassify | pipe | drop | continue |
ok | goto chain <CHAIN_INDEX>)

Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: remove 1000/Half from supported modes
Heiner Kallweit [Tue, 28 May 2019 16:43:46 +0000 (18:43 +0200)]
r8169: remove 1000/Half from supported modes

MAC on the GBit versions supports 1000/Full only, however the PHY
partially claims to support 1000/Half. So let's explicitly remove
this mode.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: mscc: ocelot: Implement port policers via tc command
Joergen Andreasen [Tue, 28 May 2019 12:49:17 +0000 (14:49 +0200)]
net: mscc: ocelot: Implement port policers via tc command

Hardware offload of matchall classifier and police action are now
supported via the tc command.
Supported police parameters are: rate and burst.

Example:

Add:
tc qdisc add dev eth3 handle ffff: ingress
tc filter add dev eth3 parent ffff: prio 1 handle 2 \
matchall skip_sw \
action police rate 100Mbit burst 10000

Show:
tc -s -d qdisc show dev eth3
tc -s -d filter show dev eth3 ingress

Delete:
tc filter del dev eth3 parent ffff: prio 1
tc qdisc del dev eth3 handle ffff: ingress

Signed-off-by: Joergen Andreasen <joergen.andreasen@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Wed, 29 May 2019 21:51:23 +0000 (14:51 -0700)]
Merge branch '100GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2019-05-29

This series contains updates to ice driver only.

Bruce cleans up white space issues and fixes complaints about using
bitop assignments using operands of different sizes.

Anirudh cleans up code that is no longer needed now that the firmware
supports the functionality.  Adds support for ethtool selftestto the ice
driver, which includes testing link, interrupts, eeprom, registers and
packet loopback.  Also, cleaned up duplicate code.

Tony implements support for toggling receive VLAN filter via ethtool.

Brett bumps up the minimum receive descriptor count per queue to resolve
dropped packets.  Refactored the interrupt tracking for the ice driver
to resolve issues seen with the co-existence of features and SR-IOV, so
instead of having a hardware IRQ tracker and a software IRQ tracker,
simply use one tracker.  Also adds a helper function to trigger software
interrupts.

Mitch changes how Malicious Driver Detection (MDD) events are handled,
to ensure all VFs checked for MDD events and just log the event instead
of disabling the VF, which was preventing proper release of resources if
the VF is rebooted or the VF driver reloaded.

Dave cleans up a redundant call to register LLDP MIB change events.

Dan adds support to retrieve the current setting of firmware logging
from the hardware to properly initialize the hardware structure.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: stmmac: Fix build error without CONFIG_INET
YueHaibing [Tue, 28 May 2019 09:10:40 +0000 (17:10 +0800)]
net: stmmac: Fix build error without CONFIG_INET

Fix gcc build error while CONFIG_INET is not set

drivers/net/ethernet/stmicro/stmmac/stmmac_selftests.o: In function `__stmmac_test_loopback':
stmmac_selftests.c:(.text+0x8ec): undefined reference to `ip_send_check'
stmmac_selftests.c:(.text+0xacc): undefined reference to `udp4_hwcsum'

Add CONFIG_INET dependency to fix this.

Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 091810dbded9 ("net: stmmac: Introduce selftests support")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agorhashtable: Add rht_ptr_rcu and improve rht_ptr
Herbert Xu [Tue, 28 May 2019 07:02:31 +0000 (15:02 +0800)]
rhashtable: Add rht_ptr_rcu and improve rht_ptr

This patch moves common code between rht_ptr and rht_ptr_exclusive
into __rht_ptr.  It also adds a new helper rht_ptr_rcu exclusively
for the RCU case.  This way rht_ptr becomes a lock-only construct
so we can use the lighter rcu_dereference_protected primitive.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: stmmac: use dev_info() before netdev is registered
Jisheng Zhang [Tue, 28 May 2019 07:02:07 +0000 (07:02 +0000)]
net: stmmac: use dev_info() before netdev is registered

Before the netdev is registered, calling netdev_info() will emit
something as "(unnamed net device) (uninitialized)", looks confusing.

Before this patch:
[    3.155028] stmmaceth f7b60000.ethernet (unnamed net_device) (uninitialized): device MAC address 52:1a:55:18:9e:9d

After this patch:
[    3.155028] stmmaceth f7b60000.ethernet: device MAC address 52:1a:55:18:9e:9d

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqed: fix spelling mistake "inculde" -> "include"
Colin Ian King [Tue, 28 May 2019 06:52:17 +0000 (07:52 +0100)]
qed: fix spelling mistake "inculde" -> "include"

There is a spelling mistake in a DP_INFO message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoice: Add a helper to trigger software interrupt
Brett Creeley [Tue, 16 Apr 2019 17:30:51 +0000 (10:30 -0700)]
ice: Add a helper to trigger software interrupt

Add a new function ice_trigger_sw_intr to trigger interrupts.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Configure RSS LUT key only if RSS is enabled
Md Fahad Iqbal Polash [Tue, 16 Apr 2019 17:30:50 +0000 (10:30 -0700)]
ice: Configure RSS LUT key only if RSS is enabled

Call ice_vsi_cfg_rss_lut_key only if RSS is enabled.

Signed-off-by: Md Fahad Iqbal Polash <md.fahad.iqbal.polash@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Add ice_get_fw_log_cfg to init FW logging
Dan Nowlin [Tue, 16 Apr 2019 17:30:49 +0000 (10:30 -0700)]
ice: Add ice_get_fw_log_cfg to init FW logging

In order to initialize the current status of the FW logging,
this patch adds ice_get_fw_log_cfg. The function retrieves
the current setting of the FW logging from HW and updates the
ice_hw structure accordingly.

Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Minor cleanup in ice_switch.h
Anirudh Venkataramanan [Tue, 16 Apr 2019 17:30:48 +0000 (10:30 -0700)]
ice: Minor cleanup in ice_switch.h

Remove duplicate define for ICE_INVAL_Q_HANDLE. Move defines to the
top of the file.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Remove redundant and premature event config
Dave Ertman [Tue, 16 Apr 2019 17:30:47 +0000 (10:30 -0700)]
ice: Remove redundant and premature event config

In the path for re-enabling FW LLDP engine, there is
a call to register for LLDP MIB change events.  This
call is redundant, in that the call to ice_pf_dcb_cfg
will already register the driver for these events.  Also,
the call as it stands now is too early in the flow before
before DCB is configured.

Remove the redundant call.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Change message level
Mitch Williams [Tue, 16 Apr 2019 17:30:46 +0000 (10:30 -0700)]
ice: Change message level

Change the message level of the MTU change log message from debug to
info.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Check all VFs for MDD activity, don't disable
Mitch Williams [Tue, 16 Apr 2019 17:30:45 +0000 (10:30 -0700)]
ice: Check all VFs for MDD activity, don't disable

Don't use the mdd_detected variable as an exit condition for this loop;
the first VF to NOT have an MDD event will cause the loop to terminate.

Instead just look at all of the VFs, but don't disable them. This
prevents proper release of resources if the VFs are rebooted or the VF
driver reloaded. Instead, just log a message and call out repeat
offenders.

To make it clear what we are doing, use a differently-named variable in
the loop.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Refactor interrupt tracking
Brett Creeley [Tue, 16 Apr 2019 17:30:44 +0000 (10:30 -0700)]
ice: Refactor interrupt tracking

Currently we have two MSI-x (IRQ) trackers, one for OS requested MSI-x
entries (sw_irq_tracker) and one for hardware MSI-x vectors
(hw_irq_tracker). Generally the sw_irq_tracker has less entries than the
hw_irq_tracker because the hw_irq_tracker has entries equal to the max
allowed MSI-x per PF and the sw_irq_tracker is mainly the minimum (non
SR-IOV portion of the vectors, kernel granted IRQs). All of the non
SR-IOV portions of the driver (i.e. LAN queues, RDMA queues, OICR, etc.)
take at least one of each type of tracker resource. SR-IOV only grabs
entries from the hw_irq_tracker. There are a few issues with this approach
that can be seen when doing any kind of device reconfiguration (i.e.
ethtool -L, SR-IOV, etc.). One of them being, any time the driver creates
an ice_q_vector and associates it to a LAN queue pair it will grab and
use one entry from the hw_irq_tracker and one from the sw_irq_tracker.
If the indices on these does not match it will cause a Tx timeout, which
will cause a reset and then the indices will match up again and traffic
will resume. The mismatched indices come from the trackers not being the
same size and/or the search_hint in the two trackers not being equal.
Another reason for the refactor is the co-existence of features with
SR-IOV. If SR-IOV is enabled and the interrupts are taken from the end
of the sw_irq_tracker then other features can no longer use this space
because the hardware has now given the remaining interrupts to SR-IOV.

This patch reworks how we track MSI-x vectors by removing the
hw_irq_tracker completely and instead MSI-x resources needed for SR-IOV
are determined all at once instead of per VF. This can be done because
when creating VFs we know how many are wanted and how many MSI-x vectors
each VF needs. This also allows us to start using MSI-x resources from
the end of the PF's allowed MSI-x vectors so we are less likely to use
entries needed for other features (i.e. RDMA, L2 Offload, etc).

This patch also reworks the ice_res_tracker structure by removing the
search_hint and adding a new member - "end". Instead of having a
search_hint we will always search from 0. The new member, "end", will be
used to manipulate the end of the ice_res_tracker (specifically
sw_irq_tracker) during runtime based on MSI-x vectors needed by SR-IOV.
In the normal case, the end of ice_res_tracker will be equal to the
ice_res_tracker's num_entries.

The sriov_base_vector member was added to the PF structure. It is used
to represent the starting MSI-x index of all the needed MSI-x vectors
for all SR-IOV VFs. Depending on how many MSI-x are needed, SR-IOV may
have to take resources from the sw_irq_tracker. This is done by setting
the sw_irq_tracker->end equal to the pf->sriov_base_vector. When all
SR-IOV VFs are removed then the sw_irq_tracker->end is reset back to
sw_irq_tracker->num_entries. The sriov_base_vector, along with the VF's
number of MSI-x (pf->num_vf_msix), vf_id, and the base MSI-x index on
the PF (pf->hw.func_caps.common_cap.msix_vector_first_id), is used to
calculate the first HW absolute MSI-x index for each VF, which is used
to write to the VPINT_ALLOC[_PCI] and GLINT_VECT2FUNC registers to
program the VFs MSI-x PCI configuration bits. Also, the sriov_base_vector
is used along with VF's num_vf_msix, vf_id, and q_vector->v_idx to
determine the MSI-x register index (used for writing to GLINT_DYN_CTL)
within the PF's space.

Interrupt changes removed any references to hw_base_vector, hw_oicr_idx,
and hw_irq_tracker. Only sw_base_vector, sw_oicr_idx, and sw_irq_tracker
variables remain. Change all of these by removing the "sw_" prefix to
help avoid confusion with these variables and their use.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Add handler for ethtool selftest
Anirudh Venkataramanan [Tue, 16 Apr 2019 17:30:43 +0000 (10:30 -0700)]
ice: Add handler for ethtool selftest

This patch adds a handler for ethtool selftest. Selftest includes
testing link, interrupts, eeprom, registers and packet loopback.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Don't call ice_cfg_itr() for SR-IOV
Brett Creeley [Tue, 16 Apr 2019 17:30:42 +0000 (10:30 -0700)]
ice: Don't call ice_cfg_itr() for SR-IOV

ice_cfg_itr() sets the ITR granularity and default ITR values for the
PF's interrupt vectors. For VF's this will be done in the AVF driver
flow. Fix this by not calling ice_cfg_itr() for SR-IOV.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Set minimum default Rx descriptor count to 512
Brett Creeley [Tue, 16 Apr 2019 17:30:41 +0000 (10:30 -0700)]
ice: Set minimum default Rx descriptor count to 512

Currently we set the default number of Rx descriptors per
queue to the system's page size divided by the number of bytes per
descriptor. For 4K page size systems this is resulting in 128 Rx
descriptors per queue. This is causing more dropped packets than desired
in the default configuration. Fix this by setting the minimum default
Rx descriptor count per queue to 512.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Resolve static analysis warning
Bruce Allan [Tue, 16 Apr 2019 17:30:40 +0000 (10:30 -0700)]
ice: Resolve static analysis warning

Some static analysis tools can complain when doing a bitop assignment using
operands of different sizes. Fix that.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Implement toggling ethtool rx-vlan-filter
Tony Nguyen [Tue, 16 Apr 2019 17:30:39 +0000 (10:30 -0700)]
ice: Implement toggling ethtool rx-vlan-filter

Implement the toggling of rx-vlan-filter; enable|disable VLAN
pruning based on on|off, respectively.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Remove direct write for GLLAN_RCTL_0
Anirudh Venkataramanan [Tue, 16 Apr 2019 17:30:38 +0000 (10:30 -0700)]
ice: Remove direct write for GLLAN_RCTL_0

Clear PXE mode AQ call (opcode 0x0110) is now supported in FW. So
remove the direct register write to GLLAN_RCTL_0.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Fix LINE_SPACING style issue
Bruce Allan [Tue, 16 Apr 2019 17:24:39 +0000 (10:24 -0700)]
ice: Fix LINE_SPACING style issue

Fix a checkpatch "LINE_SPACING: Please don't use multiple blank lines"
issue that has snuck in to the code.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoMerge branch 'qed-Fix-inifinite-spinning-of-PTP-poll-thread'
David S. Miller [Wed, 29 May 2019 07:01:30 +0000 (00:01 -0700)]
Merge branch 'qed-Fix-inifinite-spinning-of-PTP-poll-thread'

Sudarsana Reddy Kalluru says:

====================
qed*: Fix inifinite spinning of PTP poll thread.

The patch series addresses an error scenario in the PTP Tx implementation.

Please consider applying it to net-next.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqede: Handle infinite driver spinning for Tx timestamp.
Sudarsana Reddy Kalluru [Tue, 28 May 2019 03:21:33 +0000 (20:21 -0700)]
qede: Handle infinite driver spinning for Tx timestamp.

In PTP Tx implementation, driver kept scheduling a poll thread until the
timestamp is available. In the error scenarios (e.g. app requesting the
timestamp for non-ptp packet), this thread kept waiting for the timestamp
forever.  This patch add changes to report such scenario as an error and
terminate the thread. Added a timeout of 2 seconds i.e., max time to wait
for Tx timestamp. Added a stat value ptp_skip_txts for reporting the number
of packets for which Tx timestamping is skipped.

Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqed: Reduce the severity of ptp debug message.
Sudarsana Reddy Kalluru [Tue, 28 May 2019 03:21:32 +0000 (20:21 -0700)]
qed: Reduce the severity of ptp debug message.

PTP Tx implementation continuously polls for the availability of timestamp.
Reducing the severity of a debug message in this path to avoid filling up
the syslog buffer with this message, especially in the error scenarios.

Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomacvlan: Replace strncpy() by strscpy()
Gustavo A. R. Silva [Mon, 27 May 2019 18:38:55 +0000 (13:38 -0500)]
macvlan: Replace strncpy() by strscpy()

The strncpy() function is being deprecated. Replace it by the safer
strscpy() and fix the following Coverity warning:

"Calling strncpy with a maximum size argument of 16 bytes on destination
array ifrr.ifr_ifrn.ifrn_name of size 16 bytes might leave the destination
string unterminated."

Notice that, unlike strncpy(), strscpy() always null-terminates the
destination string.

Addresses-Coverity-ID: 1445537 ("Buffer not null terminated")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Wed, 29 May 2019 06:24:44 +0000 (23:24 -0700)]
Merge branch '1GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
1GbE Intel Wired LAN Driver Updates 2019-05-28

This series contains updates to e1000e, igb and igc.

Feng adds additional information on a warning message when a read of a
hardware register fails.

Gustavo A. R. Silva fixes up two "fall through" code comments so that
the checkers can actually determine that we did comment that the case
statement is falling through to the next case.

Sasha does some cleanup on the igc driver by removing duplicate
white space and removed a unneeded workaround for igc.  Adds support for
flow control to the igc driver.

Konstantin Khlebnikov reverts a previous fix which was causing a false
positive for a hardware hang.  Provides a fix so that when link is lost
the packets in the transmit queue are flushed and wakes the transmit
queue when the NIC is ready to send packets.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-API-and-initial-implementation-for-nexthop-objects'
David S. Miller [Wed, 29 May 2019 04:37:30 +0000 (21:37 -0700)]
Merge branch 'net-API-and-initial-implementation-for-nexthop-objects'

David Ahern says:

====================
net: API and initial implementation for nexthop objects

This set contains the API and initial implementation for nexthops as
standalone objects.

Patch 1 contains the UAPI and updates to selinux struct.

Patch 2 contains the barebones code for nexthop commands, rbtree
maintenance and notifications.

Patch 3 then adds support for IPv4 gateways along with handling of
netdev events.

Patch 4 adds support for IPv6 gateways.

Patch 5 has the implementation of the encap attributes.

Patch 6 adds support for nexthop groups.

At the end of this set, nexthop objects can be created and deleted and
userspace can monitor nexthop events, but ipv4 and ipv6 routes can not
use them yet. Once the nexthop struct is defined, follow on sets add it
to fib{6}_info and handle it within the respective code before routes
can be inserted using them.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonexthop: Add support for nexthop groups
David Ahern [Fri, 24 May 2019 21:43:08 +0000 (14:43 -0700)]
nexthop: Add support for nexthop groups

Allow the creation of nexthop groups which reference other nexthop
objects to create multipath routes:

                      +--------------+
   +------------+   +--------------+ |
   | nh  nh_grp --->| nh_grp_entry |-+
   +------------+   +---------|----+
     ^                |       |    +------------+
     +----------------+       +--->| nh, weight |
        nh_parent                  +------------+

A group entry points to a nexthop with a weight for that hop within the
group. The nexthop has a list_head, grp_list, for tracking which groups
it is a member of and the group entry has a reference back to the parent.
The grp_list is used when a nexthop is deleted - to efficiently remove
it from groups using it.

If a nexthop group spec is given, no other attributes can be set. Each
nexthop id in a group spec must already exist.

Similar to single nexthops, the specification of a nexthop group can be
updated so that data is managed with rcu locking.

Add path selection function to account for multiple paths and add
ipv{4,6}_good_nh helpers to know that if a neighbor entry exists it is
in a good state.

Update NETDEV event handling to rebalance multipath nexthop groups if
a nexthop is deleted due to a link event (down or unregister).

When a nexthop is removed any groups using it are updated. Groups using a
nexthop a tracked via a grp_list.

Nexthop dumps can be limited to groups only by adding NHA_GROUPS to the
request.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonexthop: Add support for lwt encaps
David Ahern [Fri, 24 May 2019 21:43:07 +0000 (14:43 -0700)]
nexthop: Add support for lwt encaps

Add support for NHA_ENCAP and NHA_ENCAP_TYPE. Leverages the existing code
for lwtunnel within fib_nh_common, so the only change needed is handling
the attributes in the nexthop code.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonexthop: Add support for IPv6 gateways
David Ahern [Fri, 24 May 2019 21:43:06 +0000 (14:43 -0700)]
nexthop: Add support for IPv6 gateways

Handle IPv6 gateway in a nexthop spec. If nh_family is set to AF_INET6,
NHA_GATEWAY is expected to be an IPv6 address. Add ipv6 option to gw in
nh_config to hold the address, add fib6_nh to nh_info to leverage the
ipv6 initialization and cleanup code. Update nh_fill_node to dump the v6
address.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonexthop: Add support for IPv4 nexthops
David Ahern [Fri, 24 May 2019 21:43:05 +0000 (14:43 -0700)]
nexthop: Add support for IPv4 nexthops

Add support for IPv4 nexthops. If nh_family is set to AF_INET, then
NHA_GATEWAY is expected to be an IPv4 address.

Register for netdev events to be notified of admin up/down changes as
well as deletes. A hash table is used to track nexthop per devices to
quickly convert device events to the affected nexthops.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: Initial nexthop code
David Ahern [Fri, 24 May 2019 21:43:04 +0000 (14:43 -0700)]
net: Initial nexthop code

Barebones start point for nexthops. Implementation for RTM commands,
notifications, management of rbtree for holding nexthops by id, and
kernel side data structures for nexthops and nexthop config.

Nexthops are maintained in an rbtree sorted by id. Similar to routes,
nexthops are configured per namespace using netns_nexthop struct added
to struct net.

Nexthop notifications are sent when a nexthop is added or deleted,
but NOT if the delete is due to a device event or network namespace
teardown (which also involves device events). Applications are
expected to use the device down event to flush nexthops and any
routes used by the nexthops.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: nexthop uapi
David Ahern [Fri, 24 May 2019 21:43:03 +0000 (14:43 -0700)]
net: nexthop uapi

New UAPI for nexthops as standalone objects:
- defines netlink ancillary header, struct nhmsg
- RTM commands for nexthop objects, RTM_*NEXTHOP,
- RTNLGRP for nexthop notifications, RTNLGRP_NEXTHOP,
- Attributes for creating nexthops, NHA_*
- Attribute for route specs to specify a nexthop by id, RTA_NH_ID.

The nexthop attributes and semantics follow the route and RTA ones for
device, gateway and lwt encap. Unique to nexthop objects are a blackhole
and a group which contains references to other nexthop objects. With the
exception of blackhole and group, nexthop objects MUST contain a device.
Gateway and encap are optional. Nexthop groups can only reference other
pre-existing nexthops by id. If the NHA_ID attribute is present that id
is used for the nexthop. If not specified, one is auto assigned.

Dump requests can include attributes:
- NHA_GROUPS to return only nexthop groups,
- NHA_MASTER to limit dumps to nexthops with devices enslaved to the
  given master (e.g., VRF)
- NHA_OIF to limit dumps to nexthops using given device

nlmsg_route_perms in selinux code is updated for the new RTM comands.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'hns3-next'
David S. Miller [Wed, 29 May 2019 00:39:01 +0000 (17:39 -0700)]
Merge branch 'hns3-next'

Huazhong Tan says:

====================
code optimizations & bugfixes for HNS3 driver

This patch-set includes code optimizations and bugfixes for the HNS3
ethernet controller driver.

[patch 1/12] fixes a compile warning reported by kbuild test robot.

[patch 2/12] fixes HNS3_RXD_GRO_SIZE_M macro definition error.

[patch 3/12] adds a debugfs command to dump firmware information.

[patch 4/12 - 10/12] adds some code optimizaions and cleanups for
reset and driver unloading.

[patch 11/12 - 12/12] adds two bugfixes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: fix a memory leak issue for hclge_map_unmap_ring_to_vf_vector
Huazhong Tan [Tue, 28 May 2019 09:03:02 +0000 (17:03 +0800)]
net: hns3: fix a memory leak issue for hclge_map_unmap_ring_to_vf_vector

When hclge_bind_ring_with_vector() fails,
hclge_map_unmap_ring_to_vf_vector() returns the error
directly, so nobody will free the memory allocated by
hclge_get_ring_chain_from_mbx().

So hclge_free_vector_ring_chain() should be called no matter
hclge_bind_ring_with_vector() fails or not.

Fixes: 84e095d64ed9 ("net: hns3: Change PF to add ring-vect binding & resetQ to mailbox")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: adjust hns3_uninit_phy()'s location in the hns3_client_uninit()
Huazhong Tan [Tue, 28 May 2019 09:03:01 +0000 (17:03 +0800)]
net: hns3: adjust hns3_uninit_phy()'s location in the hns3_client_uninit()

hns3_uninit_phy() should be called before checking
HNS3_NIC_STATE_INITED flags, otherwise when this checking fails,
there is nobody to call hns3_uninit_phy().

Fixes: c8a8045b2d0a ("net: hns3: Fix NULL deref when unloading driver")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: stop schedule reset service while unloading driver
Huazhong Tan [Tue, 28 May 2019 09:03:00 +0000 (17:03 +0800)]
net: hns3: stop schedule reset service while unloading driver

When unloading driver, the reset task should not be scheduled
anymore. If disable IRQ before cancel ongoing reset task,
the IRQ may be re-enabled by the reset task.

This patch uses HCLGE_STATE_REMOVING/HCLGEVF_STATE_REMOVING
flag to indicate that the driver is unloading, and we should
stop new coming reset service to be scheduled, otherwise,
reset service will access some resource which has been freed
by unloading.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: add handshake with hardware while doing reset
Huazhong Tan [Tue, 28 May 2019 09:02:59 +0000 (17:02 +0800)]
net: hns3: add handshake with hardware while doing reset

When reset happens, the hardware reset should begin after the
driver has finished its preparatory work, otherwise it may cause
some hardware error.

Before Hardware's reset, it will wait for the driver to write
bit HCLGE_NIC_CMQ_ENABLE of register HCLGE_NIC_CSQ_DEPTH_REG
to 1, while the driver finishes its preparatory work will do that.
BTW, since some cases this register will be cleared, so it needs
some sync time before driver's writing.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: modify hclgevf_init_client_instance()
Huazhong Tan [Tue, 28 May 2019 09:02:58 +0000 (17:02 +0800)]
net: hns3: modify hclgevf_init_client_instance()

hclgevf_init_client_instance() is a little bloated and there is
some duplicated code. This patch adds some cleanup for it.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: modify hclge_init_client_instance()
Huazhong Tan [Tue, 28 May 2019 09:02:57 +0000 (17:02 +0800)]
net: hns3: modify hclge_init_client_instance()

hclge_init_client_instance() is a little bloated and there is
some duplicated code. This patch adds some cleanup for it.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: use HCLGEVF_STATE_NIC_REGISTERED to indicate VF NIC client has registered
Huazhong Tan [Tue, 28 May 2019 09:02:56 +0000 (17:02 +0800)]
net: hns3: use HCLGEVF_STATE_NIC_REGISTERED to indicate VF NIC client has registered

When VF NIC client's init_instance() succeeds, it means this client
has been registered successfully, so we use HCLGEVF_STATE_NIC_REGISTERED
to indicate that. And before calling VF NIC client's uninit_instance(),
we clear this state.

So any operation of VF NIC client from HCLGEVF is not allowed if this
state is not set.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: use HCLGE_STATE_ROCE_REGISTERED to indicate PF ROCE client has registered
Huazhong Tan [Tue, 28 May 2019 09:02:55 +0000 (17:02 +0800)]
net: hns3: use HCLGE_STATE_ROCE_REGISTERED to indicate PF ROCE client has registered

When PF ROCE client's init_instance() succeeds, it means this client
has been registered successfully, so we use HCLGE_STATE_ROCE_REGISTERED
to indicate that. And before calling PF ROCE client's uninit_instance(),
we clear this state.

So any operation of the ROCE client from HCLGE is not allowed if this
state is not set.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: use HCLGE_STATE_NIC_REGISTERED to indicate PF NIC client has registered
Huazhong Tan [Tue, 28 May 2019 09:02:54 +0000 (17:02 +0800)]
net: hns3: use HCLGE_STATE_NIC_REGISTERED to indicate PF NIC client has registered

When PF NIC client's init_instance() succeeds, it means this client
has been registered successfully, so we use HCLGE_STATE_NIC_REGISTERED
to indicate that. And before calling PF NIC client's uninit_instance(),
we clear this state.

So any operation of PF NIC client from HCLGE is not allowed if this
state is not set.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: add support for dump firmware statistics by debugfs
Zhongzhu Liu [Tue, 28 May 2019 09:02:53 +0000 (17:02 +0800)]
net: hns3: add support for dump firmware statistics by debugfs

This patch prints firmware statistics information.

debugfs command:
echo dump m7 info > cmd

estuary:/dbg/hns3/0000:7d:00.0$ echo dump m7 info > cmd
[  172.577240] hns3 0000:7d:00.0: 0x00000000  0x00000000  0x00000000
[  172.583471] hns3 0000:7d:00.0: 0x00000000  0x00000000  0x00000000
[  172.589552] hns3 0000:7d:00.0: 0x00000030  0x00000000  0x00000000
[  172.595632] hns3 0000:7d:00.0: 0x00000000  0x00000000  0x00000000
estuary:/dbg/hns3/0000:7d:00.0$

Signed-off-by: Zhongzhu Liu <liuzhongzhu@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: fix for HNS3_RXD_GRO_SIZE_M macro
Yunsheng Lin [Tue, 28 May 2019 09:02:52 +0000 (17:02 +0800)]
net: hns3: fix for HNS3_RXD_GRO_SIZE_M macro

According to hardware user menual, the GRO_SIZE is 14 bits width,
the HNS3_RXD_GRO_SIZE_M is 10 bits width now, which may cause
hardware GRO received packet error problem.

Fixes: a6d53b97a2e7 ("net: hns3: Adds GRO params to SKB for the stack")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: fix compile warning without CONFIG_RFS_ACCEL
Jian Shen [Tue, 28 May 2019 09:02:51 +0000 (17:02 +0800)]
net: hns3: fix compile warning without CONFIG_RFS_ACCEL

The ifdef condition of function hclge_add_fd_entry_by_arfs() is
unnecessary. It may cause compile warning when CONFIG_RFS_ACCEL
is not chosen. This patch fixes it by removing the ifdef condition.

Fixes: d93ed94fbeaf ("net: hns3: add aRFS support for PF")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agohinic: fix a bug in set rx mode
Xue Chaojing [Mon, 27 May 2019 22:10:05 +0000 (22:10 +0000)]
hinic: fix a bug in set rx mode

in set_rx_mode, __dev_mc_sync and netdev_for_each_mc_addr will
repeatedly set the multicast mac address. so we delete this loop.

Signed-off-by: Xue Chaojing <xuechaojing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'inet-frags-followup'
David S. Miller [Wed, 29 May 2019 00:22:15 +0000 (17:22 -0700)]
Merge branch 'inet-frags-followup'

Eric Dumazet says:

====================
inet: frags: followup to 'inet-frags-avoid-possible-races-at-netns-dismantle'

Latest patch series ('inet-frags-avoid-possible-races-at-netns-dismantle')
brought another syzbot report shown in the third patch changelog.

While fixing the issue, I had to call inet_frags_fini() later
in IPv6 and ilowpan.

Also I believe a completion is needed to ensure proper dismantle
at module removal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoinet: frags: fix use-after-free read in inet_frag_destroy_rcu
Eric Dumazet [Mon, 27 May 2019 23:56:49 +0000 (16:56 -0700)]
inet: frags: fix use-after-free read in inet_frag_destroy_rcu

As caught by syzbot [1], the rcu grace period that is respected
before fqdir_rwork_fn() proceeds and frees fqdir is not enough
to prevent inet_frag_destroy_rcu() being run after the freeing.

We need a proper rcu_barrier() synchronization to replace
the one we had in inet_frags_fini()

We also have to fix a potential problem at module removal :
inet_frags_fini() needs to make sure that all queued work queues
(fqdir_rwork_fn) have completed, otherwise we might
call kmem_cache_destroy() too soon and get another use-after-free.

[1]
BUG: KASAN: use-after-free in inet_frag_destroy_rcu+0xd9/0xe0 net/ipv4/inet_fragment.c:201
Read of size 8 at addr ffff88806ed47a18 by task swapper/1/0

CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.2.0-rc1+ #2
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x172/0x1f0 lib/dump_stack.c:113
 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
 __kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
 kasan_report+0x12/0x20 mm/kasan/common.c:614
 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
 inet_frag_destroy_rcu+0xd9/0xe0 net/ipv4/inet_fragment.c:201
 __rcu_reclaim kernel/rcu/rcu.h:222 [inline]
 rcu_do_batch kernel/rcu/tree.c:2092 [inline]
 invoke_rcu_callbacks kernel/rcu/tree.c:2310 [inline]
 rcu_core+0xba5/0x1500 kernel/rcu/tree.c:2291
 __do_softirq+0x25c/0x94c kernel/softirq.c:293
 invoke_softirq kernel/softirq.c:374 [inline]
 irq_exit+0x180/0x1d0 kernel/softirq.c:414
 exiting_irq arch/x86/include/asm/apic.h:536 [inline]
 smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
 </IRQ>
RIP: 0010:native_safe_halt+0xe/0x10 arch/x86/include/asm/irqflags.h:61
Code: ff ff 48 89 df e8 f2 95 8c fa eb 82 e9 07 00 00 00 0f 00 2d e4 45 4b 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d d4 45 4b 00 fb f4 <c3> 90 55 48 89 e5 41 57 41 56 41 55 41 54 53 e8 8e 18 42 fa e8 99
RSP: 0018:ffff8880a98e7d78 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
RAX: 1ffffffff1164e11 RBX: ffff8880a98d4340 RCX: 0000000000000000
RDX: dffffc0000000000 RSI: 0000000000000006 RDI: ffff8880a98d4bbc
RBP: ffff8880a98e7da8 R08: ffff8880a98d4340 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
R13: ffffffff88b27078 R14: 0000000000000001 R15: 0000000000000000
 arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:571
 default_idle_call+0x36/0x90 kernel/sched/idle.c:94
 cpuidle_idle_call kernel/sched/idle.c:154 [inline]
 do_idle+0x377/0x560 kernel/sched/idle.c:263
 cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:354
 start_secondary+0x34e/0x4c0 arch/x86/kernel/smpboot.c:267
 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243

Allocated by task 8877:
 save_stack+0x23/0x90 mm/kasan/common.c:71
 set_track mm/kasan/common.c:79 [inline]
 __kasan_kmalloc mm/kasan/common.c:489 [inline]
 __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:503
 kmem_cache_alloc_trace+0x151/0x750 mm/slab.c:3555
 kmalloc include/linux/slab.h:547 [inline]
 kzalloc include/linux/slab.h:742 [inline]
 fqdir_init include/net/inet_frag.h:115 [inline]
 ipv6_frags_init_net+0x48/0x460 net/ipv6/reassembly.c:513
 ops_init+0xb3/0x410 net/core/net_namespace.c:130
 setup_net+0x2d3/0x740 net/core/net_namespace.c:316
 copy_net_ns+0x1df/0x340 net/core/net_namespace.c:439
 create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
 unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
 ksys_unshare+0x440/0x980 kernel/fork.c:2692
 __do_sys_unshare kernel/fork.c:2760 [inline]
 __se_sys_unshare kernel/fork.c:2758 [inline]
 __x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 17:
 save_stack+0x23/0x90 mm/kasan/common.c:71
 set_track mm/kasan/common.c:79 [inline]
 __kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
 kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
 __cache_free mm/slab.c:3432 [inline]
 kfree+0xcf/0x220 mm/slab.c:3755
 fqdir_rwork_fn+0x33/0x40 net/ipv4/inet_fragment.c:154
 process_one_work+0x989/0x1790 kernel/workqueue.c:2269
 worker_thread+0x98/0xe40 kernel/workqueue.c:2415
 kthread+0x354/0x420 kernel/kthread.c:255
 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88806ed47a00
 which belongs to the cache kmalloc-512 of size 512
The buggy address is located 24 bytes inside of
 512-byte region [ffff88806ed47a00ffff88806ed47c00)
The buggy address belongs to the page:
page:ffffea0001bb51c0 refcount:1 mapcount:0 mapping:ffff8880aa400940 index:0x0
flags: 0x1fffc0000000200(slab)
raw: 01fffc0000000200 ffffea000282a788 ffffea0001bb53c8 ffff8880aa400940
raw: 0000000000000000 ffff88806ed47000 0000000100000006 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff88806ed47900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88806ed47980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88806ed47a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                            ^
 ffff88806ed47a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88806ed47b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoinet: frags: call inet_frags_fini() after unregister_pernet_subsys()
Eric Dumazet [Mon, 27 May 2019 23:56:48 +0000 (16:56 -0700)]
inet: frags: call inet_frags_fini() after unregister_pernet_subsys()

Both IPv6 and 6lowpan are calling inet_frags_fini() too soon.

inet_frags_fini() is dismantling a kmem_cache, that might be needed
later when unregister_pernet_subsys() eventually has to remove
frags queues from hash tables and free them.

This fixes potential use-after-free, and is a prereq for the following patch.

Fixes: d4ad4d22e7ac ("inet: frags: use kmem_cache for inet_frag_queue")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoinet: frags: uninline fqdir_init()
Eric Dumazet [Mon, 27 May 2019 23:56:47 +0000 (16:56 -0700)]
inet: frags: uninline fqdir_init()

fqdir_init() is not fast path and is getting bigger.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests/net: ipv6 flowlabel
Willem de Bruijn [Mon, 27 May 2019 20:47:51 +0000 (16:47 -0400)]
selftests/net: ipv6 flowlabel

Test the IPv6 flowlabel control and datapath interfaces:

Acquire and release the right to use flowlabels with socket option
IPV6_FLOWLABEL_MGR.

Then configure flowlabels on send and read them on recv with cmsg
IPV6_FLOWINFO. Also verify auto-flowlabel if not explicitly set.

This helped identify the issue fixed in commit 95c169251bf73 ("ipv6:
invert flowlabel sharing check in process and user mode")

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoenetc: Enable TC offloading with mqprio
Camelia Groza [Mon, 27 May 2019 15:21:31 +0000 (18:21 +0300)]
enetc: Enable TC offloading with mqprio

Add support to configure multiple prioritized TX traffic
classes with mqprio.

Configure one BD ring per TC for the moment, one netdev
queue per TC.

Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'stmmac-SPDX'
David S. Miller [Wed, 29 May 2019 00:09:15 +0000 (17:09 -0700)]
Merge branch 'stmmac-SPDX'

Neil Armstrong says:

====================
net: stmmac: dwmac-meson: update with SPDX Licence identifier

Update the SPDX Licence identifier for the Amlogic Meson6 and Meson8 dwmac
glue drivers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: stmmac: dwmac-meson8b: update with SPDX Licence identifier
Neil Armstrong [Mon, 27 May 2019 13:46:23 +0000 (15:46 +0200)]
net: stmmac: dwmac-meson8b: update with SPDX Licence identifier

Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: stmmac: dwmac-meson: update with SPDX Licence identifier
Neil Armstrong [Mon, 27 May 2019 13:46:22 +0000 (15:46 +0200)]
net: stmmac: dwmac-meson: update with SPDX Licence identifier

Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoigc: Cleanup the redundant code
Sasha Neftin [Sun, 21 Apr 2019 08:17:23 +0000 (11:17 +0300)]
igc: Cleanup the redundant code

The default flow control settings for the i225 device is both
'rx' and 'tx' pause frames. There is no depend on the NVM value.
This patch comes to fix this and clean up the driver code.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoigc: Add flow control support
Sasha Neftin [Thu, 18 Apr 2019 07:11:08 +0000 (10:11 +0300)]
igc: Add flow control support

This change adds flow control settings. This is required to
enable the legacy flow control support.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoe1000e: start network tx queue only when link is up
Konstantin Khlebnikov [Wed, 17 Apr 2019 08:13:20 +0000 (11:13 +0300)]
e1000e: start network tx queue only when link is up

Driver does not want to keep packets in Tx queue when link is lost.
But present code only reset NIC to flush them, but does not prevent
queuing new packets. Moreover reset sequence itself could generate
new packets via netconsole and NIC falls into endless reset loop.

This patch wakes Tx queue only when NIC is ready to send packets.

This is proper fix for problem addressed by commit 0f9e980bf5ee
("e1000e: fix cyclic resets at link up with active tx").

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Tested-by: Joseph Yasi <joe.yasi@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Tested-by: Oleksandr Natalenko <oleksandr@redhat.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoRevert "e1000e: fix cyclic resets at link up with active tx"
Konstantin Khlebnikov [Wed, 17 Apr 2019 08:13:16 +0000 (11:13 +0300)]
Revert "e1000e: fix cyclic resets at link up with active tx"

This reverts commit 0f9e980bf5ee1a97e2e401c846b2af989eb21c61.

That change cased false-positive warning about hardware hang:

e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang:
   TDH                  <0>
   TDT                  <1>
   next_to_use          <1>
   next_to_clean        <0>
buffer_info[next_to_clean]:
   time_stamp           <fffba7a7>
   next_to_watch        <0>
   jiffies              <fffbb140>
   next_to_watch.status <0>
MAC Status             <40080080>
PHY Status             <7949>
PHY 1000BASE-T Status  <0>
PHY Extended Status    <3000>
PCI Status             <10>
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx

Besides warning everything works fine.
Original issue will be fixed property in following patch.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reported-by: Joseph Yasi <joe.yasi@gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203175
Tested-by: Joseph Yasi <joe.yasi@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Tested-by: Oleksandr Natalenko <oleksandr@redhat.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoigc: Remove the obsolete workaround
Sasha Neftin [Mon, 15 Apr 2019 11:10:35 +0000 (14:10 +0300)]
igc: Remove the obsolete workaround

Enables a resend request after the completion timeout workaround is not
relevant for i225 device. This patch is clean code relevant this
workaround.
Minor cosmetic fixes, replace the 'spaces' with 'tabs'

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoigc: Clean up unused pointers
Sasha Neftin [Thu, 4 Apr 2019 10:26:53 +0000 (13:26 +0300)]
igc: Clean up unused pointers

Few function pointers from phy_operations structure were unused.
This patch cleans those.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoigc: Fix double definitions
Sasha Neftin [Wed, 3 Apr 2019 13:58:04 +0000 (16:58 +0300)]
igc: Fix double definitions

Collision threshold and threshold's shift has been defined twice.
This patch comes to fix that.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoigb: mark expected switch fall-through
Gustavo A. R. Silva [Fri, 29 Mar 2019 23:38:46 +0000 (16:38 -0700)]
igb: mark expected switch fall-through

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

This patch fixes the following warning:

drivers/net/ethernet/intel/igb/e1000_82575.c: In function ‘igb_get_invariants_82575’:
drivers/net/ethernet/intel/igb/e1000_82575.c:636:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
   if (igb_sgmii_uses_mdio_82575(hw)) {
      ^
drivers/net/ethernet/intel/igb/e1000_82575.c:642:2: note: here
  case E1000_CTRL_EXT_LINK_MODE_PCIE_SERDES:
  ^~~~

Warning level 3 was used: -Wimplicit-fallthrough=3

Notice that, in this particular case, the code comment is modified
in accordance with what GCC is expecting to find.

This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.

Signed-off-by: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoigb: mark expected switch fall-through
Gustavo A. R. Silva [Fri, 29 Mar 2019 23:38:45 +0000 (16:38 -0700)]
igb: mark expected switch fall-through

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

This patch fixes the following warning:

drivers/net/ethernet/intel/igb/igb_main.c: In function ‘__igb_notify_dca’:
drivers/net/ethernet/intel/igb/igb_main.c:6694:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
   if (dca_add_requester(dev) == 0) {
      ^
drivers/net/ethernet/intel/igb/igb_main.c:6701:2: note: here
  case DCA_PROVIDER_REMOVE:
  ^~~~

Warning level 3 was used: -Wimplicit-fallthrough=3

Notice that, in this particular case, the code comment is modified
in accordance with what GCC is expecting to find.

This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.

Signed-off-by: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoigb/igc: warn when fatal read failure happens
Feng Tang [Wed, 13 Feb 2019 02:41:54 +0000 (10:41 +0800)]
igb/igc: warn when fatal read failure happens

Failed in read the HW register is very serious for igb/igc driver,
as its hw_addr will be set to NULL and cause the adapter be seen as
"REMOVED".

We saw the error only a few times in the MTBF test for suspend/resume,
but can hardly get any useful info to debug.

Adding WARN() so that we can get the necessary information about
where and how it happens, and use it for root causing and fixing
this "PCIe link lost issue"

This affects igb, igc.

Signed-off-by: Feng Tang <feng.tang@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agofsl/fman: include IPSEC SPI in the Keygen extraction
Madalin Bucur [Mon, 27 May 2019 12:32:12 +0000 (15:32 +0300)]
fsl/fman: include IPSEC SPI in the Keygen extraction

The keygen extracted fields are used as input for the hash that
determines the incoming frames distribution. Adding IPSEC SPI so
different IPSEC flows can be distributed to different CPUs.

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: mvpp2: cls: Check RSS table index validity when creating a context
Maxime Chevallier [Mon, 27 May 2019 11:52:01 +0000 (13:52 +0200)]
net: mvpp2: cls: Check RSS table index validity when creating a context

Make sure we don't use an out-of-bound index for the per-port RSS
context array.

As of today, the global context creation in mvpp22_rss_context_create
will prevent us from reaching this case, but we should still make sure
we are using a sane value anyway.

Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoenetc: fix le32/le16 degrading to integer warnings
Y.b. Lu [Mon, 27 May 2019 03:55:20 +0000 (03:55 +0000)]
enetc: fix le32/le16 degrading to integer warnings

Fix blow sparse warning introduced by a previous patch.
- restricted __le32 degrades to integer
- restricted __le16 degrades to integer

Fixes: d39823121911 ("enetc: add hardware timestamping support")
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: remove support for RTL_GIGA_MAC_VER_01
Heiner Kallweit [Sat, 25 May 2019 19:14:39 +0000 (21:14 +0200)]
r8169: remove support for RTL_GIGA_MAC_VER_01

RTL_GIGA_MAC_VER_01 is RTL8169, the ancestor of the chip family.
It didn't have an internal PHY and I've never seen it in the wild.
What isn't there doesn't need to be maintained, so let's remove
support for it.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: improve RTL8168d PHY initialization
Heiner Kallweit [Sat, 25 May 2019 18:57:42 +0000 (20:57 +0200)]
r8169: improve RTL8168d PHY initialization

Certain parts of the PHY initialization are the same for sub versions
1 and 2 of RTL8168d. So let's factor this out to simplify the code.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'r8169-small-improvements'
David S. Miller [Mon, 27 May 2019 05:19:39 +0000 (22:19 -0700)]
Merge branch 'r8169-small-improvements'

Heiner Kallweit says:

====================
r8169: small improvements

Series with small improvements.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: change type of member mac_version in rtl8169_private
Heiner Kallweit [Sat, 25 May 2019 18:45:04 +0000 (20:45 +0200)]
r8169: change type of member mac_version in rtl8169_private

Use the appropriate enum type for member mac_version. And don't assign
a fixed value to RTL_GIGA_MAC_NONE, there's no benefit in it.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: remove unneeded return statement in rtl_hw_init_8168g
Heiner Kallweit [Sat, 25 May 2019 18:44:01 +0000 (20:44 +0200)]
r8169: remove unneeded return statement in rtl_hw_init_8168g

Remove not needed return statement.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: remove rtl_hw_init_8168ep
Heiner Kallweit [Sat, 25 May 2019 18:43:25 +0000 (20:43 +0200)]
r8169: remove rtl_hw_init_8168ep

rtl_hw_init_8168ep() can be removed, this simplifies the code.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agocxgb4: Make t4_get_tp_e2c_map static
YueHaibing [Sat, 25 May 2019 12:45:10 +0000 (20:45 +0800)]
cxgb4: Make t4_get_tp_e2c_map static

Fix sparse warning:

drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:6216:14:
 warning: symbol 't4_get_tp_e2c_map' was not declared. Should it be static?

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftest: Fixes for icmp_redirect test
David Ahern [Fri, 24 May 2019 23:37:07 +0000 (16:37 -0700)]
selftest: Fixes for icmp_redirect test

I was really surprised that the IPv6 mtu exception followed by redirect
test was passing as nothing about the code suggests it should. The problem
is actually with the logic in the test script.

Fix the test cases as follows:
1. add debug function to dump the initial and redirect gateway addresses
   for ipv6. This is shown only in verbose mode. It helps verify the
   output of 'route get'.

2. fix the check_exception logic for the reset case to make sure that
   for IPv4 neither mtu nor redirect appears in the 'route get' output.
   For IPv6, make sure mtu is not present and the gateway is the initial
   R1 lladdr.

3. fix the reset logic by using a function to delete the routes added by
   initial_route_*. This format works better for the nexthop version of
   the tests.

While improving the test cases, go ahead and ensure that forwarding is
disabled since IPv6 redirect requires it.

Also, runs with kernel debugging enabled sometimes show a failure with
one of the ipv4 tests, so spread the pings over longer time interval.

The end result is that 2 tests now show failures:

TEST: IPv6: mtu exception plus redirect                    [FAIL]

and the VRF version.

This is a bug in the IPv6 logic that will need to be fixed
separately. Redirect followed by MTU works because __ip6_rt_update_pmtu
hits the 'if (!rt6_cache_allowed_for_pmtu(rt6))' path and updates the
mtu on the exception rt6_info.

MTU followed by redirect does not have this logic. rt6_do_redirect
creates a new exception and then rt6_insert_exception removes the old
one which has the MTU exception.

Fixes: ec8105352869 ("selftests: Add redirect tests")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv4: remove redundant assignment to n
Colin Ian King [Fri, 24 May 2019 21:56:58 +0000 (22:56 +0100)]
ipv4: remove redundant assignment to n

The pointer n is being assigned a value however this value is
never read in the code block and the end of the code block
continues to the next loop iteration. Clean up the code by
removing the redundant assignment.

Fixes: 1bff1a0c9bbda ("ipv4: Add function to send route updates")
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>