git.monstr.eu Git - linux-2.6-microblaze.git/log

projects / linux-2.6-microblaze.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Petr Machata [Thu, 17 Sep 2020 06:49:03 +0000 (09:49 +0300)]

mlxsw: spectrum_qdisc: Disable port buffer autoresize with qdiscs

There are two interfaces to configure ETS: qdiscs and DCB. Historically,
DCB ETS configuration was projected to ingress as well, and configured port
buffers. Qdisc was not.

Keep qdiscs behaving this way, and if an offloaded qdisc is configured on a
port, move this port's headroom to a manual mode, thus allowing
configuration of port buffers through dcbnl_setbuffer.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Petr Machata [Thu, 17 Sep 2020 06:49:02 +0000 (09:49 +0300)]

mlxsw: spectrum_dcb: Implement dcbnl_setbuffer / getbuffer

Add dcbnl_setbuffer, which bounces requests if a headroom is in DCB mode.
Implement dcbnl_getbuffer such that it can always be used to determine
port-buffer configuration, regardless of headroom mode.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Petr Machata [Thu, 17 Sep 2020 06:49:01 +0000 (09:49 +0300)]

mlxsw: spectrum_buffers: Support two headroom modes

There are two interfaces to configure ETS: qdiscs and DCB. Historically,
DCB ETS configuration was projected to ingress as well, and configured port
buffers. Qdisc was not.

So as not to break clients that today use DCB ETS and PFC and rely on
getting a reasonable ingress buffer priomap, keep the ETS mirroring in
effect.

Since qdiscs have not done this mirroring historically, it is reasonable
not to introduce it, but rather permit manual ingress configuration through
dcbnl_setbuffer only in the qdisc mode.

This will require a toggle to indicate whether buffer sizes should be
autocomputed or taken from dcbnl_setbuffer, and likewise for priomaps.
Introduce such and initialize it, and guard port buffer size configuration
as appropriate. The toggle is currently left in the DCB position. In a
following patch, qdisc code will switch it.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Yang Yingliang [Thu, 17 Sep 2020 03:32:23 +0000 (11:32 +0800)]

netlink: add spaces around '&' in netlink_recv/sendmsg()

It's hard to read the code without spaces around '&',
for better reading, add spaces around '&'.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

YueHaibing [Thu, 17 Sep 2020 02:19:10 +0000 (10:19 +0800)]

netdev: Remove unused functions

There is no callers in tree, so can remove it.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Ye Bin [Thu, 17 Sep 2020 01:12:33 +0000 (09:12 +0800)]

mptcp: Fix unsigned 'max_seq' compared with zero in mptcp_data_queue_ofo

Fixes coccicheck warnig:
net/mptcp/protocol.c:164:11-18: WARNING: Unsigned expression compared with zero: max_seq > 0

Fixes: ab174ad8ef76 ("mptcp: move ooo skbs into msk out of order queue")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Ye Bin <yebin10@huawei.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Thu, 17 Sep 2020 23:35:47 +0000 (16:35 -0700)]

Merge branch 'net-marvell-prestera-Add-Switchdev-driver-for-Prestera-family-ASIC-device-98DX3255-AC3x'

Vadym Kochan says:

====================
net: marvell: prestera: Add Switchdev driver for Prestera family ASIC device 98DX3255 (AC3x)

Marvell Prestera 98DX3255 integrates up to 24 ports of 1GbE with 8
ports of 10GbE uplinks or 2 ports of 40Gbps stacking for a largely
wireless SMB deployment.

Prestera Switchdev is a firmware based driver that operates via PCI bus.  The
current implementation supports only boards designed for the Marvell Switchdev
solution and requires special firmware.

This driver implementation includes only L1, basic L2 support, and RX/TX.

The core Prestera switching logic is implemented in prestera_main.c, there is
an intermediate hw layer between core logic and firmware. It is
implemented in prestera_hw.c, the purpose of it is to encapsulate hw
related logic, in future there is a plan to support more devices with
different HW related configurations.

The following Switchdev features are supported:

    - VLAN-aware bridge offloading
    - VLAN-unaware bridge offloading
    - FDB offloading (learning, ageing)
    - Switchport configuration

The original firmware image is uploaded to the linux-firmware repository.

PATCH v9:
    1) Replace read_poll_timeout_atomic() by original 'do {} while()' loop
       because it works much better than read_poll_timeout_atomic()
       considering the TX rate. Also it fixes warning reported on v8.

    2) Use ENOENT instead of EEXIST when item is not found in few
       places - prestera_hw.c and prestera_rxtx.c

    Patches updated:
        [1] net: marvell: prestera: Add driver for Prestera family ASIC devices

PATCH v8:
    1) Put license in one line.

    2) Sort includes.

    3) Add missing comma for last enum member

    4) Return original error code from last called func
       in places where instead other error code was used.

    5) Add comma for last member in initialized struct in prestera_hw.c

    6) Do not initialize 'int err = 0' where it is not needed.

    7) Simplify device-tree "marvell,prestera" node parsing by removing not
       needed checking on 'np == NULL'.

    8) Use u32p_replace_bits() instead of open-coded ((word & ~mask) | val)

    9) Use dev_warn_ratelimited() instead of pr_warn_ratelimited to indicate the device
        instance in prestera_rxtx.c

    10) Simplify circular buffer list creation in prestera_sdma_{rx,tx}_init() by using
        do { } while (prev != tail) construction.

    11) Use MSEC_PER_SEC instead of hard-coded 1000.

    12) Use traditional error handling pattern:

       err = F();
       if (err)
           return err;

    13) Use ether_addr_copy() instead of memcpy() for mac FDB copying in prestera_hw.c

    14) Drop swdev->ageing_time member which is not used.

    15) Fix ageing macro to be in ms instead of seconds.

    Patches updated:
        [1] net: marvell: prestera: Add driver for Prestera family ASIC devices
[2] net: marvell: prestera: Add PCI interface support
        [3] net: marvell: prestera: Add basic devlink support
[4] net: marvell: prestera: Add ethtool interface support
[5] net: marvell: prestera: Add Switchdev driver implementation

PATCH v7:
    1) Use ether_addr_copy() in prestera_main.c:prestera_port_set_mac_address()
       instead of memcpy().

    2) Removed not needed device's DMA address range check on
       dma_pool_alloc() in prestera_rxtx.c:prestera_sdma_buf_init(),
       this should be handled by dma_xxx() API considerig device's DMA mask.

    3) Removed not needed device's DMA address range check on
       dma_map_single() in prestera_rxtx.c:prestera_sdma_rx_skb_alloc(),
       this should be handled by dma_xxx() API considerig device's DMA mask.

    4) Add comment about port mac address limitation in the code where
       it is used and checked - prestera_main.c:

           - prestera_is_valid_mac_addr()
           - prestera_port_create()

    5) Add missing destroy_workqueue(swdev_wq) in prestera_switchdev.c:prestera_switchdev_init()
       on error path handling.

    Patches updated:
        [1] net: marvell: prestera: Add driver for Prestera family ASIC devices
        [5] net: marvell: prestera: Add Switchdev driver implementation

PATCH v6:
    1) Use rwlock to protect port list on create/delete stages. The list
       is mostly readable by fw event handler or packets receiver, but
       updated only on create/delete port which are performed on switch init/fini
       stages.

    2) Remove not needed variable initialization in prestera_dsa.c:prestera_dsa_parse()

    3) Get rid of bounce buffer used by tx handler in prestera_rxtx.c,
       the bounce buffer should be handled by dma_xxx API via swiotlb.

    4) Fix PRESTERA_SDMA_RX_DESC_PKT_LEN macro by using correct GENMASK(13, 0) in prestera_rxtx.c

    Patches updated:
        [1] net: marvell: prestera: Add driver for Prestera family ASIC devices

PATCH v5:
    0) add Co-developed tags for people who was involved in development.

    1) Make SPDX license as separate comment

    2) Change 'u8 *' -> 'void *', It allows to avoid not-needed u8* casting.

    3) Remove "," in terminated enum's.

    4) Use GENMASK(end, start) where it is applicable in.

    5) Remove not-needed 'u8 *' casting.

    6) Apply common error-check pattern

    7) Use ether_addr_copy instead of memcpy

    8) Use define for maximum MAC address range (255)

    9) Simplify prestera_port_state_set() in prestera_main.c by
       using separate if-blocks for state setting:

        if (is_up) {
        ...
        } else {
        ...
        }

      which makes logic more understandable.

    10) Simplify sdma tx wait logic when checking/updating tx_ring->burst.

    11) Remove not-needed packed & aligned attributes

    12) Use USEC_PER_MSEC as multiplier when converting ms -> usec on calling
        readl_poll_timeout.

    13) Simplified some error path handling by simple return error code in.

    14) Remove not-needed err assignment in.

    15) Use dev_err() in prestera_devlink_register(...).

    Patches updated:
        [1] net: marvell: prestera: Add driver for Prestera family ASIC devices
[2] net: marvell: prestera: Add PCI interface support
        [3] net: marvell: prestera: Add basic devlink support
[4] net: marvell: prestera: Add ethtool interface support
[5] net: marvell: prestera: Add Switchdev driver implementation

PATCH v4:
    1) Use prestera_ prefix in netdev_ops variable.

    2) Kconfig: use 'default PRESTERA' build type for CONFIG_PRESTERA_PCI to be
       synced by default with prestera core module.

    3) Use memcpy_xxio helpers in prestera_pci.c for IO buffer copying.

    4) Generate fw image path via snprintf() instead of macroses.

    5) Use pcim_ helpers in prestera_pci.c which simplified the
       probe/remove logic.

    6) Removed not needed initializations of variables which are used in
       readl_poll_xxx() helpers.

    7) Fixed few grammar mistakes in patch[2] description.

    8) Export only prestera_ethtool_ops struct instead of each
       ethtool handler.

    9) Add check for prestera_dev_check() in switchdev event handling to
       make sure there is no wrong topology.

    Patches updated:
        [1] net: marvell: prestera: Add driver for Prestera family ASIC devices
[2] net: marvell: prestera: Add PCI interface support
[4] net: marvell: prestera: Add ethtool interface support
[5] net: marvell: prestera: Add Switchdev driver implementation

PATCH v3:
    1) Simplify __be32 type casting in prestera_dsa.c

    2) Added per-patch changelog under "---" line.

PATCH v2:
    1) Use devlink_port_type_clear()

    2) Add _MS prefix to timeout defines.

    3) Remove not-needed packed attribute from the firmware ipc structs,
       also the firmware image needs to be uploaded too (will do it soon).

    4) Introduce prestera_hw_switch_fini(), to be mirrored with init and
       do simple validation if the event handlers are unregistered.

    5) Use kfree_rcu() for event handler unregistering.

    6) Get rid of rcu-list usage when dealing with ports, not needed for
       now.

    7) Little spelling corrections in the error/info messages.

    8) Make pci probe & remove logic mirrored.

    9) Get rid of ETH_FCS_LEN in headroom setting, not needed.

PATCH:
    1) Fixed W=1 warnings

    2) Renamed PCI driver name to be more generic "Prestera DX" because
       there will be more devices supported.

    3) Changed firmware image dir path: marvell/ -> mrvl/prestera/
       to be aligned with location in linux-firmware.git (if such
       will be accepted).

RFC v3:
    1) Fix prestera prefix in prestera_rxtx.c

    2) Protect concurrent access from multiple ports on multiple CPU system
       on tx path by spinlock in prestera_rxtx.c

    3) Try to get base mac address from device-tree, otherwise use a random generated one.

    4) Move ethtool interface support into separate prestera_ethtool.c file.

    5) Add basic devlink support and get rid of physical port naming ops.

    6) Add STP support in Switchdev driver.

    7) Removed MODULE_AUTHOR

    8) Renamed prestera.c -> prestera_main.c, and kernel module to
       prestera.ko

RFC v2:
    1) Use "pestera_" prefix in struct's and functions instead of mvsw_pr_

    2) Original series split into additional patches for Switchdev ethtool support.

    3) Use major and minor firmware version numbers in the firmware image filename.

    4) Removed not needed prints.

    5) Use iopoll API for waiting on register's value in prestera_pci.c

    6) Use standart approach for describing PCI ID matching section instead of using
       custom wrappers in prestera_pci.c

    7) Add RX/TX support in prestera_rxtx.c.

    8) Rewritten prestera_switchdev.c with following changes:
       - handle netdev events from prestera.c

       - use struct prestera_bridge for bridge objects, and get rid of
         struct prestera_bridge_device which may confuse.

       - use refcount_t

    9) Get rid of macro usage for sending fw requests in prestera_hw.c

    10) Add base_mac setting as module parameter. base_mac is required for
        generation default port's mac.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Vadym Kochan [Wed, 16 Sep 2020 16:31:02 +0000 (19:31 +0300)]

dt-bindings: marvell,prestera: Add description for device-tree bindings

Add brief description how to configure base mac address binding in
device-tree.

Describe requirement for the PCI port which is connected to the ASIC, to
allow access to the firmware related registers.

Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Vadym Kochan [Wed, 16 Sep 2020 16:31:01 +0000 (19:31 +0300)]

net: marvell: prestera: Add Switchdev driver implementation

The following features are supported:

    - VLAN-aware bridge offloading
    - VLAN-unaware bridge offloading
    - FDB offloading (learning, ageing)
    - Switchport configuration

Currently there are some limitations like:

    - Only 1 VLAN-aware bridge instance supported
    - FDB ageing timeout parameter is set globally per device

Co-developed-by: Serhiy Boiko <serhiy.boiko@plvision.eu>
Signed-off-by: Serhiy Boiko <serhiy.boiko@plvision.eu>
Co-developed-by: Serhiy Pshyk <serhiy.pshyk@plvision.eu>
Signed-off-by: Serhiy Pshyk <serhiy.pshyk@plvision.eu>
Co-developed-by: Taras Chornyi <taras.chornyi@plvision.eu>
Signed-off-by: Taras Chornyi <taras.chornyi@plvision.eu>
Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Vadym Kochan [Wed, 16 Sep 2020 16:31:00 +0000 (19:31 +0300)]

net: marvell: prestera: Add ethtool interface support

The ethtool API provides support for the configuration of the following
features: speed and duplex, auto-negotiation, MDI-x, forward error
correction, port media type. The API also provides information about the
port status, hardware and software statistic. The following limitation
exists:

    - port media type should be configured before speed setting
    - ethtool -m option is not supported
    - ethtool -p option is not supported
    - ethtool -r option is supported for RJ45 port only
    - the following combination of parameters is not supported:

          ethtool -s sw1pX port XX autoneg on

    - forward error correction feature is supported only on SFP ports, 10G
      speed

    - auto-negotiation and MDI-x features are not supported on
      Copper-to-Fiber SFP module

Co-developed-by: Andrii Savka <andrii.savka@plvision.eu>
Signed-off-by: Andrii Savka <andrii.savka@plvision.eu>
Co-developed-by: Serhiy Boiko <serhiy.boiko@plvision.eu>
Signed-off-by: Serhiy Boiko <serhiy.boiko@plvision.eu>
Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Vadym Kochan [Wed, 16 Sep 2020 16:30:59 +0000 (19:30 +0300)]

net: marvell: prestera: Add basic devlink support

Add very basic support for devlink interface:

    - driver name
    - fw version
    - devlink ports

Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Vadym Kochan [Wed, 16 Sep 2020 16:30:58 +0000 (19:30 +0300)]

net: marvell: prestera: Add PCI interface support

Add PCI interface driver for Prestera Switch ASICs family devices, which
provides:

    - Firmware loading mechanism
    - Requests & events handling to/from the firmware
    - Access to the firmware on the bus level

The firmware has to be loaded each time the device is reset. The driver
is loading it from:

    /lib/firmware/mrvl/prestera/mvsw_prestera_fw-v{MAJOR}.{MINOR}.img

The full firmware image version is located within the internal header
and consists of 3 numbers - MAJOR.MINOR.PATCH. Additionally, driver has
hard-coded minimum supported firmware version which it can work with:

    MAJOR - reflects the support on ABI level between driver and loaded
            firmware, this number should be the same for driver and loaded
            firmware.

    MINOR - this is the minimum supported version between driver and the
            firmware.

    PATCH - indicates only fixes, firmware ABI is not changed.

Firmware image file name contains only MAJOR and MINOR numbers to make
driver be compatible with any PATCH version.

Co-developed-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Vadym Kochan [Wed, 16 Sep 2020 16:30:57 +0000 (19:30 +0300)]

net: marvell: prestera: Add driver for Prestera family ASIC devices

Marvell Prestera 98DX326x integrates up to 24 ports of 1GbE with 8
ports of 10GbE uplinks or 2 ports of 40Gbps stacking for a largely
wireless SMB deployment.

The current implementation supports only boards designed for the Marvell
Switchdev solution and requires special firmware.

The core Prestera switching logic is implemented in prestera_main.c,
there is an intermediate hw layer between core logic and firmware. It is
implemented in prestera_hw.c, the purpose of it is to encapsulate hw
related logic, in future there is a plan to support more devices with
different HW related configurations.

This patch contains only basic switch initialization and RX/TX support
over SDMA mechanism.

Currently supported devices have DMA access range <= 32bit and require
ZONE_DMA to be enabled, for such cases SDMA driver checks if the skb
allocated in proper range supported by the Prestera device.

Also meanwhile there is no TX interrupt support in current firmware
version so recycling work is scheduled on each xmit.

Port's mac address is generated from the switch base mac which may be
provided via device-tree (static one or as nvme cell), or randomly
generated. This is required by the firmware.

Co-developed-by: Andrii Savka <andrii.savka@plvision.eu>
Signed-off-by: Andrii Savka <andrii.savka@plvision.eu>
Co-developed-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Co-developed-by: Serhiy Boiko <serhiy.boiko@plvision.eu>
Signed-off-by: Serhiy Boiko <serhiy.boiko@plvision.eu>
Co-developed-by: Serhiy Pshyk <serhiy.pshyk@plvision.eu>
Signed-off-by: Serhiy Pshyk <serhiy.pshyk@plvision.eu>
Co-developed-by: Taras Chornyi <taras.chornyi@plvision.eu>
Signed-off-by: Taras Chornyi <taras.chornyi@plvision.eu>
Co-developed-by: Volodymyr Mytnyk <volodymyr.mytnyk@plvision.eu>
Signed-off-by: Volodymyr Mytnyk <volodymyr.mytnyk@plvision.eu>
Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

YueHaibing [Wed, 16 Sep 2020 14:17:28 +0000 (22:17 +0800)]

genetlink: Remove unused function genl_err_attr()

It is never used, so can remove it.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

YueHaibing [Wed, 16 Sep 2020 14:16:29 +0000 (22:16 +0800)]

net/sched: Remove unused function qdisc_queue_drop_head()

It is not used since commit a09ceb0e0814 ("sched: remove qdisc->drop")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Matthieu Baerts [Wed, 16 Sep 2020 13:13:51 +0000 (15:13 +0200)]

selftests: mptcp: interpret \n as a new line

In case of errors, this message was printed:

  (...)
  # read: Resource temporarily unavailable
  #  client exit code 0, server 3
  # \nnetns ns1-0-BJlt5D socket stat for 10003:
  (...)

Obviously, the idea was to add a new line before the socket stat and not
print "\nnetns".

Fixes: b08fbf241064 ("selftests: add test-cases for MPTCP MP_JOIN")
Fixes: 048d19d444be ("mptcp: add basic kselftest for mptcp")
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Xie He [Wed, 16 Sep 2020 12:23:08 +0000 (05:23 -0700)]

net/packet: Fix a comment about mac_header

1. Change all "dev->hard_header" to "dev->header_ops"

2. On receiving incoming frames when header_ops == NULL:

The comment only says what is wrong, but doesn't say what is right.
This patch changes the comment to make it clear what is right.

3. On transmitting and receiving outgoing frames when header_ops == NULL:

The comment explains that the LL header will be later added by the driver.

However, I think it's better to simply say that the LL header is invisible
to us. This phrasing is better from a software engineering perspective,
because this makes it clear that what happens in the driver should be
hidden from us and we should not care about what happens internally in the
driver.

4. On resuming the LL header (for RAW frames) when header_ops == NULL:

The comment says we are "unlikely" to restore the LL header.

However, we should say that we are "unable" to restore it.
It's not possible (rather than not likely) to restore it, because:

1) There is no way for us to restore because the LL header internally
processed by the driver should be invisible to us.

2) In function packet_rcv and tpacket_rcv, the code only tries to restore
the LL header when header_ops != NULL.

Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Thu, 17 Sep 2020 23:14:28 +0000 (16:14 -0700)]

Merge branch 'net-hns3-updates-for-next'

Huazhong Tan says:

====================
net: hns3: updates for -next

There are some optimizations related to IO path.

Change since V1:
- fixes a unsuitable handling in hns3_lb_clear_tx_ring() of #6 which
pointed out by Saeed Mahameed.

previous version:
V1: https://patchwork.ozlabs.org/project/netdev/cover/1600085217-26245-1-git-send-email-tanhuazhong@huawei.com/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Yunsheng Lin [Wed, 16 Sep 2020 09:33:50 +0000 (17:33 +0800)]

net: hns3: use napi_consume_skb() when cleaning tx desc

Use napi_consume_skb() to batch consuming skb when cleaning
tx desc in NAPI polling.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Yunsheng Lin [Wed, 16 Sep 2020 09:33:49 +0000 (17:33 +0800)]

net: hns3: use writel() to optimize the barrier operation

writel() can be used to order I/O vs memory by default when
writing portable drivers. Use writel() to replace wmb() +
writel_relaxed(), and writel() is dma_wmb() + writel_relaxed()
for ARM64, so there is an optimization here because dma_wmb()
is a lighter barrier than wmb().

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Yunsheng Lin [Wed, 16 Sep 2020 09:33:48 +0000 (17:33 +0800)]

net: hns3: optimize the rx clean process

Currently HNS3_RING_RX_RING_FBDNUM_REG register is read to determine
how many rx desc can be cleaned. To avoid the register read operation
in the critical data path, use the valid bit in the rx desc to determine
if a specific rx desc can be cleaned.

The hns3 driver clear valid bit in the rx desc before notifying the
rx desc to the hw, and hw will only set the valid bit of the rx desc
after corresponding buffer is filled with packet data and other field
in the rx desc is set accordingly.

Add hns3_rx_ring_move_fw() function to clear the valid bit in the rx
desc before moving rx ring's next_to_clean forward to avoid double
cleaning a rx desc, also add a dma_rmb() barrier in hns3_handle_rx_bd()
to make sure valid bit is set before reading other field in the rx desc.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Yunsheng Lin [Wed, 16 Sep 2020 09:33:47 +0000 (17:33 +0800)]

net: hns3: optimize the tx clean process

Currently HNS3_RING_TX_RING_HEAD_REG register is read to determine
how many tx desc can be cleaned. To avoid the register read operation
in the critical data path, use the valid bit in the tx desc to determine
if a specific tx desc can be cleaned.

The hns3 driver sets valid bit in the tx desc before ringing a doorbell
to the hw, and hw will only clear the valid bit of the tx desc after
corresponding packet is sent out to the wire. And because next_to_use
for tx ring is a changing variable when the driver is filling the tx
desc, so reuse the pull_len for rx ring to record the tx desc that has
notified to the hw, so that hns3_nic_reclaim_desc() can decide how many
tx desc's valid bit need checking when reclaiming tx desc.

And io_err_cnt stat is also removed for it is not used anymore.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Yunsheng Lin [Wed, 16 Sep 2020 09:33:46 +0000 (17:33 +0800)]

net: hns3: batch tx doorbell operation

Use netdev_xmit_more() to defer the tx doorbell operation when
the skb is passed to the driver continuously. By doing this we
can improve the overall xmit performance by avoid some doorbell
operations.

Also, the tx_err_cnt stat is not used, so rename it to tx_more
stat.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Yunsheng Lin [Wed, 16 Sep 2020 09:33:45 +0000 (17:33 +0800)]

net: hns3: batch the page reference count updates

Batch the page reference count updates instead of doing them
one at a time. By doing this we can improve the overall receive
performance by avoid some atomic increment operations when the
rx page is reused.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Liu Shixin [Wed, 16 Sep 2020 02:50:18 +0000 (10:50 +0800)]

cxgb4vf: convert to use DEFINE_SEQ_ATTRIBUTE macro

Use DEFINE_SEQ_ATTRIBUTE macro to simplify the code.

Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Shannon Nelson [Tue, 15 Sep 2020 23:59:03 +0000 (16:59 -0700)]

ionic: dynamic interrupt moderation

Use the dim library to manage dynamic interrupt
moderation in ionic.

v3: rebase
v2: untangled declarations in ionic_dim_work()

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Karsten Graul [Tue, 15 Sep 2020 20:57:09 +0000 (22:57 +0200)]

net/smc: check variable before dereferencing in smc_close.c

smc->clcsock and smc->clcsock->sk are used before the check if they can
be dereferenced. Fix this by checking the variables first.

Fixes: a60a2b1e0af1 ("net/smc: reduce active tcp_listen workers")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Nikolay Aleksandrov [Tue, 15 Sep 2020 14:57:24 +0000 (17:57 +0300)]

net: bridge: mcast: don't ignore return value of __grp_src_toex_excl

When we're handling TO_EXCLUDE report in EXCLUDE filter mode we should
not ignore the return value of __grp_src_toex_excl() as we'll miss
sending notifications about group changes.

Fixes: 5bf1e00b6849 ("net: bridge: mcast: support for IGMPV3/MLDv2 CHANGE_TO_INCLUDE/EXCLUDE report")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Song, Yoong Siang [Wed, 16 Sep 2020 07:40:20 +0000 (15:40 +0800)]

net: stmmac: Add support to Ethtool get/set ring parameters

This patch add support to --show-ring & --set-ring Ethtool functions:
- Adding min, max, power of two check to new ring parameter's value.
- Bring down the network interface before changing the value of ring
parameters.
- Bring up the network interface after changing the value of ring
parameters.

Signed-off-by: Song, Yoong Siang <yoong.siang.song@intel.com>
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Wed, 16 Sep 2020 22:19:30 +0000 (15:19 -0700)]

Merge branch 'mlxsw-Refactor-headroom-management'

Ido Schimmel says:

====================
mlxsw: Refactor headroom management

Petr says:

On Spectrum, port buffers, also called port headroom, is where packets are
stored while they are parsed and the forwarding decision is being made. For
lossless traffic flows, in case shared buffer admission is not allowed,
headroom is also where to put the extra traffic received before the sent
PAUSE takes effect. Another aspect of the port headroom is the so called
internal buffer, which is used for egress mirroring.

Linux supports two DCB interfaces related to the headroom: dcbnl_setbuffer
for configuration, and dcbnl_getbuffer for inspection. In order to make it
possible to implement these interfaces, it is first necessary to clean up
headroom handling, which is currently strewn in several places in the
driver.

The end goal is an architecture whereby it is possible to take a copy of
the current configuration, adjust parameters, and then hand the proposed
configuration over to the system to implement it. When everything works,
the proposed configuration is accepted and saved. First, this centralizes
the reconfiguration handling to one function, which takes care of
coordinating buffer size changes and priority map changes to avoid
introducing drops. Second, the fact that the configuration is all in one
place makes it easy to keep a backup and handle error path rollbacks, which
were previously hard to understand.

Patch #1 introduces struct mlxsw_sp_hdroom, which will keep port headroom
configuration.

Patch #2 unifies handling of delay provision between PFC and PAUSE. From
now on, delay is to be measured in bytes of extra space, and will not
include MTU. PFC handler sets the delay directly from the parameter it gets
through the DCB interface. For PAUSE, MLXSW_SP_PAUSE_DELAY is converted to
have the same meaning.

In patches #3-#5, MTU, lossiness and priorities are gradually moved over to
struct mlxsw_sp_hdroom.

In patches #6-#11, handling of buffer resizing and priority maps is moved
from spectrum.c and spectrum_dcb.c to spectrum_buffers.c. The API is
gradually adapted so that struct mlxsw_sp_hdroom becomes the main interface
through which the various clients express how the headroom should be
configured.

Patch #12 is a small cleanup that the previous transformation made
possible.

In patch #13, the port init code becomes a boring client of the headroom
code, instead of rolling its own thing.

Patches #14 and #15 move handling of internal mirroring buffer to the new
headroom code as well. Previously, this code was in the SPAN module. This
patchset converts the SPAN module to another boring client of the headroom
code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Petr Machata [Wed, 16 Sep 2020 06:35:28 +0000 (09:35 +0300)]

mlxsw: spectrum_buffers: Manage internal buffer in the hdroom code

Traffic mirroring modes that are in-chip implemented on egress need an
internal buffer to work. As the only client, the SPAN module was managing
the buffer so far. However logically it belongs to the buffers module. E.g.
buffer size validation needs to take the size of the internal buffer into
account.

Therefore move the related code from SPAN to spectrum_buffers. Move over
the callbacks that determine the minimum buffer size as a function of
maximum speed and MTU. Add a field describing the internal buffer to struct
mlxsw_sp_hdroom. Extend mlxsw_sp_hdroom_bufs_reset_sizes() to take care of
sizing the internal buffer as well. Change the SPAN module to invoke that
function and mlxsw_sp_hdroom_configure() like all the other hdroom clients.
Drop the now-unnecessary mlxsw_sp_span_port_buffer_disable().

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Petr Machata [Wed, 16 Sep 2020 06:35:27 +0000 (09:35 +0300)]

mlxsw: spectrum_buffers: Introduce shared buffer ops

The size of the internal buffer is currently calculated in the SPAN module.
Logically it belongs to the spectrum_buffers module, where it should be
moved. However, that being a chip-specific operation, it needs dynamic
dispatch. There currently is a chip-specific structure for description of
shared buffer values, struct mlxsw_sp_sb_vals. However placing ops into
this structure would be confusing. Therefore introduce a new per-chip
structure, currently empty, and initialize the ops pointer as appropriate.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Petr Machata [Wed, 16 Sep 2020 06:35:26 +0000 (09:35 +0300)]

mlxsw: spectrum_buffers: Convert mlxsw_sp_port_headroom_init()

Currently mlxsw_sp_port_headroom_init() configures both priomap and buffers
by hand. Additionally, for port buffers, it configures buffer 0 with a size
that it will never again have if PFC configuration is touched.

Rewrite the init code to become a client of the new hdroom code. The only
difference in invocation is that the configuration is forced, so that it is
issued even if the desired configuration happens to match what is contained
in (hitherto not initialized with meaningful values) mlxsw_sp_port->hdroom.

Since now mlxsw_sp_port_headroom_init() initializes all the PG buffers to
meaningful values, mlxsw_sp_hdroom_configure_buffers() can avoid querying
the current configuration, and can fill the whole PBMC itself.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Petr Machata [Wed, 16 Sep 2020 06:35:25 +0000 (09:35 +0300)]

mlxsw: spectrum_buffers: Inline mlxsw_sp_sb_max_headroom_cells()

This function is now only used from the buffers module, and is a trivial
field reference. Just inline it and drop the related artifacts.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Petr Machata [Wed, 16 Sep 2020 06:35:24 +0000 (09:35 +0300)]

mlxsw: spectrum_buffers: Move here the new headroom code

Move all the headroom code to the spectrum_buffers module, where it
belongs.

Rename mlxsw_sp_pg_buf_threshold_get() and mlxsw_sp_pg_buf_pack() to
..._hdroom_... to match the naming convention of the new headroom code.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Petr Machata [Wed, 16 Sep 2020 06:35:23 +0000 (09:35 +0300)]

mlxsw: spectrum: Move here the three-step headroom configuration from DCB

The ETS handler performs the headroom configuration in three steps: first
it resizes the buffers and adds any new ones. Then it redirects priorities
to the new buffers. And finally it sets the size of the now-unused buffers
to zero. This way no packet drops are introduced.

This sort of careful approach will also be useful for configuring port
buffer sizes and priority map by hand, through dcbnl_setbuffer. Therefore
move the code from the DCB handler to the generic headroom function.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Petr Machata [Wed, 16 Sep 2020 06:35:22 +0000 (09:35 +0300)]

mlxsw: spectrum_dcb: Convert mlxsw_sp_port_pg_prio_map() to hdroom code

The new hdroom code has certain conventions: iteration over priorities is
done through a variable named `prio', configuration is not pushed unless it
is dirty, but a `force' flag can be used to override this, updated
configuration is written to port. Convert the function
mlxsw_sp_port_pg_prio_map() to use these conventions and rename
appropriately to fit in.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Petr Machata [Wed, 16 Sep 2020 06:35:21 +0000 (09:35 +0300)]

mlxsw: spectrum_dcb: Convert ETS handler fully to mlxsw_sp_hdroom_configure()

The ETS handler performs the headroom configuration in three steps: first
it resizes the buffers and adds any new ones. Then it redirects priorities
to the new buffers. And finally it sets the size of the now-unused buffers
to zero. This way no packet drops are introduced.

Both of the buffer size configuration operations are simply buffer size
configurations, there is no material difference between setting buffers to
zero and any other value. Therefore simply invoke the same
mlxsw_sp_hdroom_configure(), and drop mlxsw_sp_port_pg_destroy() and
mlxsw_sp_ets_has_pg() which are now unused.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Petr Machata [Wed, 16 Sep 2020 06:35:20 +0000 (09:35 +0300)]

mlxsw: spectrum: Split headroom autoresize out of buffer configuration

Split mlxsw_sp_port_headroom_set() to three functions.
mlxsw_sp_hdroom_bufs_reset_sizes() changes the sizes of the individual PG
buffers, and mlxsw_sp_hdroom_configure_buffers() will actually apply the
configuration. A third function, mlxsw_sp_hdroom_bufs_fit(), verifies that
the requested buffer configuration matches total headroom size
requirements.

Add wrappers, mlxsw_sp_hdroom_configure() and __..., that will eventually
perform full headroom configuration, but for now, only have them verify the
configured headroom size, and invoke mlxsw_sp_hdroom_configure_buffers().
Have them take the `force` argument to prepare for a later patch, even
though it is currently unused.

Note that the loop in mlxsw_sp_hdroom_configure_buffers() only goes through
DCBX_MAX_BUFFERS. Since there is no logic to configure the control buffer,
it needs to keep the values queried from the FW. Eventually this function
should configure all the PGs.

Note that conversion of __mlxsw_sp_dcbnl_ieee_setets() is not trivial. That
function performs the headroom configuration in three steps: first it
resizes the buffers and adds any new ones. Then it redirects priorities to
the new buffers. And finally it sets the size of the now-unused buffers to
zero. This way no packet drops are introduced.

So after invoking mlxsw_sp_hdroom_bufs_reset_sizes(), tweak the
configuration to keep the old sizes of PG buffers for those buffers whose
size was set to zero.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Petr Machata [Wed, 16 Sep 2020 06:35:19 +0000 (09:35 +0300)]

mlxsw: spectrum: Track buffer sizes in struct mlxsw_sp_hdroom

So far, port buffers were always autoconfigured. When dcbnl_setbuffer
callback is implemented, it will allow the user to change the buffer size
configuration by hand. The sizes therefore need to be a configuration
parameter, not always deduced, and therefore belong to struct
mlxsw_sp_hdroom, where the configuration routine should take them from.

Update mlxsw_sp_port_headroom_set() to update these sizes. Have the
function update the sizes even for the case that a given buffer is not
used.

Additionally, change the loop iteration end to DCBX_MAX_BUFFERS instead of
IEEE_8021QAZ_MAX_TCS. The value is the same, but the semantics differ.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Petr Machata [Wed, 16 Sep 2020 06:35:18 +0000 (09:35 +0300)]

mlxsw: spectrum: Track lossiness in struct mlxsw_sp_hdroom

Client-side configuration has lossiness as an attribute of a priority.
Therefore add a "lossy" attribute to struct mlxsw_sp_hdroom_prio.

To a Spectrum ASIC, lossiness is a feature of a port buffer. Therefore add
struct mlxsw_sp_hdroom_buf, which in the following patches will get more
attributes, but right now only use it to track port buffer lossiness.

Instead of passing around the primary indicators of PFC and pause_en, add a
function mlxsw_sp_hdroom_bufs_reset_lossiness() to compute the buffer
lossiness from the priority map and priority lossiness. Change
mlxsw_sp_port_headroom_set() to take the buffer lossy flag from the
headroom configuration. Have the PFC and pause handlers configure priority
lossiness in mlxsw_sp_hdroom, from where it will propagate.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Petr Machata [Wed, 16 Sep 2020 06:35:17 +0000 (09:35 +0300)]

mlxsw: spectrum: Track priorities in struct mlxsw_sp_hdroom

The mapping from priorities to buffers determines which buffers should be
configured. Lossiness of these priorities combined with the mapping
determines whether a given buffer should be lossy.

Currently this configuration is stored implicitly in DCB ETS, PFC and
ethtool PAUSE configuration. Keeping it together with the rest of the
headroom configuration and deriving it as needed from PFC / ETS / PAUSE
will make things clearer. To that end, add a field "prios" to struct
mlxsw_sp_hdroom.

Previously, __mlxsw_sp_port_headroom_set() took prio_tc as an argument, and
assumed that the same mapping as we use on the egress should be used on
ingress as well. Instead, track this configuration at each priority, so
that it can be adjusted flexibly.

In the following patches, as dcbnl_setbuffer is implemented, it will need
to store its own mapping, and it will also be sometimes necessary to revert
back to the original ETS mapping. Therefore track two buffer indices: the
one for chip configuration (buf_idx), and the source one (ets_buf_idx).
Introduce a function to configure the chip-level buffer index, and for now
have it simply copy the ETS mapping over to the chip mapping.

Update the ETS handler to project prio_tc to the ets_buf_idx and invoke the
buf_idx recomputation.

Now that there is a canonical place to look for this configuration,
mlxsw_sp_port_headroom_set() does not need to invent def_prio_tc to use if
DCB is compiled out.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Petr Machata [Wed, 16 Sep 2020 06:35:16 +0000 (09:35 +0300)]

mlxsw: spectrum: Track MTU in struct mlxsw_sp_hdroom

MTU influences sizes of auto-allocated buffers. Make it a part of port
buffer configuration and have __mlxsw_sp_port_headroom_set() take it from
there, instead of as an argument.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Petr Machata [Wed, 16 Sep 2020 06:35:15 +0000 (09:35 +0300)]

mlxsw: spectrum: Unify delay handling between PFC and pause

When a priority is marked as lossless using DCB PFC, or when pause frames
are enabled on a port, mlxsw adds to port buffers an extra space to cover
the traffic that will arrive between the time that a pause or PFC frame is
emitted, and the time traffic actually stops. This is called the delay. The
concept is the same in PFC and pause, however the way the extra buffer
space is calculated differs.

In this patch, unify this handling. Delay is to be measured in bytes of
extra space, and will not include MTU. PFC handler sets the delay directly
from the parameter it gets through the DCB interface.

To convert pause handler, move MLXSW_SP_PAUSE_DELAY to ethtool module,
convert to bytes, and reduce it by maximum MTU, and divide by two. Then it
has the same meaning as the delay_bytes set by the PFC handler.

Keep the delay_bytes value in struct mlxsw_sp_hdroom introduced in the
previous patch. Change PFC and pause handlers to store the new delay value
there and have __mlxsw_sp_port_headroom_set() take it from there.

Instead of mlxsw_sp_pfc_delay_get() and mlxsw_sp_pg_buf_delay_get(),
introduce mlxsw_sp_hdroom_buf_delay_get() to calculate the delay provision.
Drop the unnecessary MLXSW_SP_CELL_FACTOR, and instead add an explanatory
comment describing the formula used.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Petr Machata [Wed, 16 Sep 2020 06:35:14 +0000 (09:35 +0300)]

mlxsw: spectrum_buffers: Add struct mlxsw_sp_hdroom

The port headroom handling is currently strewn across several modules and
tricky to follow: MTU, DCB PFC, DCB ETS and ethtool pause all influence the
settings, and then there is the completely separate initial configuraion in
spectrum_buffers. A following patch will implement the dcbnl_setbuffer
callback, which is going to further complicate the landscape.

In order to simplify work with port buffers, the following patches are
going to centralize all port-buffer handling in spectrum_buffers. As a
first step, introduce a (currently empty) struct mlxsw_sp_hdroom that will
keep the configuration parameters, and allocate and free it in appropriate
places.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Wed, 16 Sep 2020 22:16:51 +0000 (15:16 -0700)]

Merge tag 'mlx5-updates-2020-09-15' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2020-09-15

Various updates to mlx5 driver,

1) Eli adds support for TC trap action.
2) Eran, minor improvements to clock.c code structure
3) Better handling of error reporting in LAG from Jianbo
4) IPv6 traffic class (DSCP) header rewrite support from Maor
5) Ofer Levi adds support for CQE compression of multi-strides packets
6) Vu, Enables use of vport meta data by default.
7) Some minor code cleanup
====================

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Tue, 15 Sep 2020 23:31:44 +0000 (16:31 -0700)]

Merge branch 'nexthop-Small-changes'

Ido Schimmel says:

====================
nexthop: Small changes

This patch set contains a few small changes that I split out of the RFC
I sent last week [1]. Main change is the conversion of the nexthop
notification chain to a blocking chain so that it could be reused by
device drivers for nexthop objects programming in the future.

Tested with fib_nexthops.sh:

Tests passed: 164
Tests failed: 0

[1] https://lore.kernel.org/netdev/20200908091037.2709823-1-idosch@idosch.org/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Ido Schimmel [Tue, 15 Sep 2020 11:41:03 +0000 (14:41 +0300)]

selftests: fib_nexthops: Test cleanup of FDB entries following nexthop deletion

Commit c7cdbe2efc40 ("vxlan: support for nexthop notifiers") registered
a listener in the VXLAN driver to the nexthop notification chain. Its
purpose is to cleanup FDB entries that use a nexthop that is being
deleted.

Test that such FDB entries are removed when the nexthop group that they
use is deleted. Test that entries are not deleted when a single nexthop
in the group is deleted.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Ido Schimmel [Tue, 15 Sep 2020 11:41:02 +0000 (14:41 +0300)]

nexthop: Only emit a notification when nexthop is actually deleted

Currently, the in-kernel delete notification is emitted from the error
path of nexthop_add() and replace_nexthop(), which can be confusing to
in-kernel listeners as they are not familiar with the nexthop.

Instead, only emit the notification when the nexthop is actually
deleted. The following sub-cases are covered:

1. User space deletes the nexthop
2. The nexthop is deleted by the kernel due to a netdev event (e.g.,
nexthop device going down)
3. A group is deleted because its last nexthop is being deleted
4. The network namespace of the nexthop device is deleted

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Ido Schimmel [Tue, 15 Sep 2020 11:41:01 +0000 (14:41 +0300)]

nexthop: Convert to blocking notification chain

Currently, the only listener of the nexthop notification chain is the
VXLAN driver. Subsequent patches will add more listeners (e.g., device
drivers such as netdevsim) that need to be able to block when processing
notifications.

Therefore, convert the notification chain to a blocking one. This is
safe as notifications are always emitted from process context.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Ido Schimmel [Tue, 15 Sep 2020 11:41:00 +0000 (14:41 +0300)]

nexthop: Remove NEXTHOP_EVENT_ADD

Not used anywhere.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Suggested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Ido Schimmel [Tue, 15 Sep 2020 11:40:59 +0000 (14:40 +0300)]

nexthop: Remove unused function declaration from header file

Not used or implemented anywhere.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Geert Uytterhoeven [Tue, 15 Sep 2020 09:35:51 +0000 (11:35 +0200)]

chelsio/chtls: Re-add dependencies on CHELSIO_T4 to fix modular CHELSIO_T4

As CHELSIO_INLINE_CRYPTO is bool, and CHELSIO_T4 is tristate, the
dependency of CHELSIO_INLINE_CRYPTO on CHELSIO_T4 is not sufficient to
protect CRYPTO_DEV_CHELSIO_TLS and CHELSIO_IPSEC_INLINE.  The latter two
are also tristate, hence if CHELSIO_T4=n, they cannot be builtin, as
that would lead to link failures like:

    drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_main.c:259: undefined reference to `cxgb4_port_viid'

and

    drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c:752: undefined reference to `cxgb4_reclaim_completed_tx'

Fix this by re-adding dependencies on CHELSIO_T4 to tristate symbols.
The dependency of CHELSIO_INLINE_CRYPTO on CHELSIO_T4 is kept to avoid
asking the user.

Fixes: 6bd860ac1c2a0ec2 ("chelsio/chtls: CHELSIO_INLINE_CRYPTO should depend on CHELSIO_T4")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Tue, 15 Sep 2020 22:57:16 +0000 (15:57 -0700)]

Merge branch 'mlxsw-Introduce-fw_fatal-health-reporter-and-test-cmd-to
-trigger-test-event'

Ido Schimmel says:

====================
mlxsw: Introduce fw_fatal health reporter and test cmd to trigger test event

Jiri says:

This patch set introduces a health reporter for mlxsw that reports FW
fatal events. Alongside that, it introduces a test command that is used
to trigger a dummy FW fatal event by user:

$ sudo devlink health test pci/0000:03:00.0 reporter fw_fatal

$ devlink health
pci/0000:03:00.0:
  reporter fw_fatal
    state error error 1 recover 0 last_dump_date 2020-07-27 last_dump_time 16:33:27 auto_dump true

$ sudo devlink health dump show pci/0000:03:00.0 reporter fw_fatal -j -p
{
    "irisc_id": 0,
    "event": [
        "id": 3 ],
    "method": "query",
    "long_process": false,
    "command_type": "mad",
    "reg_attr_id": 0
}

As a dependency, the FW validation and flashing is moved to core.c.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jiri Pirko [Tue, 15 Sep 2020 08:40:58 +0000 (11:40 +0300)]

mlxsw: core: Introduce fw_fatal health reporter

Introduce devlink health reporter to report FW fatal events. Implement
the event listener using MFDE trap and enable the events to be
propagated using MFGD register configuration.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jiri Pirko [Tue, 15 Sep 2020 08:40:57 +0000 (11:40 +0300)]

devlink: introduce the health reporter test command

Introduce a test command for health reporters. User might use this
command to trigger test event on a reporter if the reporter supports it.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jiri Pirko [Tue, 15 Sep 2020 08:40:56 +0000 (11:40 +0300)]

mlxsw: reg: Add Monitoring FW General Debug Register

Introduce MFGD register that is used to configure firmware debugging.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jiri Pirko [Tue, 15 Sep 2020 08:40:55 +0000 (11:40 +0300)]

mlxsw: reg: Add Monitoring FW Debug Register

Introduce MFDE register that is passed through MFDE trap in case of
fatal FW event.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jiri Pirko [Tue, 15 Sep 2020 08:40:54 +0000 (11:40 +0300)]

mlxsw: Move fw_load_policy devlink param into core.c

As the fw flashing code was moved to core.c, move the param which is
related to it there as well. Remove unnecessary parentheses on the way.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jiri Pirko [Tue, 15 Sep 2020 08:40:53 +0000 (11:40 +0300)]

mlxsw: core: Push code doing params register/unregister into separate helpers

Extract the code calling params register/unregister driver ops into
separate functions. Call publish/unpublish unconditionally.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jiri Pirko [Tue, 15 Sep 2020 08:40:52 +0000 (11:40 +0300)]

mlxsw: Move fw flashing code into core.c

As the firmware flashing is not specific to Spectrum, move the code to
core.c and avoid one op call and 2 exported symbols. Also, this allows
to do flash before call of driver->init function and possibly do other
core calls in between.

Do some small renaming here and there on the way to be consistent with
the rest of core.c code.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jiri Pirko [Tue, 15 Sep 2020 08:40:51 +0000 (11:40 +0300)]

mlxsw: Bump firmware version to XX.2008.1310

Among other changes, this version supports FW monitoring.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Tue, 15 Sep 2020 22:40:04 +0000 (15:40 -0700)]

Merge branch 'net-stmmac-Add-ethtool-support-for-get-set-channels'

Wong Vee Khee says:

====================
net: stmmac: Add ethtool support for get|set channels

This patch set is to add support for user to get or set Tx/Rx channel
via ethtool. There are two patches that fixes bug introduced on upstream
in order to have the feature work.

Tested on Intel Tigerlake Platform.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Ong Boon Leong [Tue, 15 Sep 2020 01:28:40 +0000 (09:28 +0800)]

net: stmmac: use netif_tx_start|stop_all_queues() function

The current implementation of stmmac_stop_all_queues() and
stmmac_start_all_queues() will not work correctly when the value of
tx_queues_to_use is changed through ethtool -L DEVNAME rx N tx M command.

Also, netif_tx_start|stop_all_queues() are only needed in driver open()
and close() only.

Fixes: c22a3f48 net: stmmac: adding multiple napi mechanism

Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Aashish Verma [Tue, 15 Sep 2020 01:28:39 +0000 (09:28 +0800)]

net: stmmac: Fix incorrect location to set real_num_rx|tx_queues

netif_set_real_num_tx_queues() & netif_set_real_num_rx_queues() should be
used to inform network stack about the real Tx & Rx queue (active) number
in both stmmac_open() and stmmac_resume(), therefore, we move the code
from stmmac_dvr_probe() to stmmac_hw_setup().

Fixes: c02b7a914551 net: stmmac: use netif_set_real_num_{rx,tx}_queues

Signed-off-by: Aashish Verma <aashishx.verma@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Ong Boon Leong [Tue, 15 Sep 2020 01:28:38 +0000 (09:28 +0800)]

net: stmmac: add ethtool support for get/set channels

Restructure NAPI add and delete process so that we can call them
accordingly in open() and ethtool_set_channels() accordingly.

Introduced stmmac_reinit_queues() to handle the transition needed
for changing Rx & Tx channels accordingly.

Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Tue, 15 Sep 2020 20:26:29 +0000 (13:26 -0700)]

Merge branch 'ethtool-add-pause-frame-stats'

Jakub Kicinski says:

====================
ethtool: add pause frame stats

This is the first (small) series which exposes some stats via
the corresponding ethtool interface. Here (thanks to the
excitability of netlink) we expose pause frame stats via
the same interfaces as ethtool -a / -A.

In particular the following stats from the standard:
- 30.3.4.2 aPAUSEMACCtrlFramesTransmitted
- 30.3.4.3 aPAUSEMACCtrlFramesReceived

4 real drivers are converted, I believe we got confirmation
from maintainers that all exposed stats match the standard.

v3:
- fix mlx5 build
- adjust the init logic in patch 1
v2:
- netdevsim: add missing static
- bnxt: fix sparse warning
- mlx5: address Saeed's comments
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jakub Kicinski [Tue, 15 Sep 2020 00:11:59 +0000 (17:11 -0700)]

mlx4: add pause frame stats

Check if the pause stats are reported by HW by checking the bitmap.
Calculation is based on the order of strings in main_strings from
ethtool -S. Hopefully the semantics of these stats match the standard..

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jakub Kicinski [Tue, 15 Sep 2020 00:11:58 +0000 (17:11 -0700)]

mlx5: add pause frame stats

Plumb through all the indirection and copy some code from
ethtool -S. The names of the group indicate that these are
the stats we are after (and Saeed confirms it).

v3:
- fix build in mlx5_rep
v2:
- drop the ethool helper and call stats directly
- don't pass 0 as initialized to in buffer
- use local buffer

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jakub Kicinski [Tue, 15 Sep 2020 00:11:57 +0000 (17:11 -0700)]

ixgbe: add pause frame stats

Report standard pause frame stats. They are already aggregated
in struct ixgbe_hw_stats.

The combination of the registers is suggested as equivalent to
PAUSEMACCtrlFramesTransmitted / PAUSEMACCtrlFramesReceived
by the Intel 82576EB datasheet, I could not find any information
in the HW actually supported by ixgbe.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jakub Kicinski [Tue, 15 Sep 2020 00:11:56 +0000 (17:11 -0700)]

bnxt: add pause frame stats

These stats are already reported in ethtool -S.
Michael confirms they are equivalent to standard stats.

v2: - fix sparse warning about endian by using the macro
- use u64 for pointer type

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jakub Kicinski [Tue, 15 Sep 2020 00:11:55 +0000 (17:11 -0700)]

selftests: add a test for ethtool pause stats

Make sure the empty nest is reported even without stats.
Make sure reporting only selected stats works fine.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jakub Kicinski [Tue, 15 Sep 2020 00:11:54 +0000 (17:11 -0700)]

netdevsim: add pause frame stats

Add minimal ethtool interface for testing ethtool pause stats.

v2: add missing static on nsim_ethtool_ops

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jakub Kicinski [Tue, 15 Sep 2020 00:11:53 +0000 (17:11 -0700)]

docs: net: include the new ethtool pause stats in the stats doc

Tell people that there now is an interface for querying pause frames.
A little bit of restructuring is needed given this is a first source
of such statistics.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jakub Kicinski [Tue, 15 Sep 2020 00:11:52 +0000 (17:11 -0700)]

ethtool: add standard pause stats

Currently drivers have to report their pause frames statistics
via ethtool -S, and there is a wide variety of names used for
these statistics.

Add the two statistics defined in IEEE 802.3x to the standard
API. Create a new ethtool request header flag for including
statistics in the response to GET commands.

Always create the ETHTOOL_A_PAUSE_STATS nest in replies when
flag is set. Testing if driver declares the op is not a reliable
way of checking if any stats will actually be included and therefore
we don't want to give the impression that presence of
ETHTOOL_A_PAUSE_STATS indicates driver support.

Note that this patch does not include PFC counters, which may fit
better in dcbnl? But mostly I don't need them/have a setup to test
them so I haven't looked deeply into exposing them :)

v3:
- add a helper for "uninitializing" stats, rather than a cryptic
memset() (Andrew)

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Tue, 15 Sep 2020 20:21:47 +0000 (13:21 -0700)]

Merge branch 's390-qeth-next'

Julian Wiedmann says:

====================
s390/qeth: updates 2020-09-10

subject to positive review by the bridge maintainers on patch 5,
please apply the following patch series to netdev's net-next tree.

Alexandra adds BR_LEARNING_SYNC support to qeth. In addition to the
main qeth changes (controlling the feature, and raising switchdev
events), this also needs
- Patch 1 and 2 for some s390/cio infrastructure improvements
  (acked by Heiko to go in via net-next), and
- Patch 5 to introduce a new switchdev_notifier_type, so that a driver
  can clear all previously learned entries from the bridge FDB in case
  things go out-of-sync later on.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Alexandra Winter [Thu, 10 Sep 2020 17:23:51 +0000 (19:23 +0200)]

s390/qeth: implement ndo_bridge_setlink for learning_sync

Documentation/networking/switchdev.txt and 'man bridge' indicate that the
learning_sync bridge attribute is used to control whether a given
device will sync MAC addresses learned on its device port to a master
bridge FDB, where they will show up as 'extern_learn offload'. So we map
qeth_l2_dev2br_an_set() to the learning_sync bridge link attribute.

Turning off learning_sync will flush all extern_learn entries from the
bridge fdb and all pending events from the card's work queue.

When the hardware interface goes offline with learning_sync on
(e.g. for HW recovery), all extern_learn entries will be flushed from the
bridge fdb and all pending events from the card's work queue. When the
interface goes online again, it will send new notifications for all then
valid MACs. learning_sync attribute can not be modified while interface is
offline. See
'commit e6e771b3d897 ("s390/qeth: detach netdevice while card is offline")'

An alternative implementation would be to always offload the 'learning'
attribute of a software bridge to the hardware interface attached to it
and thus implicitly enable fdb notification. This was not chosen for 2
reasons:
1) In our case the software bridge is NOT a representation of a hardware
switch. It is just connected to a smart NIC that is able to inform
about the addresses attached to it. It is not necessarily using source
MAC learning for this and other bridgeports can be attached to other
NICs with different properties.
2) We want a means to enable this notification explicitly. There may be
cases where a bridgeport is set to 'learning', but we do not want to
enable the notification.

Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Alexandra Winter [Thu, 10 Sep 2020 17:23:50 +0000 (19:23 +0200)]

s390/qeth: implement ndo_bridge_getlink for learning_sync

Documentation/networking/switchdev.txt and 'man bridge' indicate that the
learning_sync bridge attribute is used to indicate whether a given
device will sync MAC addresses learned on its device port to a master
bridge FDB.

learning_sync attribute can not be read while interface is offline (down).
See
'commit e6e771b3d897 ("s390/qeth: detach netdevice while card is offline")'
We return EOPNOTSUPP and not EONODEV in this case, because EONOTSUPP is the
only rc that is tolerated by 'bridge -d link show'.

Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Alexandra Winter [Thu, 10 Sep 2020 17:23:49 +0000 (19:23 +0200)]

s390/qeth: Reset address notification in case of buffer overflow

In case hardware sends more device-to-bridge-address-change notfications
than the qeth-l2 driver can handle, the hardware will send an overflow
event and then stop sending any events. It expects software to flush its
FDB and start over again. Re-enabling address-change-notification will
report all current addresses.

In order to re-enable address-change-notification this patch defines
the functions qeth_l2_dev2br_an_set() and qeth_l2_dev2br_an_set_cb
to enable or disable dev-to-bridge-address-notification.

A following patch will use the learning_sync bridgeport flag to trigger
enabling or disabling of address-change-notification, so we define
priv->brport_features to store the current setting. BRIDGE_INFO and
ADDR_INFO functionality are mutually exclusive, whereas ADDR_INFO and
qeth_l2_vnicc* can be used together.

Alternative implementations to handle buffer overflow:
Just re-enabling notification and adding all newly reported addresses
would cover any lost 'add' events, but not the lost 'delete' events.
Then these invalid addresses would stay in the bridge FDB as long as the
device exists.
Setting the net device down and up, would be an alternative, but is a bit
drastic. If the net device has many secondary addresses this will create
many delete/add events at its peers which could de-stabilize the
network segment.

Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Alexandra Winter [Thu, 10 Sep 2020 17:23:48 +0000 (19:23 +0200)]

bridge: Add SWITCHDEV_FDB_FLUSH_TO_BRIDGE notifier

so the switchdev can notifiy the bridge to flush non-permanent fdb entries
for this port. This is useful whenever the hardware fdb of the switchdev
is reset, but the netdev and the bridgeport are not deleted.

Note that this has the same effect as the IFLA_BRPORT_FLUSH attribute.

CC: Jiri Pirko <jiri@resnulli.us>
CC: Ivan Vecera <ivecera@redhat.com>
CC: Roopa Prabhu <roopa@nvidia.com>
CC: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Alexandra Winter [Thu, 10 Sep 2020 17:23:47 +0000 (19:23 +0200)]

s390/qeth: Translate address events into switchdev notifiers

A qeth-l2 HiperSockets card can show switch-ish behaviour in the sense,
that it can report all MACs that are reachable via this interface. Just
like a switch device, it can notify the software bridge about changes
to its fdb. This patch exploits this device-to-bridge-notification and
extracts the relevant information from the hardware events to generate
notifications to an attached software bridge.

There are 2 sources for this information:
1) The reply message of Perform-Network-Subchannel-Operations (PNSO)
(operation code ADDR_INFO) reports all addresses that are currently
reachable (implemented in a later patch).
2) As long as device-to-bridge-notification is enabled, hardware will
generate address change notification events, whenever the content of
the hardware fdb changes (this patch).

The bridge_hostnotify feature (PNSO operation code BRIDGE_INFO) uses
the same address change notification events. We need to distinguish
between qeth_pnso_mode QETH_PNSO_BRIDGEPORT and QETH_PNSO_ADDR_INFO
and call a different handler. In both cases deadlocks must be
prevented, if the workqueue is drained under lock and QETH_PNSO_NONE,
when notification is disabled.

bridge_hostnotify generates udev events, there is no intend to do the same
for dev2br. Instead this patch will generate SWITCHDEV_FDB_ADD_TO_BRIDGE
and SWITCHDEV_FDB_DEL_TO_BRIDGE notifications, that will cause the
software bridge to add (or delete) entries to its fdb as 'extern_learn
offload'.

Documentation/networking/switchdev.txt proposes to add
"depends NET_SWITCHDEV" to driver's Kconfig. This is not done here,
so even in absence of the NET_SWITCHDEV module, the QETH_L2 module will
still be built, but then the switchdev notifiers will have no effect.

No VLAN filtering is done on the entries and VLAN information is not
passed on to the bridge fdb entries. This could be added later.
For now VLAN interfaces can be defined on the upper bridge interface.

Multicast entries are not passed on to the bridge fdb.
This could be added later. For now mcast flooding can be used in the
bridge.

The card reports all MACs that are in its FDB, but we must not pass on
MACs that are registered for this interface.

Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Alexandra Winter [Thu, 10 Sep 2020 17:23:46 +0000 (19:23 +0200)]

s390/qeth: Detect PNSO OC3 capability

This patch detects whether device-to-bridge-notification, provided
by the Perform Network Subchannel Operation (PNSO) operation code
ADDR_INFO (OC3), is supported by this card. A following patch will
map this to the learning_sync bridgeport flag, so we store it in
priv->brport_hw_features in bridgeport flag format.

Only IQD cards provide PNSO.
There is a feature bit to indicate whether the machine provides OC3,
unfortunately it is not set on old machines.
So PNSO is called to find out. As this will disable notification
and is exclusive with bridgeport_notification, this must be done
during card initialisation before previous settings are restored.

PNSO functionality requires some configuration values that are added to
the qeth_card.info structure. Some helper functions are defined to fill
them out when the card is brought online and some other places are
adapted, that can also benefit from these fields.

Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Alexandra Winter [Thu, 10 Sep 2020 17:23:45 +0000 (19:23 +0200)]

s390/cio: Helper functions to read CSSID, IID, and CHID

Add helper functions to expose Channel Subsystem ID (CSSID), MIF Image Id
(IID), Channel ID (CHID) and Channel Path ID (CHPID).
These values are required by the qeth driver's exploitation of network-
address-change-notifications to determine which entries belong to this
interface.

Store the Partition identifier in System log, as this may be used to map
a Linux view to a Hardware view for debugging purpose.

Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Alexandra Winter [Thu, 10 Sep 2020 17:23:44 +0000 (19:23 +0200)]

s390/cio: Add new Operation Code OC3 to PNSO

Add support for operation code 3 (OC3) of the
Perform-Network-Subchannel-Operations (PNSO) function
of the Channel-Subsystem-Call (CHSC) instruction.

PNSO provides 2 operation codes:
OC0 - BRIDGE_INFO
OC3 - ADDR_INFO (new)

Extend the function calls to *pnso* to pass the OC and
add new response code 0108.

Support for OC3 is indicated by a flag in the css_general_characteristics.

Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Reviewed-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Ofer Levi [Sun, 17 May 2020 07:16:49 +0000 (10:16 +0300)]

net/mlx5e: Add CQE compression support for multi-strides packets

Add CQE compression support for completions of packets that span
multiple strides in a Striding RQ, per the HW capability.
In our memory model, we use small strides (256B as of today) for the
non-linear SKB mode. This feature allows CQE compression to work also
for multiple strides packets. In this case decompressing the mini CQE
array will use stride index provided by HW as part of the mini CQE.
Before this feature, compression was possible only for single-strided
packets, i.e. for packets of size up to 256 bytes when in non-linear
mode, and the index was maintained by SW.
This feature is supported for ConnectX-5 and above.

Feature performance test:
This was whitebox-tested, we reduced the PCI speed from 125Gb/s to
62.5Gb/s to overload pci and manipulated mlx5 driver to drop incoming
packets before building the SKB to achieve low cpu utilization.
Outcome is low cpu utilization and bottleneck on pci only.
Test setup:
Server: Intel(R) Xeon(R) Silver 4108 CPU @ 1.80GHz server, 32 cores
NIC: ConnectX-6 DX.
Sender side generates 300 byte packets at full pci bandwidth.
Receiver side configuration:
Single channel, one cpu processing with one ring allocated. Cpu utilization
is ~20% while pci bandwidth is fully utilized.
For the generated traffic and interface MTU of 4500B (to activate the
non-linear SKB mode), packet rate improvement is about 19% from ~17.6Mpps
to ~21Mpps.
Without this feature, counters show no CQE compression blocks for
this setup, while with the feature, counters show ~20.7Mpps compressed CQEs
in ~500K compression blocks.

Signed-off-by: Ofer Levi <oferle@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

commit | commitdiff | tree

Maor Dickman [Thu, 3 Sep 2020 08:02:10 +0000 (11:02 +0300)]

net/mlx5e: Add IPv6 traffic class (DSCP) header rewrite support

Add support for rewriting of IPV6 DSCP part of traffic class field.
Next commands, for example, can be used to offload rewrite action:

OVS:
$ ovs-ofctl add-flow ovs-sriov "tcpv6, in_port=REP, \
       actions=mod_nw_tos:68, output:NIC"

iproute2:
$ tc filter add dev REP ingress protocol ipv6 prio 1 flower skip_sw \
       ip_proto tcp \
       action pedit ex munge ip6 traffic_class set 68 retain 0xfc pipe \
       action mirred egress redirect dev NIC

Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

commit | commitdiff | tree

Eli Cohen [Sun, 23 Aug 2020 12:54:43 +0000 (15:54 +0300)]

net/mlx5e: Add support for tc trap

Support tc trap such that packets can explicitly be forwarded to slow
path if they match a specific rule.

In the example below, we want packets with src IP equals 7.7.7.8 to be
forwarded to software, in which case it will get to the appropriate
representor net device.

$ tc filter add dev eth1 protocol ip prio 1 root flower skip_sw \
src_ip 7.7.7.8 action trap

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

commit | commitdiff | tree

Vu Pham [Mon, 18 May 2020 22:02:45 +0000 (15:02 -0700)]

net/mlx5: E-Switch, Use vport metadata matching by default

Multiple features use metadata matching such as bond vport
in live migration, multi-port RoCE mode, stacked devices;
hence, enable vport metadata matching by default.

Fixes: 1e62e222db2e ("net/mlx5: E-Switch, Use vport metadata matching only when mandatory")
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

commit | commitdiff | tree

Vu Pham [Fri, 22 May 2020 18:48:38 +0000 (11:48 -0700)]

net/mlx5: E-Switch, Setup all vports' metadata to support peer miss rule

In merged eswitch configuration, peer miss rule is setup for all
vports. If metadata is enabled, peer miss rule with metadata matching
will be configured instead of source port matching; however, some
vports that have not yet been enabled don't have default_metadata
setup and their default_metadata will be zero.

Hence, setup/cleanup default metadata for all vports when eswitch moves
in/out of offloads mode.

Fixes: 133dcfc577ea ("net/mlx5: E-Switch, Alloc and free unique metadata for match")
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

commit | commitdiff | tree

Vu Pham [Tue, 23 Jun 2020 18:53:34 +0000 (11:53 -0700)]

net/mlx5: E-Switch, Dedicated metadata for uplink vport

Uplink vport must have a dedicated metadata with vhca_id
being part of the metadata.

Fixes: 133dcfc577ea ("net/mlx5: E-Switch, Alloc and free unique metadata for match")
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

commit | commitdiff | tree

Vu Pham [Tue, 23 Jun 2020 19:32:31 +0000 (12:32 -0700)]

net/mlx5: E-Switch, Check and enable metadata support flag before using

Check E-Switch capabilities and enable metadata support flag
before using it to setup other features that need metadata.

Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

commit | commitdiff | tree

Jianbo Liu [Fri, 17 Apr 2020 08:53:16 +0000 (08:53 +0000)]

net/mlx5e: Add LAG warning if bond slave is not lag master

LAG offload can't be enabled if the enslaved PF is not lag master,
which is indicated by HCA capabilities bit. It is cleared if more than
64 VFs are configured for this PF.

Previously, a data structure is created to store lag info, including
PFs to be enslaved, then a handler is registered for netdev notifier.
However, this initialization is skipped if PF is not lag master. So
PF can't handle CHANGEUPPER event from upper bond device. Even worse,
PF is enslaved silently, and LAG offload is not activated.

Fix this by registering netdev notifier for PFs which are not lag
masters. When CHANGEUPPER event is received, and both physical ports
(and only them) on the same NIC are about to be enslaved, a warning is
returned for user to know it.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

commit | commitdiff | tree

Jianbo Liu [Fri, 17 Apr 2020 02:55:46 +0000 (02:55 +0000)]

net/mlx5e: Add LAG warning for unsupported tx type

If bond tx type is not active-backup or hash, LAG offload can't be
enabled. When CHANGEUPPER event is received, and both PFs (and only
them) under the same lag master are about to be enslaved, a warning
is returned for user to know the offload failure, otherwise PFs are
enslaved silently without LAG offload activated.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

commit | commitdiff | tree

Jianbo Liu [Thu, 30 Apr 2020 01:59:20 +0000 (01:59 +0000)]

net/mlx5e: Return a valid errno if can't get lag device index

Change the return value to -ENOENT, to make it more meaningful.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

commit | commitdiff | tree

Eran Ben Elisha [Tue, 16 Jun 2020 09:07:10 +0000 (12:07 +0300)]

net/mlx5: Don't call timecounter cyc2time directly from 1PPS flow

Before calling timecounter_cyc2time(), clock->lock must be taken.
Use mlx5_timecounter_cyc2time instead which guarantees a safe access.

Fixes: afc98a0b46d8 ("net/mlx5: Update ptp_clock_event foreach PPS event")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>

commit | commitdiff | tree

Eran Ben Elisha [Tue, 19 May 2020 09:00:57 +0000 (12:00 +0300)]

net/mlx5: Release clock lock before scheduling a PPS work

Holding the clock lock is not required when scheduling a PPS work.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>

commit | commitdiff | tree

Eran Ben Elisha [Tue, 9 Jun 2020 07:58:31 +0000 (10:58 +0300)]

net/mlx5: Rename ptp clock info

Fix a typo in ptp_clock_info naming: mlx5_p2p -> mlx5_ptp.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>

commit | commitdiff | tree

Eran Ben Elisha [Wed, 13 May 2020 08:06:47 +0000 (11:06 +0300)]

net/mlx5: Always use container_of to find mdev pointer from clock struct

Clock struct is part of struct mlx5_core_dev. Code was inconsistent, on
some cases used container_of and on another used clock->mdev.

Align code to use container_of amd remove clock->mdev pointer.
While here, fix reverse xmas tree coding style.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>

commit | commitdiff | tree

Dan Carpenter [Mon, 3 Aug 2020 14:34:48 +0000 (17:34 +0300)]

net/mlx5: remove erroneous fallthrough

This isn't a fall through because it was after a return statement.  The
fall through annotation leads to a Smatch warning:

    drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:246
    mlx5e_ethtool_get_sset_count() warn: ignoring unreachable code.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>

commit | commitdiff | tree

Moshe Tal [Thu, 16 Jul 2020 11:59:30 +0000 (14:59 +0300)]

net/mlx5: Fix uninitialized variable warning

Add variable initialization to eliminate the warning
"variable may be used uninitialized".

Fixes: 5f29458b77d5 ("net/mlx5e: Support dump callback in TX reporter")
Signed-off-by: Moshe Tal <moshet@mellanox.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

MicroBlaze GIT Repository