linux-2.6-microblaze.git
2 years agortnetlink: add extack support in fdb del handlers
Alaa Mohamed [Thu, 5 May 2022 15:09:57 +0000 (17:09 +0200)]
rtnetlink: add extack support in fdb del handlers

Add extack support to .ndo_fdb_del in netdevice.h and
all related methods.

Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'wwan-t7xx'
David S. Miller [Mon, 9 May 2022 09:51:59 +0000 (10:51 +0100)]
Merge branch 'wwan-t7xx'

Ricardo Martinez says:

====================
net: wwan: t7xx: PCIe driver for MediaTek M.2 modem

t7xx is the PCIe host device driver for Intel 5G 5000 M.2 solution which
is based on MediaTek's T700 modem to provide WWAN connectivity.
The driver uses the WWAN framework infrastructure to create the following
control ports and network interfaces:
* /dev/wwan0mbim0 - Interface conforming to the MBIM protocol.
  Applications like libmbim [1] or Modem Manager [2] from v1.16 onwards
  with [3][4] can use it to enable data communication towards WWAN.
* /dev/wwan0at0 - Interface that supports AT commands.
* wwan0 - Primary network interface for IP traffic.

The main blocks in t7xx driver are:
* PCIe layer - Implements probe, removal, and power management callbacks.
* Port-proxy - Provides a common interface to interact with different types
  of ports such as WWAN ports.
* Modem control & status monitor - Implements the entry point for modem
  initialization, reset and exit, as well as exception handling.
* CLDMA (Control Layer DMA) - Manages the HW used by the port layer to send
  control messages to the modem using MediaTek's CCCI (Cross-Core
  Communication Interface) protocol.
* DPMAIF (Data Plane Modem AP Interface) - Controls the HW that provides
  uplink and downlink queues for the data path. The data exchange takes
  place using circular buffers to share data buffer addresses and metadata
  to describe the packets.
* MHCCIF (Modem Host Cross-Core Interface) - Provides interrupt channels
  for bidirectional event notification such as handshake, exception, PM and
  port enumeration.

The compilation of the t7xx driver is enabled by the CONFIG_MTK_T7XX config
option which depends on CONFIG_WWAN.
This driver was originally developed by MediaTek. Intel adapted t7xx to
the WWAN framework, optimized and refactored the driver source code in close
collaboration with MediaTek. This will enable getting the t7xx driver on the
Approved Vendor List for interested OEM's and ODM's productization plans
with Intel 5G 5000 M.2 solution.

List of contributors:
Amir Hanania <amir.hanania@intel.com>
Andriy Shevchenko <andriy.shevchenko@linux.intel.com>
Chandrashekar Devegowda <chandrashekar.devegowda@intel.com>
Dinesh Sharma <dinesh.sharma@intel.com>
Eliot Lee <eliot.lee@intel.com>
Haijun Liu <haijun.liu@mediatek.com>
M Chetan Kumar <m.chetan.kumar@intel.com>
Mika Westerberg <mika.westerberg@linux.intel.com>
Moises Veleta <moises.veleta@intel.com>
Pierre-louis Bossart <pierre-louis.bossart@intel.com>
Chiranjeevi Rapolu <chiranjeevi.rapolu@intel.com>
Ricardo Martinez <ricardo.martinez@linux.intel.com>
Madhusmita Sahu <madhusmita.sahu@intel.com>
Muralidharan Sethuraman <muralidharan.sethuraman@intel.com>
Soumya Prakash Mishra <Soumya.Prakash.Mishra@intel.com>
Sreehari Kancharla <sreehari.kancharla@intel.com>
Suresh Nagaraj <suresh.nagaraj@intel.com>

[1] https://www.freedesktop.org/software/libmbim/
[2] https://www.freedesktop.org/software/ModemManager/
[3] https://gitlab.freedesktop.org/mobile-broadband/ModemManager/-/merge_requests/582
[4] https://gitlab.freedesktop.org/mobile-broadband/ModemManager/-/merge_requests/523

V8:
- Rebase skb_data_area_size() patch (02).

V7:
- Delete unused macros.
- Avoid duplicated calls to le32_to_cpu().
- Fix 'out of bounds' compilation error.
- Rename port_number to port_count.
- Remove '!!' when the destination variable is a boolean.
- Remove unneeded spinlock around rx_length_th.
- Remove common field from union inside dpmaif_drb struct.
- Use 'goto' for the exit flow in t7xx_pci_enable_sleep().
- Merge CLDMA tgpd and rgpd structs.
- Add comments to clarify skb consumption by ports.
- Introduce skb_data_area_size() helper.
- Declare the port config array as constant.
- Update CLDMA_JUMBO_BUFF_SZ definition when ccci_header
  is introduced by port-proxy patch.
- Update Reviewed-by tags.
- Simplify t7xx_dpmaif_tx_send_skb() and make t7xx_dpmaif_add_skb_to_ring()
  report the tx queue full state early.

V6:
- Remove unneeded initializations and bit masks.
- Remove t7xx_common.h file.
- Add comment to circular linking in GPD list.
- Use min instead of min_t.
- Use int for local indexes instead of short or char.
- Update the commit message in CLDMA patch about dependencies on core patch.
- Add space between contributor name and email address.
- Rename registers with double negatives
  e.g. DIS_ASPM_LOWPWR_CLR_0 -> ENABLE_ASPM_LOWPWR.
- Fix a race condition in pci sleep resource locking.
- Initialize interrupts with t7xx_pcie_mac_set_int() instead of 'clear'.
- Remove duplicate spin_lock_init(&md->exp_lock).
- Remove .ndo_select_queue callback due to singular TX queue.
- Remove call to deprecated netif_rx_any_context().
- Fix include guard name in t7xx_hif_dpmaif.h.
- Remove unused q_num parameter in DPMAIF functions.
- Do not serialize the drb_wr_idx write in t7xx_dpmaif_add_skb_to_ring()
  and the read from t7xx_txq_drb_wr_available().
- Fix potential leak in t7xx_dpmaif_add_skb_to_ring().
- Unionize:
    DRB structs: msg and pd.
    PIT structs: msg and pd.
- Replace list_head & spinlock with skb_buff_head in dpmaif_tx_queue.
- Remove rx_length_th check in TX WWAN port flow.
- Remove wwan_remove_port() from the critical section in WWAN port uninit.
- Use skb_end_pointer() to avoid conditional compilation.
- Simplify the loop in t7xx_port_ctrl_tx() by checking the buffer offset
  instead of calculating the number of required packets.
- Remove the code for unused channel PORT_CH_STATUS_RX.
- Remove bit flags from ports. Ports can check chan_enable instead of the
  PORT_F_RX_ALLOW_DROP flag.
- Use INVALID_SEQ_NUM to identify the first seq number.
- Rename port_static to port_conf and ports_private to ports.
- Implement t7xx_port_send_skb() and t7xx_port_send_ctl_skb() in a layered
  approach to reduce duplicated code and simplify the CCCI header handling.
- Move wwan_port_rx() call from port-proxy to WWAN port.
- Rename t7xx_port_recv_skb() to t7xx_port_enqueue_skb().
- Move control message parsing logic from port-proxy to control port,
  preserve the endianness when parsing the message and make port-proxy
  export a function to enable/disable ports.
- Use flexible arrays for:
    port-proxy ports.
    payload data in t7xx_fsm_event, port_msg, and mtk_runtime_feature.

v5:
- Update Intel's copyright years to 2021-2022.
- Remove circular dependency between DPMAIF HW (07) and HIF (08).
- Keep separate patches for CLDMA (02) and Core (03)
  but improve the code split by decoupling CLDMA from
  modem ops and cleaning up t7xx_common.h.
- Rename ID_CLDMA0/ID_CLDMA1 to CLDMA_ID_AP/CLDMA_ID_MD.
- Consistently use CLDMA's ring_lock to protect tr_ring.
- Free resources first and then print messages.
- Implement suggested name changes.
- Do not explicitly include dev_printk.h.
- Remove redundant dev_err()s.
- Fix possible memory leak during probe.
- Remove infrastructure for legacy interrupts.
- Remove unused macros and variables, including those that
  can be replaced with constants.
- Remove PCIE_MAC_MSIX_MSK_SET macro which is duplicated code.
- Refactor __t7xx_pci_pm_suspend() for clarity.
- Refactor t7xx_cldma_rx_ring_init() and t7xx_cldma_tx_ring_init().
- Do not use & for function callbacks.
- Declare a structure to access skb->cb[] data.
- Use skb_put_data instead of memcpy.
- No need to use kfree_sensitive.
- Use dev_kfree_skb() instead of kfree_skb().
- Refactor t7xx_prepare_device_rt_data() to remove potential leaks,
  avoid unneeded memset and keep rt_data and packet_size updates
  inside the same 'if' block.
- Set port's rx_length_th back to 0 during uninit.
- Remove unneeded 'blocking' parameter from t7xx_cldma_send_skb().
- Return -EIO in t7xx_cldma_send_skb() if the queue is inactive.
- Refactor t7xx_cldma_qs_are_active() to use pci_device_is_present().
- Simplify t7xx_cldma_stop_q() and rename it to t7xx_cldma_stop_all_qs().
- Fix potential leaks in t7xx_cldma_init().
- Improve error handling in fsm_append_event and fsm_routine_starting().
- Propagate return codes from fsm_append_cmd() and t7xx_fsm_append_event().
- Refactor fsm_wait_for_event() to avoid unnecessary sleep.
- Create the WWAN ports and net device only after the modem is in
  the ready state.
- Refactor t7xx_port_proxy_recv_skb() and port_recv_skb().
- Rename t7xx_port_check_rx_seq_num() as t7xx_port_next_rx_seq_num()
  and fix the seq_num logic to handle overflows.
- Declare seq_nums as u16 instead of short.
- Use unsigned int for local indexes.
- Use min_t instead of the ternary operator.
- Refactor the loop in t7xx_dpmaif_rx_data_collect() to avoid a dead
  condition check.
- Use a bitmap (bat_bitmap) instead of an array to keep track of
  the DRB status. Used in t7xx_dpmaif_avail_pkt_bat_cnt().
- Refactor t7xx_dpmaif_tx_send_skb() to protect tx_submit_skb_cnt
  with spinlock and remove the misleading tx_drb_available variable.
- Consolidate bit operations before endianness conversion.
- Use C bit fields in dpmaif_drb_skb struct which is not HW related.
- Add back the que_started check in t7xx_select_tx_queue().
- Create a helper function to get the DRB count.
- Simplify the use of 'usage' during t7xx_ccmni_close().
- Enforce CCMNI MTU selection with BUILD_BUG_ON() instead of a comment.
- Remove t7xx_ccmni_ctrl->capability parameter which remains constant.

v4:
- Implement list_prev_entry_circular() and list_next_entry_circular() macros.
- Remove inline from all c files.
- Define ioread32_poll_timeout_atomic() helper macro.
- Fix return code for WWAN port tx op.
- Allow AT commands fragmentation same as MBIM commands.
- Introduce t7xx_common.h file in the first patch.
- Rename functions and variables as suggested in v3.
- Reduce code duplication by creating fsm_wait_for_event() helper function.
- Remove unneeded dev_err in t7xx_fsm_clr_event().
- Remove unused variable last_state from struct t7xx_fsm_ctl.
- Remove unused variable txq_select_times from struct dpmaif_ctrl.
- Replace ETXTBSY with EBUSY.
- Refactor t7xx_dpmaif_rx_buf_alloc() to remove an unneeded allocation.
- Fix potential leak at t7xx_dpmaif_rx_frag_alloc().
- Simplify return value handling at t7xx_dpmaif_rx_start().
- Add a helper to handle the common part of CCCI header initialization.
- Make sure interrupts are enabled during PM resume.
- Add a parameter to t7xx_fsm_append_cmd() to tell if it is in interrupt context.

v3:
- Avoid unneeded ping-pong changes between patches.
- Use t7xx_ prefix in functions.
- Use t7xx_ prefix in generic structs where mtk_ or ccci prefix was used.
- Update Authors/Contributors header.
- Remove skb pools used for control path.
- Remove skb pools used for RX data path.
- Do not use dedicated TX queue for ACK-only packets.
- Remove __packed attribute from GPD structs.
- Remove the infrastructure for test and debug ports.
- Use the skb control buffer to store metadata.
- Get the IP packet type from RX PIT.
- Merge variable declaration and simple assignments.
- Use preferred coding patterns.
- Remove global variables.
- Declare HW facing structure members as little endian.
- Rename goto tags to describe what is going to be done.
- Do not use variable length arrays.
- Remove unneeded blank lines source code and kdoc headers.
- Use C99 initialization format for port-proxy ports.
- Clean up comments.
- Review included headers.
- Better use of 100 column limit.
- Remove unneeded mb() in CLDMA.
- Remove unneeded spin locks and atomics.
- Handle read_poll_timeout error.
- Use dev_err_ratelimited() where required.
- Fix resource leak when requesting IRQs.
- Use generic DEFAULT_TX_QUEUE_LEN instead custom macro.
- Use ETH_DATA_LEN instead of defining WWAN_DEFAULT_MTU.
- Use sizeof() instead of defines when the size of structures is required.
- Remove unneeded code from netdev:
    No need to configure HW address length
    No need to implement .ndo_change_mtu
    Remove random address generation
- Code simplifications by using kernel provided functions and macros such as:
    module_pci_driver
    PTR_ERR_OR_ZERO
    for_each_set_bit
    pci_device_is_present
    skb_queue_purge
    list_prev_entry
    __ffs64

v2:
- Replace pdev->driver->name with dev_driver_string(&pdev->dev).
- Replace random_ether_addr() with eth_random_addr().
- Update kernel-doc comment for enum data_policy.
- Indicate the driver is 'Supported' instead of 'Maintained'.
- Fix the Signed-of-by and Co-developed-by tags in the patches.
- Added authors and contributors in the top comment of the src files.
====================

2 years agonet: wwan: t7xx: Add maintainers and documentation
Ricardo Martinez [Fri, 6 May 2022 18:13:10 +0000 (11:13 -0700)]
net: wwan: t7xx: Add maintainers and documentation

Adds maintainers and documentation for MediaTek t7xx 5G WWAN modem
device driver.

Signed-off-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: wwan: t7xx: Device deep sleep lock/unlock
Haijun Liu [Fri, 6 May 2022 18:13:09 +0000 (11:13 -0700)]
net: wwan: t7xx: Device deep sleep lock/unlock

Introduce the mechanism to lock/unlock the device 'deep sleep' mode.
When the PCIe link state is L1.2 or L2, the host side still can keep
the device is in D0 state from the host side point of view. At the same
time, if the device's 'deep sleep' mode is unlocked, the device will
go to 'deep sleep' while it is still in D0 state on the host side.

Signed-off-by: Haijun Liu <haijun.liu@mediatek.com>
Signed-off-by: Chandrashekar Devegowda <chandrashekar.devegowda@intel.com>
Co-developed-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
Signed-off-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: wwan: t7xx: Runtime PM
Haijun Liu [Fri, 6 May 2022 18:13:08 +0000 (11:13 -0700)]
net: wwan: t7xx: Runtime PM

Enables runtime power management callbacks including runtime_suspend
and runtime_resume. Autosuspend is used to prevent overhead by frequent
wake-ups.

Signed-off-by: Haijun Liu <haijun.liu@mediatek.com>
Signed-off-by: Chandrashekar Devegowda <chandrashekar.devegowda@intel.com>
Co-developed-by: Eliot Lee <eliot.lee@intel.com>
Signed-off-by: Eliot Lee <eliot.lee@intel.com>
Signed-off-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: wwan: t7xx: Introduce power management
Haijun Liu [Fri, 6 May 2022 18:13:07 +0000 (11:13 -0700)]
net: wwan: t7xx: Introduce power management

Implements suspend, resumes, freeze, thaw, poweroff, and restore
`dev_pm_ops` callbacks.

From the host point of view, the t7xx driver is one entity. But, the
device has several modules that need to be addressed in different ways
during power management (PM) flows.
The driver uses the term 'PM entities' to refer to the 2 DPMA and
2 CLDMA HW blocks that need to be managed during PM flows.
When a dev_pm_ops function is called, the PM entities list is iterated
and the matching function is called for each entry in the list.

Signed-off-by: Haijun Liu <haijun.liu@mediatek.com>
Signed-off-by: Chandrashekar Devegowda <chandrashekar.devegowda@intel.com>
Co-developed-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
Signed-off-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: wwan: t7xx: Add WWAN network interface
Haijun Liu [Fri, 6 May 2022 18:13:06 +0000 (11:13 -0700)]
net: wwan: t7xx: Add WWAN network interface

Creates the Cross Core Modem Network Interface (CCMNI) which implements
the wwan_ops for registration with the WWAN framework, CCMNI also
implements the net_device_ops functions used by the network device.
Network device operations include open, close, start transmission, TX
timeout and change MTU.

Signed-off-by: Haijun Liu <haijun.liu@mediatek.com>
Co-developed-by: Chandrashekar Devegowda <chandrashekar.devegowda@intel.com>
Signed-off-by: Chandrashekar Devegowda <chandrashekar.devegowda@intel.com>
Co-developed-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
Signed-off-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: wwan: t7xx: Add data path interface
Haijun Liu [Fri, 6 May 2022 18:13:05 +0000 (11:13 -0700)]
net: wwan: t7xx: Add data path interface

Data Path Modem AP Interface (DPMAIF) HIF layer provides methods
for initialization, ISR, control and event handling of TX/RX flows.

DPMAIF TX
Exposes the 'dmpaif_tx_send_skb' function which can be used by the
network device to transmit packets.
The uplink data management uses a Descriptor Ring Buffer (DRB).
First DRB entry is a message type that will be followed by 1 or more
normal DRB entries. Message type DRB will hold the skb information
and each normal DRB entry holds a pointer to the skb payload.

DPMAIF RX
The downlink buffer management uses Buffer Address Table (BAT) and
Packet Information Table (PIT) rings.
The BAT ring holds the address of skb data buffer for the HW to use,
while the PIT contains metadata about a whole network packet including
a reference to the BAT entry holding the data buffer address.
The driver reads the PIT and BAT entries written by the modem, when
reaching a threshold, the driver will reload the PIT and BAT rings.

Signed-off-by: Haijun Liu <haijun.liu@mediatek.com>
Signed-off-by: Chandrashekar Devegowda <chandrashekar.devegowda@intel.com>
Co-developed-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
Signed-off-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: wwan: t7xx: Data path HW layer
Haijun Liu [Fri, 6 May 2022 18:13:04 +0000 (11:13 -0700)]
net: wwan: t7xx: Data path HW layer

Data Path Modem AP Interface (DPMAIF) HW layer provides HW abstraction
for the upper layer (DPMAIF HIF). It implements functions to do the HW
configuration, TX/RX control and interrupt handling.

Signed-off-by: Haijun Liu <haijun.liu@mediatek.com>
Signed-off-by: Chandrashekar Devegowda <chandrashekar.devegowda@intel.com>
Co-developed-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
Signed-off-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: wwan: t7xx: Add AT and MBIM WWAN ports
Chandrashekar Devegowda [Fri, 6 May 2022 18:13:03 +0000 (11:13 -0700)]
net: wwan: t7xx: Add AT and MBIM WWAN ports

Adds AT and MBIM ports to the port proxy infrastructure.
The initialization method is responsible for creating the corresponding
ports using the WWAN framework infrastructure. The implemented WWAN port
operations are start, stop, and TX.

Signed-off-by: Chandrashekar Devegowda <chandrashekar.devegowda@intel.com>
Co-developed-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
Signed-off-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: wwan: t7xx: Add control port
Haijun Liu [Fri, 6 May 2022 18:13:02 +0000 (11:13 -0700)]
net: wwan: t7xx: Add control port

Control Port implements driver control messages such as modem-host
handshaking, controls port enumeration, and handles exception messages.

The handshaking process between the driver and the modem happens during
the init sequence. The process involves the exchange of a list of
supported runtime features to make sure that modem and host are ready
to provide proper feature lists including port enumeration. Further
features can be enabled and controlled in this handshaking process.

Signed-off-by: Haijun Liu <haijun.liu@mediatek.com>
Signed-off-by: Chandrashekar Devegowda <chandrashekar.devegowda@intel.com>
Co-developed-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
Signed-off-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: wwan: t7xx: Add port proxy infrastructure
Haijun Liu [Fri, 6 May 2022 18:13:01 +0000 (11:13 -0700)]
net: wwan: t7xx: Add port proxy infrastructure

Port-proxy provides a common interface to interact with different types
of ports. Ports export their configuration via `struct t7xx_port` and
operate as defined by `struct port_ops`.

Signed-off-by: Haijun Liu <haijun.liu@mediatek.com>
Co-developed-by: Chandrashekar Devegowda <chandrashekar.devegowda@intel.com>
Signed-off-by: Chandrashekar Devegowda <chandrashekar.devegowda@intel.com>
Co-developed-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
Signed-off-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: wwan: t7xx: Add core components
Haijun Liu [Fri, 6 May 2022 18:13:00 +0000 (11:13 -0700)]
net: wwan: t7xx: Add core components

Registers the t7xx device driver with the kernel. Setup all the core
components: PCIe layer, Modem Host Cross Core Interface (MHCCIF),
modem control operations, modem state machine, and build
infrastructure.

* PCIe layer code implements driver probe and removal.
* MHCCIF provides interrupt channels to communicate events
  such as handshake, PM and port enumeration.
* Modem control implements the entry point for modem init,
  reset and exit.
* The modem status monitor is a state machine used by modem control
  to complete initialization and stop. It is used also to propagate
  exception events reported by other components.

Signed-off-by: Haijun Liu <haijun.liu@mediatek.com>
Signed-off-by: Chandrashekar Devegowda <chandrashekar.devegowda@intel.com>
Co-developed-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
Signed-off-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: wwan: t7xx: Add control DMA interface
Haijun Liu [Fri, 6 May 2022 18:12:59 +0000 (11:12 -0700)]
net: wwan: t7xx: Add control DMA interface

Cross Layer DMA (CLDMA) Hardware interface (HIF) enables the control
path of Host-Modem data transfers. CLDMA HIF layer provides a common
interface to the Port Layer.

CLDMA manages 8 independent RX/TX physical channels with data flow
control in HW queues. CLDMA uses ring buffers of General Packet
Descriptors (GPD) for TX/RX. GPDs can represent multiple or single
data buffers (DB).

CLDMA HIF initializes GPD rings, registers ISR handlers for CLDMA
interrupts, and initializes CLDMA HW registers.

CLDMA TX flow:
1. Port Layer write
2. Get DB address
3. Configure GPD
4. Triggering processing via HW register write

CLDMA RX flow:
1. CLDMA HW sends a RX "done" to host
2. Driver starts thread to safely read GPD
3. DB is sent to Port layer
4. Create a new buffer for GPD ring

Note: This patch does not enable compilation since it has dependencies
such as t7xx_pcie_mac_clear_int()/t7xx_pcie_mac_set_int() and
struct t7xx_pci_dev which are added by the core patch.

Signed-off-by: Haijun Liu <haijun.liu@mediatek.com>
Signed-off-by: Chandrashekar Devegowda <chandrashekar.devegowda@intel.com>
Co-developed-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
Signed-off-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: skb: introduce skb_data_area_size()
Ricardo Martinez [Fri, 6 May 2022 18:12:58 +0000 (11:12 -0700)]
net: skb: introduce skb_data_area_size()

Helper to calculate the linear data space in the skb.

Signed-off-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agolist: Add list_next_entry_circular() and list_prev_entry_circular()
Ricardo Martinez [Fri, 6 May 2022 18:12:57 +0000 (11:12 -0700)]
list: Add list_next_entry_circular() and list_prev_entry_circular()

Add macros to get the next or previous entries and wraparound if
needed. For example, calling list_next_entry_circular() on the last
element should return the first element in the list.

Signed-off-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge tag 'batadv-next-pullrequest-20220508' of git://git.open-mesh.org/linux-merge
David S. Miller [Sun, 8 May 2022 16:10:59 +0000 (17:10 +0100)]
Merge tag 'batadv-next-pullrequest-20220508' of git://git.open-mesh.org/linux-merge

This cleanup patchset includes the following patches:

 - bump version strings, by Simon Wunderlich

 - remove unnecessary type castings, by Yu Zhe

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'mlxsw-dedicated-router-notification-block'
David S. Miller [Sun, 8 May 2022 10:46:21 +0000 (11:46 +0100)]
Merge branch 'mlxsw-dedicated-router-notification-block'

Ido Schimmel says:

====================
mlxsw: A dedicated notifier block for router code

Petr says:

Currently all netdevice events are handled in the centralized notifier
handler maintained by spectrum.c. Since a number of events are involving
router code, spectrum.c needs to dispatch them to spectrum_router.c. The
spectrum module therefore needs to know more about the router code than it
should have, and there is are several API points through which the two
modules communicate.

In this patchset, move bulk of the router-related event handling to the
router code. Some of the knowledge has to stay: spectrum.c cannot veto
events that the router supports, and vice versa. But beyond that, the two
can ignore each other's details, which leads to more focused and simpler
code.

As a side effect, this fixes L3 HW stats support on tunnel netdevices.

The patch set progresses as follows:

- In patch #1, change spectrum code to not bounce L3 enslavement, which the
  router code supports.

- In patch #2, add a new do-nothing notifier block to the router code.

- In patches #3-#6, move router-specific event handling to the router
  module. In patch #7, clean up a comment.

- In patch #8, use the advantage that all router event handling is in the
  router code and clean up taking router lock.

- mlxsw supports L3 HW stats on tunnels as of this patchset. Patches #9 and
  #10 therefore add a selftest for L3 HW stats support on tunnels.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoselftests: forwarding: Add a tunnel-based test for L3 HW stats
Petr Machata [Sun, 8 May 2022 08:08:23 +0000 (11:08 +0300)]
selftests: forwarding: Add a tunnel-based test for L3 HW stats

Add a selftest that uses an IPIP topology and tests that L3 HW stats
reflect the traffic in the tunnel.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoselftests: lib: Add a generic helper for obtaining HW stats
Petr Machata [Sun, 8 May 2022 08:08:22 +0000 (11:08 +0300)]
selftests: lib: Add a generic helper for obtaining HW stats

The function get_l3_stats() from the test hw_stats_l3.sh will be useful for
any test that wishes to work with L3 stats. Furthermore, it is easy to
generalize to other HW stats suites (for when such are added). Therefore,
move the code to lib.sh, rewrite it to have the same interface as the other
stats-collecting functions, and generalize to take the name of the HW stats
suite to collect as an argument.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomlxsw: spectrum_router: Take router lock in router notifier handler
Petr Machata [Sun, 8 May 2022 08:08:21 +0000 (11:08 +0300)]
mlxsw: spectrum_router: Take router lock in router notifier handler

For notifications that the router needs to handle, router lock is taken.
Further, at least to determine whether an event is related to a tunnel
underlay, router lock also needs to be taken. Due to this, the router lock
is always taken for each unhandled event, and also for some handled events,
even if they are not related to underlay. Thus each event implies at least
one router lock, sometimes two.

Instead of deferring the locking to the leaf handlers, take the lock in the
router notifier handler always. This simplifies thinking about the locking
state, and in some cases saves one lock cycle.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomlxsw: spectrum: Update a comment
Petr Machata [Sun, 8 May 2022 08:08:20 +0000 (11:08 +0300)]
mlxsw: spectrum: Update a comment

The position of netdevice notifier registration no longer depends on the
router initialization, because the event handler no longer dispatches to
the router code. Update the comment at the registration to that effect.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomlxsw: spectrum: Move handling of tunnel events to router code
Petr Machata [Sun, 8 May 2022 08:08:19 +0000 (11:08 +0300)]
mlxsw: spectrum: Move handling of tunnel events to router code

The events related to IPIP tunnels are handled by the router code. Move the
handling from the central dispatcher in spectrum.c to the new notifier
handler in the router module.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomlxsw: spectrum: Move handling of router events to router code
Petr Machata [Sun, 8 May 2022 08:08:18 +0000 (11:08 +0300)]
mlxsw: spectrum: Move handling of router events to router code

The events NETDEV_PRE_CHANGEADDR, NETDEV_CHANGEADDR and NETDEV_CHANGEMTU
have implications for in-ASIC router interface objects, and as such are
handled in the router module. Move the handling from the central dispatcher
in spectrum.c to the new notifier handler in the router module.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomlxsw: spectrum: Move handling of HW stats events to router code
Petr Machata [Sun, 8 May 2022 08:08:17 +0000 (11:08 +0300)]
mlxsw: spectrum: Move handling of HW stats events to router code

L3 HW stats are implemented in mlxsw as RIF counters, and therefore the
code resides in spectrum_router. Exclude the offload xstats events from the
mlxsw_sp_netdevice_event_is_router() predicate, and instead recreate the
glue code in the router module.

Previously, the order of dispatch was that for events on tunnels, a
dedicated handler was called, which however did not handle HW stats events.
But there is nothing special about tunnel devices as far as HW stats: there
is a RIF associated with the tunnel netdevice, and that RIF is where the
counter should be installed. Therefore now, HW stats events are tested
first, independent of netdevice type. The upshot is that as of this commit,
mlxsw supports L3 HW stats work on GRE tunnels.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomlxsw: spectrum: Move handling of VRF events to router code
Petr Machata [Sun, 8 May 2022 08:08:16 +0000 (11:08 +0300)]
mlxsw: spectrum: Move handling of VRF events to router code

Events involving VRF, as L3 concern, are handled in the router code, by the
helper mlxsw_sp_netdevice_vrf_event(). The handler is currently invoked
from the centralized dispatcher in spectrum.c. Instead, move the call to
the newly-introduced router-specific notifier handler.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomlxsw: spectrum_router: Add a dedicated notifier block
Petr Machata [Sun, 8 May 2022 08:08:15 +0000 (11:08 +0300)]
mlxsw: spectrum_router: Add a dedicated notifier block

Currently all netdevice events are handled in the centralized notifier
handler maintained by spectrum.c. Since a number of events are involving
router code, spectrum.c needs to dispatch them to spectrum_router.c. The
spectrum module therefore needs to know more about the router code than it
should have, and there is are several API points through which the two
modules communicate.

To simplify the notifier handlers, introduce a new notifier into the router
module.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomlxsw: spectrum: Tolerate enslaving of various devices to VRF
Petr Machata [Sun, 8 May 2022 08:08:14 +0000 (11:08 +0300)]
mlxsw: spectrum: Tolerate enslaving of various devices to VRF

Enslaving netdevices to VRF is currently handled through an
mlxsw_sp_is_vrf_event() conditional in mlxsw_sp_netdevice_event(). In the
following patch sets, VRF enslavement will be handled purely in the router
code. Therefore make handlers of NETDEV_PRECHANGEUPPER tolerant of
enslaving to VRF, so that they do not bounce the change.

For NETDEV_CHANGEUPPER, drop the WARN_ON(1) and bounce from
mlxsw_sp_netdevice_port_vlan_event(). This is the only handler that warned
and bounces even in the CHANGEUPPER code, other handler quietly do nothing
when they encounter an unfamiliar upper.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'switch-drivers-to-netif_napi_add_weight'
David S. Miller [Sun, 8 May 2022 10:33:57 +0000 (11:33 +0100)]
Merge branch 'switch-drivers-to-netif_napi_add_weight'

Jakub Kicinski says:

====================
net: switch drivers to netif_napi_add_weight()

The minority of drivers pass a custom weight to netif_napi_add().
Switch those away to the new netif_napi_add_weight(). All drivers
(which can go thru net-next) calling netif_napi_add() will now
be calling it with NAPI_POLL_WEIGHT or 64.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: wan: switch to netif_napi_add_weight()
Jakub Kicinski [Fri, 6 May 2022 17:07:51 +0000 (10:07 -0700)]
net: wan: switch to netif_napi_add_weight()

A handful of WAN drivers use custom napi weights,
switch them to the new API.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: virtio: switch to netif_napi_add_weight()
Jakub Kicinski [Fri, 6 May 2022 17:07:50 +0000 (10:07 -0700)]
net: virtio: switch to netif_napi_add_weight()

virtio netdev driver uses a custom napi weight, switch to the new
API for setting custom weight.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agor8152: switch to netif_napi_add_weight()
Jakub Kicinski [Fri, 6 May 2022 17:07:49 +0000 (10:07 -0700)]
r8152: switch to netif_napi_add_weight()

r8152 uses a custom napi weight, switch to the new
API for setting custom weight.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoeth: switch to netif_napi_add_weight()
Jakub Kicinski [Fri, 6 May 2022 17:07:48 +0000 (10:07 -0700)]
eth: switch to netif_napi_add_weight()

Switch all Ethernet drivers which use custom napi weights
to the new API.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agocaif_virtio: switch to netif_napi_add_weight()
Jakub Kicinski [Fri, 6 May 2022 17:07:47 +0000 (10:07 -0700)]
caif_virtio: switch to netif_napi_add_weight()

caif_virtio uses a custom napi weight, switch to the new
API for setting custom weights.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoum: vector: switch to netif_napi_add_weight()
Jakub Kicinski [Fri, 6 May 2022 17:07:46 +0000 (10:07 -0700)]
um: vector: switch to netif_napi_add_weight()

UM's netdev driver uses a custom napi weight, switch to the new
API for setting custom weight.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'simplify-migration-of-host-filtered-addresses-in-felix-driver'
Jakub Kicinski [Sat, 7 May 2022 04:00:15 +0000 (21:00 -0700)]
Merge branch 'simplify-migration-of-host-filtered-addresses-in-felix-driver'

Vladimir Oltean says:

====================
Simplify migration of host filtered addresses in Felix driver

The purpose of this patch set is to remove the functions
dsa_port_walk_fdbs() and dsa_port_walk_mdbs() from the DSA core, which
were introduced when the Felix driver gained support for unicast
filtering on standalone ports. They get called when changing the tagging
protocol back and forth between "ocelot" and "ocelot-8021q".
I did not realize we could get away without having them.

The patch set was regression-tested using the local_termination.sh
selftest using both tagging protocols.
====================

Link: https://lore.kernel.org/r/20220505162213.307684-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: dsa: delete dsa_port_walk_{fdbs,mdbs}
Vladimir Oltean [Thu, 5 May 2022 16:22:13 +0000 (19:22 +0300)]
net: dsa: delete dsa_port_walk_{fdbs,mdbs}

All the users of these functions are gone, delete them before they gain
new ones.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: dsa: felix: perform MDB migration based on ocelot->multicast list
Vladimir Oltean [Thu, 5 May 2022 16:22:12 +0000 (19:22 +0300)]
net: dsa: felix: perform MDB migration based on ocelot->multicast list

The felix driver is the only user of dsa_port_walk_mdbs(), and there
isn't even a good reason for it, considering that the host MDB entries
are already saved by the ocelot switch lib in the ocelot->multicast list.

Rewrite the multicast entry migration procedure around the
ocelot->multicast list so we can delete dsa_port_walk_mdbs().

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: dsa: felix: stop migrating FDBs back and forth on tag proto change
Vladimir Oltean [Thu, 5 May 2022 16:22:11 +0000 (19:22 +0300)]
net: dsa: felix: stop migrating FDBs back and forth on tag proto change

I just realized we don't need to migrate the host-filtered FDB entries
when the tagging protocol changes from "ocelot" to "ocelot-8021q".

Host-filtered addresses are learned towards the PGID_CPU "multicast"
port group, reserved by software, which contains BIT(ocelot->num_phys_ports).
That is the "special" port entry in the analyzer block for the CPU port
module.

In "ocelot" mode, the CPU port module's packets are redirected to the
NPI port.

In "ocelot-8021q" mode, felix_8021q_cpu_port_init() does something funny
anyway, and changes PGID_CPU to stop pointing at the CPU port module and
start pointing at the physical port where the DSA master is attached.

The fact that we can alter the destination of packets learned towards
PGID_CPU without altering the MAC table entries themselves means that it
is pointless to walk through the FDB entries, forget that they were
learned towards PGID_CPU, and re-learn them towards the "unicast" PGID
associated with the physical port connected to the DSA master. We can
let the PGID_CPU value change simply alter the destination of the
host-filtered unicast packets in one fell swoop.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: dsa: felix: use PGID_CPU for FDB entry migration on NPI port
Vladimir Oltean [Thu, 5 May 2022 16:22:10 +0000 (19:22 +0300)]
net: dsa: felix: use PGID_CPU for FDB entry migration on NPI port

ocelot_fdb_add() redirects FDB entries installed on the NPI port towards
the special reserved PGID_CPU used for host-filtered addresses. PGID_CPU
contains BIT(ocelot->num_phys_ports) in the destination port mask, which
is code name for the CPU port module.

Whereas felix_migrate_fdbs_to_*_port() uses the ocelot->num_phys_ports
PGID directly, and it appears that this works too. Even if this PGID is
set to zero, apparently its number is special and packets still reach
the CPU port module.

Nonetheless, in the end, these addresses end up in the same place
regardless of whether they go through an extra indirection layer or not.
Use PGID_CPU across to have more uniformity.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agomlxbf_gige: increase MDIO polling rate to 5us
David Thompson [Thu, 5 May 2022 16:23:09 +0000 (12:23 -0400)]
mlxbf_gige: increase MDIO polling rate to 5us

This patch increases the polling rate used by the
mlxbf_gige driver on the MDIO bus.  The previous
polling rate was every 100us, and the new rate is
every 5us.  With this change the amount of time
spent waiting for the MDIO BUSY signal to de-assert
drops from ~100us to ~27us for each operation.

Signed-off-by: David Thompson <davthompson@nvidia.com>
Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
Link: https://lore.kernel.org/r/20220505162309.20050-1-davthompson@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next...
Jakub Kicinski [Fri, 6 May 2022 22:39:28 +0000 (15:39 -0700)]
Merge branch '10GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
10GbE Intel Wired LAN Driver Updates 2022-05-05

This series contains updates to ixgbe and igb drivers.

Jeff Daly adjusts type for 'allow_unsupported_sfp' to match the
associated struct value for ixgbe.

Alaa Mohamed converts, deprecated, kmap() call to kmap_local_page() for
igb.

* '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  igb: Convert kmap() to kmap_local_page()
  ixgbe: Fix module_param allow_unsupported_sfp type
====================

Link: https://lore.kernel.org/r/20220505155651.2606195-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'tso-gso-limit-split'
David S. Miller [Fri, 6 May 2022 11:07:56 +0000 (12:07 +0100)]
Merge branch 'tso-gso-limit-split'

Jakub Kicinski says:

====================
net: disambiguate the TSO and GSO limits

This series separates the device-reported TSO limitations
from the user space-controlled GSO limits. It used to be that
we only had the former (HW limits) but they were named GSO.
This probably lead to confusion and letting user override them.

The problem came up in the BIG TCP discussion between Eric and
Alex, and seems like something we should address.

Targeting net-next because (a) nobody is reporting problems;
and (b) there is a tiny but non-zero chance that some actually
wants to lift the HW limitations.
====================

Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: move netif_set_gso_max helpers
Jakub Kicinski [Fri, 6 May 2022 02:51:34 +0000 (19:51 -0700)]
net: move netif_set_gso_max helpers

These are now internal to the core, no need to expose them.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: make drivers set the TSO limit not the GSO limit
Jakub Kicinski [Fri, 6 May 2022 02:51:33 +0000 (19:51 -0700)]
net: make drivers set the TSO limit not the GSO limit

Drivers should call the TSO setting helper, GSO is controllable
by user space.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: don't allow user space to lift the device limits
Jakub Kicinski [Fri, 6 May 2022 02:51:32 +0000 (19:51 -0700)]
net: don't allow user space to lift the device limits

Up until commit 46e6b992c250 ("rtnetlink: allow GSO maximums to
be set on device creation") the gso_max_segs and gso_max_size
of a device were not controlled from user space.

The quoted commit added the ability to control them because of
the following setup:

 netns A  |  netns B
     veth<->veth   eth0

If eth0 has TSO limitations and user wants to efficiently forward
traffic between eth0 and the veths they should copy the TSO
limitations of eth0 onto the veths. This would happen automatically
for macvlans or ipvlan but veth users are not so lucky (given the
loose coupling).

Unfortunately the commit in question allowed users to also override
the limits on real HW devices.

It may be useful to control the max GSO size and someone may be using
that ability (not that I know of any user), so create a separate set
of knobs to reliably record the TSO limitations. Validate the user
requests.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: add netif_inherit_tso_max()
Jakub Kicinski [Fri, 6 May 2022 02:51:31 +0000 (19:51 -0700)]
net: add netif_inherit_tso_max()

To make later patches smaller create a helper for inheriting
the TSO limitations of a lower device. The TSO in the name
is not an accident, subsequent patches will replace GSO
with TSO in more names.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'nfp-flower-rework'
David S. Miller [Fri, 6 May 2022 10:21:34 +0000 (11:21 +0100)]
Merge branch 'nfp-flower-rework'

Simon Horman says:

====================
nfp: flower: decap neighbour table rework

Louis Peens says:

This patch series reworks the way in which flow rules that outputs to
OVS internal ports gets handled by the nfp driver.

Previously this made use of a small pre_tun_table, but this only used
destination MAC addresses, and made the implicit assumption that there is
only a single source MAC":"destination MAC" mapping per tunnel. In
hindsight this seems to be a pretty obvious oversight, but this was hidden
in plain sight for quite some time.

This series changes the implementation to make use of the same Neighbour
table for decap that is in use for the tunnel encap solution. It stores
any new Neighbour updates in this table. Previously this path was only
triggered for encapsulation candidates, and the entries were send and
forget, not saved on the host as it is after this series. It also keeps
track of any flow rule that outputs to OVS internal ports (and some
other criteria not worth mentioning here), very similar to how it was
done previously, except now these flows are kept track of in a list.

When a new Neighbour entry gets added this list gets iterated for
potential matches, in which case the table gets updated with a reference
to the flow, and the Neighbour entry on the card gets updated with the
relevant host_ctx. The same happens when a new qualifying flow gets
added - the Neighbour table gets iterated for applicable matches, and
once again the firmware gets updated with the host_ctx when any matches
are found.

Since this also requires a firmware change we add a new capability bit,
and keep the old behaviour in case of older firmware without this bit
set.

This series starts by doing some preparation, then adding the new list
and table entries. Next the functionality to link/unlink these entries
are added, and finally this new functionality is enabled by adding the
DECAP_V2 bit to the driver feature list.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonfp: flower: enable decap_v2 bit
Louis Peens [Thu, 5 May 2022 05:43:48 +0000 (14:43 +0900)]
nfp: flower: enable decap_v2 bit

Finally enable the decap_v2 feature bit now that all the
other bits are in place to configure it correctly.

Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonfp: flower: remove unused neighbour cache
Louis Peens [Thu, 5 May 2022 05:43:47 +0000 (14:43 +0900)]
nfp: flower: remove unused neighbour cache

With the neighbour entries now stored in a dedicated table there
is no use to make use of the tunnel route cache anymore, so remove
this.

Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonfp: flower: link pre_tun flow rules with neigh entries
Louis Peens [Thu, 5 May 2022 05:43:46 +0000 (14:43 +0900)]
nfp: flower: link pre_tun flow rules with neigh entries

Add helper functions that can create links between flow rules
and cached neighbour entries. Also add the relevant calls to
these functions.

* When a new neighbour entry gets added cycle through the saved
  pre_tun flow list and link any relevant matches. Update the
  neighbour table on the nfp with this new information.
* When a new pre_tun flow rule gets added iterate through the
  save neighbour entries and link any relevant matches. Once
  again update the nfp neighbour table with any new links.
* Do the inverse when deleting - remove any created links and
  also inform the nfp of this.

Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonfp: flower: rework tunnel neighbour configuration
Louis Peens [Thu, 5 May 2022 05:43:45 +0000 (14:43 +0900)]
nfp: flower: rework tunnel neighbour configuration

This patch updates the way in which the tunnel neighbour entries
are handled. Previously they were mostly send-and-forget, with
just the destination IP's cached in a list. This update changes
to a scheme where the neighbour entry information is stored in
a hash table.

The reason for this is that the neighbour table will now also
be used on the decapsulation path, whereas previously it was
only used for encapsulation. We need to save more of the neighbour
information in order to link them with flower flows in follow
up patches.

Updating of the neighbour table is now also handled by the same
function, instead of separate  *_write_neigh_vX functions.

Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonfp: flower: update nfp_tun_neigh structs
Louis Peens [Thu, 5 May 2022 05:43:44 +0000 (14:43 +0900)]
nfp: flower: update nfp_tun_neigh structs

Prepare for more rework in following patches by updating
the existing nfp_neigh_structs. The update allows for
the same headers to be used for both old and new firmware,
with a slight length adjustment when sending the control message
to the firmware.

Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonfp: flower: fixup ipv6/ipv4 route lookup for neigh events
Louis Peens [Thu, 5 May 2022 05:43:43 +0000 (14:43 +0900)]
nfp: flower: fixup ipv6/ipv4 route lookup for neigh events

When a callback is received to invalidate a neighbour entry
there is no need to try and populate any other flow information.
Only the flowX->daddr information is needed as lookup key to delete
an entry from the NFP neighbour table. Fix this by only doing the
lookup if the callback is for a new entry.

As part of this cleanup remove the setting of flow6.flowi6_proto, as
this is not needed either, it looks to be a possible leftover from a
previous implementation.

Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonfp: flower: enforce more strict pre_tun checks
Louis Peens [Thu, 5 May 2022 05:43:42 +0000 (14:43 +0900)]
nfp: flower: enforce more strict pre_tun checks

Make sure that the rule also matches on source MAC address. On top
of that also now save the src and dst MAC addresses similar to how
vlan_tci is saved - this will be used in later comparisons with
neighbour entries. Indicate if the flow matched on ipv4 or ipv6.
Populate the vlan_tpid field that got added to the pre_run_rule
struct as well.

Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonfp: flower: add/remove predt_list entries
Louis Peens [Thu, 5 May 2022 05:43:41 +0000 (14:43 +0900)]
nfp: flower: add/remove predt_list entries

Add calls to add and remove flows to the predt_table. This very simply
just allocates and add a new pretun entry if detected as such, and
removes it when encountered on a delete flow.

Compatibility for older firmware is kept in place through the
DECAP_V2 feature bit.

Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonfp: flower: add infrastructure for pre_tun rework
Louis Peens [Thu, 5 May 2022 05:43:40 +0000 (14:43 +0900)]
nfp: flower: add infrastructure for pre_tun rework

The previous implementation of using a pre_tun_table for decap has
some limitations, causing flows to end up unoffloaded when in fact
we are able to offload them. This is because the pre_tun_table does
not have enough matching resolution. The next step is to instead make
use of the neighbour table which already exists for the encap direction.
This patch prepares for this by:

- Moving nfp_tun_neigh/_v6 to main.h.
- Creating two new "wrapping" structures, one to keep track of neighbour
  entries (previously they were send-and-forget), and another to keep
  track of pre_tun flows.
- Create a new list in nfp_flower_priv to keep track of pre_tunnel flows
- Create a new table in nfp_flower_priv to keep track of next neighbour
  entries
- Initialising and destroying these new list/tables
- Extending nfp_fl_payload->pre_tun_rule to save more information for
  future use.

Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next...
David S. Miller [Fri, 6 May 2022 09:50:05 +0000 (10:50 +0100)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2022-05-05
This series contains updates to ice driver only.

Wan Jiabing converts an open coded min selection to min_t().

Maciej commonizes on a single find VSI function and removes the
duplicated implementation.

Wojciech adjusts the return value when exceeding ICE_MAX_CHAIN_WORDS to,
a more appropriate, -ENOSPC and allows for the error to be propagated.

Michal adds support for ndo_get_devlink_port().

Jake does some cleanup related to virtualization code. Mainly involving
function header comments and wording changes. NULL checks are added to
ice_get_vf_vsi() calls in order to prevent static analysis tools from
complaining that a NULL value could be dereferenced.
---
v2: Dropped patch 1: "ice: Add support for classid based queue selection"
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'mptcp-improve-mptcp-level-window-tracking'
Jakub Kicinski [Fri, 6 May 2022 02:00:19 +0000 (19:00 -0700)]
Merge branch 'mptcp-improve-mptcp-level-window-tracking'

Mat Martineau says:

====================
mptcp: Improve MPTCP-level window tracking

This series improves MPTCP receive window compliance with RFC 8684 and
helps increase throughput on high-speed links. Note that patch 3 makes a
change in tcp_output.c

For the details, Paolo says:

I've been chasing bad/unstable performance with multiple subflows
on very high speed links.

It looks like the root cause is due to the current mptcp-level
congestion window handling. There are apparently a few different
sub-issues:

- the rcv_wnd is not effectively shared on the tx side, as each
  subflow takes in account only the value received by the underlaying
  TCP connection. This is addressed in patch 1/5

- The mptcp-level offered wnd right edge is currently allowed to shrink.
  Reading section 3.3.4.:

"""
   The receive window is relative to the DATA_ACK.  As in TCP, a
   receiver MUST NOT shrink the right edge of the receive window (i.e.,
   DATA_ACK + receive window).  The receiver will use the data sequence
   number to tell if a packet should be accepted at the connection
   level.
"""

I read the above as we need to reflect window right-edge tracking
on the wire, see patch 4/5.

- The offered window right edge tracking can happen concurrently on
  multiple subflows, but there is no mutex protection. We need an
  additional atomic operation - still patch 4/5

This series additionally bumps a few new MIBs to track all the above
(ensure/observe that the suspected races actually take place).

I could not access again the host where the issue was so
noticeable, still in the current setup the tput changes from
[6-18] Gbps to 19Gbps very stable.
====================

Link: https://lore.kernel.org/r/20220504215408.349318-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agomptcp: add more offered MIBs counter
Paolo Abeni [Wed, 4 May 2022 21:54:08 +0000 (14:54 -0700)]
mptcp: add more offered MIBs counter

Track the exceptional handling of MPTCP-level offered window
with a few more counters for observability.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agomptcp: never shrink offered window
Paolo Abeni [Wed, 4 May 2022 21:54:07 +0000 (14:54 -0700)]
mptcp: never shrink offered window

As per RFC, the offered MPTCP-level window should never shrink.
While we currently track the right edge, we don't enforce the
above constraint on the wire.
Additionally, concurrent xmit on different subflows can end-up in
erroneous right edge update.
Address the above explicitly updating the announced window and
protecting the update with an additional atomic operation (sic)

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agotcp: allow MPTCP to update the announced window
Paolo Abeni [Wed, 4 May 2022 21:54:06 +0000 (14:54 -0700)]
tcp: allow MPTCP to update the announced window

The MPTCP RFC requires that the MPTCP-level receive window's
right edge never moves backward. Currently the MPTCP code
enforces such constraint while tracking the right edge, but it
does not reflects it on the wire, as MPTCP lacks a suitable hook
to update accordingly the TCP header.

This change modifies the existing mptcp_write_options() hook,
providing the current packet's TCP header to the MPTCP protocol,
so that the next patch could implement the above mentioned
constraint.

No functional changes intended.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agomptcp: add mib for xmit window sharing
Paolo Abeni [Wed, 4 May 2022 21:54:05 +0000 (14:54 -0700)]
mptcp: add mib for xmit window sharing

Bump a counter for counter when snd_wnd is shared among subflow,
for observability's sake.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agomptcp: really share subflow snd_wnd
Paolo Abeni [Wed, 4 May 2022 21:54:04 +0000 (14:54 -0700)]
mptcp: really share subflow snd_wnd

As per RFC, mptcp subflows use a "shared" snd_wnd: the effective
window is the maximum among the current values received on all
subflows. Without such feature a data transfer using multiple
subflows could block.

Window sharing is currently implemented in the RX side:
__tcp_select_window uses the mptcp-level receive buffer to compute
the announced window.

That is not enough: the TCP stack will stick to the window size
received on the given subflow; we need to propagate the msk window
value on each subflow at xmit time.

Change the packet scheduler to ignore the subflow level window
and use instead the msk level one

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agofirmware: tee_bnxt: Use UUID API for exporting the UUID
Andy Shevchenko [Wed, 4 May 2022 09:14:07 +0000 (12:14 +0300)]
firmware: tee_bnxt: Use UUID API for exporting the UUID

There is export_uuid() function which exports uuid_t to the u8 array.
Use it instead of open coding variant.

This allows to hide the uuid_t internals.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220504091407.70661-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: Make msg_zerocopy_alloc static
David Ahern [Wed, 4 May 2022 17:09:47 +0000 (10:09 -0700)]
net: Make msg_zerocopy_alloc static

msg_zerocopy_alloc is only used by msg_zerocopy_realloc; remove the
export and make static in skbuff.c

Signed-off-by: David Ahern <dsahern@kernel.org>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Link: https://lore.kernel.org/r/20220504170947.18773-1-dsahern@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: move snowflake callers to netif_napi_add_tx_weight()
Jakub Kicinski [Wed, 4 May 2022 16:37:25 +0000 (09:37 -0700)]
net: move snowflake callers to netif_napi_add_tx_weight()

Make the drivers with custom tx napi weight call netif_napi_add_tx_weight().

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220504163725.550782-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: switch to netif_napi_add_tx()
Jakub Kicinski [Wed, 4 May 2022 16:37:24 +0000 (09:37 -0700)]
net: switch to netif_napi_add_tx()

Switch net callers to the new API not requiring
the NAPI_POLL_WEIGHT argument.

Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Acked-by: Alexandra Winter <wintera@linux.ibm.com>
Link: https://lore.kernel.org/r/20220504163725.550782-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agojme: remove an unnecessary indirection
Jakub Kicinski [Wed, 4 May 2022 16:39:39 +0000 (09:39 -0700)]
jme: remove an unnecessary indirection

Remove a define which looks like a OS abstraction layer
and makes spatch conversions on this driver problematic.

Link: https://lore.kernel.org/r/20220504163939.551231-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: ethernet: Prepare cleanup of powerpc's asm/prom.h
Christophe Leroy [Wed, 4 May 2022 11:32:17 +0000 (13:32 +0200)]
net: ethernet: Prepare cleanup of powerpc's asm/prom.h

powerpc's asm/prom.h includes some headers that it doesn't
need itself.

In order to clean powerpc's asm/prom.h up in a further step,
first clean all files that include asm/prom.h

Some files don't need asm/prom.h at all. For those ones,
just remove inclusion of asm/prom.h

Some files don't need any of the items provided by asm/prom.h,
but need some of the headers included by asm/prom.h. For those
ones, add the needed headers that are brought by asm/prom.h at
the moment and remove asm/prom.h

Some files really need asm/prom.h but also need some of the
headers included by asm/prom.h. For those one, leave asm/prom.h
but also add the needed headers so that they can be removed
from asm/prom.h in a later step.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/r/09a13d592d628de95d30943e59b2170af5b48110.1651663857.git.christophe.leroy@csgroup.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agosungem: Prepare cleanup of powerpc's asm/prom.h
Christophe Leroy [Wed, 4 May 2022 11:16:09 +0000 (13:16 +0200)]
sungem: Prepare cleanup of powerpc's asm/prom.h

powerpc's <asm/prom.h> includes some headers that it doesn't
need itself.

In order to clean powerpc's <asm/prom.h> up in a further step,
first clean all files that include <asm/prom.h>

sungem_phy.c doesn't use any object provided by <asm/prom.h>.

But removing inclusion of <asm/prom.h> leads to the following
errors:

  CC      drivers/net/sungem_phy.o
drivers/net/sungem_phy.c: In function 'bcm5421_init':
drivers/net/sungem_phy.c:448:42: error: implicit declaration of function 'of_get_parent'; did you mean 'dget_parent'? [-Werror=implicit-function-declaration]
  448 |                 struct device_node *np = of_get_parent(phy->platform_data);
      |                                          ^~~~~~~~~~~~~
      |                                          dget_parent
drivers/net/sungem_phy.c:448:42: warning: initialization of 'struct device_node *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
drivers/net/sungem_phy.c:450:35: error: implicit declaration of function 'of_get_property' [-Werror=implicit-function-declaration]
  450 |                 if (np == NULL || of_get_property(np, "no-autolowpower", NULL))
      |                                   ^~~~~~~~~~~~~~~

Remove <asm/prom.h> from included headers but add <linux/of.h> to
handle the above.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/r/f7a7fab3ec5edf803d934fca04df22631c2b449d.1651662885.git.christophe.leroy@csgroup.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: align SO_RCVMARK required privileges with SO_MARK
Eyal Birger [Wed, 4 May 2022 09:54:59 +0000 (12:54 +0300)]
net: align SO_RCVMARK required privileges with SO_MARK

The commit referenced in the "Fixes" tag added the SO_RCVMARK socket
option for receiving the skb mark in the ancillary data.

Since this is a new capability, and exposes admin configured details
regarding the underlying network setup to sockets, let's align the
needed capabilities with those of SO_MARK.

Fixes: 6fd1d51cfa25 ("net: SO_RCVMARK socket option for SO_MARK with recvmsg()")
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Link: https://lore.kernel.org/r/20220504095459.2663513-1-eyal.birger@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoRevert "Merge branch 'mlxsw-line-card-model'"
Jakub Kicinski [Wed, 4 May 2022 15:40:37 +0000 (08:40 -0700)]
Revert "Merge branch 'mlxsw-line-card-model'"

This reverts commit 5e927a9f4b9f29d78a7c7d66ea717bb5c8bbad8e, reversing
changes made to cfc1d91a7d78cf9de25b043d81efcc16966d55b3.

The discussion is still ongoing so let's remove the uAPI
until the discussion settles.

Link: https://lore.kernel.org/all/20220425090021.32e9a98f@kernel.org/
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20220504154037.539442-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 5 May 2022 20:03:18 +0000 (13:03 -0700)]
Merge git://git./linux/kernel/git/netdev/net

tools/testing/selftests/net/forwarding/Makefile
  f62c5acc800e ("selftests/net/forwarding: add missing tests to Makefile")
  50fe062c806e ("selftests: forwarding: new test, verify host mdb entries")
https://lore.kernel.org/all/20220502111539.0b7e4621@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoice: remove period on argument description in ice_for_each_vf
Jacob Keller [Mon, 11 Apr 2022 23:29:07 +0000 (16:29 -0700)]
ice: remove period on argument description in ice_for_each_vf

The ice_for_each_vf macros have comments describing the implementation. One
of the arguments has a period on the end, which is not our typical style.
Remove the unnecessary period.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoice: add a function comment for ice_cfg_mac_antispoof
Jacob Keller [Mon, 11 Apr 2022 23:29:06 +0000 (16:29 -0700)]
ice: add a function comment for ice_cfg_mac_antispoof

This function definition was missing a comment describing its
implementation. Add one.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoice: fix wording in comment for ice_reset_vf
Jacob Keller [Mon, 11 Apr 2022 23:29:05 +0000 (16:29 -0700)]
ice: fix wording in comment for ice_reset_vf

The comment explaining ice_reset_vf has an extraneous "the" with the "if
the resets are disabled". Remove it.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoice: remove return value comment for ice_reset_all_vfs
Jacob Keller [Mon, 11 Apr 2022 23:29:04 +0000 (16:29 -0700)]
ice: remove return value comment for ice_reset_all_vfs

Since commit fe99d1c06c16 ("ice: make ice_reset_all_vfs void"), the
ice_reset_all_vfs function has not returned anything. The function comment
still indicated it did. Fix this.

While here, also add a line to clarify the function resets all VFs at once
in response to hardware resets such as a PF reset.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoice: always check VF VSI pointer values
Jacob Keller [Mon, 11 Apr 2022 23:29:03 +0000 (16:29 -0700)]
ice: always check VF VSI pointer values

The ice_get_vf_vsi function can return NULL in some cases, such as if
handling messages during a reset where the VSI is being removed and
recreated.

Several places throughout the driver do not bother to check whether this
VSI pointer is valid. Static analysis tools maybe report issues because
they detect paths where a potentially NULL pointer could be dereferenced.

Fix this by checking the return value of ice_get_vf_vsi everywhere.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoice: add newline to dev_dbg in ice_vf_fdir_dump_info
Jacob Keller [Mon, 11 Apr 2022 23:29:02 +0000 (16:29 -0700)]
ice: add newline to dev_dbg in ice_vf_fdir_dump_info

The debug print in ice_vf_fdir_dump_info does not end in newlines. This can
look confusing when reading the kernel log, as the next print will
immediately continue on the same line.

Fix this by adding the forgotten newline.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoice: get switch id on switchdev devices
Michal Swiatkowski [Wed, 20 Apr 2022 11:22:43 +0000 (13:22 +0200)]
ice: get switch id on switchdev devices

Switch id should be the same for each netdevice on a driver.
The id must be unique between devices on the same system, but
does not need to be unique between devices on different systems.

The switch id is used to locate ports on a switch and to know if
aggregated ports belong to the same switch.

To meet this requirements, use pci_get_dsn as switch id value, as
this is unique value for each devices on the same system.

Implementing switch id is needed by automatic tools for kubernetes.

Set switch id by setting devlink port attribiutes and calling
devlink_port_attrs_set while creating pf (for uplink) and vf
(for representator) devlink port.

To get switch id (in switchdev mode):
cat /sys/class/net/$PF0/phys_switch_id

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoice: return ENOSPC when exceeding ICE_MAX_CHAIN_WORDS
Wojciech Drewek [Wed, 20 Apr 2022 10:55:41 +0000 (12:55 +0200)]
ice: return ENOSPC when exceeding ICE_MAX_CHAIN_WORDS

When number of words exceeds ICE_MAX_CHAIN_WORDS, -ENOSPC
should be returned not -EINVAL. Do not overwrite this
error code in ice_add_tc_flower_adv_fltr.

Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Suggested-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoice: introduce common helper for retrieving VSI by vsi_num
Maciej Fijalkowski [Thu, 24 Mar 2022 11:49:07 +0000 (12:49 +0100)]
ice: introduce common helper for retrieving VSI by vsi_num

Both ice_idc.c and ice_virtchnl.c carry their own implementation of a
helper function that is looking for a given VSI based on provided
vsi_num. Their functionality is the same, so let's introduce the common
function in ice.h that both of the mentioned sites will use.

This is a strictly cleanup thing, no functionality is changed.

Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoice: use min_t() to make code cleaner in ice_gnss
Wan Jiabing [Mon, 21 Mar 2022 13:59:47 +0000 (21:59 +0800)]
ice: use min_t() to make code cleaner in ice_gnss

Fix the following coccicheck warning:
./drivers/net/ethernet/intel/ice/ice_gnss.c:79:26-27: WARNING opportunity for min()

Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoMerge tag 'net-5.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 5 May 2022 16:45:12 +0000 (09:45 -0700)]
Merge tag 'net-5.18-rc6' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from can, rxrpc and wireguard.

  Previous releases - regressions:

   - igmp: respect RCU rules in ip_mc_source() and ip_mc_msfilter()

   - mld: respect RCU rules in ip6_mc_source() and ip6_mc_msfilter()

   - rds: acquire netns refcount on TCP sockets

   - rxrpc: enable IPv6 checksums on transport socket

   - nic: hinic: fix bug of wq out of bound access

   - nic: thunder: don't use pci_irq_vector() in atomic context

   - nic: bnxt_en: fix possible bnxt_open() failure caused by wrong RFS
     flag

   - nic: mlx5e:
      - lag, fix use-after-free in fib event handler
      - fix deadlock in sync reset flow

  Previous releases - always broken:

   - tcp: fix insufficient TCP source port randomness

   - can: grcan: grcan_close(): fix deadlock

   - nfc: reorder destructive operations in to avoid bugs

  Misc:

   - wireguard: improve selftests reliability"

* tag 'net-5.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (63 commits)
  NFC: netlink: fix sleep in atomic bug when firmware download timeout
  selftests: ocelot: tc_flower_chains: specify conform-exceed action for policer
  tcp: drop the hash_32() part from the index calculation
  tcp: increase source port perturb table to 2^16
  tcp: dynamically allocate the perturb table used by source ports
  tcp: add small random increments to the source port
  tcp: resalt the secret every 10 seconds
  tcp: use different parts of the port_offset for index and offset
  secure_seq: use the 64 bits of the siphash for port offset calculation
  wireguard: selftests: set panic_on_warn=1 from cmdline
  wireguard: selftests: bump package deps
  wireguard: selftests: restore support for ccache
  wireguard: selftests: use newer toolchains to fill out architectures
  wireguard: selftests: limit parallelism to $(nproc) tests at once
  wireguard: selftests: make routing loop test non-fatal
  net/mlx5: Fix matching on inner TTC
  net/mlx5: Avoid double clear or set of sync reset requested
  net/mlx5: Fix deadlock in sync reset flow
  net/mlx5e: Fix trust state reset in reload
  net/mlx5e: Avoid checking offload capability in post_parse action
  ...

2 years agoigb: Convert kmap() to kmap_local_page()
Alaa Mohamed [Tue, 19 Apr 2022 23:43:13 +0000 (01:43 +0200)]
igb: Convert kmap() to kmap_local_page()

kmap() is being deprecated and these usages are all local to the thread
so there is no reason kmap_local_page() can't be used.

Replace kmap() calls with kmap_local_page().

Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoixgbe: Fix module_param allow_unsupported_sfp type
Jeff Daly [Thu, 14 Apr 2022 20:21:04 +0000 (16:21 -0400)]
ixgbe: Fix module_param allow_unsupported_sfp type

The module_param allow_unsupported_sfp should be a boolean to match the
type in the ixgbe_hw struct.

Signed-off-by: Jeff Daly <jeffd@silicom-usa.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agonet: sparx5: Add handling of host MDB entries
Casper Andersson [Tue, 3 May 2022 09:39:22 +0000 (11:39 +0200)]
net: sparx5: Add handling of host MDB entries

Handle adding and removing MDB entries for host

Signed-off-by: Casper Andersson <casper.casan@gmail.com>
Link: https://lore.kernel.org/r/20220503093922.1630804-1-casper.casan@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoNFC: netlink: fix sleep in atomic bug when firmware download timeout
Duoming Zhou [Wed, 4 May 2022 05:58:47 +0000 (13:58 +0800)]
NFC: netlink: fix sleep in atomic bug when firmware download timeout

There are sleep in atomic bug that could cause kernel panic during
firmware download process. The root cause is that nlmsg_new with
GFP_KERNEL parameter is called in fw_dnld_timeout which is a timer
handler. The call trace is shown below:

BUG: sleeping function called from invalid context at include/linux/sched/mm.h:265
Call Trace:
kmem_cache_alloc_node
__alloc_skb
nfc_genl_fw_download_done
call_timer_fn
__run_timers.part.0
run_timer_softirq
__do_softirq
...

The nlmsg_new with GFP_KERNEL parameter may sleep during memory
allocation process, and the timer handler is run as the result of
a "software interrupt" that should not call any other function
that could sleep.

This patch changes allocation mode of netlink message from GFP_KERNEL
to GFP_ATOMIC in order to prevent sleep in atomic bug. The GFP_ATOMIC
flag makes memory allocation operation could be used in atomic context.

Fixes: 9674da8759df ("NFC: Add firmware upload netlink command")
Fixes: 9ea7187c53f6 ("NFC: netlink: Rename CMD_FW_UPLOAD to CMD_FW_DOWNLOAD")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20220504055847.38026-1-duoming@zju.edu.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoMerge branch 'ocelot-vcap-cleanups'
Jakub Kicinski [Thu, 5 May 2022 03:42:17 +0000 (20:42 -0700)]
Merge branch 'ocelot-vcap-cleanups'

Vladimir Oltean says:

====================
Ocelot VCAP cleanups

This is a series of minor code cleanups brought to the Ocelot switch
driver logic for VCAP filters.

- don't use list_for_each_safe() in ocelot_vcap_filter_add_to_block
- don't use magic numbers for OCELOT_POLICER_DISCARD
====================

Link: https://lore.kernel.org/r/20220503120150.837233-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: mscc: ocelot: don't use magic numbers for OCELOT_POLICER_DISCARD
Vladimir Oltean [Tue, 3 May 2022 12:01:50 +0000 (15:01 +0300)]
net: mscc: ocelot: don't use magic numbers for OCELOT_POLICER_DISCARD

OCELOT_POLICER_DISCARD helps "kill dropped packets dead" since a
PERMIT/DENY mask mode with a port mask of 0 isn't enough to stop the CPU
port from receiving packets removed from the forwarding path.

The hardcoded initialization done for it in ocelot_vcap_init() is
confusing. All we need from it is to have a rate and a burst size of 0.

Reuse qos_policer_conf_set() for that purpose.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: mscc: ocelot: drop port argument from qos_policer_conf_set
Vladimir Oltean [Tue, 3 May 2022 12:01:49 +0000 (15:01 +0300)]
net: mscc: ocelot: drop port argument from qos_policer_conf_set

The "port" argument is used for nothing else except printing on the
error path. Print errors on behalf of the policer index, which is less
confusing anyway.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: mscc: ocelot: use list_for_each_entry in ocelot_vcap_filter_add_to_block
Vladimir Oltean [Tue, 3 May 2022 12:01:48 +0000 (15:01 +0300)]
net: mscc: ocelot: use list_for_each_entry in ocelot_vcap_filter_add_to_block

Unify the code paths for adding to an empty list and to a list with
elements by keeping a "pos" list_head element that indicates where to
insert. Initialize "pos" with the list head itself in case
list_for_each_entry() doesn't iterate over any element.

Note that list_for_each_safe() isn't needed because no element is
removed from the list while iterating.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: mscc: ocelot: add to tail of empty list in ocelot_vcap_filter_add_to_block
Vladimir Oltean [Tue, 3 May 2022 12:01:47 +0000 (15:01 +0300)]
net: mscc: ocelot: add to tail of empty list in ocelot_vcap_filter_add_to_block

This makes no functional difference but helps in minimizing the delta
for a future change.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: mscc: ocelot: use list_add_tail in ocelot_vcap_filter_add_to_block()
Vladimir Oltean [Tue, 3 May 2022 12:01:46 +0000 (15:01 +0300)]
net: mscc: ocelot: use list_add_tail in ocelot_vcap_filter_add_to_block()

list_add(..., pos->prev) and list_add_tail(..., pos) are equivalent, use
the later form to unify with the case where the list is empty later.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agodt-bindings: net: lan966x: fix example
Michael Walle [Tue, 3 May 2022 13:20:38 +0000 (15:20 +0200)]
dt-bindings: net: lan966x: fix example

In commit 4fdabd509df3 ("dt-bindings: net: lan966x: remove PHY reset")
the PHY reset was removed, but I failed to remove it from the example.
Fix it.

Fixes: 4fdabd509df3 ("dt-bindings: net: lan966x: remove PHY reset")
Reported-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Walle <michael@walle.cc>
Acked-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20220503132038.2714128-1-michael@walle.cc
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests: ocelot: tc_flower_chains: specify conform-exceed action for policer
Vladimir Oltean [Tue, 3 May 2022 12:14:28 +0000 (15:14 +0300)]
selftests: ocelot: tc_flower_chains: specify conform-exceed action for policer

As discussed here with Ido Schimmel:
https://patchwork.kernel.org/project/netdevbpf/patch/20220224102908.5255-2-jianbol@nvidia.com/

the default conform-exceed action is "reclassify", for a reason we don't
really understand.

The point is that hardware can't offload that police action, so not
specifying "conform-exceed" was always wrong, even though the command
used to work in hardware (but not in software) until the kernel started
adding validation for it.

Fix the command used by the selftest by making the policer drop on
exceed, and pass the packet to the next action (goto) on conform.

Fixes: 8cd6b020b644 ("selftests: ocelot: add some example VCAP IS1, IS2 and ES0 tc offloads")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20220503121428.842906-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'insufficient-tcp-source-port-randomness'
Jakub Kicinski [Thu, 5 May 2022 02:22:34 +0000 (19:22 -0700)]
Merge branch 'insufficient-tcp-source-port-randomness'

Willy Tarreau says:

====================
insufficient TCP source port randomness

In a not-yet published paper, Moshe Kol, Amit Klein, and Yossi Gilad
report being able to accurately identify a client by forcing it to emit
only 40 times more connections than the number of entries in the
table_perturb[] table, which is indexed by hashing the connection tuple.
The current 2^8 setting allows them to perform that attack with only 10k
connections, which is not hard to achieve in a few seconds.

Eric, Amit and I have been working on this for a few weeks now imagining,
testing and eliminating a number of approaches that Amit and his team were
still able to break or that were found to be too risky or too expensive,
and ended up with the simple improvements in this series that resists to
the attack, doesn't degrade the performance, and preserves a reliable port
selection algorithm to avoid connection failures, including the odd/even
port selection preference that allows bind() to always find a port quickly
even under strong connect() stress.

The approach relies on several factors:
  - resalting the hash secret that's used to choose the table_perturb[]
    entry every 10 seconds to eliminate slow attacks and force the
    attacker to forget everything that was learned after this delay.
    This already eliminates most of the problem because if a client
    stays silent for more than 10 seconds there's no link between the
    previous and the next patterns, and 10s isn't yet frequent enough
    to cause too frequent repetition of a same port that may induce a
    connection failure ;

  - adding small random increments to the source port. Previously, a
    random 0 or 1 was added every 16 ports. Now a random 0 to 7 is
    added after each port. This means that with the default 32768-60999
    range, a worst case rollover happens after 1764 connections, and
    an average of 3137. This doesn't stop statistical attacks but
    requires significantly more iterations of the same attack to
    confirm a guess.

  - increasing the table_perturb[] size from 2^8 to 2^16, which Amit
    says will require 2.6 million connections to be attacked with the
    changes above, making it pointless to get a fingerprint that will
    only last 10 seconds. Due to the size, the table was made dynamic.

  - a few minor improvements on the bits used from the hash, to eliminate
    some unfortunate correlations that may possibly have been exploited
    to design future attack models.

These changes were tested under the most extreme conditions, up to
1.1 million connections per second to one and a few targets, showing no
performance regression, and only 2 connection failures within 13 billion,
which is less than 2^-32 and perfectly within usual values.

The series is split into small reviewable changes and was already reviewed
by Amit and Eric.
====================

Link: https://lore.kernel.org/r/20220502084614.24123-1-w@1wt.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agotcp: drop the hash_32() part from the index calculation
Willy Tarreau [Mon, 2 May 2022 08:46:14 +0000 (10:46 +0200)]
tcp: drop the hash_32() part from the index calculation

In commit 190cc82489f4 ("tcp: change source port randomizarion at
connect() time"), the table_perturb[] array was introduced and an
index was taken from the port_offset via hash_32(). But it turns
out that hash_32() performs a multiplication while the input here
comes from the output of SipHash in secure_seq, that is well
distributed enough to avoid the need for yet another hash.

Suggested-by: Amit Klein <aksecurity@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agotcp: increase source port perturb table to 2^16
Willy Tarreau [Mon, 2 May 2022 08:46:13 +0000 (10:46 +0200)]
tcp: increase source port perturb table to 2^16

Moshe Kol, Amit Klein, and Yossi Gilad reported being able to accurately
identify a client by forcing it to emit only 40 times more connections
than there are entries in the table_perturb[] table. The previous two
improvements consisting in resalting the secret every 10s and adding
randomness to each port selection only slightly improved the situation,
and the current value of 2^8 was too small as it's not very difficult
to make a client emit 10k connections in less than 10 seconds.

Thus we're increasing the perturb table from 2^8 to 2^16 so that the
same precision now requires 2.6M connections, which is more difficult in
this time frame and harder to hide as a background activity. The impact
is that the table now uses 256 kB instead of 1 kB, which could mostly
affect devices making frequent outgoing connections. However such
components usually target a small set of destinations (load balancers,
database clients, perf assessment tools), and in practice only a few
entries will be visited, like before.

A live test at 1 million connections per second showed no performance
difference from the previous value.

Reported-by: Moshe Kol <moshe.kol@mail.huji.ac.il>
Reported-by: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
Reported-by: Amit Klein <aksecurity@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>