Doug Dziggel [Wed, 24 Apr 2019 12:20:50 +0000 (05:20 -0700)]
i40e: Improve AQ log granularity
This patch makes it possible to log only AQ descriptors, without the
entire AQ message buffers being dumped too. It should greatly reduce
kernel log size in cases where a full AQ dump is not needed.
Selection is made by setting flags in hw->debug_mask.
Additionally, some debug messages that preceded an AQ dump have been
moved to I40E_DEBUG_AQ_COMMAND class, which seems more appropriate.
Signed-off-by: Doug Dziggel <douglas.a.dziggel@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Piotr Kwapulinski [Wed, 24 Apr 2019 12:20:49 +0000 (05:20 -0700)]
i40e: Add bounds check for ch[] array
Add bounds check for ch[] array.
Use ARRAY_SIZE() to ensure that idx is within the range.
Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Wed, 24 Apr 2019 12:20:48 +0000 (05:20 -0700)]
i40e: Use signed variable
The counter variable in i40e_clean_tx_irq starts out negative and climbs
to 0. So it should not be defined as a u16. This was working by accident
due to the fact the u16 overflows and underflows predictably.
Replace the u16 with int, which is signed and can handle the negativity.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Piotr Kwapulinski [Wed, 24 Apr 2019 12:20:47 +0000 (05:20 -0700)]
i40e: add constraints for accessing veb array
Add veb array access boundary checks.
Ensure veb array index is smaller than I40E_MAX_VEB.
Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Piotr Kwapulinski [Wed, 24 Apr 2019 12:20:46 +0000 (05:20 -0700)]
i40e: let untrusted VF to create up to 16 VLANs
This patch lets untrusted VF to create up to 16 VLANs.
It was implemented by increasing I40E_VC_MAX_VLAN_PER_VF up to 16.
Without this patch untrusted VF could create only up to 8 VLANs.
Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Aleksandr Loktionov [Sat, 30 Mar 2019 00:04:56 +0000 (17:04 -0700)]
i40e: add functions stubs to support EEE
This patch adds functions stubs to support EEE on/off.
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Kevin Darbyshire-Bryant [Fri, 14 Jun 2019 09:09:44 +0000 (10:09 +0100)]
sched: act_ctinfo: use extack error reporting
Use extack error reporting mechanism in addition to returning -EINVAL
NL_SET_ERR_* code shamelessy copy/paste/adjusted from act_pedit &
sch_cake and used as reference as to what I should have done in the
first place.
Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Greg Kroah-Hartman [Fri, 14 Jun 2019 07:04:38 +0000 (09:04 +0200)]
l2tp: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
Also, there is no need to store the individual debugfs file name, just
remove the whole directory all at once, saving a local variable.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guillaume Nault <g.nault@alphalink.fr>
Cc: netdev@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 14 Jun 2019 15:38:27 +0000 (08:38 -0700)]
Merge branch 'r8169-add-and-use-helper-rtl_is_8168evl_up'
Heiner Kallweit says:
====================
r8169: add and use helper rtl_is_8168evl_up
Few registers have been added or changed its purpose with version
RTL8168e-vl, so create a helper for identifying chip versions from
RTL8168e-vl.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Fri, 14 Jun 2019 05:55:21 +0000 (07:55 +0200)]
r8169: use helper rtl_is_8168evl_up for setting register MaxTxPacketSize
>From RTL8168e-vl the value in register MaxTxPacketSize is interpreted
differently, therefore use new helper rtl_is_8168evl_up to set this
register.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Fri, 14 Jun 2019 05:54:07 +0000 (07:54 +0200)]
r8169: add helper rtl_is_8168evl_up
Add helper rtl_is_8168evl_up to make the code better readable and to
simplify it.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Tue, 11 Jun 2019 19:09:19 +0000 (21:09 +0200)]
r8169: improve rtl_coalesce_info
tp->coalesce_info is used in rtl_coalesce_info() only, so we can
remove this member. In addition replace phy_ethtool_get_link_ksettings
with a direct access to tp->phydev->speed.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Tue, 11 Jun 2019 19:04:09 +0000 (21:04 +0200)]
r8169: let mdio read functions return -ETIMEDOUT
In case of a timeout currently ~0 is returned. Callers often just check
whether a certain bit is set and therefore may behave incorrectly.
So let's return -ETIMEDOUT in case of a timeout.
r8168_phy_ocp_read is used in r8168g_mdio_read only, therefore we can
apply the same change.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 11 Jun 2019 18:47:45 +0000 (21:47 +0300)]
net: dsa: tag_sja1105: Select CONFIG_PACKING
The packing facility is needed to decode Ethernet meta frames containing
source port and RX timestamping information.
The DSA driver selects CONFIG_PACKING, but the tagger did not, and since
taggers can be now compiled as modules independently from the drivers
themselves, this is an issue now, as CONFIG_PACKING is disabled by
default on all architectures.
Fixes:
e53e18a6fe4d ("net: dsa: sja1105: Receive and decode meta frames")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Hancock [Tue, 11 Jun 2019 16:56:02 +0000 (10:56 -0600)]
net: axienet: move use of resource after validity check
We were accessing the pointer returned from platform_get_resource before
checking if it was valid, causing an oops if it was not. Move this access
after the call to devm_ioremap_resource which does the validity check.
Signed-off-by: Robert Hancock <hancock@sedsystems.ca>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 14 Jun 2019 05:39:32 +0000 (22:39 -0700)]
Merge branch 's390-qeth-next'
Julian Wiedmann says:
====================
s390/qeth: updates 2019-06-11
please apply the following patch series for qeth to net-next.
This brings all sorts of cleanups and minor improvements,
primarily for the control IO path.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Tue, 11 Jun 2019 16:38:00 +0000 (18:38 +0200)]
s390/qeth: allocate a single cmd on read channel
We statically allocate 8 cmd buffers on the read channel, when the only
IO left that's still using them is the long-running READ.
Replace this with a single allocated cmd, that gets restarted whenever
the READ completed.
This introduces refcounting for allocated cmds, so that the READ cmd can
survive the IO completion.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Tue, 11 Jun 2019 16:37:59 +0000 (18:37 +0200)]
s390/qeth: command-chain the IDX sequence
The current IDX sequence first sends one WRITE cmd to activate the
device, and then sends a second cmd that READs the response.
Using qeth_alloc_cmd(), we can combine this into a single IO with two
command-chained CCWs.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Tue, 11 Jun 2019 16:37:58 +0000 (18:37 +0200)]
s390/qeth: convert RCD code to common IO infrastructure
The RCD code is the last remaining IO path that doesn't use the
qeth_send_control_data() infrastructure. Doing so allows us to remove
all sorts of custom state machinery and logic in the IRQ handler.
Instead of introducing statically allocated cmd buffers for this single
IO on the data channel, use the new qeth_alloc_cmd() helper.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Tue, 11 Jun 2019 16:37:57 +0000 (18:37 +0200)]
s390/qeth: add support for dynamically allocated cmds
qeth currently uses a fixed set of statically allocated cmd buffers for
the read and write IO channels. This (1) doesn't play well with the single
RCD cmd we need to issue on the data channel, (2) doesn't provide the
necessary flexibility for certain IDX improvements, and (3) is also rather
wasteful since the buffers are idle most of the time.
Add a new type of cmd buffer that is dynamically allocated, and keeps
its ccw chain in the DMA data area. Since this touches most callers of
qeth_setup_ccw(), also add a new CCW flags parameter for future usage.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Tue, 11 Jun 2019 16:37:56 +0000 (18:37 +0200)]
s390/qeth: remove 'channel' parameter from callbacks
Each cmd buffer maintains a pointer to the IO channel that it was/will
be issued on. So when dealing with cmd buffers, we don't need to pass
around a separate channel pointer.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Tue, 11 Jun 2019 16:37:55 +0000 (18:37 +0200)]
s390/qeth: convert device-specific trace entries
The vast majority of SETUP-classified trace entries can be moved to
their device-specific trace file. This reduces pollution of the global
SETUP file, and provides a consistent trace view of all activity on the
device.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Tue, 11 Jun 2019 16:37:54 +0000 (18:37 +0200)]
s390/qeth: remove OSN-specific IO code
OSN currently provides a custom code path to submit IPA cmds, without
waiting for the cmd response. Replace it with qeth_send_ipa_cmd(), which
uses the common qeth_send_control_data() IO infrastructure.
By setting a custom iob->callback, we can now provide feedback to the
caller about whether the cmd has been successfully submitted to HW.
Since the callback then immediately wakes up the reply-waiter object, we
maintain the old behaviour of returning early without waiting for the
response.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Tue, 11 Jun 2019 16:37:53 +0000 (18:37 +0200)]
s390/qeth: remove qeth_wait_for_buffer()
The basic MPC initialization sequence is strictly sequential, and
waiting for an available cmd buffer should never be necessary.
So this change only affects the OSN path, where dangling waiters on an
unbounded wait_event() are not desirable. Switch to qeth_get_buffers(),
and let OSN callers deal with -ENOMEM.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Tue, 11 Jun 2019 16:37:52 +0000 (18:37 +0200)]
s390/qeth: clean up setting of BLKT defaults
When called from qeth_core_probe_device(), qeth_determine_capabilities()
initializes the device's BLKT defaults. From all other callers, the
ccw_device has already been set online and the BLKT setting is skipped.
Clean this up by extracting the BLKT setting into a separate helper that
gets called from the right place.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Tue, 11 Jun 2019 16:37:51 +0000 (18:37 +0200)]
s390/qeth: restart pending READ cmd from callback
The completion of a pending READ cmd is processed via
qeth_issue_next_read_cb(). Let this callback also start the next READ
cmd, instead of hardcoding that step into the IRQ handler.
While at it remove the check of the channel state,
__qeth_issue_next_read() already does this.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Tue, 11 Jun 2019 16:37:50 +0000 (18:37 +0200)]
s390/qeth: simplify DOWN state handling
When the tear down sequence in qeth_l?_stop_card() has finished, the
card is guaranteed to be in DOWN state and we don't have to check for
it again.
With this insight we can also remove the redundant setting of
card->state in qeth_l?_set_online()'s error path.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Tue, 11 Jun 2019 16:37:49 +0000 (18:37 +0200)]
s390/qeth: use mm helpers
Slightly reduce the complexity of the core xmit path, by replacing some
open-coded logic with the corresponding helpers.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Tue, 11 Jun 2019 16:37:48 +0000 (18:37 +0200)]
s390/qeth: don't mask TX errors on IQD devices
Current code suppresses debug entries when an TX buffer completes in
ERROR state with no error indication set in SBALF15.
This was introduced back with
commit
58490f18071d ("qeth: HiperSockets SIGA retry support on CC=2.").
But qeth no longer retries after CC=2, and this sort of suppression
make no sense anymore. Remove it.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 14 Jun 2019 05:34:55 +0000 (22:34 -0700)]
Merge branch 'mlxsw-Add-support-for-physical-hardware-clock'
Ido Schimmel says:
====================
mlxsw: Add support for physical hardware clock
Shalom says:
This patchset adds support for physical hardware clock for Spectrum-1
ASIC only.
Patches #1, #2 and #3 add the ability to query the free running clock
PCI address.
Patches #4 and #5 add two new register, the Management UTC Register and
the Management Pulse Per Second Register.
Patch #6 publishes scaled_ppm_to_ppb() to allow drivers to use it.
Patch #7 adds the physical hardware clock operations.
Patch #8 initializes the physical hardware clock.
Patch #9 adds a selftest for testing the PTP physical hardware clock.
v2 (Richard):
* s/ptp_clock_scaled_ppm_to_ppb/scaled_ppm_to_ppb/
* imply PTP_1588_CLOCK in mlxsw Kconfig
* s/mlxsw_sp1_ptp_update_phc_settime/mlxsw_sp1_ptp_phc_settime/
* s/mlxsw_sp1_ptp_update_phc_adjfreq/mlxsw_sp1_ptp_phc_adjfreq/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Shalom Toledo [Tue, 11 Jun 2019 15:45:12 +0000 (18:45 +0300)]
selftests: ptp: Add Physical Hardware Clock test
Test the PTP Physical Hardware Clock functionality using the "phc_ctl" (a
part of "linuxptp").
The test contains three sub-tests:
* "settime" test
* "adjtime" test
* "adjfreq" test
"settime" test:
* set the PHC time to 0 seconds.
* wait for 120.5 seconds.
* check if PHC time equal to 120.XX seconds.
"adjtime" test:
* set the PHC time to 0 seconds.
* adjust the time by 10 seconds.
* check if PHC time equal to 10.XX seconds.
"adjfreq" test:
* adjust the PHC frequency to be 1% faster.
* set the PHC time to 0 seconds.
* wait for 100.5 seconds.
* check if PHC time equal to 101.XX seconds.
Usage:
$ ./phc.sh /dev/ptp<X>
It is possible to run a subset of the tests, for example:
* To run only the "settime" test:
$ TESTS="settime" ./phc.sh /dev/ptp<X>
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shalom Toledo [Tue, 11 Jun 2019 15:45:11 +0000 (18:45 +0300)]
mlxsw: spectrum: PTP physical hardware clock initialization
Initialize the PTP physical hardware clock.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shalom Toledo [Tue, 11 Jun 2019 15:45:10 +0000 (18:45 +0300)]
mlxsw: spectrum_ptp: Add implementation for physical hardware clock operations
Implement physical hardware clock operations.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shalom Toledo [Tue, 11 Jun 2019 15:45:09 +0000 (18:45 +0300)]
ptp: ptp_clock: Publish scaled_ppm_to_ppb
Publish scaled_ppm_to_ppb to allow drivers to use it.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shalom Toledo [Tue, 11 Jun 2019 15:45:08 +0000 (18:45 +0300)]
mlxsw: reg: Add Management Pulse Per Second Register
The MTPPS register provides the device PPS capabilities, configure the PPS
in and out modules and holds the PPS in time stamp.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shalom Toledo [Tue, 11 Jun 2019 15:45:07 +0000 (18:45 +0300)]
mlxsw: reg: Add Management UTC Register
The MTUTC register configures the HW UTC counter.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shalom Toledo [Tue, 11 Jun 2019 15:45:06 +0000 (18:45 +0300)]
mlxsw: pci: Query free running clock PCI BAR and offsets
Query free running clock PCI BAR and offsets during the pci_init.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shalom Toledo [Tue, 11 Jun 2019 15:45:05 +0000 (18:45 +0300)]
mlxsw: core: Add a new interface for reading the hardware free running clock
Add two new bus operations for reading the hardware free running clock.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shalom Toledo [Tue, 11 Jun 2019 15:45:04 +0000 (18:45 +0300)]
mlxsw: cmd: Free running clock PCI BAR and offsets via query firmware
Add free running clock PCI BAR and offset to query firmware command.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roman Mashak [Tue, 11 Jun 2019 14:02:22 +0000 (10:02 -0400)]
tc-tests: updated fw with bind actions by reference use cases
Extended fw TDC tests with use cases where actions are pre-created and
attached to a filter by reference, i.e. by action index.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 13 Jun 2019 21:02:09 +0000 (14:02 -0700)]
Merge branch 'net-stmmac-Convert-to-phylink'
Jose Abreu says:
====================
net: stmmac: Convert to phylink
This converts stmmac to use phylink. Besides the code redution this will
allow to gain more flexibility.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu [Tue, 11 Jun 2019 15:18:47 +0000 (17:18 +0200)]
net: stmmac: Convert to phylink and remove phylib logic
Convert everything to phylink.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu [Tue, 11 Jun 2019 15:18:46 +0000 (17:18 +0200)]
net: stmmac: Start adding phylink support
Start adding the phylink callbacks.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu [Tue, 11 Jun 2019 15:18:45 +0000 (17:18 +0200)]
net: stmmac: Prepare to convert to phylink
In preparation for the convertion, split the adjust_link function into
mac_config and add the mac_link_up and mac_link_down functions.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Tue, 11 Jun 2019 14:07:09 +0000 (22:07 +0800)]
qede: Make two functions static
Fix sparse warning:
drivers/net/ethernet/qlogic/qede/qede_main.c:963:6:
warning: symbol 'qede_lock' was not declared. Should it be static?
drivers/net/ethernet/qlogic/qede/qede_main.c:969:6:
warning: symbol 'qede_unlock' was not declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Tue, 11 Jun 2019 13:58:34 +0000 (21:58 +0800)]
net: dsa: sja1105: Make two functions static
Fix sparse warnings:
drivers/net/dsa/sja1105/sja1105_main.c:1848:6:
warning: symbol 'sja1105_port_rxtstamp' was not declared. Should it be static?
drivers/net/dsa/sja1105/sja1105_main.c:1869:6:
warning: symbol 'sja1105_port_txtstamp' was not declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 12 Jun 2019 18:57:25 +0000 (11:57 -0700)]
tcp: add optional per socket transmit delay
Adding delays to TCP flows is crucial for studying behavior
of TCP stacks, including congestion control modules.
Linux offers netem module, but it has unpractical constraints :
- Need root access to change qdisc
- Hard to setup on egress if combined with non trivial qdisc like FQ
- Single delay for all flows.
EDT (Earliest Departure Time) adoption in TCP stack allows us
to enable a per socket delay at a very small cost.
Networking tools can now establish thousands of flows, each of them
with a different delay, simulating real world conditions.
This requires FQ packet scheduler or a EDT-enabled NIC.
This patchs adds TCP_TX_DELAY socket option, to set a delay in
usec units.
unsigned int tx_delay = 10000; /* 10 msec */
setsockopt(fd, SOL_TCP, TCP_TX_DELAY, &tx_delay, sizeof(tx_delay));
Note that FQ packet scheduler limits might need some tweaking :
man tc-fq
PARAMETERS
limit
Hard limit on the real queue size. When this limit is
reached, new packets are dropped. If the value is lowered,
packets are dropped so that the new limit is met. Default
is 10000 packets.
flow_limit
Hard limit on the maximum number of packets queued per
flow. Default value is 100.
Use of TCP_TX_DELAY option will increase number of skbs in FQ qdisc,
so packets would be dropped if any of the previous limit is hit.
Use of a jump label makes this support runtime-free, for hosts
never using the option.
Also note that TSQ (TCP Small Queues) limits are slightly changed
with this patch : we need to account that skbs artificially delayed
wont stop us providind more skbs to feed the pipe (netem uses
skb_orphan_partial() for this purpose, but FQ can not use this trick)
Because of that, using big delays might very well trigger
old bugs in TSO auto defer logic and/or sndbuf limited detection.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 12 Jun 2019 18:23:45 +0000 (11:23 -0700)]
Merge branch 'ena-dynamic-queue-sizes'
Sameeh Jubran says:
====================
Support for dynamic queue size changes
This patchset introduces the following:
* add new admin command for supporting different queue size for Tx/Rx
* add support for Tx/Rx queues size modification through ethtool
* allow queues allocation backoff when low on memory
* update driver version
Difference from v2:
* Dropped superfluous range checks which are already done in ethtool. [patch 5/7]
* Dropped inline keyword from function. [patch 4/7]
* Added a new patch which drops inline keyword all *.c files. [patch 6/7]
Difference from v1:
* Changed ena_update_queue_sizes() signature to use u32 instead of int
type for the size arguments. [patch 5/7]
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sameeh Jubran [Tue, 11 Jun 2019 11:58:11 +0000 (14:58 +0300)]
net: ena: update driver version from 2.0.3 to 2.1.0
Update driver version to match device specification.
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sameeh Jubran [Tue, 11 Jun 2019 11:58:10 +0000 (14:58 +0300)]
net: ena: remove inline keyword from functions in *.c
Let the compiler decide if the function should be inline in *.c files
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sameeh Jubran [Tue, 11 Jun 2019 11:58:09 +0000 (14:58 +0300)]
net: ena: add ethtool function for changing io queue sizes
Implement the set_ringparam() function of the ethtool interface
to enable the changing of io queue sizes.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sameeh Jubran [Tue, 11 Jun 2019 11:58:08 +0000 (14:58 +0300)]
net: ena: allow queue allocation backoff when low on memory
If there is not enough memory to allocate io queues the driver will
try to allocate smaller queues.
The backoff algorithm is as follows:
1. Try to allocate TX and RX and if successful.
1.1. return success
2. Divide by 2 the size of the larger of RX and TX queues (or both if their size is the same).
3. If TX or RX is smaller than 256
3.1. return failure.
4. else
4.1. go back to 1.
Also change the tx_queue_size, rx_queue_size field names in struct
adapter to requested_tx_queue_size and requested_rx_queue_size, and
use RX and TX queue 0 for actual queue sizes.
Explanation:
The original fields were useless as they were simply used to assign
values once from them to each of the queues in the adapter in ena_probe().
They could simply be deleted. However now that we have a backoff
feature, we have use for them. In case of backoff there is a difference
between the requested queue sizes and the actual sizes. Therefore there
is a need to save the requested queue size for future retries of queue
allocation (for example if allocation failed and then ifdown + ifup was
called we want to start the allocation from the original requested size of
the queues).
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sameeh Jubran [Tue, 11 Jun 2019 11:58:07 +0000 (14:58 +0300)]
net: ena: make ethtool show correct current and max queue sizes
Currently ethtool -g shows the same size for current and max queue
sizes.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sameeh Jubran [Tue, 11 Jun 2019 11:58:06 +0000 (14:58 +0300)]
net: ena: enable negotiating larger Rx ring size
Use MAX_QUEUES_EXT get feature capability to query the device.
Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arthur Kiyanovski [Tue, 11 Jun 2019 11:58:05 +0000 (14:58 +0300)]
net: ena: add MAX_QUEUES_EXT get feature admin command
Add a new admin command to support different queue size for Tx/Rx
queues (the change also support different SQ/CQ sizes)
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 12 Jun 2019 18:22:18 +0000 (11:22 -0700)]
Merge branch 'dpaa2-eth-Add-support-for-MQPRIO-offloading'
Ioana Radulescu says:
====================
dpaa2-eth: Add support for MQPRIO offloading
Add support for adding multiple TX traffic classes with mqprio. We can have
up to one netdev queue and hardware frame queue per TC per core.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ioana Radulescu [Tue, 11 Jun 2019 11:50:03 +0000 (14:50 +0300)]
dpaa2-eth: Add mqprio support
Implement mqprio qdisc support by mapping traffic classes to
different hardware enqueue priorities. The maximum number of
supported traffic classes is an attribute of each DPNI object.
The traffic classes map to hardware priorities from highest (0)
to lowest (highest prio number). The skb priority information
received from the stack is used to select the hardware Tx queue
on which to enqueue the frame.
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ioana Radulescu [Tue, 11 Jun 2019 11:50:02 +0000 (14:50 +0300)]
dpaa2-eth: Support multiple traffic classes on Tx
DPNI objects can have multiple traffic classes, as reflected by
the num_tc attribute. Until now we ignored its value and only
used traffic class 0.
This patch adds support for multiple Tx traffic classes; we have
num_queues x num_tcs hardware queues available for each interface.
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ioana Radulescu [Tue, 11 Jun 2019 11:50:01 +0000 (14:50 +0300)]
dpaa2-eth: Refactor xps code
Move the code configuring xps on the netdev TX queues to a
separate function. A subsequent patch will need to call
this in another context as well.
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Grygorii Strashko [Tue, 11 Jun 2019 11:16:32 +0000 (14:16 +0300)]
net: ethernet: ti: cpts: fix build failure for powerpc
Add dependency to TI CPTS from Common CLK framework COMMON_CLK to fix
allyesconfig build for Powerpc:
drivers/net/ethernet/ti/cpts.c: In function 'cpts_of_mux_clk_setup':
drivers/net/ethernet/ti/cpts.c:567:2: error: implicit declaration of function 'of_clk_parent_fill'; did you mean 'of_clk_get_parent_name'? [-Werror=implicit-function-declaration]
of_clk_parent_fill(refclk_np, parent_names, num_parents);
^~~~~~~~~~~~~~~~~~
of_clk_get_parent_name
Fixes:
a3047a81ba13 ("net: ethernet: ti: cpts: add support for ext rftclk selection")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Mon, 10 Jun 2019 19:31:49 +0000 (12:31 -0700)]
net: dsa: Deal with non-existing PHY/fixed-link
We need to specifically deal with phylink_of_phy_connect() returning
-ENODEV, because this can happen when a CPU/DSA port does connect
neither to a PHY, nor has a fixed-link property. This is a valid use
case that is permitted by the binding and indicates to the switch:
auto-configure port with maximum capabilities.
Fixes:
0e27921816ad ("net: dsa: Use PHYLINK for the CPU/DSA ports")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Wed, 12 Jun 2019 16:42:47 +0000 (12:42 -0400)]
net: dsa: mv88e6xxx: lock mutex in port_fdb_dump
During a port FDB dump operation, the mutex protecting the concurrent
access to the switch registers is currently held by the internal
mv88e6xxx_port_db_dump and mv88e6xxx_port_db_dump_fid helpers.
It must be held at the higher level in mv88e6xxx_port_fdb_dump which
is called directly by DSA through ds->ops->port_fdb_dump. Fix this.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Saenz Julienne [Wed, 12 Jun 2019 12:25:27 +0000 (14:25 +0200)]
dt-bindings: net: wiznet: add w5x00 support
Add bindings for Wiznet's w5x00 series of SPI interfaced Ethernet chips.
Based on the bindings for microchip,enc28j60.
Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Saenz Julienne [Wed, 12 Jun 2019 12:25:25 +0000 (14:25 +0200)]
net: ethernet: wiznet: w5X00 add device tree support
The w5X00 chip provides an SPI to Ethernet inteface. This patch allows
platform devices to be defined through the device tree.
Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov [Wed, 12 Jun 2019 07:14:35 +0000 (10:14 +0300)]
net: sched: ingress: set 'unlocked' flag for Qdisc ops
To remove rtnl lock dependency in tc filter update API when using ingress
Qdisc, set QDISC_CLASS_OPS_DOIT_UNLOCKED flag in ingress Qdisc_class_ops.
Ingress Qdisc ops don't require any modifications to be used without rtnl
lock on tc filter update path. Ingress implementation never changes its
q->block and only releases it when Qdisc is being destroyed. This means it
is enough for RTM_{NEWTFILTER|DELTFILTER|GETTFILTER} message handlers to
hold ingress Qdisc reference while using it without relying on rtnl lock
protection. Unlocked Qdisc ops support is already implemented in filter
update path by unlocked cls API patch set.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 11 Jun 2019 19:22:27 +0000 (12:22 -0700)]
Merge branch 'tls-add-support-for-kernel-driven-resync-and-nfp-RX-offload'
Jakub Kicinski says:
====================
tls: add support for kernel-driven resync and nfp RX offload
This series adds TLS RX offload for NFP and completes the offload
by providing resync strategies. When TLS data stream looses segments
or experiences reorder NIC can no longer perform in line offload.
Resyncs provide information about placement of records in the
stream so that offload can resume.
Existing TLS resync mechanisms are not a great fit for the NFP.
In particular the TX resync is hard to implement for packet-centric
NICs. This patchset adds an ability to perform TX resync in a way
similar to the way initial sync is done - by calling down to the
driver when new record is created after driver indicated sync had
been lost.
Similarly on the RX side, we try to wait for a gap in the stream
and send record information for the next record. This works very
well for RPC workloads which are the primary focus at this time.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 11 Jun 2019 04:40:10 +0000 (21:40 -0700)]
nfp: tls: make use of kernel-driven TX resync
When TCP stream gets out of sync (driver stops receiving skbs
with expected TCP sequence numbers) request a TX resync from
the kernel.
We try to distinguish retransmissions from missed transmissions
by comparing the sequence number to expected - if it's further
than the expected one - we probably missed packets.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 11 Jun 2019 04:40:09 +0000 (21:40 -0700)]
net/tls: add kernel-driven resync mechanism for TX
TLS offload drivers keep track of TCP seq numbers to make sure
the packets are fed into the HW in order.
When packets get dropped on the way through the stack, the driver
will get out of sync and have to use fallback encryption, but unless
TCP seq number is resynced it will never match the packets correctly
(or even worse - use incorrect record sequence number after TCP seq
wraps).
Existing drivers (mlx5) feed the entire record on every out-of-order
event, allowing FW/HW to always be in sync.
This patch adds an alternative, more akin to the RX resync. When
driver sees a frame which is past its expected sequence number the
stream must have gotten out of order (if the sequence number is
smaller than expected its likely a retransmission which doesn't
require resync). Driver will ask the stack to perform TX sync
before it submits the next full record, and fall back to software
crypto until stack has performed the sync.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 11 Jun 2019 04:40:08 +0000 (21:40 -0700)]
net/tls: generalize the resync callback
Currently only RX direction is ever resynced, however, TX may
also get out of sequence if packets get dropped on the way to
the driver. Rename the resync callback and add a direction
parameter.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 11 Jun 2019 04:40:07 +0000 (21:40 -0700)]
nfp: tls: enable TLS RX offload
Set ethtool TLS RX feature based on NIC capabilities, and enable
TLS RX when connections are added for decryption.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dirk van der Merwe [Tue, 11 Jun 2019 04:40:06 +0000 (21:40 -0700)]
nfp: tls: implement RX TLS resync
Enable kernel-controlled RX resync and propagate TLS connection
RX resync from kernel TLS to firmware.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 11 Jun 2019 04:40:05 +0000 (21:40 -0700)]
nfp: add async version of mailbox communication
Some control messages must be sent from atomic context. The mailbox
takes sleeping locks and uses a waitqueue so add a "posted" version
of communication.
Trylock the semaphore and if that's successful kick of the device
communication. The device communication will be completed from
a workqueue, which will also release the semaphore.
If locks are taken queue the message and return. Schedule a
different workqueue to take the semaphore and run the communication.
Note that the there are currently no atomic users which would actually
need the return value, so all replies to posted messages are just
freed.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 11 Jun 2019 04:40:04 +0000 (21:40 -0700)]
nfp: rename nfp_ccm_mbox_alloc()
We need the name nfp_ccm_mbox_alloc() for allocating the mailbox
communication channel itself.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dirk van der Merwe [Tue, 11 Jun 2019 04:40:03 +0000 (21:40 -0700)]
nfp: tls: set skb decrypted flag
Firmware indicates when a packet has been decrypted by reusing the
currently unused BPF flag. Transfer this information into the skb
and provide a statistic of all decrypted segments.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 11 Jun 2019 04:40:02 +0000 (21:40 -0700)]
net/tls: add kernel-driven TLS RX resync
TLS offload device may lose sync with the TCP stream if packets
arrive out of order. Drivers can currently request a resync at
a specific TCP sequence number. When a record is found starting
at that sequence number kernel will inform the device of the
corresponding record number.
This requires the device to constantly scan the stream for a
known pattern (constant bytes of the header) after sync is lost.
This patch adds an alternative approach which is entirely under
the control of the kernel. Kernel tracks records it had to fully
decrypt, even though TLS socket is in TLS_HW mode. If multiple
records did not have any decrypted parts - it's a pretty strong
indication that the device is out of sync.
We choose the min number of fully encrypted records to be 2,
which should hopefully be more than will get retransmitted at
a time.
After kernel decides the device is out of sync it schedules a
resync request. If the TCP socket is empty the resync gets
performed immediately. If socket is not empty we leave the
record parser to resync when next record comes.
Before resync in message parser we peek at the TCP socket and
don't attempt the sync if the socket already has some of the
next record queued.
On resync failure (encrypted data continues to flow in) we
retry with exponential backoff, up to once every 128 records
(with a 16k record thats at most once every 2M of data).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 11 Jun 2019 04:40:01 +0000 (21:40 -0700)]
net/tls: rename handle_device_resync()
handle_device_resync() doesn't describe the function very well.
The function checks if resync should be issued upon parsing of
a new record.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 11 Jun 2019 04:40:00 +0000 (21:40 -0700)]
net/tls: pass record number as a byte array
TLS offload code casts record number to a u64. The buffer
should be aligned to 8 bytes, but its actually a __be64, and
the rest of the TLS code treats it as big int. Make the
offload callbacks take a byte array, drivers can make the
choice to do the ugly cast if they want to.
Prepare for copying the record number onto the stack by
defining a constant for max size of the byte array.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 11 Jun 2019 04:39:59 +0000 (21:39 -0700)]
net/tls: simplify seq calculation in handle_device_resync()
We subtract "TLS_HEADER_SIZE - 1" from req_seq, then if they
match we add the same constant to seq. Just add it to seq,
and we don't have to touch req_seq.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mao Wenan [Tue, 11 Jun 2019 01:32:13 +0000 (09:32 +0800)]
packet: remove unused variable 'status' in __packet_lookup_frame_in_block
The variable 'status' in __packet_lookup_frame_in_block() is never used since
introduction in commit
f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer
implementation."), we can remove it.
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Sun, 9 Jun 2019 17:19:06 +0000 (02:19 +0900)]
net: openvswitch: remove unnecessary ASSERT_OVSL in ovs_vport_del()
ASSERT_OVSL() in ovs_vport_del() is unnecessary because
ovs_vport_del() is only called by ovs_dp_detach_port() and
ovs_dp_detach_port() calls ASSERT_OVSL() too.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Sun, 9 Jun 2019 17:05:30 +0000 (02:05 +0900)]
net: netlink: make netlink_walk_start() void return type
netlink_walk_start() needed to return an error code because of
rhashtable_walk_init(). but that was converted to rhashtable_walk_enter()
and it is a void type function. so now netlink_walk_start() doesn't need
any return value.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefano Brivio [Thu, 6 Jun 2019 20:15:09 +0000 (22:15 +0200)]
selftests: pmtu: Introduce list_flush_ipv6_exception test case
This test checks that route exceptions can be successfully listed and
flushed using ip -6 route {list,flush} cache.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 10 Jun 2019 17:44:57 +0000 (10:44 -0700)]
Merge branch 'net-Enable-nexthop-objects-with-IPv4-and-IPv6-routes'
David Ahern says:
====================
net: Enable nexthop objects with IPv4 and IPv6 routes
This is the final set of the initial nexthop object work. When I
started this idea almost 2 years ago, it took 18 seconds to inject
700k+ IPv4 routes with 1 hop and about 28 seconds for 4-paths. Some
of that time was due to inefficiencies in 'ip', but most of it was
kernel side with excessive synchronize_rcu calls in ipv4, and redundant
processing validating a nexthop spec (device, gateway, encap). Worse,
the time increased dramatically as the number of legs in the routes
increased; for example, taking over 72 seconds for 16-path routes.
After this set, with increased dirty memory limits (fib_sync_mem sysctl),
an improved ip and nexthop objects a full internet fib (743,799 routes
based on a pull in January 2019) can be pushed to the kernel in 4.3
seconds. Even better, the time to insert is "almost" constant with
increasing number of paths. The 'almost constant' time is due to
expanding the nexthop definitions when generating notifications. A
follow on patch will be sent adding a sysctl that allows an admin to
avoid the nexthop expansion and truly get constant route insert time
regardless of the number of paths in a route! (Useful once all programs
used for a deployment that care about routes understand nexthop objects).
To be clear, 'ip' is used for benchmarking for no other reason than
'ip -batch' is a trivial to use for the tests. FRR, for example, better
manages nexthops and route changes and the way those are pushed to the
kernel and thus will have less userspace processing times than 'ip -batch'.
Patches 1-10 iterate over fib6_nh with a nexthop invoke a processing
function per fib6_nh. Prior to nexthop objects, a fib6_info referenced
a single fib6_nh. Multipath routes were added as separate fib6_info for
each leg of the route and linked as siblings:
f6i -> sibling -> sibling ... -> sibling
| |
+--------- multipath route ---------+
With nexthop objects a single fib6_info references an external
nexthop which may have a series of fib6_nh:
f6i ---> nexthop ---> fib6_nh
...
fib6_nh
making IPv6 routes similar to IPv4. The side effect is that a single
fib6_info now indirectly references a series of fib6_nh so the code
needs to walk each entry and call the local, per-fib6_nh processing
function.
Patches 11 and 13 wire up use of nexthops with fib entries for IPv4
and IPv6. With these commits you can actually use nexthops with routes.
Patch 12 is an optimization for IPv4 when using nexthops in the most
predominant use case (no metrics).
Patches 14 handles replace of a nexthop config.
Patches 15-18 add update pmtu and redirect tests to use both old and
new routing.
Patches 19 and 20 add new tests for the nexthop infrastructure. The first
is single nexthop is used by multiple prefixes to communicate with remote
hosts. This is on top of the functional tests already committed. The
second verifies multipath selection.
v4
- changed return to 'goto out' in patch 9 since the rcu_read_lock is
held (noticed by Wei)
v3
- removed found arg in patch 7 and changed rt6_nh_remove_exception_rt
to return 1 when a match is found for an exception
v2
- changed ++i to i++ in patches 1 and 14 as noticed by DaveM
- improved commit message for patch 14 (nexthop replace)
- removed the skip_fib argument to remove_nexthop; vestige of an
older design
====================
Reviewed-By: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Sat, 8 Jun 2019 21:53:41 +0000 (14:53 -0700)]
selftests: Add version of router_multipath.sh using nexthop objects
Add a version of router_multipath.sh that uses nexthop objects for
routes.
Ido requested a version that does not cause regressions with mlxsw
testing since it does not support nexthop objects yet.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Sat, 8 Jun 2019 21:53:40 +0000 (14:53 -0700)]
selftests: Add test with multiple prefixes using single nexthop
Add tests where multiple FIB entries use the same nexthop object. Generate
per-cpu cached routes for each by running ping on each cpu, and then
generate exceptions unique to each prefix (remote host) with different
mtus.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Sat, 8 Jun 2019 21:53:39 +0000 (14:53 -0700)]
selftests: icmp_redirect: Add support for routing via nexthop objects
Add a second pass to icmp_redirect.sh to use nexthop objects for
routes.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Sat, 8 Jun 2019 21:53:38 +0000 (14:53 -0700)]
selftests: pmtu: Add support for routing via nexthop objects
Add routing setup using nexthop objects and repeat tests with
old and new routing.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Sat, 8 Jun 2019 21:53:37 +0000 (14:53 -0700)]
selftests: pmtu: Move route installs to a new function
Move the route add commands to a new function called setup_routing_old.
The '_old' refers to the classic way of installing routes.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Sat, 8 Jun 2019 21:53:36 +0000 (14:53 -0700)]
selftests: pmtu: Move running of test into a new function
Move the block of code that runs a test and prints the verdict to a
new function, run_test.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Sat, 8 Jun 2019 21:53:35 +0000 (14:53 -0700)]
nexthops: add support for replace
Add support for atomically upating a nexthop config.
When updating a nexthop, walk the lists of associated fib entries and
verify the new config is valid. Replace is done by swapping nh_info
for single nexthops - new config is applied to old nexthop struct, and
old config is moved to new nexthop struct. For nexthop groups the same
applies but for nh_group. In addition for groups the nh_parent reference
needs to be updated. The old config is released by calling __remove_nexthop
on the 'new' nexthop which now has the old config. This is done to avoid
messing around with the list_heads that track which fib entries are
using the nexthop.
After the swap of config data, bump the sequence counters for FIB entries
to invalidate any dst entries and send notifications to userspace. The
notifications include the new nexthop spec as well as any fib entries
using the updated nexthop struct.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Sat, 8 Jun 2019 21:53:34 +0000 (14:53 -0700)]
ipv6: Allow routes to use nexthop objects
Add support for RTA_NH_ID attribute to allow a user to specify a
nexthop id to use with a route. fc_nh_id is added to fib6_config to
hold the value passed in the RTA_NH_ID attribute. If a nexthop id
is given, the gateway, device, encap and multipath attributes can
not be set.
Update ip6_route_del to check metric and protocol before nexthop
specs. If fc_nh_id is set, then it must match the id in the route
entry. Since IPv6 allows delete of a cached entry (an exception),
add ip6_del_cached_rt_nh to cycle through all of the fib6_nh in
a fib entry if it is using a nexthop.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Sat, 8 Jun 2019 21:53:33 +0000 (14:53 -0700)]
ipv4: Optimization for fib_info lookup with nexthops
Be optimistic about re-using a fib_info when nexthop id is given and
the route does not use metrics. Avoids a memory allocation which in
most cases is expected to be freed anyways.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Sat, 8 Jun 2019 21:53:32 +0000 (14:53 -0700)]
ipv4: Allow routes to use nexthop objects
Add support for RTA_NH_ID attribute to allow a user to specify a
nexthop id to use with a route. fc_nh_id is added to fib_config to
hold the value passed in the RTA_NH_ID attribute. If a nexthop id
is given, the gateway, device, encap and multipath attributes can
not be set.
Update fib_nh_match to check ids on a route delete.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Sat, 8 Jun 2019 21:53:31 +0000 (14:53 -0700)]
ipv6: Handle all fib6_nh in a nexthop in mtu updates
Use nexthop_for_each_fib6_nh to call fib6_nh_mtu_change for each
fib6_nh in a nexthop for rt6_mtu_change_route. For __ip6_rt_update_pmtu,
we need to find the nexthop that correlates to the device and gateway
in the rt6_info.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Sat, 8 Jun 2019 21:53:30 +0000 (14:53 -0700)]
ipv6: Handle all fib6_nh in a nexthop in rt6_do_redirect
Use nexthop_for_each_fib6_nh and fib6_nh_find_match to find the
fib6_nh in a nexthop that correlates to the device and gateway
in the rt6_info.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Sat, 8 Jun 2019 21:53:29 +0000 (14:53 -0700)]
ipv6: Handle all fib6_nh in a nexthop in __ip6_route_redirect
Add a hook in __ip6_route_redirect to handle a nexthop struct in a
fib6_info. Use nexthop_for_each_fib6_nh and fib6_nh_redirect_match
to call ip6_redirect_nh_match for each fib6_nh looking for a match.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Sat, 8 Jun 2019 21:53:28 +0000 (14:53 -0700)]
ipv6: Handle all fib6_nh in a nexthop in exception handling
Add a hook in rt6_flush_exceptions, rt6_remove_exception_rt,
rt6_update_exception_stamp_rt, and rt6_age_exceptions to handle
nexthop struct in a fib6_info.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Sat, 8 Jun 2019 21:53:27 +0000 (14:53 -0700)]
ipv6: Handle all fib6_nh in a nexthop in fib6_info_uses_dev
Add a hook in fib6_info_uses_dev to handle nexthop struct in a fib6_info.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Sat, 8 Jun 2019 21:53:26 +0000 (14:53 -0700)]
ipv6: Handle all fib6_nh in a nexthop in rt6_nlmsg_size
Add a hook in rt6_nlmsg_size to handle nexthop struct in a fib6_info.
rt6_nh_nlmsg_size is used to sum the space needed for all nexthops in
the fib entry.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Sat, 8 Jun 2019 21:53:25 +0000 (14:53 -0700)]
ipv6: Handle all fib6_nh in a nexthop in __find_rr_leaf
Add a hook in __find_rr_leaf to handle nexthop struct in a fib6_info.
nexthop_for_each_fib6_nh is used to walk each fib6_nh in a nexthop and
call find_match. On a match, use the fib6_nh saved in the callback arg
to setup fib6_result.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>