git.monstr.eu Git - linux-2.6-microblaze.git/log

net: dsa: microchip: ptp: add packet reception timestamping

Rx Timestamping is done through 4 additional bytes in tail tag.
Whenever the ptp packet is received, the 4 byte hardware time stamped
value is added before 1 byte tail tag. Also, bit 7 in tail tag indicates
it as PTP frame. This 4 byte value is extracted from the tail tag and
reconstructed to absolute time and assigned to skb hwtstamp.
If the packet received in PDelay_Resp, then partial ingress timestamp
is subtracted from the correction field. Since user space tools expects
to be done in hardware.

Signed-off-by: Christian Eggers <ceggers@arri.de>
Co-developed-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ptp: add helper for one-step P2P clocks

For P2P delay measurement, the ingress time stamp of the PDelay_Req is
required for the correction field of the PDelay_Resp. The application
echoes back the correction field of the PDelay_Req when sending the
PDelay_Resp.

Some hardware (like the ZHAW InES PTP time stamping IP core) subtracts
the ingress timestamp autonomously from the correction field, so that
the hardware only needs to add the egress timestamp on tx. Other
hardware (like the Microchip KSZ9563) reports the ingress time stamp via
an interrupt and requires that the software provides this time stamp via
tail-tag on tx.

In order to avoid introducing a further application interface for this,
the driver can simply emulate the behavior of the InES device and
subtract the ingress time stamp in software from the correction field.

On egress, the correction field can either be kept as it is (and the
time stamp field in the tail-tag is set to zero) or move the value from
the correction field back to the tail-tag.

Changing the correction field requires updating the UDP checksum (if UDP
is used as transport).

Signed-off-by: Christian Eggers <ceggers@arri.de>
Co-developed-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: microchip: ptp: enable interrupt for timestamping

PTP Interrupt mask and status register differ from the global and port
interrupt mechanism by two methods. One is that for global/port
interrupt enabling we have to clear the bit but for ptp interrupt we
have to set the bit. And other is bit12:0 is reserved in ptp interrupt
registers. This forced to not use the generic implementation of
global/port interrupt method routine. This patch implement the ptp
interrupt mechanism to read the timestamp register for sync, pdelay_req
and pdelay_resp.

Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: microchip: ptp: manipulating absolute time using ptp hw clock

This patch is used for reconstructing the absolute time from the 32bit
hardware time stamping value. The do_aux ioctl is used for reading the
ptp hardware clock and store it to global variable.
The timestamped value in tail tag during rx and register during tx are
32 bit value (2 bit seconds and 30 bit nanoseconds). The time taken to
read entire ptp clock will be time consuming. In order to speed up, the
software clock is maintained. This clock time will be added to 32 bit
timestamp to get the absolute time stamp.

Signed-off-by: Christian Eggers <ceggers@arri.de>
Co-developed-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: microchip: ptp: add 4 bytes in tail tag when ptp enabled

When the PTP is enabled in hardware bit 6 of PTP_MSG_CONF1 register, the
transmit frame needs additional 4 bytes before the tail tag. It is
needed for all the transmission packets irrespective of PTP packets or
not.
The 4-byte timestamp field is 0 for frames other than Pdelay_Resp. For
the one-step Pdelay_Resp, the switch needs the receive timestamp of the
Pdelay_Req message so that it can put the turnaround time in the
correction field.
Since PTP has to be enabled for both Transmission and reception
timestamping, driver needs to track of the tx and rx setting of the all
the user ports in the switch.
Two flags hw_tx_en and hw_rx_en are added in ksz_port to track the
timestampping setting of each port. When any one of ports has tx or rx
timestampping enabled, bit 6 of PTP_MSG_CONF1 is set and it is indicated
to tag_ksz.c through tagger bytes. This flag adds 4 additional bytes to
the tail tag. When tx and rx timestamping of all the ports are disabled,
then 4 bytes are not added.

Tested using hwstamp -i <interface>

Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com> # mostly api
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: microchip: ptp: Initial hardware time stamping support

This patch adds the routine for get_ts_info, hwstamp_get, set. This enables
the PTP support towards userspace applications such as linuxptp.

Signed-off-by: Christian Eggers <ceggers@arri.de>
Co-developed-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: microchip: ptp: add the posix clock support

This patch implement routines (adjfine, adjtime, gettime and settime)
for manipulating the chip's PTP clock. It registers the ptp caps
to posix clock register.

Signed-off-by: Christian Eggers <ceggers@arri.de>
Co-developed-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com> # mostly api
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'add-support-to-offload-macsec-using-netlink-update'

Emeel Hakim says:

====================
Add support to offload macsec using netlink update

This series adds support for offloading macsec as part of the netlink
update routine, command example:

$ ip link set link eth2 macsec0 type macsec offload mac

The above is done using the IFLA_MACSEC_OFFLOAD attribute hence
the second patch of dumping this attribute as part of the macsec
dump.
====================

Link: https://lore.kernel.org/r/20230111150210.8246-1-ehakim@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

macsec: dump IFLA_MACSEC_OFFLOAD attribute as part of macsec dump

Support dumping offload netlink attribute in macsec's device
attributes dump.
Change macsec_get_size to consider the offload attribute in
the calculations of the required room for dumping the device
netlink attributes.

Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

macsec: add support for IFLA_MACSEC_OFFLOAD in macsec_changelink

Add support for changing Macsec offload selection through the
netlink layer by implementing the relevant changes in
macsec_changelink.

Since the handling in macsec_changelink is similar to macsec_upd_offload,
update macsec_upd_offload to use a common helper function to avoid
duplication.

Example for setting offload for a macsec device:
ip link set macsec0 type macsec offload mac

Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: remove redundant config PCI dependency for some network driver configs

While reviewing dependencies in some Kconfig files, I noticed the redundant
dependency "depends on PCI && PCI_MSI". The config PCI_MSI has always,
since its introduction, been dependent on the config PCI. So, it is
sufficient to just depend on PCI_MSI, and know that the dependency on PCI
is implicitly implied.

Reduce the dependencies of some network driver configs.
No functional change and effective change of Kconfig dependendencies.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Dimitris Michailidis <dmichail@fungible.com>
Link: https://lore.kernel.org/r/20230111125855.19020-1-lukas.bulwahn@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ngbe: Add ngbe mdio bus driver.

Add mdio bus register for ngbe.
The internal phy and external phy need to be handled separately.
Add phy changed event detection.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230111111718.40745-1-mengyuanlou@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-thunderbolt-add-tracepoints'

Mika Westerberg says:

====================
net: thunderbolt: Add tracepoints

This series adds tracepoints and additional logging to the
Thunderbolt/USB4 networking driver. These are useful when debugging
possible issues.

Before that we move the driver into its own directory under drivers/net
so that we can add additional files without trashing the network drivers
main directory, and update the MAINTAINERS accordingly.

v1: https://lore.kernel.org/netdev/20230104081731.45928-1-mika.westerberg@linux.intel.com/
====================

Link: https://lore.kernel.org/r/20230111062633.1385-1-mika.westerberg@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: thunderbolt: Add tracepoints

These are useful when debugging various performance issues.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Yehezkel Bernat <YehezkelShB@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: thunderbolt: Add debugging when sending/receiving control packets

These can be useful when debugging possible issues around USB4NET
control packet exchange.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Yehezkel Bernat <YehezkelShB@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: thunderbolt: Move into own directory

We will be adding tracepoints to the driver so instead of littering the
main network driver directory, move the driver into its own directory.
While there, rename the module to thunderbolt_net (with underscore) to
match with the thunderbolt_dma_test convention.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Yehezkel Bernat <YehezkelShB@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

u64_stat: Remove the obsolete fetch_irq() variants.

There are no more users of the obsolete interface
u64_stats_fetch_begin_irq() and u64_stats_fetch_retry_irq().

Remove the obsolete API.

[bigeasy: Split out the bits from a larger patch].

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/20230110160738.974085-1-bigeasy@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge git://git./linux/kernel/git/netdev/net

drivers/net/usb/r8152.c
be53771c87f4 ("r8152: add vendor/device ID pair for Microsoft Devkit")
ec51fbd1b8a2 ("r8152: add USB device driver for config selection")
https://lore.kernel.org/all/20230113113339.658c4723@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'net-6.2-rc4' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from rxrpc.

  The rxrpc changes are noticeable large: to address a recent regression
  has been necessary completing the threaded refactor.

  Current release - regressions:

   - rxrpc:
       - only disconnect calls in the I/O thread
       - move client call connection to the I/O thread
       - fix incoming call setup race

   - eth: mlx5:
       - restore pkt rate policing support
       - fix memory leak on updating vport counters

  Previous releases - regressions:

   - gro: take care of DODGY packets

   - ipv6: deduct extension header length in rawv6_push_pending_frames

   - tipc: fix unexpected link reset due to discovery messages

  Previous releases - always broken:

   - sched: disallow noqueue for qdisc classes

   - eth: ice: fix potential memory leak in ice_gnss_tty_write()

   - eth: ixgbe: fix pci device refcount leak

   - eth: mlx5:
       - fix command stats access after free
       - fix macsec possible null dereference when updating MAC security
         entity (SecY)"

* tag 'net-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (64 commits)
  r8152: add vendor/device ID pair for Microsoft Devkit
  net: stmmac: add aux timestamps fifo clearance wait
  bnxt: make sure we return pages to the pool
  net: hns3: fix wrong use of rss size during VF rss config
  ipv6: raw: Deduct extension header length in rawv6_push_pending_frames
  net: lan966x: check for ptp to be enabled in lan966x_ptp_deinit()
  net: sched: disallow noqueue for qdisc classes
  iavf/iavf_main: actually log ->src mask when talking about it
  igc: Fix PPS delta between two synchronized end-points
  ixgbe: fix pci device refcount leak
  octeontx2-pf: Fix resource leakage in VF driver unbind
  selftests/net: l2_tos_ttl_inherit.sh: Ensure environment cleanup on failure.
  selftests/net: l2_tos_ttl_inherit.sh: Run tests in their own netns.
  selftests/net: l2_tos_ttl_inherit.sh: Set IPv6 addresses with "nodad".
  net/mlx5e: Fix macsec possible null dereference when updating MAC security entity (SecY)
  net/mlx5e: Fix macsec ssci attribute handling in offload path
  net/mlx5: E-switch, Coverity: overlapping copy
  net/mlx5e: Don't support encap rules with gbp option
  net/mlx5: Fix ptp max frequency adjustment range
  net/mlx5e: Fix memory leak on updating vport counters
  ...

Merge tag 's390-6.2-2' of git://git./linux/kernel/git/s390/linux

Pull s390 fixes from Heiko Carstens:

- Add various missing READ_ONCE() to cmpxchg() loops prevent the
   compiler from potentially generating incorrect code. This includes a
   rather large change to the s390 specific hardware sampling code and
   its current use of cmpxchg_double().

   Do the fix now to get it out of the way of Peter Zijlstra's
   cmpxchg128() work, and have something that can be backported. The
   added new code includes a private 128 bit cmpxchg variant which will
   be removed again after Peter's rework is available. Also note that
   this 128 bit cmpxchg variant is used to implement 128 bit
   READ_ONCE(), while strictly speaking it wouldn't be necessary, and
   _READ_ONCE() should also be sufficient; even though it isn't obvious
   for all converted locations that this is the case. Therefore use this
   implementation for for the sake of clarity and consistency for now.

- Fix ipl report address handling to avoid kdump failures/hangs.

- Fix misuse of #(el)if in kernel decompressor.

- Define RUNTIME_DISCARD_EXIT to fix link error with GNU ld < 2.36,
   caused by the recently changed discard behaviour.

- Make sure _edata and _end symbols are always page aligned.

- The current header guard DEBUG_H in one of the s390 specific header
   files is too generic and conflicts with the ath9k wireless driver.
   Add an _ASM_S390_ prefix to the guard to make it unique.

- Update defconfigs.

* tag 's390-6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: update defconfigs
  KVM: s390: interrupt: use READ_ONCE() before cmpxchg()
  s390/percpu: add READ_ONCE() to arch_this_cpu_to_op_simple()
  s390/cpum_sf: add READ_ONCE() semantics to compare and swap loops
  s390/kexec: fix ipl report address for kdump
  s390: fix -Wundef warning for CONFIG_KERNEL_ZSTD
  s390: define RUNTIME_DISCARD_EXIT to fix link error with GNU ld < 2.36
  s390: expicitly align _edata and _end symbols on page boundary
  s390/debug: add _ASM_S390_ prefix to header guard

Merge tag 'for-linus-6.2-rc4-tag' of git://git./linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:

- two cleanup patches

- a fix of a memory leak in the Xen pvfront driver

- a fix of a locking issue in the Xen hypervisor console driver

* tag 'for-linus-6.2-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/pvcalls: free active map buffer on pvcalls_front_free_map
  hvc/xen: lock console list traversal
  x86/xen: Remove the unused function p2m_index()
  xen: make remove callback of xen driver void returned

Merge tag 'timers-urgent-2023-01-12' of git://git./linux/kernel/git/tip/tip

Pull timer doc fixes from Ingo Molnar:

- Fix various DocBook formatting errors in kernel/time/ that generated
(justified) warnings during a kernel-doc build.

* tag 'timers-urgent-2023-01-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
time: Fix various kernel-doc problems

Merge tag 'perf-urgent-2023-01-12' of git://git./linux/kernel/git/tip/tip

Pull perf events hw enablement from Ingo Molnar:

- More hardware-enablement for Intel Meteor Lake & Emerald Rapid
   systems: pure model ID enumeration additions that do not affect other
   systems.

* tag 'perf-urgent-2023-01-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/uncore: Add Emerald Rapids
  perf/x86/msr: Add Emerald Rapids
  perf/x86/msr: Add Meteor Lake support
  perf/x86/cstate: Add Meteor Lake support

Merge tag 'sched-urgent-2023-01-12' of git://git./linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:

- Fix scheduler frequency invariance bug related to overly long
   tickless periods triggering an integer overflow and disabling the
   feature.

- Fix use-after-free bug in dup_user_cpus_ptr().

- Fix do_set_cpus_allowed() deadlock scenarios related to calling
   kfree() with the pi_lock held. NOTE: the rcu_free() is the 'lazy'
   solution here - we looked at patches to free the structure after the
   pi_lock got dropped, but that looked quite a bit messier - and none
   of this is truly performance critical. We can revisit this if it's
   too lazy of a solution ...

* tag 'sched-urgent-2023-01-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/core: Use kfree_rcu() in do_set_cpus_allowed()
  sched/core: Fix use-after-free bug in dup_user_cpus_ptr()
  sched/core: Fix arch_scale_freq_tick() on tickless systems

Merge tag 'core-urgent-2023-01-12' of git://git./linux/kernel/git/tip/tip

Pull objtool fix from Ingo Molnar:

- Fix objtool to be more permissive with hand-written assembly that
uses non-function symbols in executable sections.

* tag 'core-urgent-2023-01-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Tolerate STT_NOTYPE symbols at end of section

Merge tag 'urgent-nolibc.2023.01.09a' of git://git./linux/kernel/git/paulmck/linux-rcu

Pull nolibc fixes from Paul McKenney:

- The fd_set structure was incorrectly defined as arrays of u32 instead
   of long, which breaks BE64. Fix courtesy of Sven Schnelle.

- S_ISxxx macros were incorrectly testing the bits after applying them
   instead of bitwise ANDing S_FMT with the value. Fix from Warner Losh.

- The mips code was randomly broken due to an unprotected "noreorder"
   directive in the _start code that could prevent the assembler from
   filling delayed slots. This in turn resulted in random other
   instructions being placed into those slots. Fix courtesy of Willy
   Tarreau.

- The current nolibc header layout refrains from including files that
   are not explicitly included by the code using nolibc. Unfortunately,
   this causes build failures when such files contain definitions that
   are used (for example) by libgcc. Example definitions include raise()
   and memset(), which are called by some architectures, but only at
   certain optimization levels. Fix courtesy of Willy Tarreau.

- gcc 11.3 in ARM thumb2 mode at -O2 recognized a memset() construction
   inside the memset() definition. The compiler replaced this
   construction with a call to... memset(). Userland cannot be forced to
   build with -ffreestanding, so an empty asm() statement was introduced
   into the loop the loop in order to prevent the compiler from making
   this unproductive transformation. Fix courtesy of Willy Tarreau.

- Most of the O_* macros were wrong on RISCV because their octal values
   were coded as hexadecimal. This resulted in the getdents64() selftest
   failing. Fix courtesy of Willy Tarreau.

This was tested on x86_64, i386, armv5, armv7, thumb1, thumb2, mips and
riscv, all at -O0, -Os and -O3.

* tag 'urgent-nolibc.2023.01.09a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  tools/nolibc: fix the O_* fcntl/open macro definitions for riscv
  tools/nolibc: prevent gcc from making memset() loop over itself
  tools/nolibc: fix missing includes causing build issues at -O0
  tools/nolibc: restore mips branch ordering in the _start block
  tools/nolibc: Fix S_ISxxx macros
  nolibc: fix fd_set type

r8152: add vendor/device ID pair for Microsoft Devkit

The Microsoft Devkit 2023 is a an ARM64 based machine featuring a
Realtek 8153 USB3.0-to-GBit Ethernet adapter. As in their other
machines, Microsoft uses a custom USB device ID.

Add the respective ID values to the driver. This makes Ethernet work on
the MS Devkit device. The chip has been visually confirmed to be a
RTL8153.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/20230111133228.190801-1-andre.przywara@arm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'spi-fix-v6.2-rc3' of git://git./linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:

- Fixes for long standing issues with accesses to spidev->spi during
   teardown in the spidev userspace driver.

- Rename the newly added spi-cs-setup-ns DT property to be more in line
   with our other delay properties before it becomes ABI.

- A few driver specific fixes.

* tag 'spi-fix-v6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: spidev: remove debug messages that access spidev->spi without locking
  spi: spidev: fix a race condition when accessing spidev->spi
  spi: Rename spi-cs-setup-ns property to spi-cs-setup-delay-ns
  spi: dt-bindings: Rename spi-cs-setup-ns to spi-cs-setup-delay-ns
  spi: cadence: Fix busy cycles calculation
  spi: mediatek: Enable irq before the spi registration

Merge tag 'regulator-fix-v6.2-rc3' of git://git./linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
"A couple of small driver specific fixes, one of which I queued for 6.1
  but didn't actually send out so has had *plenty* of testing in -next"

* tag 'regulator-fix-v6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: qcom-rpmh: PM8550 ldo11 regulator is an nldo
  regulator: da9211: Use irq handler when ready

Merge tag 'mtd/fixes-for-6.2-rc4' of git://git./linux/kernel/git/mtd/linux

Pull MTD fixes from Miquel Raynal:

- cfi: Allow building spi-intel standalone to avoid build issues

- parsers: scpart: Fix __udivdi3 undefined on mips

- parsers: tplink_safeloader: Fix potential memory leak during parsing

- Update email of Tudor Ambarus

* tag 'mtd/fixes-for-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
  MAINTAINERS: Update email of Tudor Ambarus
  mtd: cfi: allow building spi-intel standalone
  mtd: parsers: scpart: fix __udivdi3 undefined on mips
  mtd: parsers: Fix potential memory leak in mtd_parser_tplink_safeloader_parse()

Merge branch 'vsock-update-tools-and-error-handling'

Arseniy Krasnov says:

====================
vsock: update tools and error handling

Patchset consists of two parts:

1) Kernel patch
One patch from Bobby Eshleman. I took single patch from Bobby:
https://lore.kernel.org/lkml/d81818b868216c774613dd03641fcfe63cc55a45
.1660362668.git.bobby.eshleman@bytedance.com/ and use only part for
af_vsock.c, as VMCI and Hyper-V parts were rejected.

I used it, because for SOCK_SEQPACKET big messages handling was broken -
ENOMEM was returned instead of EMSGSIZE. And anyway, current logic which
always replaces any error code returned by transport to ENOMEM looks
strange for me also(for example in EMSGSIZE case it was changed to
ENOMEM).

2) Tool patches
Since there is work on several significant updates for vsock(virtio/
vsock especially): skbuff, DGRAM, zerocopy rx/tx, so I think that this
patchset will be useful.

This patchset updates vsock tests and tools a little bit. First of all
it updates test suite: two new tests are added. One test is reworked
message bound test. Now it is more complex. Instead of sending 1 byte
messages with one MSG_EOR bit, it sends messages of random length(one
half of messages are smaller than page size, second half are bigger)
with random number of MSG_EOR bits set. Receiver also don't know total
number of messages. Message bounds control is maintained by hash sum
of messages length calculation. Second test is for SOCK_SEQPACKET - it
tries to send message with length more than allowed. I think both tests
will be useful for DGRAM support also.

Third thing that this patchset adds is small utility to test vsock
performance for both rx and tx. I think this util could be useful as
'iperf'/'uperf', because:
1) It is small comparing to 'iperf' or 'uperf', so it very easy to add
   new mode or feature to it(especially vsock specific).
2) It allows to set SO_RCVLOWAT and SO_VM_SOCKETS_BUFFER_SIZE option.
   Whole throughtput depends on both parameters.
3) It is located in the kernel source tree, so it could be updated by
   the same patchset which changes related kernel functionality in vsock.

I used this util very often to check performance of my rx zerocopy
support(this tool has rx zerocopy support, but not in this patchset).

Here is comparison of outputs from three utils: 'iperf', 'uperf' and
'vsock_perf'. In all three cases sender was at guest side. rx and
tx buffers were always 64Kb(because by default 'uperf' uses 8K).

iperf:

   [ ID] Interval           Transfer     Bitrate
   [  5]   0.00-10.00  sec  12.8 GBytes  11.0 Gbits/sec sender
   [  5]   0.00-10.00  sec  12.8 GBytes  11.0 Gbits/sec receiver

uperf:

   Total     16.27GB /  11.36(s) =    12.30Gb/s       23455op/s

vsock_perf:

   tx performance: 12.301529 Gbits/s
   rx performance: 12.288011 Gbits/s

Results are almost same in all three cases.

Patchset was rebased and tested on skbuff v9 patch from Bobby Eshleman:
https://lore.kernel.org/netdev/20230107002937.899605-1-bobby.eshleman@bytedance.com/

====================

Link: https://lore.kernel.org/r/67cd2d0a-1c58-baac-7b39-b8d4ea44f719@sberdevices.ru
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

test/vsock: vsock_perf utility

This adds utility to check vsock rx/tx performance.

Usage as sender:
./vsock_perf --sender <cid> --port <port> --bytes <bytes to send>
Usage as receiver:
./vsock_perf --port <port> --rcvlowat <SO_RCVLOWAT>

Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

test/vsock: add big message test

This adds test for sending message, bigger than peer's buffer size.
For SOCK_SEQPACKET socket it must fail, as this type of socket has
message size limit.

Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

test/vsock: rework message bounds test

This updates message bound test making it more complex. Instead of
sending 1 bytes messages with one MSG_EOR bit, it sends messages of
random length(one half of messages are smaller than page size, second
half are bigger) with random number of MSG_EOR bits set. Receiver
also don't know total number of messages.

Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

vsock: return errors other than -ENOMEM to socket

This removes behaviour, where error code returned from any transport
was always switched to ENOMEM. For example when user tries to send too
big message via SEQPACKET socket, transport layers return EMSGSIZE, but
this error code was always replaced with ENOMEM and returned to user.

Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"Ten small fixes (less the one that cleaned up a reverted removal),
  nine in drivers of which the ufs one is the most critical.

  The single core patch is a minor speedup to error handling"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: libsas: Grab the ATA port lock in sas_ata_device_link_abort()
  scsi: hisi_sas: Fix tag freeing for reserved tags
  scsi: ufs: core: WLUN suspend SSU/enter hibern8 fail recovery
  scsi: scsi_debug: Delete unreachable code in inquiry_vpd_b0()
  scsi: mpi3mr: Refer CONFIG_SCSI_MPI3MR in Makefile
  scsi: core: scsi_error: Do not queue pointless abort workqueue functions
  scsi: storvsc: Fix swiotlb bounce buffer leak in confidential VM
  scsi: iscsi: Fix multiple iSCSI session unbind events sent to userspace
  scsi: mpi3mr: Remove usage of dma_get_required_mask() API
  scsi: mpt3sas: Remove usage of dma_get_required_mask() API

net: ethernet: mtk_wed: get rid of queue lock for rx queue

Queue spinlock is currently held in mtk_wed_wo_queue_rx_clean and
mtk_wed_wo_queue_refill routines for MTK Wireless Ethernet Dispatcher
MCU rx queue. mtk_wed_wo_queue_refill() is running during initialization
and in rx tasklet while mtk_wed_wo_queue_rx_clean() is running in
mtk_wed_wo_hw_deinit() during hw de-init phase after rx tasklet has been
disabled. Since mtk_wed_wo_queue_rx_clean and mtk_wed_wo_queue_refill
routines can't run concurrently get rid of spinlock for mcu rx queue.

Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/36ec3b729542ea60898471d890796f745479ba32.1673342990.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: add aux timestamps fifo clearance wait

Add timeout polling wait for auxiliary timestamps snapshot FIFO clear bit
(ATSFC) to clear. This is to ensure no residue fifo value is being read
erroneously.

Fixes: f4da56529da6 ("net: stmmac: Add support for external trigger timestamping")
Cc: <stable@vger.kernel.org> # 5.10.x
Signed-off-by: Noor Azura Ahmad Tarmizi <noor.azura.ahmad.tarmizi@intel.com>
Link: https://lore.kernel.org/r/20230111050200.2130-1-noor.azura.ahmad.tarmizi@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch '10GbE' of git://git./linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-01-10 (ixgbe, igc, iavf)

This series contains updates to ixgbe, igc, and iavf drivers.

Yang Yingliang adds calls to pci_dev_put() for proper ref count tracking
on ixgbe.

Christopher adds setting of Toggle on Target Time bits for proper
pulse per second (PPS) synchronization for igc.

Daniil Tatianin fixes, likely, copy/paste issue that misreported
destination instead of source for IP mask for iavf error.

* '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  iavf/iavf_main: actually log ->src mask when talking about it
  igc: Fix PPS delta between two synchronized end-points
  ixgbe: fix pci device refcount leak
====================

Link: https://lore.kernel.org/r/20230110223825.648544-1-anthony.l.nguyen@intel.com
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: keep the instance mutex alive until references are gone

The reference needs to keep the instance memory around, but also
the instance lock must remain valid. Users will take the lock,
check registration status and release the lock. mutex_destroy()
etc. belong in the same place as the freeing of the memory.

Unfortunately lockdep_unregister_key() sleeps so we need
to switch the an rcu_work.

Note that the problem is a bit hard to repro, because
devlink_pernet_pre_exit() iterates over registered instances.
AFAIU the instances must get devlink_free()d concurrently with
the namespace getting deleted for the problem to occur.

Reported-by: syzbot+d94d214ea473e218fc89@syzkaller.appspotmail.com
Reported-by: syzbot+9f0dd863b87113935acf@syzkaller.appspotmail.com
Fixes: 9053637e0da7 ("devlink: remove the registration guarantee of references")
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20230111042908.988199-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnxt: make sure we return pages to the pool

Before the commit under Fixes the page would have been released
from the pool before the napi_alloc_skb() call, so normal page
freeing was fine (released page == no longer in the pool).

After the change we just mark the page for recycling so it's still
in the pool if the skb alloc fails, we need to recycle.

Same commit added the same bug in the new bnxt_rx_multi_page_skb().

Fixes: 1dc4c557bfed ("bnxt: adding bnxt_xdp_build_skb to build skb from multibuffer xdp_buff")
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Link: https://lore.kernel.org/r/20230111042547.987749-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: hns3: fix wrong use of rss size during VF rss config

Currently, it used old rss size to get current tc mode. As a result, the
rss size is updated, but the tc mode is still configured based on the old
rss size.

So this patch fixes it by using the new rss size in both process.

Fixes: 93969dc14fcd ("net: hns3: refactor VF rss init APIs with new common rss init APIs")
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Hao Lan <lanhao@huawei.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://lore.kernel.org/r/20230110115359.10163-1-lanhao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

r8169: disable ASPM in case of tx timeout

There are still single reports of systems where ASPM incompatibilities
cause tx timeouts. It's not clear whom to blame, so let's disable
ASPM in case of a tx timeout.

v2:
- add one-time warning for informing the user

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://lore.kernel.org/r/92369a92-dc32-4529-0509-11459ba0e391@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'perf-tools-fixes-for-v6.2-2-2023-01-11' of git://git./linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

- Make 'perf kmem' cope with the removal of some
   kmem:kmem_cache_alloc_node and kmem:kmalloc_node in the
   11e9734bcb6a7361 ("mm/slab_common: unify NUMA and UMA version of
   tracepoints") commit, making sure it works with Linux >= 6.2 as well
   as with older kernels where those tracepoints are present.

- Also make it handle the new "node" kmem:kmalloc and
   kmem:kmem_cache_alloc tracepoint field introduced in that same
   commit.

- Fix hardware tracing PMU address filter duplicate symbol selection,
   that was preventing to match with static functions with the same name
   present in different object files.

- Fix regression on what linux/types.h file gets used to build the "BPF
   prologue" 'perf test' entry, the system one lacks the fmode_t
   definition used in this test, so provide that type in the test
   itself.

- Avoid build breakage with libbpf < 0.8.0 + LIBBPF_DYNAMIC=1. If the
   user asks for linking with the libbpf package provided by the distro,
   then it has to be >= 0.8.0. Using the libbpf supplied with the kernel
   would be a fallback in that case.

- Fix the build when libbpf isn't available or explicitly disabled via
   NO_LIBBPF=1.

- Don't try to install libtraceevent plugins as its not anymore in the
   kernel sources and will thus always fail.

* tag 'perf-tools-fixes-for-v6.2-2-2023-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf auxtrace: Fix address filter duplicate symbol selection
  perf bpf: Avoid build breakage with libbpf < 0.8.0 + LIBBPF_DYNAMIC=1
  perf build: Fix build error when NO_LIBBPF=1
  perf tools: Don't install libtraceevent plugins as its not anymore in the kernel sources
  perf kmem: Support field "node" in evsel__process_alloc_event() coping with recent tracepoint restructuring
  perf kmem: Support legacy tracepoints
  perf build: Properly guard libbpf includes
  perf tests bpf prologue: Fix bpf-script-test-prologue test compile issue with clang

s390: update defconfigs

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

Merge branch 'dt-bindings-first-batch-of-dt-schema-conversions-for-amlogic-meson-bindings'

Neil Armstrong says:

====================
dt-bindings: first batch of dt-schema conversions for Amlogic Meson bindings

Batch conversion of the following bindings:
[...]
- mdio-mux-meson-g12a.txt
====================

Link: https://lore.kernel.org/r/20221117-b4-amlogic-bindings-convert-v2-0-36ad050bb625@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: convert mdio-mux-meson-g12a.txt to dt-schema

Convert MDIO bus multiplexer/glue of Amlogic G12a SoC family bindings
to dt-schema.

Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

perf auxtrace: Fix address filter duplicate symbol selection

When a match has been made to the nth duplicate symbol, return
success not error.

Example:

  Before:

    $ cat file.c
    cat: file.c: No such file or directory
    $ cat file1.c
    #include <stdio.h>

    static void func(void)
    {
            printf("First func\n");
    }

    void other(void);

    int main()
    {
            func();
            other();
            return 0;
    }
    $ cat file2.c
    #include <stdio.h>

    static void func(void)
    {
            printf("Second func\n");
    }

    void other(void)
    {
            func();
    }

    $ gcc -Wall -Wextra -o test file1.c file2.c
    $ perf record -e intel_pt//u --filter 'filter func @ ./test' -- ./test
    Multiple symbols with name 'func'
    #1      0x1149  l       func
                    which is near           main
    #2      0x1179  l       func
                    which is near           other
    Disambiguate symbol name by inserting #n after the name e.g. func #2
    Or select a global symbol by inserting #0 or #g or #G
    Failed to parse address filter: 'filter func @ ./test'
    Filter format is: filter|start|stop|tracestop <start symbol or address> [/ <end symbol or size>] [@<file name>]
    Where multiple filters are separated by space or comma.
    $ perf record -e intel_pt//u --filter 'filter func #2 @ ./test' -- ./test
    Failed to parse address filter: 'filter func #2 @ ./test'
    Filter format is: filter|start|stop|tracestop <start symbol or address> [/ <end symbol or size>] [@<file name>]
    Where multiple filters are separated by space or comma.

  After:

    $ perf record -e intel_pt//u --filter 'filter func #2 @ ./test' -- ./test
    First func
    Second func
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.016 MB perf.data ]
    $ perf script --itrace=b -Ftime,flags,ip,sym,addr --ns
    1231062.526977619:   tr strt                               0 [unknown] =>     558495708179 func
    1231062.526977619:   tr end  call               558495708188 func =>     558495708050 _init
    1231062.526979286:   tr strt                               0 [unknown] =>     55849570818d func
    1231062.526979286:   tr end  return             55849570818f func =>     55849570819d other

Fixes: 1b36c03e356936d6 ("perf record: Add support for using symbols in address filters")
Reported-by: Dmitrii Dolgov <9erthalion6@gmail.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Dmitry Dolgov <9erthalion6@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230110185659.15979-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

KVM: s390: interrupt: use READ_ONCE() before cmpxchg()

Use READ_ONCE() before cmpxchg() to prevent that the compiler generates
code that fetches the to be compared old value several times from memory.

Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Link: https://lore.kernel.org/r/20230109145456.2895385-1-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/percpu: add READ_ONCE() to arch_this_cpu_to_op_simple()

Make sure that *ptr__ within arch_this_cpu_to_op_simple() is only
dereferenced once by using READ_ONCE(). Otherwise the compiler could
generate incorrect code.

Cc: <stable@vger.kernel.org>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

s390/cpum_sf: add READ_ONCE() semantics to compare and swap loops

The current cmpxchg_double() loops within the perf hw sampling code do not
have READ_ONCE() semantics to read the old value from memory. This allows
the compiler to generate code which reads the "old" value several times
from memory, which again allows for inconsistencies.

For example:

        /* Reset trailer (using compare-double-and-swap) */
        do {
                te_flags = te->flags & ~SDB_TE_BUFFER_FULL_MASK;
                te_flags |= SDB_TE_ALERT_REQ_MASK;
        } while (!cmpxchg_double(&te->flags, &te->overflow,
                 te->flags, te->overflow,
                 te_flags, 0ULL));

The compiler could generate code where te->flags used within the
cmpxchg_double() call may be refetched from memory and which is not
necessarily identical to the previous read version which was used to
generate te_flags. Which in turn means that an incorrect update could
happen.

Fix this by adding READ_ONCE() semantics to all cmpxchg_double()
loops. Given that READ_ONCE() cannot generate code on s390 which atomically
reads 16 bytes, use a private compare-and-swap-double implementation to
achieve that.

Also replace cmpxchg_double() with the private implementation to be able to
re-use the old value within the loops.

As a side effect this converts the whole code to only use bit fields
to read and modify bits within the hws trailer header.

Reported-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Reviewed-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/linux-s390/Y71QJBhNTIatvxUT@osiris/T/#ma14e2a5f7aa8ed4b94b6f9576799b3ad9c60f333
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

spi: Merge rename of spi-cs-setup-ns DT property

The newly added spi-cs-setup-ns doesn't really fit with the existing
property names for delays, rename it so that it does before it makes it
into a release and becomes ABI.

spi: spidev: remove debug messages that access spidev->spi without locking

The two debug messages in spidev_open() dereference spidev->spi without
taking the lock and without checking if it's not null. This can lead to
a crash. Drop the messages as they're not needed - the user-space will
get informed about ENOMEM with the syscall return value.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Link: https://lore.kernel.org/r/20230106100719.196243-2-brgl@bgdev.pl
Signed-off-by: Mark Brown <broonie@kernel.org>

spi: spidev: fix a race condition when accessing spidev->spi

There's a spinlock in place that is taken in file_operations callbacks
whenever we check if spidev->spi is still alive (not null). It's also
taken when spidev->spi is set to NULL in remove().

This however doesn't protect the code against driver unbind event while
one of the syscalls is still in progress. To that end we need a lock taken
continuously as long as we may still access spidev->spi. As both the file
ops and the remove callback are never called from interrupt context, we
can replace the spinlock with a mutex.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Link: https://lore.kernel.org/r/20230106100719.196243-1-brgl@bgdev.pl
Signed-off-by: Mark Brown <broonie@kernel.org>

Merge tag 'mlx5-fixes-2023-01-09' of git://git./linux/kernel/git/saeed/linux

mlx5-fixes-2023-01-09

ipv6: raw: Deduct extension header length in rawv6_push_pending_frames

The total cork length created by ip6_append_data includes extension
headers, so we must exclude them when comparing them against the
IPV6_CHECKSUM offset which does not include extension headers.

Reported-by: Kyle Zeng <zengyhkyle@gmail.com>
Fixes: 357b40a18b04 ("[IPV6]: IPV6_CHECKSUM socket option can corrupt kernel memory")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'mlx5-updates-2023-01-10' of git://git./linux/kernel/git/saeed/linux

mlx5-updates-2023-01-10

1) From Gal: Add debugfs entries for netdev nic driver
   - ktls, flow steering and hairpin info
   - useful for debug and performance analysis
   - e.g hairpin queue attributes, dump ktls tx pool size, etc

2) From Maher: Update shared buffer configuration on PFC commands
   2.1) For every change of buffer's headroom, recalculate the size of shared
       buffer to be equal to "total_buffer_size" - "new_headroom_size".
       The new shared buffer size will be split in ratio of 3:1 between
       lossy and lossless pools, respectively.

   2.2) For each port buffer change, count the number of lossless buffers.
       If there is only one lossless buffer, then set its lossless pool
       usage threshold to be infinite. Otherwise, if there is more than
       one lossless buffer, set a usage threshold for each lossless buffer.

    While at it, add more verbosity to debug prints when handling user
    commands, to assist in future debug.

3) From Tariq: Throttle high rate FW commands

4) From Shay: Properly initialize management PF

5) Various cleanup patches

Merge branch 'NCN26000-PLCA-RS-support'

Piergiorgio Beruto says:

====================
net: add PLCA RS support and onsemi NCN26000

This patchset adds support for getting/setting the Physical Layer
Collision Avoidace (PLCA) Reconciliation Sublayer (RS) configuration and
status on Ethernet PHYs that supports it.

PLCA is a feature that provides improved media-access performance in terms
of throughput, latency and fairness for multi-drop (P2MP) half-duplex PHYs.
PLCA is defined in Clause 148 of the IEEE802.3 specifications as amended
by 802.3cg-2019. Currently, PLCA is supported by the 10BASE-T1S single-pair
Ethernet PHY defined in the same standard and related amendments. The OPEN
Alliance SIG TC14 defines additional specifications for the 10BASE-T1S PHY,
including a standard register map for PHYs that embeds the PLCA RS (see
PLCA management registers at https://www.opensig.org/about/specifications/).

The changes proposed herein add the appropriate ethtool netlink interface
for configuring the PLCA RS on PHYs that supports it. A separate patchset
further modifies the ethtool userspace program to show and modify the
configuration/status of the PLCA RS.

Additionally, this patchset adds support for the onsemi NCN26000
Industrial Ethernet 10BASE-T1S PHY that uses the newly added PLCA
infrastructure.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net/phy: add driver for the onsemi NCN26000 10BASE-T1S PHY

This patch adds support for the onsemi NCN26000 10BASE-T1S industrial
Ethernet PHY. The driver supports Point-to-Multipoint operation without
auto-negotiation and with link control handling. The PHY also features
PLCA for improving performance in P2MP mode.

Signed-off-by: Piergiorgio Beruto <piergiorgio.beruto@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net/phy: add helpers to get/set PLCA configuration

This patch adds support in phylib to read/write PLCA configuration for
Ethernet PHYs that support the OPEN Alliance "10BASE-T1S PLCA
Management Registers" specifications. These can be found at
https://www.opensig.org/about/specifications/

Signed-off-by: Piergiorgio Beruto <piergiorgio.beruto@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net/phy: add connection between ethtool and phylib for PLCA

This patch adds the required connection between netlink ethtool and
phylib to resolve PLCA get/set config and get status messages.

Signed-off-by: Piergiorgio Beruto <piergiorgio.beruto@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net/phy: add the link modes for the 10BASE-T1S Ethernet PHY

This patch adds the link modes for the IEEE 802.3cg Clause 147 10BASE-T1S
Ethernet PHY. According to the specifications, the 10BASE-T1S supports
Point-To-Point Full-Duplex, Point-To-Point Half-Duplex and/or
Point-To-Multipoint (AKA Multi-Drop) Half-Duplex operations.

Signed-off-by: Piergiorgio Beruto <piergiorgio.beruto@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ethtool: add netlink interface for the PLCA RS

Add support for configuring the PLCA Reconciliation Sublayer on
multi-drop PHYs that support IEEE802.3cg-2019 Clause 148 (e.g.,
10BASE-T1S). This patch adds the appropriate netlink interface
to ethtool.

Signed-off-by: Piergiorgio Beruto <piergiorgio.beruto@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: lan966x: check for ptp to be enabled in lan966x_ptp_deinit()

If ptp was not enabled due to missing IRQ for instance,
lan966x_ptp_deinit() will dereference NULL pointers.

Fixes: d096459494a8 ("net: lan966x: Add support for ptp clocks")
Signed-off-by: Clément Léger <clement.leger@bootlin.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: Use kzalloc() in mlx5e_accel_fs_tcp_create()

'accel_tcp' is allocted by kvzalloc() now, which is a small chunk.
Use kzalloc() directly instead of kvzalloc().

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: remove redundant ret variable

Return value from mlx5dr_send_postsend_action() directly instead of taking
this in another redundant variable.

Signed-off-by: zhang songyi <zhang.songyi@zte.com.cn>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Replace 0-length array with flexible array

Zero-length arrays are deprecated[1]. Replace struct mlx5e_rx_wqe_cyc's
"data" 0-length array with a flexible array. Detected with GCC 13,
using -fstrict-flex-arrays=3:

drivers/net/ethernet/mellanox/mlx5/core/en_main.c: In function 'mlx5e_alloc_rq':
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:827:42: warning: array subscript f is outside array bounds of 'struct mlx5_wqe_data_seg[0]' [-Warray-bounds=]
  827 |                                 wqe->data[f].byte_count = 0;
      |                                 ~~~~~~~~~^~~
In file included from drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h:11,
                 from drivers/net/ethernet/mellanox/mlx5/core/eswitch.h:48,
                 from drivers/net/ethernet/mellanox/mlx5/core/en_main.c:42:
drivers/net/ethernet/mellanox/mlx5/core/en.h:250:39: note: while referencing 'data'
  250 |         struct mlx5_wqe_data_seg      data[0];
      |                                       ^~~~

[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays

Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: netdev@vger.kernel.org
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Replace zero-length array with flexible-array member

Zero-length arrays are deprecated[1] and we are moving towards
adopting C99 flexible-array members instead. So, replace zero-length
array declaration in struct mlx5e_flow_meter_aso_obj with flex-array
member.

This helps with the ongoing efforts to tighten the FORTIFY_SOURCE
routines on memcpy() and help us make progress towards globally
enabling -fstrict-flex-arrays=3 [2].

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays
Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html
Link: https://github.com/KSPP/linux/issues/78
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Prevent high-rate FW commands from populating all slots

Certain connection-based device-offload protocols (like TLS) use
per-connection HW objects to track the state, maintain the context, and
perform the offload properly. Some of these objects are created,
modified, and destroyed via FW commands. Under high connection rate,
this type of FW commands might continuously populate all slots of the FW
command interface and throttle it, while starving other critical control
FW commands.

Limit these throttle commands to using only up to a portion (half) of
the FW command interface slots. FW commands maximal rate is not hit, and
the same high rate is still reached when applying this limitation.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Introduce and use opcode getter in command interface

Introduce an opcode getter in the FW command interface, and use it.
Initialize the entry's opcode field early in cmd_alloc_ent() and use it
when possible.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Enable management PF initialization

Enable initialization of DPU Management PF, which is a new loopback PF
designed for communication with BMC.
For now Management PF doesn't support nor require most upper layer
protocols so avoid them.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Add hairpin debugfs files

We refer to a TC NIC rule that involves forwarding as "hairpin".
Hairpin queues are mlx5 hardware specific implementation for hardware
forwarding of such packets.

For debug purposes, introduce debugfs files which:
* Expose the number of active hairpins
* Dump the hairpin table
* Allow control over the number and size of the hairpin queues instead
  of the hard-coded values.

This allows us to get visibility of the feature in order to improve it
for next generation hardware.

Add debugfs files:
  fs/tc/hairpin_num_active
  fs/tc/hairpin_num_queues
  fs/tc/hairpin_queue_size
  fs/tc/hairpin_table_dump

Note that the new values will only take effect on the next queues
creation, it does not affect existing queues.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Add flow steering debugfs directory

Add a debugfs directory for flow steering related information.
The directory is currently empty, and will hold the 'tc' subdirectory in
a downstream patch.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Add hairpin params structure

In preparation for downstream work to expose hairpin queues parameters,
introduce a hairpin parameters struct as part of the tc structure.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: kTLS, Add debugfs

Add TLS debugfs to improve observability by exposing the size of the tls
TX pool.

To observe the size of the TX pool:
$ cat /sys/kernel/debug/mlx5/<pci>/nic/tls/tx/pool_size

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Co-developed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Add Ethernet driver debugfs

Similar to the mlx5_core debugfs, lay the groundwork for mlx5e debugfs
files under /sys/kernel/debug/mlx5/<pci>/nic/..

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Update shared buffer along with device buffer changes

Currently, the user can modify device's receive buffer size, modify the
mapping between QoS priority groups to buffers and change the buffer
state to become lossy/lossless via pfc command.

However, the shared receive buffer pool alignments, as a result of
such commands, is performed only when the shared buffer is in FW ownership.
When a user changes the mapping of priority groups or buffer size,
the shared buffer is moved to SW ownership.

Therefore, for devices that support shared buffer, handle the shared buffer
alignments in accordance to user's desired configurations.

Meaning, the following will be performed:
1. For every change of buffer's headroom, recalculate the size of shared
   buffer to be equal to "total_buffer_size" - "new_headroom_size".
   The new shared buffer size will be split in ratio of 3:1 between
   lossy and lossless pools, respectively.

2. For each port buffer change, count the number of lossless buffers.
   If there is only one lossless buffer, then set its lossless pool
   usage threshold to be infinite. Otherwise, if there is more than
   one lossless buffer, set a usage threshold for each lossless buffer.

While at it, add more verbosity to debug prints when handling user
commands, to assist in future debug.

Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Add API to query/modify SBPR and SBCM registers

To allow users to configure shared receive buffer parameters through
dcbnl callbacks, expose an API to query and modify SBPR and SBCM registers,
which will be used in the upcoming patch.

Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Expose shared buffer registers bits and structs

Add the shared receive buffer management and configuration registers:
1. SBPR - Shared Buffer Pools Register
2. SBCM - Shared Buffer Class Management Register

Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-01-09 (ice)

This series contains updates to ice driver only.

Jiasheng Jiang frees allocated cmd_buf if write_buf allocation failed to
prevent memory leak.

Yuan Can adds check, and proper cleanup, of gnss_tty_port allocation call
to avoid memory leaks.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: Add check for kzalloc
ice: Fix potential memory leak in ice_gnss_tty_write()
====================

Link: https://lore.kernel.org/r/20230109225358.3478060-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: sched: disallow noqueue for qdisc classes

While experimenting with applying noqueue to a classful queue discipline,
we discovered a NULL pointer dereference in the __dev_queue_xmit()
path that generates a kernel OOPS:

    # dev=enp0s5
    # tc qdisc replace dev $dev root handle 1: htb default 1
    # tc class add dev $dev parent 1: classid 1:1 htb rate 10mbit
    # tc qdisc add dev $dev parent 1:1 handle 10: noqueue
    # ping -I $dev -w 1 -c 1 1.1.1.1

[    2.172856] BUG: kernel NULL pointer dereference, address: 0000000000000000
[    2.173217] #PF: supervisor instruction fetch in kernel mode
...
[    2.178451] Call Trace:
[    2.178577]  <TASK>
[    2.178686]  htb_enqueue+0x1c8/0x370
[    2.178880]  dev_qdisc_enqueue+0x15/0x90
[    2.179093]  __dev_queue_xmit+0x798/0xd00
[    2.179305]  ? _raw_write_lock_bh+0xe/0x30
[    2.179522]  ? __local_bh_enable_ip+0x32/0x70
[    2.179759]  ? ___neigh_create+0x610/0x840
[    2.179968]  ? eth_header+0x21/0xc0
[    2.180144]  ip_finish_output2+0x15e/0x4f0
[    2.180348]  ? dst_output+0x30/0x30
[    2.180525]  ip_push_pending_frames+0x9d/0xb0
[    2.180739]  raw_sendmsg+0x601/0xcb0
[    2.180916]  ? _raw_spin_trylock+0xe/0x50
[    2.181112]  ? _raw_spin_unlock_irqrestore+0x16/0x30
[    2.181354]  ? get_page_from_freelist+0xcd6/0xdf0
[    2.181594]  ? sock_sendmsg+0x56/0x60
[    2.181781]  sock_sendmsg+0x56/0x60
[    2.181958]  __sys_sendto+0xf7/0x160
[    2.182139]  ? handle_mm_fault+0x6e/0x1d0
[    2.182366]  ? do_user_addr_fault+0x1e1/0x660
[    2.182627]  __x64_sys_sendto+0x1b/0x30
[    2.182881]  do_syscall_64+0x38/0x90
[    2.183085]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
...
[    2.187402]  </TASK>

Previously in commit d66d6c3152e8 ("net: sched: register noqueue
qdisc"), NULL was set for the noqueue discipline on noqueue init
so that __dev_queue_xmit() falls through for the noqueue case. This
also sets a bypass of the enqueue NULL check in the
register_qdisc() function for the struct noqueue_disc_ops.

Classful queue disciplines make it past the NULL check in
__dev_queue_xmit() because the discipline is set to htb (in this case),
and then in the call to __dev_xmit_skb(), it calls into htb_enqueue()
which grabs a leaf node for a class and then calls qdisc_enqueue() by
passing in a queue discipline which assumes ->enqueue() is not set to NULL.

Fix this by not allowing classes to be assigned to the noqueue
discipline. Linux TC Notes states that classes cannot be set to
the noqueue discipline. [1] Let's enforce that here.

Links:
1. https://linux-tc-notes.sourceforge.net/tc/doc/sch_noqueue.txt

Fixes: d66d6c3152e8 ("net: sched: register noqueue qdisc")
Cc: stable@vger.kernel.org
Signed-off-by: Frederick Lawler <fred@cloudflare.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/r/20230109163906.706000-1-fred@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

qed: fix a typo in comment

Fix a typo of "permision" which should be "permission".

Signed-off-by: Dai Shixin <dai.shixin@zte.com.cn>
Signed-off-by: Yang Yang <yang.yang29@zte.com.cn>
Link: https://lore.kernel.org/r/202301091935262709751@zte.com.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-mdio-start-separating-c22-and-c45'

Michael Walle says:

====================
net: mdio: Start separating C22 and C45

This patch set starts the separation of C22 and C45 MDIO bus
transactions at the API level to the MDIO Bus drivers. C45 read and
write ops are added to the MDIO bus driver structure, and the MDIO
core will try to use these ops if requested to perform a C45
transfer. If not available a fallback to the older API is made, to
allow backwards compatibility until all drivers are converted.

A few drivers are then converted to this new API.

The core DSA patch was dropped for now as there is still an ongoing
discussion. It will be picked up in a later series again.

v2: https://lore.kernel.org/r/20221227-v6-2-rc1-c45-seperation-v2-0-ddb37710e5a7@walle.cc
v1: https://lore.kernel.org/r/20220508153049.427227-1-andrew@lunn.ch
====================

Link: https://lore.kernel.org/r/20221227-v6-2-rc1-c45-seperation-v3-0-ade1deb438da@walle.cc
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: mv88e6xxx: Separate C22 and C45 transactions

The global2 SMI MDIO bus driver can perform both C22 and C45
transfers. Create separate functions for each and register the C45
versions using the new API calls where appropriate. Update the SERDES
code to make use of these new accessors.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mdio: add mdiobus_c45_read/write_nested helpers

Some DSA devices pass through PHY access to the MDIO bus the switch is
on. Add C45 versions of the current C22 helpers for nested accesses to
MDIO busses, so that C22 and C45 can be separated in these DSA
drivers.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: fec: Separate C22 and C45 transactions

The fec MDIO bus driver can perform both C22 and C45 transfers.
Create separate functions for each and register the C45 versions using
the new API calls where appropriate.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mdio: xgmac_mdio: Separate C22 and C45 transactions

The xgmac MDIO bus driver can perform both C22 and C45 transfers.
Create separate functions for each and register the C45 versions using
the new API calls where appropriate.

While at it, remove the misleading comment. According to Vladimir
Oltean:
- miimcom is a register accessed by fsl_pq_mdio.c, not by xgmac_mdio.c
- "device dev" doesn't really refer to anything (maybe "dev_addr").
- I don't understand what is meant by the comment "All PHY
   configuration has to be done through the TSEC1 MIIM regs". Or rather
   said, I think I understand, but it is irrelevant to the driver for 2
   reasons:
    * TSEC devices use the fsl_pq_mdio.c driver, not this one
    * It doesn't matter to this driver whose TSEC registers are used for
      MDIO access. The driver just works with the registers it's given,
      which is a concern for the device tree.
- barring the above, the rest just describes the MDIO bus API, which is
   superfluous

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mdio: mvmdio: Convert XSMI bus to new API

The marvell MDIO driver supports two different hardware blocks. The
XSMI block is C45 only. Convert this block to the new API, and only
populate the c45 calls in the bus structure.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mdio: mdio-bitbang: Separate C22 and C45 transactions

The bitbbanging bus driver can perform both C22 and C45 transfers.
Create separate functions for each and register the C45 versions using
the new driver API calls.

The SH Ethernet driver places wrappers around these functions. In
order to not break boards which might be using C45, add similar
wrappers for C45 operations.

Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mdio: Move mdiobus_c45_addr() next to users

Now that mdiobus_c45_addr() is only used within the MDIO code during
fallback, move the function next to its only users. This function
should not be used any more in drivers, the c45 helpers should be used
in its place, so hiding it away will prevent any new users from being
added.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mdio: C22 is now optional, EOPNOTSUPP if not provided

When performing a C22 operation, check that the bus driver actually
provides the methods, and return -EOPNOTSUPP if not. C45 only busses
do exist, and in future their C22 methods will be NULL.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mdio: mdiobus_register: update validation test

Now that C45 uses its own read/write methods, the validation performed
when a bus is registers needs updating. All combinations of C22 and
C45 are supported, but both read and write methods must be provided,
read only busses are not supported etc.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: pcs: pcs-xpcs: Use C45 MDIO API

Convert the PCS-XPCS driver to make use of the C45 MDIO bus API for
modify_change().

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mdio: Add dedicated C45 API to MDIO bus drivers

Currently C22 and C45 transactions are mixed over a combined API calls
which make use of a special bit in the reg address to indicate if a
C45 transaction should be performed. This makes it impossible to know
if the bus driver actually supports C45. Additionally, many C22 only
drivers don't return -EOPNOTSUPP when asked to perform a C45
transaction, they mistaking perform a C22 transaction.

This is the first step to cleanly separate C22 from C45. To maintain
backwards compatibility until all drivers which are capable of
performing C45 are converted to this new API, the helper functions
will fall back to the older API if the new API is not
supported. Eventually this fallback will be removed.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'nfsd-6.2-3' of git://git./linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

- Fix a race when creating NFSv4 files

- Revert the use of relaxed bitops

* tag 'nfsd-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  NFSD: Use set_bit(RQ_DROPME)
  Revert "SUNRPC: Use RMW bitops in single-threaded hot paths"
  nfsd: fix handling of cached open files in nfsd4_open codepath

Merge tag 'xtensa-20230110' of https://github.com/jcmvbkbc/linux-xtensa

Pull xtensa fixes from Max Filippov:

- fix xtensa allmodconfig build broken by the kcsan test

- drop unused members of struct thread_struct

* tag 'xtensa-20230110' of https://github.com/jcmvbkbc/linux-xtensa:
xtensa: drop unused members of struct thread_struct
kcsan: test: don't put the expect array on the stack

iavf/iavf_main: actually log ->src mask when talking about it

This fixes a copy-paste issue where dev_err would log the dst mask even
though it is clearly talking about src.

Found by Linux Verification Center (linuxtesting.org) with the SVACE
static analysis tool.

Fixes: 0075fa0fadd0 ("i40evf: Add support to apply cloud filters")
Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

igc: Fix PPS delta between two synchronized end-points

This patch fix the pulse per second output delta between
two synchronized end-points.

Based on Intel Discrete I225 Software User Manual Section
4.2.15 TimeSync Auxiliary Control Register, ST0[Bit 4] and
ST1[Bit 7] must be set to ensure that clock output will be
toggles based on frequency value defined. This is to ensure
that output of the PPS is aligned with the clock.

How to test:

1) Running time synchronization on both end points.
Ex: ptp4l --step_threshold=1 -m -f gPTP.cfg -i <interface name>

2) Configure PPS output using below command for both end-points
Ex: SDP0 on I225 REV4 SKU variant

./testptp -d /dev/ptp0 -L 0,2
./testptp -d /dev/ptp0 -p 1000000000

3) Measure the output using analyzer for both end-points

Fixes: 87938851b6ef ("igc: enable auxiliary PHC functions for the i225")
Signed-off-by: Christopher S Hall <christopher.s.hall@intel.com>
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ixgbe: fix pci device refcount leak

As the comment of pci_get_domain_bus_and_slot() says, it
returns a PCI device with refcount incremented, when finish
using it, the caller must decrement the reference count by
calling pci_dev_put().

In ixgbe_get_first_secondary_devfn() and ixgbe_x550em_a_has_mii(),
pci_dev_put() is called to avoid leak.

Fixes: 8fa10ef01260 ("ixgbe: register a mdiobus")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

perf bpf: Avoid build breakage with libbpf < 0.8.0 + LIBBPF_DYNAMIC=1

In 746bd29e348f99b4 ("perf build: Use tools/lib headers from install
path") we stopped having the tools/lib/ directory from the kernel
sources in the header include path unconditionally, which breaks the
build on systems with older versions of libbpf-devel, in this case 0.7.0
as some of the structures and function declarations present in the newer
version of libbpf included in the kernel sources (tools/lib/bpf) are not
anymore used, just the ones in the system libbpf.

So instead of trying to provide alternative functions when the
libbpf-bpf_program__set_insns feature test fails, fail a
LIBBPF_DYNAMIC=1 build (requesting the use of the system's libbpf) and
emit this build error message:

  $ make LIBBPF_DYNAMIC=1 -C tools/perf
  Makefile.config:593: *** Error: libbpf devel library needs to be >= 0.8.0 to build with LIBBPF_DYNAMIC, update or build statically with the version that comes with the kernel sources.  Stop.
  $

For v6.3 these tests will be revamped and we'll require libbpf 1.0 as a
minimal version for using LIBBPF_DYNAMIC=1, most distros should have it
by now or at v6.3 time.

Fixes: 746bd29e348f99b4 ("perf build: Use tools/lib headers from install path")
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/CAP-5=fVa51_URGsdDFVTzpyGmdDRj_Dj2EKPuDHNQ0BYgMSzUA@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>