linux-2.6-microblaze.git
11 months agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 3 Aug 2023 21:29:50 +0000 (14:29 -0700)]
Merge git://git./linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR.

Conflicts:

net/dsa/port.c
  9945c1fb03a3 ("net: dsa: fix older DSA drivers using phylink")
  a88dd7538461 ("net: dsa: remove legacy_pre_march2020 detection")
https://lore.kernel.org/all/20230731102254.2c9868ca@canb.auug.org.au/

net/xdp/xsk.c
  3c5b4d69c358 ("net: annotate data-races around sk->sk_mark")
  b7f72a30e9ac ("xsk: introduce wrappers and helpers for supporting multi-buffer in Tx path")
https://lore.kernel.org/all/20230731102631.39988412@canb.auug.org.au/

drivers/net/ethernet/broadcom/bnxt/bnxt.c
  37b61cda9c16 ("bnxt: don't handle XDP in netpoll")
  2b56b3d99241 ("eth: bnxt: handle invalid Tx completions more gracefully")
https://lore.kernel.org/all/20230801101708.1dc7faac@canb.auug.org.au/

Adjacent changes:

drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c
  62da08331f1a ("net/mlx5e: Set proper IPsec source port in L4 selector")
  fbd517549c32 ("net/mlx5e: Add function to get IPsec offload namespace")

drivers/net/ethernet/sfc/selftest.c
  55c1528f9b97 ("sfc: fix field-spanning memcpy in selftest")
  ae9d445cd41f ("sfc: Miscellaneous comment removals")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge tag 'net-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 3 Aug 2023 21:00:02 +0000 (14:00 -0700)]
Merge tag 'net-6.5-rc5' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bpf and wireless.

  Nothing scary here. Feels like the first wave of regressions from v6.5
  is addressed - one outstanding fix still to come in TLS for the
  sendpage rework.

  Current release - regressions:

   - udp: fix __ip_append_data()'s handling of MSG_SPLICE_PAGES

   - dsa: fix older DSA drivers using phylink

  Previous releases - regressions:

   - gro: fix misuse of CB in udp socket lookup

   - mlx5: unregister devlink params in case interface is down

   - Revert "wifi: ath11k: Enable threaded NAPI"

  Previous releases - always broken:

   - sched: cls_u32: fix match key mis-addressing

   - sched: bind logic fixes for cls_fw, cls_u32 and cls_route

   - add bound checks to a number of places which hand-parse netlink

   - bpf: disable preemption in perf_event_output helpers code

   - qed: fix scheduling in a tasklet while getting stats

   - avoid using APIs which are not hardirq-safe in couple of drivers,
     when we may be in a hard IRQ (netconsole)

   - wifi: cfg80211: fix return value in scan logic, avoid page
     allocator warning

   - wifi: mt76: mt7615: do not advertise 5 GHz on first PHY of MT7615D
     (DBDC)

  Misc:

   - drop handful of inactive maintainers, put some new in place"

* tag 'net-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (98 commits)
  MAINTAINERS: update TUN/TAP maintainers
  test/vsock: remove vsock_perf executable on `make clean`
  tcp_metrics: fix data-race in tcpm_suck_dst() vs fastopen
  tcp_metrics: annotate data-races around tm->tcpm_net
  tcp_metrics: annotate data-races around tm->tcpm_vals[]
  tcp_metrics: annotate data-races around tm->tcpm_lock
  tcp_metrics: annotate data-races around tm->tcpm_stamp
  tcp_metrics: fix addr_same() helper
  prestera: fix fallback to previous version on same major version
  udp: Fix __ip_append_data()'s handling of MSG_SPLICE_PAGES
  net/mlx5e: Set proper IPsec source port in L4 selector
  net/mlx5: fs_core: Skip the FTs in the same FS_TYPE_PRIO_CHAINS fs_prio
  net/mlx5: fs_core: Make find_closest_ft more generic
  wifi: brcmfmac: Fix field-spanning write in brcmf_scan_params_v2_to_v1()
  vxlan: Fix nexthop hash size
  ip6mr: Fix skb_under_panic in ip6mr_cache_report()
  s390/qeth: Don't call dev_close/dev_open (DOWN/UP)
  net: tap_open(): set sk_uid from current_fsuid()
  net: tun_chr_open(): set sk_uid from current_fsuid()
  net: dcb: choose correct policy to parse DCB_ATTR_BCN
  ...

11 months agoMAINTAINERS: update TUN/TAP maintainers
Jakub Kicinski [Wed, 2 Aug 2023 18:28:43 +0000 (11:28 -0700)]
MAINTAINERS: update TUN/TAP maintainers

Willem and Jason have agreed to take over the maintainer
duties for TUN/TAP, thank you!

There's an existing entry for TUN/TAP which only covers
the user mode Linux implementation.
Since we haven't heard from Maxim on the list for almost
a decade, extend that entry and take it over, rather than
adding a new one.

Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20230802182843.4193099-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Jakub Kicinski [Thu, 3 Aug 2023 18:22:53 +0000 (11:22 -0700)]
Merge tag 'for-netdev' of https://git./linux/kernel/git/bpf/bpf

Martin KaFai Lau says:

====================
pull-request: bpf 2023-08-03

We've added 5 non-merge commits during the last 7 day(s) which contain
a total of 3 files changed, 37 insertions(+), 20 deletions(-).

The main changes are:

1) Disable preemption in perf_event_output helpers code,
   from Jiri Olsa

2) Add length check for SK_DIAG_BPF_STORAGE_REQ_MAP_FD parsing,
   from Lin Ma

3) Multiple warning splat fixes in cpumap from Hou Tao

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf, cpumap: Handle skb as well when clean up ptr_ring
  bpf, cpumap: Make sure kthread is running before map update returns
  bpf: Add length check for SK_DIAG_BPF_STORAGE_REQ_MAP_FD parsing
  bpf: Disable preemption in bpf_event_output
  bpf: Disable preemption in bpf_perf_event_output
====================

Link: https://lore.kernel.org/r/20230803181429.994607-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge tag 'wireless-2023-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git...
Jakub Kicinski [Thu, 3 Aug 2023 18:05:46 +0000 (11:05 -0700)]
Merge tag 'wireless-2023-08-03' of git://git./linux/kernel/git/wireless/wireless

Kalle Valo says:

====================
wireless fixes for v6.5

We did some house cleaning in MAINTAINERS file so several patches
about that. Few regressions fixed and also fix some recently enabled
memcpy() warnings. Only small commits and nothing special standing
out.

* tag 'wireless-2023-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: brcmfmac: Fix field-spanning write in brcmf_scan_params_v2_to_v1()
  wifi: ray_cs: Replace 1-element array with flexible array
  MAINTAINERS: add Jeff as ath10k, ath11k and ath12k maintainer
  MAINTAINERS: wifi: mark mlw8k as orphan
  MAINTAINERS: wifi: mark b43 as orphan
  MAINTAINERS: wifi: mark zd1211rw as orphan
  MAINTAINERS: wifi: mark wl3501 as orphan
  MAINTAINERS: wifi: mark rndis_wlan as orphan
  MAINTAINERS: wifi: mark ar5523 as orphan
  MAINTAINERS: wifi: mark cw1200 as orphan
  MAINTAINERS: wifi: atmel: mark as orphan
  MAINTAINERS: wifi: rtw88: change Ping as the maintainer
  Revert "wifi: ath6k: silence false positive -Wno-dangling-pointer warning on GCC 12"
  wifi: cfg80211: Fix return value in scan logic
  Revert "wifi: ath11k: Enable threaded NAPI"
  MAINTAINERS: Update mwifiex maintainer list
  wifi: mt76: mt7615: do not advertise 5 GHz on first phy of MT7615D (DBDC)
====================

Link: https://lore.kernel.org/r/20230803140058.57476C433C9@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agotest/vsock: remove vsock_perf executable on `make clean`
Stefano Garzarella [Thu, 3 Aug 2023 08:54:54 +0000 (10:54 +0200)]
test/vsock: remove vsock_perf executable on `make clean`

We forgot to add vsock_perf to the rm command in the `clean`
target, so now we have a left over after `make clean` in
tools/testing/vsock.

Fixes: 8abbffd27ced ("test/vsock: vsock_perf utility")
Cc: AVKrasnov@sberdevices.ru
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org> # build-tested
Link: https://lore.kernel.org/r/20230803085454.30897-1-sgarzare@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge branch 'tcp_metrics-series-of-fixes'
Jakub Kicinski [Thu, 3 Aug 2023 17:58:27 +0000 (10:58 -0700)]
Merge branch 'tcp_metrics-series-of-fixes'

Eric Dumazet says:

====================
tcp_metrics: series of fixes

This series contains a fix for addr_same() and various
data-race annotations.

We still have to address races over tm->tcpm_saddr and
tm->tcpm_daddr later.
====================

Link: https://lore.kernel.org/r/20230802131500.1478140-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agotcp_metrics: fix data-race in tcpm_suck_dst() vs fastopen
Eric Dumazet [Wed, 2 Aug 2023 13:15:00 +0000 (13:15 +0000)]
tcp_metrics: fix data-race in tcpm_suck_dst() vs fastopen

Whenever tcpm_new() reclaims an old entry, tcpm_suck_dst()
would overwrite data that could be read from tcp_fastopen_cache_get()
or tcp_metrics_fill_info().

We need to acquire fastopen_seqlock to maintain consistency.

For newly allocated objects, tcpm_new() can switch to kzalloc()
to avoid an extra fastopen_seqlock acquisition.

Fixes: 1fe4c481ba63 ("net-tcp: Fast Open client - cookie cache")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230802131500.1478140-7-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agotcp_metrics: annotate data-races around tm->tcpm_net
Eric Dumazet [Wed, 2 Aug 2023 13:14:59 +0000 (13:14 +0000)]
tcp_metrics: annotate data-races around tm->tcpm_net

tm->tcpm_net can be read or written locklessly.

Instead of changing write_pnet() and read_pnet() and potentially
hurt performance, add the needed READ_ONCE()/WRITE_ONCE()
in tm_net() and tcpm_new().

Fixes: 849e8a0ca8d5 ("tcp_metrics: Add a field tcpm_net and verify it matches on lookup")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230802131500.1478140-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agotcp_metrics: annotate data-races around tm->tcpm_vals[]
Eric Dumazet [Wed, 2 Aug 2023 13:14:58 +0000 (13:14 +0000)]
tcp_metrics: annotate data-races around tm->tcpm_vals[]

tm->tcpm_vals[] values can be read or written locklessly.

Add needed READ_ONCE()/WRITE_ONCE() to document this,
and force use of tcp_metric_get() and tcp_metric_set()

Fixes: 51c5d0c4b169 ("tcp: Maintain dynamic metrics in local cache.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agotcp_metrics: annotate data-races around tm->tcpm_lock
Eric Dumazet [Wed, 2 Aug 2023 13:14:57 +0000 (13:14 +0000)]
tcp_metrics: annotate data-races around tm->tcpm_lock

tm->tcpm_lock can be read or written locklessly.

Add needed READ_ONCE()/WRITE_ONCE() to document this.

Fixes: 51c5d0c4b169 ("tcp: Maintain dynamic metrics in local cache.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230802131500.1478140-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agotcp_metrics: annotate data-races around tm->tcpm_stamp
Eric Dumazet [Wed, 2 Aug 2023 13:14:56 +0000 (13:14 +0000)]
tcp_metrics: annotate data-races around tm->tcpm_stamp

tm->tcpm_stamp can be read or written locklessly.

Add needed READ_ONCE()/WRITE_ONCE() to document this.

Also constify tcpm_check_stamp() dst argument.

Fixes: 51c5d0c4b169 ("tcp: Maintain dynamic metrics in local cache.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230802131500.1478140-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agotcp_metrics: fix addr_same() helper
Eric Dumazet [Wed, 2 Aug 2023 13:14:55 +0000 (13:14 +0000)]
tcp_metrics: fix addr_same() helper

Because v4 and v6 families use separate inetpeer trees (respectively
net->ipv4.peers and net->ipv6.peers), inetpeer_addr_cmp(a, b) assumes
a & b share the same family.

tcp_metrics use a common hash table, where entries can have different
families.

We must therefore make sure to not call inetpeer_addr_cmp()
if the families do not match.

Fixes: d39d14ffa24c ("net: Add helper function to compare inetpeer addresses")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230802131500.1478140-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoprestera: fix fallback to previous version on same major version
Jonas Gorski [Wed, 2 Aug 2023 09:23:56 +0000 (11:23 +0200)]
prestera: fix fallback to previous version on same major version

When both supported and previous version have the same major version,
and the firmwares are missing, the driver ends in a loop requesting the
same (previous) version over and over again:

    [   76.327413] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.1.img firmware, fall-back to previous 4.0 version
    [   76.339802] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
    [   76.352162] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
    [   76.364502] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
    [   76.376848] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
    [   76.389183] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
    [   76.401522] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
    [   76.413860] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
    [   76.426199] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
    ...

Fix this by inverting the check to that we aren't yet at the previous
version, and also check the minor version.

This also catches the case where both versions are the same, as it was
after commit bb5dbf2cc64d ("net: marvell: prestera: add firmware v4.0
support").

With this fix applied:

    [   88.499622] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.1.img firmware, fall-back to previous 4.0 version
    [   88.511995] Prestera DX 0000:01:00.0: failed to request previous firmware: mrvl/prestera/mvsw_prestera_fw-v4.0.img
    [   88.522403] Prestera DX: probe of 0000:01:00.0 failed with error -2

Fixes: 47f26018a414 ("net: marvell: prestera: try to load previous fw version")
Signed-off-by: Jonas Gorski <jonas.gorski@bisdn.de>
Acked-by: Elad Nachman <enachman@marvell.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: Taras Chornyi <taras.chornyi@plvision.eu>
Link: https://lore.kernel.org/r/20230802092357.163944-1-jonas.gorski@bisdn.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge branch 'docs-net-page_pool-sync-dev-and-kdoc'
Jakub Kicinski [Thu, 3 Aug 2023 16:54:25 +0000 (09:54 -0700)]
Merge branch 'docs-net-page_pool-sync-dev-and-kdoc'

Jakub Kicinski says:

====================
docs: net: page_pool: sync dev and kdoc

Document PP_FLAG_DMA_SYNC_DEV based on recent conversation.
Use kdoc to document structs and functions, to avoid duplication.

Olek, this will conflict with your work, but I think that trying
to make progress in parallel is the best course of action...
Retargetting at net-next to make it a little less bad.
====================

Link: https://lore.kernel.org/r/20230802161821.3621985-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agodocs: net: page_pool: use kdoc to avoid duplicating the information
Jakub Kicinski [Wed, 2 Aug 2023 16:18:21 +0000 (09:18 -0700)]
docs: net: page_pool: use kdoc to avoid duplicating the information

All struct members of the driver-facing APIs are documented twice,
in the code and under Documentation. This is a bit tedious.

I also get the feeling that a lot of developers will read the header
when coding, rather than the doc. Bring the two a little closer
together by using kdoc for structs and functions.

Using kdoc also gives us links (mentioning a function or struct
in the text gets replaced by a link to its doc).

Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20230802161821.3621985-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agodocs: net: page_pool: document PP_FLAG_DMA_SYNC_DEV parameters
Jakub Kicinski [Wed, 2 Aug 2023 16:18:20 +0000 (09:18 -0700)]
docs: net: page_pool: document PP_FLAG_DMA_SYNC_DEV parameters

Using PP_FLAG_DMA_SYNC_DEV is a bit confusing. It was perhaps
more obvious when it was introduced but the page pool use
has grown beyond XDP and beyond packet-per-page so now
making the heads and tails out of this feature is not
trivial.

Obviously making the API more user friendly would be
a better fix, but until someone steps up to do that
let's at least document what the parameters are.

Relevant discussion in the first Link.

Link: https://lore.kernel.org/all/20230731114427.0da1f73b@kernel.org/
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Link: https://lore.kernel.org/r/20230802161821.3621985-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge tag 'nfsd-6.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Linus Torvalds [Thu, 3 Aug 2023 16:26:34 +0000 (09:26 -0700)]
Merge tag 'nfsd-6.5-3' of git://git./linux/kernel/git/cel/linux

Pull nfsd fix from Chuck Lever:

 - Fix tmpfs splice read support

* tag 'nfsd-6.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  nfsd: Fix reading via splice

11 months agoMerge tag 'erofs-for-6.5-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 3 Aug 2023 16:20:50 +0000 (09:20 -0700)]
Merge tag 'erofs-for-6.5-rc5-fixes' of git://git./linux/kernel/git/xiang/erofs

Pull erofs fixes from Gao Xiang:

 - Fix data corruption caused by insufficient decompression on
   deduplicated compressed extents

 - Drop a useless s_magic checking in erofs_kill_sb()

* tag 'erofs-for-6.5-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: drop unnecessary WARN_ON() in erofs_kill_sb()
  erofs: fix wrong primary bvec selection on deduplicated extents

11 months agoMerge tag 's390-6.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Thu, 3 Aug 2023 16:06:38 +0000 (09:06 -0700)]
Merge tag 's390-6.5-4' of git://git./linux/kernel/git/s390/linux

Pull s390 fixes from Heiko Carstens:

 - Split kernel large page mappings into 4k mappings in case debug
   pagealloc is enabled again. This got accidentally removed by commit
   bb1520d581a3 ("s390/mm: start kernel with DAT enabled")

 - Fix error handling in KVM's sthyi handling

 - Add missing include to s390's uapi ptrace.h

 - Update defconfigs

* tag 's390-6.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/ptrace: add missing linux/const.h include
  KVM: s390: fix sthyi error handling
  s390: update defconfigs
  s390/vmem: split pages when debug pagealloc is enabled

11 months agonet/mlx4: Remove many unnecessary NULL values
Ruan Jinjie [Wed, 2 Aug 2023 04:00:26 +0000 (12:00 +0800)]
net/mlx4: Remove many unnecessary NULL values

The NULL initialization of the pointers assigned by kzalloc() first is
not necessary, because if the kzalloc() failed, the pointers will be
assigned NULL, otherwise it works as usual. so remove it.

Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230802040026.2588675-1-ruanjinjie@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agoMerge branch 'selftests-openvswitch-add-flow-programming-cases'
Paolo Abeni [Thu, 3 Aug 2023 13:05:43 +0000 (15:05 +0200)]
Merge branch 'selftests-openvswitch-add-flow-programming-cases'

Aaron Conole says:

====================
selftests: openvswitch: add flow programming cases

The openvswitch selftests currently contain a few cases for managing the
datapath, which includes creating datapath instances, adding interfaces,
and doing some basic feature / upcall tests.  This is useful to validate
the control path.

Add the ability to program some of the more common flows with actions. This
can be improved overtime to include regression testing, etc.
====================

Link: https://lore.kernel.org/r/20230801212226.909249-1-aconole@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agoselftests: openvswitch: add ct-nat test case with ipv4
Aaron Conole [Tue, 1 Aug 2023 21:22:26 +0000 (17:22 -0400)]
selftests: openvswitch: add ct-nat test case with ipv4

Building on the previous work, add a very simplistic NAT case
using ipv4.  This just tests dnat transformation

Signed-off-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agoselftests: openvswitch: add basic ct test case parsing
Aaron Conole [Tue, 1 Aug 2023 21:22:25 +0000 (17:22 -0400)]
selftests: openvswitch: add basic ct test case parsing

Forwarding via ct() action is an important use case for openvswitch, but
generally would require using a full ovs-vswitchd to get working. Add a
ct action parser for basic ct test case.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agoselftests: openvswitch: add a test for ipv4 forwarding
Aaron Conole [Tue, 1 Aug 2023 21:22:24 +0000 (17:22 -0400)]
selftests: openvswitch: add a test for ipv4 forwarding

This is a simple ipv4 bidirectional connectivity test.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agoselftests: openvswitch: support key masks
Adrian Moreno [Tue, 1 Aug 2023 21:22:23 +0000 (17:22 -0400)]
selftests: openvswitch: support key masks

The default value for the mask actually depends on the value (e.g: if
the value is non-null, the default is full-mask), so change the convert
functions to accept the full, possibly masked string and let them figure
out how to parse the different values.

Also, implement size-aware int parsing.

With this patch we can now express flows such as the following:
"eth(src=0a:ca:fe:ca:fe:0a/ff:ff:00:00:ff:00)"
"eth(src=0a:ca:fe:ca:fe:0a)" -> mask = ff:ff:ff:ff:ff:ff
"ipv4(src=192.168.1.1)" -> mask = 255.255.255.255
"ipv4(src=192.168.1.1/24)"
"ipv4(src=192.168.1.1/255.255.255.0)"
"tcp(src=8080)" -> mask = 0xffff
"tcp(src=8080/0xf0f0)"

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agoselftests: openvswitch: add an initial flow programming case
Aaron Conole [Tue, 1 Aug 2023 21:22:22 +0000 (17:22 -0400)]
selftests: openvswitch: add an initial flow programming case

The openvswitch self-tests can test much of the control side of
the module (ie: what a vswitchd implementation would process),
but the actual packet forwarding cases aren't supported, making
the testing of limited value.

Add some flow parsing and an initial ARP based test case using
arping utility.  This lets us display flows, add some basic
output flows with simple matches, and test against a known good
forwarding case.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agoudp6: Fix __ip6_append_data()'s handling of MSG_SPLICE_PAGES
David Howells [Wed, 2 Aug 2023 07:36:50 +0000 (08:36 +0100)]
udp6: Fix __ip6_append_data()'s handling of MSG_SPLICE_PAGES

__ip6_append_data() can has a similar problem to __ip_append_data()[1] when
asked to splice into a partially-built UDP message that has more than the
frag-limit data and up to the MTU limit, but in the ipv6 case, it errors
out with EINVAL.  This can be triggered with something like:

        pipe(pfd);
        sfd = socket(AF_INET6, SOCK_DGRAM, 0);
        connect(sfd, ...);
        send(sfd, buffer, 8137, MSG_CONFIRM|MSG_MORE);
        write(pfd[1], buffer, 8);
        splice(pfd[0], 0, sfd, 0, 0x4ffe0ul, 0);

where the amount of data given to send() is dependent on the MTU size (in
this instance an interface with an MTU of 8192).

The problem is that the calculation of the amount to copy in
__ip6_append_data() goes negative in two places, but a check has been put
in to give an error in this case.

This happens because when pagedlen > 0 (which happens for MSG_ZEROCOPY and
MSG_SPLICE_PAGES), the terms in:

        copy = datalen - transhdrlen - fraggap - pagedlen;

then mostly cancel when pagedlen is substituted for, leaving just -fraggap.

Fix this by:

 (1) Insert a note about the dodgy calculation of 'copy'.

 (2) If MSG_SPLICE_PAGES, clear copy if it is negative from the above
     equation, so that 'offset' isn't regressed and 'length' isn't
     increased, which will mean that length and thus copy should match the
     amount left in the iterator.

 (3) When handling MSG_SPLICE_PAGES, give a warning and return -EIO if
     we're asked to splice more than is in the iterator.  It might be
     better to not give the warning or even just give a 'short' write.

 (4) If MSG_SPLICE_PAGES, override the copy<0 check.

[!] Note that this should also affect MSG_ZEROCOPY, but that will return
-EINVAL for the range of send sizes that requires the skbuff to be split.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: David Ahern <dsahern@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/000000000000881d0606004541d1@google.com/
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/1580952.1690961810@warthog.procyon.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agonet: gemini: Do not check for 0 return after calling platform_get_irq()
Ruan Jinjie [Wed, 2 Aug 2023 08:52:16 +0000 (16:52 +0800)]
net: gemini: Do not check for 0 return after calling platform_get_irq()

It is not possible for platform_get_irq() to return 0. Use the
return value from platform_get_irq().

Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Link: https://lore.kernel.org/r/20230802085216.659238-1-ruanjinjie@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agodrivers: net: xgene: Do not check for 0 return after calling platform_get_irq()
Ruan Jinjie [Wed, 2 Aug 2023 09:06:57 +0000 (17:06 +0800)]
drivers: net: xgene: Do not check for 0 return after calling platform_get_irq()

It is not possible for platform_get_irq() to return 0. Use the
return value from platform_get_irq().

Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Link: https://lore.kernel.org/r/20230802090657.969923-1-ruanjinjie@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agotipc: Remove unused function declarations
Yue Haibing [Wed, 2 Aug 2023 03:46:59 +0000 (11:46 +0800)]
tipc: Remove unused function declarations

Commit d50ccc2d3909 ("tipc: add 128-bit node identifier") declared but never
implemented tipc_node_id2hash().
Also commit 5c216e1d28c8 ("tipc: Allow run-time alteration of default link settings")
never implemented tipc_media_set_priority() and tipc_media_set_window(),
commit cad2929dc432 ("tipc: update a binding service via broadcast") only declared
tipc_named_bcast().

Since commit be07f056396d ("tipc: simplify the finalize work queue")
tipc_sched_net_finalize() is removed and declaration is unused.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230802034659.39840-1-yuehaibing@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agonet: ethernet: mtk_eth_soc: support per-flow accounting on MT7988
Daniel Golle [Wed, 2 Aug 2023 03:31:09 +0000 (04:31 +0100)]
net: ethernet: mtk_eth_soc: support per-flow accounting on MT7988

NETSYS_V3 uses 64 bits for each counters while older SoCs are using
48/40 bits for each counter.
Support reading per-flow byte and package counters on NETSYS_V3.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/37a0928fa8c1253b197884c68ce1f54239421ac5.1690946442.git.daniel@makrotopia.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agobonding: support balance-alb with openvswitch
Mateusz Kowalski [Tue, 1 Aug 2023 12:37:50 +0000 (14:37 +0200)]
bonding: support balance-alb with openvswitch

Commit d5410ac7b0ba ("net:bonding:support balance-alb interface with
vlan to bridge") introduced a support for balance-alb mode for
interfaces connected to the linux bridge by fixing missing matching of
MAC entry in FDB. In our testing we discovered that it still does not
work when the bond is connected to the OVS bridge as show in diagram
below:

eth1(mac:eth1_mac)--bond0(balance-alb,mac:eth0_mac)--eth0(mac:eth0_mac)
                         |
                       bond0.150(mac:eth0_mac)
                         |
                       ovs_bridge(ip:bridge_ip,mac:eth0_mac)

This patch fixes it by checking not only if the device is a bridge but
also if it is an openvswitch.

Signed-off-by: Mateusz Kowalski <mko@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/9fe7297c-609e-208b-c77b-3ceef6eb51a4@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agoudp: Fix __ip_append_data()'s handling of MSG_SPLICE_PAGES
David Howells [Tue, 1 Aug 2023 15:48:53 +0000 (16:48 +0100)]
udp: Fix __ip_append_data()'s handling of MSG_SPLICE_PAGES

__ip_append_data() can get into an infinite loop when asked to splice into
a partially-built UDP message that has more than the frag-limit data and up
to the MTU limit.  Something like:

        pipe(pfd);
        sfd = socket(AF_INET, SOCK_DGRAM, 0);
        connect(sfd, ...);
        send(sfd, buffer, 8161, MSG_CONFIRM|MSG_MORE);
        write(pfd[1], buffer, 8);
        splice(pfd[0], 0, sfd, 0, 0x4ffe0ul, 0);

where the amount of data given to send() is dependent on the MTU size (in
this instance an interface with an MTU of 8192).

The problem is that the calculation of the amount to copy in
__ip_append_data() goes negative in two places, and, in the second place,
this gets subtracted from the length remaining, thereby increasing it.

This happens when pagedlen > 0 (which happens for MSG_ZEROCOPY and
MSG_SPLICE_PAGES), because the terms in:

        copy = datalen - transhdrlen - fraggap - pagedlen;

then mostly cancel when pagedlen is substituted for, leaving just -fraggap.
This causes:

        length -= copy + transhdrlen;

to increase the length to more than the amount of data in msg->msg_iter,
which causes skb_splice_from_iter() to be unable to fill the request and it
returns less than 'copied' - which means that length never gets to 0 and we
never exit the loop.

Fix this by:

 (1) Insert a note about the dodgy calculation of 'copy'.

 (2) If MSG_SPLICE_PAGES, clear copy if it is negative from the above
     equation, so that 'offset' isn't regressed and 'length' isn't
     increased, which will mean that length and thus copy should match the
     amount left in the iterator.

 (3) When handling MSG_SPLICE_PAGES, give a warning and return -EIO if
     we're asked to splice more than is in the iterator.  It might be
     better to not give the warning or even just give a 'short' write.

[!] Note that this ought to also affect MSG_ZEROCOPY, but MSG_ZEROCOPY
avoids the problem by simply assuming that everything asked for got copied,
not just the amount that was in the iterator.  This is a potential bug for
the future.

Fixes: 7ac7c987850c ("udp: Convert udp_sendpage() to use MSG_SPLICE_PAGES")
Reported-by: syzbot+f527b971b4bdc8e79f9e@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/000000000000881d0606004541d1@google.com/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: David Ahern <dsahern@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/1420063.1690904933@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge branch 'introduce-ndo_hwtstamp_get-and-ndo_hwtstamp_set'
Jakub Kicinski [Thu, 3 Aug 2023 02:11:08 +0000 (19:11 -0700)]
Merge branch 'introduce-ndo_hwtstamp_get-and-ndo_hwtstamp_set'

Vladimir Oltean says:

====================
Introduce ndo_hwtstamp_get() and ndo_hwtstamp_set()

Based on previous RFCs from Maxim Georgiev:
https://lore.kernel.org/netdev/20230502043150.17097-1-glipus@gmail.com/

this series attempts to introduce new API for the hardware timestamping
control path (SIOCGHWTSTAMP and SIOCSHWTSTAMP handling).

I don't have any board with phylib hardware timestamping, so I would
appreciate testing (especially on lan966x, the most intricate
conversion). I was, however, able to test netdev level timestamping,
because I also have some more unsubmitted conversions in progress:

https://github.com/vladimiroltean/linux/commits/ndo-hwtstamp-v9

I hope that the concerns expressed in the comments of previous series
were addressed, and that Köry Maincent's series:
https://lore.kernel.org/netdev/20230406173308.401924-1-kory.maincent@bootlin.com/
can make progress in parallel with the conversion of the rest of drivers.
====================

Link: https://lore.kernel.org/r/20230801142824.1772134-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: remove phy_has_hwtstamp() -> phy_mii_ioctl() decision from converted drivers
Vladimir Oltean [Tue, 1 Aug 2023 14:28:24 +0000 (17:28 +0300)]
net: remove phy_has_hwtstamp() -> phy_mii_ioctl() decision from converted drivers

It is desirable that the new .ndo_hwtstamp_set() API gives more
uniformity, less overhead and future flexibility w.r.t. the PHY
timestamping behavior.

Currently there are some drivers which allow PHY timestamping through
the procedure mentioned in Documentation/networking/timestamping.rst.
They don't do anything locally if phy_has_hwtstamp() is set, except for
lan966x which installs PTP packet traps.

Centralize that behavior in a new dev_set_hwtstamp_phylib() code
function, which calls either phy_mii_ioctl() for the phylib PHY,
or .ndo_hwtstamp_set() of the netdev, based on a single policy
(currently simplistic: phy_has_hwtstamp()).

Any driver converted to .ndo_hwtstamp_set() will automatically opt into
the centralized phylib timestamping policy. Unconverted drivers still
get to choose whether they let the PHY handle timestamping or not.

Netdev drivers with integrated PHY drivers that don't use phylib
presumably don't set dev->phydev, and those will always see
HWTSTAMP_SOURCE_NETDEV requests even when converted. The timestamping
policy will remain 100% up to them.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://lore.kernel.org/r/20230801142824.1772134-13-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: phy: provide phylib stubs for hardware timestamping operations
Vladimir Oltean [Tue, 1 Aug 2023 14:28:23 +0000 (17:28 +0300)]
net: phy: provide phylib stubs for hardware timestamping operations

net/core/dev_ioctl.c (built-in code) will want to call phy_mii_ioctl()
for hardware timestamping purposes. This is not directly possible,
because phy_mii_ioctl() is a symbol provided under CONFIG_PHYLIB.

Do something similar to what was done in DSA in commit 5a17818682cf
("net: dsa: replace NETDEV_PRE_CHANGE_HWTSTAMP notifier with a stub"),
and arrange some indirect calls to phy_mii_ioctl() through a stub
structure containing function pointers, that's provided by phylib as
built-in even when CONFIG_PHYLIB=m, and which phy_init() populates at
runtime (module insertion).

Note: maybe the ownership of the ethtool_phy_ops singleton is backwards,
and the methods exposed by that should be later merged into phylib_stubs.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20230801142824.1772134-12-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: transfer rtnl_lock() requirement from ethtool_set_ethtool_phy_ops() to caller
Vladimir Oltean [Tue, 1 Aug 2023 14:28:22 +0000 (17:28 +0300)]
net: transfer rtnl_lock() requirement from ethtool_set_ethtool_phy_ops() to caller

phy_init() and phy_exit() will have to do more stuff under rtnl_lock()
in a future change. Since rtnl_unlock() -> netdev_run_todo() does a lot
of stuff under the hood, it's a pity to lock and unlock the rtnetlink
mutex twice in a row.

Change the calling convention such that the only caller of
ethtool_set_ethtool_phy_ops(), phy_device.c, provides a context where
the rtnl_mutex is already acquired.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20230801142824.1772134-11-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: lan966x: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()
Vladimir Oltean [Tue, 1 Aug 2023 14:28:21 +0000 (17:28 +0300)]
net: lan966x: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()

The hardware timestamping through ndo_eth_ioctl() is going away.
Convert the lan966x driver to the new API before that can be removed.

After removing the timestamping logic from lan966x_port_ioctl(), the
rest is equivalent to phy_do_ioctl().

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://lore.kernel.org/r/20230801142824.1772134-10-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: sparx5: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()
Vladimir Oltean [Tue, 1 Aug 2023 14:28:20 +0000 (17:28 +0300)]
net: sparx5: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()

The hardware timestamping through ndo_eth_ioctl() is going away.
Convert the sparx5 driver to the new API before that can be removed.

After removing the timestamping logic from sparx5_port_ioctl(), the rest
is equivalent to phy_do_ioctl().

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Link: https://lore.kernel.org/r/20230801142824.1772134-9-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: fec: delete fec_ptp_disable_hwts()
Vladimir Oltean [Tue, 1 Aug 2023 14:28:19 +0000 (17:28 +0300)]
net: fec: delete fec_ptp_disable_hwts()

Commit 340746398b67 ("net: fec: fix hardware time stamping by external
devices") was overly cautious with calling fec_ptp_disable_hwts() when
cmd == SIOCSHWTSTAMP and use_fec_hwts == false, because use_fec_hwts is
based on a runtime invariant (phy_has_hwtstamp()). Thus, if use_fec_hwts
is false, then fep->hwts_tx_en and fep->hwts_rx_en cannot be changed at
runtime; their values depend on the initial memory allocation, which
already sets them to zeroes.

If the core will ever gain support for switching timestamping layers,
it will arrange for a more organized calling convention and disable
timestamping in the previous layer as a first step. This means that the
code in the FEC driver is not necessary in any case.

The purpose of this change is to arrange the phy_has_hwtstamp() code in
a way in which it can be refactored away into generic logic.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Link: https://lore.kernel.org/r/20230801142824.1772134-8-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: fec: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()
Vladimir Oltean [Tue, 1 Aug 2023 14:28:18 +0000 (17:28 +0300)]
net: fec: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()

The hardware timestamping through ndo_eth_ioctl() is going away.
Convert the FEC driver to the new API before that can be removed.

After removing the timestamping logic from fec_enet_ioctl(), the rest
is equivalent to phy_do_ioctl_running().

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Link: https://lore.kernel.org/r/20230801142824.1772134-7-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: bonding: convert to ndo_hwtstamp_get() / ndo_hwtstamp_set()
Maxim Georgiev [Tue, 1 Aug 2023 14:28:17 +0000 (17:28 +0300)]
net: bonding: convert to ndo_hwtstamp_get() / ndo_hwtstamp_set()

bonding is one of the stackable net devices which pass the hardware
timestamping ops to the real device through ndo_eth_ioctl(). This
prevents converting any device driver to the new hwtimestamping API
without regressions.

Remove that limitation in bonding by using the newly introduced helpers
for timestamping through lower devices, that handle both the new and the
old driver API.

Signed-off-by: Maxim Georgiev <glipus@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20230801142824.1772134-6-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: macvlan: convert to ndo_hwtstamp_get() / ndo_hwtstamp_set()
Maxim Georgiev [Tue, 1 Aug 2023 14:28:16 +0000 (17:28 +0300)]
net: macvlan: convert to ndo_hwtstamp_get() / ndo_hwtstamp_set()

macvlan is one of the stackable net devices which pass the hardware
timestamping ops to the real device through ndo_eth_ioctl(). This
prevents converting any device driver to the new hwtimestamping API
without regressions.

Remove that limitation in macvlan by using the newly introduced helpers
for timestamping through lower devices, that handle both the new and the
old driver API.

macvlan only implements ndo_eth_ioctl() for these 2 operations, so
delete that method.

Signed-off-by: Maxim Georgiev <glipus@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20230801142824.1772134-5-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: vlan: convert to ndo_hwtstamp_get() / ndo_hwtstamp_set()
Maxim Georgiev [Tue, 1 Aug 2023 14:28:15 +0000 (17:28 +0300)]
net: vlan: convert to ndo_hwtstamp_get() / ndo_hwtstamp_set()

8021q is one of the stackable net devices which pass the hardware
timestamping ops to the real device through ndo_eth_ioctl(). This
prevents converting any device driver to the new hwtimestamping API
without regressions.

Remove that limitation in the vlan driver by using the newly introduced
helpers for timestamping through lower devices, that handle both the new
and the old driver API.

Signed-off-by: Maxim Georgiev <glipus@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20230801142824.1772134-4-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: add hwtstamping helpers for stackable net devices
Maxim Georgiev [Tue, 1 Aug 2023 14:28:14 +0000 (17:28 +0300)]
net: add hwtstamping helpers for stackable net devices

The stackable net devices with hwtstamping support (vlan, macvlan,
bonding) only pass the hwtstamping ops to the lower (real) device.

These drivers are the first that need to be converted to the new
timestamping API, because if they aren't prepared to handle that,
then no real device driver cannot be converted to the new API either.

After studying what vlan_dev_ioctl(), macvlan_eth_ioctl() and
bond_eth_ioctl() have in common, here we propose two generic
implementations of ndo_hwtstamp_get() and ndo_hwtstamp_set() which
can be called by those 3 drivers, with "dev" being their lower device.

These helpers cover both cases, when the lower driver is converted to
the new API or unconverted.

We need some hacks in case of an unconverted driver, namely to stuff
some pointers in struct kernel_hwtstamp_config which shouldn't have
been there (since the new API isn't supposed to need it). These will
be removed when all drivers will have been converted to the new API.

Signed-off-by: Maxim Georgiev <glipus@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20230801142824.1772134-3-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: add NDOs for configuring hardware timestamping
Maxim Georgiev [Tue, 1 Aug 2023 14:28:13 +0000 (17:28 +0300)]
net: add NDOs for configuring hardware timestamping

Current hardware timestamping API for NICs requires implementing
.ndo_eth_ioctl() for SIOCGHWTSTAMP and SIOCSHWTSTAMP.

That API has some boilerplate such as request parameter translation
between user and kernel address spaces, handling possible translation
failures correctly, etc. Since it is the same all across the board, it
would be desirable to handle it through generic code.

Here we introduce .ndo_hwtstamp_get() and .ndo_hwtstamp_set(), which
implement that boilerplate and allow drivers to just act upon requests.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Maxim Georgiev <glipus@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://lore.kernel.org/r/20230801142824.1772134-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge branch 'net-extend-alloc_skb_with_frags-max-size'
Jakub Kicinski [Thu, 3 Aug 2023 01:44:59 +0000 (18:44 -0700)]
Merge branch 'net-extend-alloc_skb_with_frags-max-size'

Eric Dumazet says:

====================
net: extend alloc_skb_with_frags() max size

alloc_skb_with_frags(), while being able to use high order allocations,
limits the payload size to PAGE_SIZE * MAX_SKB_FRAGS

Reviewing Tahsin Erdogan patch [1], it was clear to me we need
to remove this limitation.

[1] https://lore.kernel.org/netdev/20230731230736.109216-1-trdgn@amazon.com/

v2: Addressed Willem feedback on 1st patch.
====================

Link: https://lore.kernel.org/r/20230801205254.400094-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: tap: change tap_alloc_skb() to allow bigger paged allocations
Eric Dumazet [Tue, 1 Aug 2023 20:52:54 +0000 (20:52 +0000)]
net: tap: change tap_alloc_skb() to allow bigger paged allocations

tap_alloc_skb() is currently calling sock_alloc_send_pskb()
forcing order-0 page allocations.

Switch to PAGE_ALLOC_COSTLY_ORDER, to increase max size by 8x.

Also add logic to increase the linear part if needed.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tahsin Erdogan <trdgn@amazon.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20230801205254.400094-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet/packet: change packet_alloc_skb() to allow bigger paged allocations
Eric Dumazet [Tue, 1 Aug 2023 20:52:53 +0000 (20:52 +0000)]
net/packet: change packet_alloc_skb() to allow bigger paged allocations

packet_alloc_skb() is currently calling sock_alloc_send_pskb()
forcing order-0 page allocations.

Switch to PAGE_ALLOC_COSTLY_ORDER, to increase max size by 8x.

Also add logic to increase the linear part if needed.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tahsin Erdogan <trdgn@amazon.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20230801205254.400094-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: tun: change tun_alloc_skb() to allow bigger paged allocations
Eric Dumazet [Tue, 1 Aug 2023 20:52:52 +0000 (20:52 +0000)]
net: tun: change tun_alloc_skb() to allow bigger paged allocations

tun_alloc_skb() is currently calling sock_alloc_send_pskb()
forcing order-0 page allocations.

Switch to PAGE_ALLOC_COSTLY_ORDER, to increase max allocation size by 8x.

Also add logic to increase the linear part if needed.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tahsin Erdogan <trdgn@amazon.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20230801205254.400094-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: allow alloc_skb_with_frags() to allocate bigger packets
Eric Dumazet [Tue, 1 Aug 2023 20:52:51 +0000 (20:52 +0000)]
net: allow alloc_skb_with_frags() to allocate bigger packets

Refactor alloc_skb_with_frags() to allow bigger packets allocations.

Instead of assuming that only order-0 allocations will be attempted,
use the caller supplied max order.

v2: try harder to use high-order pages, per Willem feedback.

Link: https://lore.kernel.org/netdev/CANn89iJQfmc_KeUr3TeXvsLQwo3ZymyoCr7Y6AnHrkWSuz0yAg@mail.gmail.com/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tahsin Erdogan <trdgn@amazon.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20230801205254.400094-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agosctp: Remove unused function declarations
Yue Haibing [Mon, 31 Jul 2023 14:10:30 +0000 (22:10 +0800)]
sctp: Remove unused function declarations

These declarations are never implemented since beginning of git history.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/20230731141030.32772-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge branch 'mlx5-ipsec-fixes'
Jakub Kicinski [Thu, 3 Aug 2023 01:38:45 +0000 (18:38 -0700)]
Merge branch 'mlx5-ipsec-fixes'

Leon Romanovsky says:

====================
mlx5 IPsec fixes

The following patches are combination of Jianbo's work on IPsec eswitch mode
together with our internal review toward addition of TCP protocol selectors
support to IPSec packet offload.

Despite not-being fix, the first patch helps us to make second one more
clear, so I'm asking to apply it anyway as part of this series.
====================

Link: https://lore.kernel.org/r/cover.1690803944.git.leonro@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet/mlx5e: Set proper IPsec source port in L4 selector
Leon Romanovsky [Mon, 31 Jul 2023 11:58:42 +0000 (14:58 +0300)]
net/mlx5e: Set proper IPsec source port in L4 selector

Fix typo in setup_fte_upper_proto_match() where destination UDP port
was used instead of source port.

Fixes: a7385187a386 ("net/mlx5e: IPsec, support upper protocol selector field offload")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/ffc024a4d192113103f392b0502688366ca88c1f.1690803944.git.leonro@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet/mlx5: fs_core: Skip the FTs in the same FS_TYPE_PRIO_CHAINS fs_prio
Jianbo Liu [Mon, 31 Jul 2023 11:58:41 +0000 (14:58 +0300)]
net/mlx5: fs_core: Skip the FTs in the same FS_TYPE_PRIO_CHAINS fs_prio

In the cited commit, new type of FS_TYPE_PRIO_CHAINS fs_prio was added
to support multiple parallel namespaces for multi-chains. And we skip
all the flow tables under the fs_node of this type unconditionally,
when searching for the next or previous flow table to connect for a
new table.

As this search function is also used for find new root table when the
old one is being deleted, it will skip the entire FS_TYPE_PRIO_CHAINS
fs_node next to the old root. However, new root table should be chosen
from it if there is any table in it. Fix it by skipping only the flow
tables in the same FS_TYPE_PRIO_CHAINS fs_node when finding the
closest FT for a fs_node.

Besides, complete the connecting from FTs of previous priority of prio
because there should be multiple prevs after this fs_prio type is
introduced. And also the next FT should be chosen from the first flow
table next to the prio in the same FS_TYPE_PRIO_CHAINS fs_prio, if
this prio is the first child.

Fixes: 328edb499f99 ("net/mlx5: Split FDB fast path prio to multiple namespaces")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/7a95754df479e722038996c97c97b062b372591f.1690803944.git.leonro@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet/mlx5: fs_core: Make find_closest_ft more generic
Jianbo Liu [Mon, 31 Jul 2023 11:58:40 +0000 (14:58 +0300)]
net/mlx5: fs_core: Make find_closest_ft more generic

As find_closest_ft_recursive is called to find the closest FT, the
first parameter of find_closest_ft can be changed from fs_prio to
fs_node. Thus this function is extended to find the closest FT for the
nodes of any type, not only prios, but also the sub namespaces.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/d3962c2b443ec8dde7a740dc742a1f052d5e256c.1690803944.git.leonro@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge branch 'mlx5-ipsec-packet-offload-support-in-eswitch-mode'
Jakub Kicinski [Thu, 3 Aug 2023 01:37:38 +0000 (18:37 -0700)]
Merge branch 'mlx5-ipsec-packet-offload-support-in-eswitch-mode'

Leon Romanovsky says:

====================
mlx5 IPsec packet offload support in eswitch mode

This series from Jianbo adds mlx5 IPsec packet offload support in eswitch
offloaded mode.

It works exactly like "regular" IPsec, nothing special, except
now users can switch to switchdev before adding IPsec rules.

 devlink dev eswitch set pci/0000:06:00.0 mode switchdev

Same configurations as here:

https://lore.kernel.org/netdev/cover.1670005543.git.leonro@nvidia.com/

Packet offload mode:
  ip xfrm state offload packet dev <if-name> dir <in|out>
  ip xfrm policy .... offload packet dev <if-name>
Crypto offload mode:
  ip xfrm state offload crypto dev <if-name> dir <in|out>
or (backward compatibility)
  ip xfrm state offload dev <if-name> dir <in|out>

v0: https://lore.kernel.org/all/cover.1689064922.git.leonro@nvidia.com
====================

Link: https://lore.kernel.org/r/cover.1690802064.git.leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet/mlx5e: Make TC and IPsec offloads mutually exclusive on a netdev
Jianbo Liu [Mon, 31 Jul 2023 11:28:24 +0000 (14:28 +0300)]
net/mlx5e: Make TC and IPsec offloads mutually exclusive on a netdev

For IPsec packet offload mode, the order of TC offload and IPsec
offload on the same netdevice is not aligned with the order in the
non-offload software. For example, for RX, the software performs TC
first and then IPsec transformation, but the implementation for
offload does that in the opposite way.

To resolve the difference for now, either IPsec offload or TC offload,
not both, is allowed for a specific interface.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/8e2e5e3b0984d785066e8663aaf97b3ba1bb873f.1690802064.git.leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet/mlx5e: Add get IPsec offload stats for uplink representor
Jianbo Liu [Mon, 31 Jul 2023 11:28:23 +0000 (14:28 +0300)]
net/mlx5e: Add get IPsec offload stats for uplink representor

As IPsec offload is supported in switchdev mode, HW stats can be can be
obtained from uplink rep.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/b43c91c452f1db9c35c10639a029aa10fd8b7895.1690802064.git.leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet/mlx5e: Modify and restore TC rules for IPSec TX rules
Jianbo Liu [Mon, 31 Jul 2023 11:28:22 +0000 (14:28 +0300)]
net/mlx5e: Modify and restore TC rules for IPSec TX rules

After IPsec policy/state TX rules are added, any TC flow rule, which
forwards packets to uplink, is modified to forward to IPsec TX tables.
As these tables are destroyed dynamically, whenever there is no
reference to them, the destinations of this kind of rules must be
restored to uplink.

There is a special case for packet encapsulation, as the
packet_reformat_id in the extended destination is used to reformat
packets, but only for the VPORT destination. To forward packet to
IPsec table and do encapsulation in one FTE, move the
packet_reformat_id to flow context, instead of using the extended
destination. As a limitation, multiple encapsulations with table
forwarding, and one together with other VPORT destinations, are not
allowed, so add a check when offloading TC rules.

TC rules are not allowed before IPsec TX rule is added, so only need
to restore TC rules after flush IPSec TX rules. As they are saved in
the vport_rep rhashtables, we walk all the rules in the rhashtables,
and find TC rules with destinations pointing to IPsec tables, and
modify them one by one. To avoid concurrent issue, this handling is
done under the protection of eswitch mode_lock.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/7bcb2c7e2ecf0e0d06b095c8dcc6a37ea7f02faf.1690802064.git.leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet/mlx5e: Make IPsec offload work together with eswitch and TC
Jianbo Liu [Mon, 31 Jul 2023 11:28:21 +0000 (14:28 +0300)]
net/mlx5e: Make IPsec offload work together with eswitch and TC

The eswitch mode is not allowed to change if there are any IPsec rules.
Besides, by using mlx5_esw_try_lock() to get eswitch mode lock, IPsec
rules are not allowed to be offloaded if there are any TC rules.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/e442b512b21a931fbdfb87d57ae428c37badd58a.1690802064.git.leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet/mlx5: Compare with old_dest param to modify rule destination
Jianbo Liu [Mon, 31 Jul 2023 11:28:20 +0000 (14:28 +0300)]
net/mlx5: Compare with old_dest param to modify rule destination

The rule destination must be compared with the old_dest passed in.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/24adc60d05d7492359ba343c6da1ebbe9fe284f6.1690802064.git.leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet/mlx5e: Support IPsec packet offload for TX in switchdev mode
Jianbo Liu [Mon, 31 Jul 2023 11:28:19 +0000 (14:28 +0300)]
net/mlx5e: Support IPsec packet offload for TX in switchdev mode

The IPsec encryption is done at the last, so add new prio for IPsec
offload in FDB, and put it just lower than the slow path prio and
higher than the per-vport prio.
Three levels are added for TX. The first one is for ip xfrm policy.
The sa table is created in the second level for ip xfrm state. The
status table is created at the last to count the number of packets
encrypted.
The rules, which forward packets to uplink, are changed to forward
them to IPsec TX tables first. These rules are restored after those
tables are destroyed, which is done immediately when there is no
reference to them, just as what does in legacy mode. The support for
slow path is added here, by refreshing uplink's channels. But, the
handling for TC fast path, which is more complicated, will be added
later. Besides, reg c4 is used instead to match reqid.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/cfd0e6ffaf0b8c55ebaa9fb0649b7c504b6b8ec6.1690802064.git.leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet/mlx5e: Refactor IPsec TX tables creation
Jianbo Liu [Mon, 31 Jul 2023 11:28:18 +0000 (14:28 +0300)]
net/mlx5e: Refactor IPsec TX tables creation

Add attribute for IPsec TX creation, pass all needed parameters in it,
so tx_create() can be used by eswitch.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/24d5ab988b0db2d39b7fde321b44ffe885d47828.1690802064.git.leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet/mlx5e: Handle IPsec offload for RX datapath in switchdev mode
Jianbo Liu [Mon, 31 Jul 2023 11:28:17 +0000 (14:28 +0300)]
net/mlx5e: Handle IPsec offload for RX datapath in switchdev mode

Reuse tun opts bits in reg c1, to pass IPsec obj id to datapath.
As this is only for RX SA and there are only 11 bits, xarray is used
to map IPsec obj id to an index, which is between 1 and 0x7ff, and
replace obj id to write to reg c1.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/43d60fbcc9cd672a97d7e2a2f7fe6a3d9e9a776d.1690802064.git.leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet/mlx5e: Support IPsec packet offload for RX in switchdev mode
Jianbo Liu [Mon, 31 Jul 2023 11:28:16 +0000 (14:28 +0300)]
net/mlx5e: Support IPsec packet offload for RX in switchdev mode

As decryption must be done first, add new prio for IPsec offload in
FDB, and put it just lower than BYPASS prio and higher than TC prio.
Three levels are added for RX. The first one is for ip xfrm policy. SA
table is created in the second level for ip xfrm state. The status
table is created in the last to check the decryption result. If
success, packets continue with the next process, or dropped otherwise.
For now, the set of reg c1 is removed for swtichdev mode, and the
datapath process will be added in the next patch.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/c91063554cf643fb50b99cf093e8a9bf11729de5.1690802064.git.leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet/mlx5e: Refactor IPsec RX tables creation and destruction
Jianbo Liu [Mon, 31 Jul 2023 11:28:15 +0000 (14:28 +0300)]
net/mlx5e: Refactor IPsec RX tables creation and destruction

Add attribute for IPsec RX creation, so rx_create() can be used by
eswitch in later patch. And move the code for TTC dest
connect/disconnect, which are needed only in NIC mode, to individual
functions.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/87478d928479b6a4eee41901204546ea05741815.1690802064.git.leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet/mlx5e: Prepare IPsec packet offload for switchdev mode
Jianbo Liu [Mon, 31 Jul 2023 11:28:14 +0000 (14:28 +0300)]
net/mlx5e: Prepare IPsec packet offload for switchdev mode

As the uplink representor is created only in switchdev mode, add a local
variable for IPsec to indicate the device is in this mode.
In this mode, IPsec ROCE is disabled, and crypto offload is kept
as it is. However, as the tables for packet offload are created in FDB,
ipsec->rx_esw and ipsec->tx_esw are added.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/ee242398f3b0a18007749fe79ff6ff19445a0280.1690802064.git.leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet/mlx5e: Change the parameter of IPsec RX skb handle function
Jianbo Liu [Mon, 31 Jul 2023 11:28:13 +0000 (14:28 +0300)]
net/mlx5e: Change the parameter of IPsec RX skb handle function

Refactor the function to pass in reg B value only.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/3b3c53f64660d464893eaecc41298b1ce49c6baa.1690802064.git.leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet/mlx5e: Add function to get IPsec offload namespace
Jianbo Liu [Mon, 31 Jul 2023 11:28:12 +0000 (14:28 +0300)]
net/mlx5e: Add function to get IPsec offload namespace

Add function to get namespace in different directions. It will be
extended for switchdev mode in later patch, but no functionality change
for now.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/ac2982c34f1ed3288d4670cacfd7e1b87a8c96d9.1690802064.git.leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge tag 'soc-fixes-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Linus Torvalds [Thu, 3 Aug 2023 01:21:12 +0000 (18:21 -0700)]
Merge tag 'soc-fixes-6.5-2' of git://git./linux/kernel/git/soc/soc

Pull ARM SoC fixes from Arnd Bergmann:
 "A couple of platforms get a lone dts fix each:

   - SoCFPGA: Fix incorrect I2C property for SCL signal

   - Renesas: Fix interrupt names for MTU3 channels on RZ/G2L and
     RZ/V2L.

   - Juno/Vexpress: remove a dangling symlink

   - at91: sam9x60 SoC detection compatible strings

   - nspire: Fix arm primecell compatible string

  On the NXP i.MX platform, there multiple issues that get addressed:

   - A couple of ARM DTS fixes for i.MX6SLL usbphy and supported CPU
     frequency of sk-imx53 board

   - Add missing pull-up for imx8mn-var-som onboard PHY reset pinmux

   - A couple of imx8mm-venice fixes from Tim Harvey to diable
     disp_blk_ctrl

   - A couple of phycore-imx8mm fixes from Yashwanth Varakala to correct
     VPU label and gpio-line-names

   - Fix imx8mp-blk-ctrl driver to register HSIO PLL clock as
     bus_power_dev child, so that runtime PM can translate into the
     necessary GPC power domain action

  On the driver side, there are two fixes for tegra memory controller
  drivers addressing regressions from the merge window, a couple of
  minor correctness fixes for SCMI and SMCCC firmware, as well as a
  build fix for an lcd backlight driver"

* tag 'soc-fixes-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (22 commits)
  backlight: corgi_lcd: fix missing prototype
  memory: tegra: make icc_set_bw return zero if BWMGR not supported
  arm64: dts: renesas: rzg2l: Update overfow/underflow IRQ names for MTU3 channels
  dt-bindings: serial: atmel,at91-usart: update compatible for sam9x60
  ARM: dts: at91: sam9x60: fix the SOC detection
  ARM: dts: nspire: Fix arm primecell compatible string
  firmware: arm_scmi: Fix chan_free cleanup on SMC
  firmware: arm_scmi: Drop OF node reference in the transport channel setup
  soc: imx: imx8mp-blk-ctrl: register HSIO PLL clock as bus_power_dev child
  ARM: dts: nxp/imx: limit sk-imx53 supported frequencies
  firmware: arm_scmi: Fix signed error return values handling
  firmware: smccc: Fix use of uninitialised results structure
  arm64: dts: freescale: Fix VPU G2 clock
  arm64: dts: imx8mn-var-som: add missing pull-up for onboard PHY reset pinmux
  arm64: dts: phycore-imx8mm: Correction in gpio-line-names
  arm64: dts: phycore-imx8mm: Label typo-fix of VPU
  ARM: dts: nxp/imx6sll: fix wrong property name in usbphy node
  arm64: dts: imx8mm-venice-gw7904: disable disp_blk_ctrl
  arm64: dts: imx8mm-venice-gw7903: disable disp_blk_ctrl
  arm64: dts: arm: Remove the dangling vexpress-v2m-rs1.dtsi symlink
  ...

11 months agoMerge tag 'bitmap-6.5-rc5' of https://github.com:/norov/linux
Linus Torvalds [Thu, 3 Aug 2023 01:10:26 +0000 (18:10 -0700)]
Merge tag 'bitmap-6.5-rc5' of https://github.com:/norov/linux

Pull bitmap fixes from Yury Norov:

 - Fix for bitmap documentation

 - Fix for kernel build under certain configurations

* tag 'bitmap-6.5-rc5' of https://github.com:/norov/linux:
  lib/bitmap: workaround const_eval test build failure
  cpumask: eliminate kernel-doc warnings

11 months agopds_core: Fix documentation for pds_client_register
Brett Creeley [Tue, 1 Aug 2023 16:58:33 +0000 (09:58 -0700)]
pds_core: Fix documentation for pds_client_register

The documentation above pds_client_register states that it returns 0 on
success and negative on error. However, it actually returns a positive
client ID on success and negative on error. Fix the documentation to
state exactly that.

Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Link: https://lore.kernel.org/r/20230801165833.1622-1-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: switchdev: Remove unused typedef switchdev_obj_dump_cb_t()
Yue Haibing [Tue, 1 Aug 2023 14:42:09 +0000 (22:42 +0800)]
net: switchdev: Remove unused typedef switchdev_obj_dump_cb_t()

Commit 29ab586c3d83 ("net: switchdev: Remove bridge bypass support from switchdev")
leave this unused.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230801144209.27512-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonetlabel: Remove unused declaration netlbl_cipsov4_doi_free()
Yue Haibing [Tue, 1 Aug 2023 14:34:53 +0000 (22:34 +0800)]
netlabel: Remove unused declaration netlbl_cipsov4_doi_free()

Since commit b1edeb102397 ("netlabel: Replace protocol/NetLabel linking with refrerence counts")
this declaration is unused and can be removed.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20230801143453.24452-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoila: Remove unnecessary file net/ila.h
Yue Haibing [Tue, 1 Aug 2023 14:31:29 +0000 (22:31 +0800)]
ila: Remove unnecessary file net/ila.h

Commit 642c2c95585d ("ila: xlat changes") removed ila_xlat_outgoing()
and ila_xlat_incoming() functions, then this file became unnecessary.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230801143129.40652-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoudp: Remove unused function declaration udp_bpf_get_proto()
Yue Haibing [Tue, 1 Aug 2023 13:39:02 +0000 (21:39 +0800)]
udp: Remove unused function declaration udp_bpf_get_proto()

commit 8a59f9d1e3d4 ("sock: Introduce sk->sk_prot->psock_update_sk_prot()")
left behind this.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20230801133902.3660-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agocirrus: cs89x0: fix the return value handle and remove redundant dev_warn() for platf...
Ruan Jinjie [Tue, 1 Aug 2023 13:31:21 +0000 (21:31 +0800)]
cirrus: cs89x0: fix the return value handle and remove redundant dev_warn() for platform_get_irq()

There is no possible for platform_get_irq() to return 0
and the return value of platform_get_irq() is more sensible
to show the error reason.

And there is no need to call the dev_warn() function directly to print
a custom message when handling an error from platform_get_irq() function as
it is going to display an appropriate error message in case of a failure.

Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Alex Elder <elder@linaro.org>
Link: https://lore.kernel.org/r/20230801133121.416319-1-ruanjinjie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: dsa: hellcreek: Replace bogus comment
Kurt Kanzenbach [Tue, 1 Aug 2023 13:16:47 +0000 (15:16 +0200)]
net: dsa: hellcreek: Replace bogus comment

Replace bogus comment about matching the latched timestamp to one of the
received frames. That comment is probably copied from mv88e6xxx and true for
these switches. However, the hellcreek switch is configured to insert the
timestamp directly into the PTP packets.

While here, remove the other comments regarding the list splicing and locking as
well, because it doesn't add any value.

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20230801131647.84697-1-kurt@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agobnx2x: Remove unnecessary ternary operators
Ruan Jinjie [Tue, 1 Aug 2023 11:19:28 +0000 (19:19 +0800)]
bnx2x: Remove unnecessary ternary operators

There are a little ternary operators, the true or false judgement
of which is unnecessary in C language semantics.

Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230801111928.300231-1-ruanjinjie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoocteontx2: Remove unnecessary ternary operators
Ruan Jinjie [Tue, 1 Aug 2023 11:26:38 +0000 (19:26 +0800)]
octeontx2: Remove unnecessary ternary operators

There are a little ternary operators, the true or false judgement
of which is unnecessary in C language semantics. So remove it
to clean Code.

Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Sunil Goutham <sgoutham@marvell.com>
Link: https://lore.kernel.org/r/20230801112638.317149-1-ruanjinjie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: hisilicon: fix the return value handle and remove redundant netdev_err() for...
Ruan Jinjie [Mon, 31 Jul 2023 07:38:58 +0000 (15:38 +0800)]
net: hisilicon: fix the return value handle and remove redundant netdev_err() for platform_get_irq()

There is no possible for platform_get_irq() to return 0
and the return value of platform_get_irq() is more sensible
to show the error reason.

And there is no need to call the netdev_err() function directly to print
a custom message when handling an error from platform_get_irq() function as
it is going to display an appropriate error message in case of a failure.

Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/20230731073858.3633193-1-ruanjinjie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge tag 'exfat-for-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/linkin...
Linus Torvalds [Wed, 2 Aug 2023 18:43:06 +0000 (11:43 -0700)]
Merge tag 'exfat-for-6.5-rc5' of git://git./linux/kernel/git/linkinjeon/exfat

Pull exfat fixes from Namjae Jeon:

 - Fix page allocation failure from allocation bitmap by using
   kvmalloc_array/kvfree

 - Add the check to validate if filename entries exceeds max filename
   length

 - Fix potential deadlock condition from dir_emit*()

* tag 'exfat-for-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
  exfat: release s_lock before calling dir_emit()
  exfat: check if filename entries exceeds max filename length
  exfat: use kvmalloc_array/kvfree instead of kmalloc_array/kfree

11 months agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Wed, 2 Aug 2023 18:04:27 +0000 (11:04 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Three small fixes, all in drivers"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: pm80xx: Fix error return code in pm8001_pci_probe()
  scsi: zfcp: Defer fc_rport blocking until after ADISC response
  scsi: storvsc: Limit max_sectors for virtual Fibre Channel devices

11 months agoword-at-a-time: use the same return type for has_zero regardless of endianness
ndesaulniers@google.com [Tue, 1 Aug 2023 22:22:17 +0000 (15:22 -0700)]
word-at-a-time: use the same return type for has_zero regardless of endianness

Compiling big-endian targets with Clang produces the diagnostic:

  fs/namei.c:2173:13: warning: use of bitwise '|' with boolean operands [-Wbitwise-instead-of-logical]
} while (!(has_zero(a, &adata, &constants) | has_zero(b, &bdata, &constants)));
          ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                               ||
  fs/namei.c:2173:13: note: cast one or both operands to int to silence this warning

It appears that when has_zero was introduced, two definitions were
produced with different signatures (in particular different return
types).

Looking at the usage in hash_name() in fs/namei.c, I suspect that
has_zero() is meant to be invoked twice per while loop iteration; using
logical-or would not update `bdata` when `a` did not have zeros.  So I
think it's preferred to always return an unsigned long rather than a
bool than update the while loop in hash_name() to use a logical-or
rather than bitwise-or.

[ Also changed powerpc version to do the same  - Linus ]

Link: https://github.com/ClangBuiltLinux/linux/issues/1832
Link: https://lore.kernel.org/lkml/20230801-bitwise-v1-1-799bec468dc4@google.com/
Fixes: 36126f8f2ed8 ("word-at-a-time: make the interfaces truly generic")
Debugged-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 months agonet: Remove duplicated include in mac.c
Yang Li [Tue, 1 Aug 2023 00:50:41 +0000 (08:50 +0800)]
net: Remove duplicated include in mac.c

./drivers/net/ethernet/freescale/fman/mac.c: linux/of_platform.h is included more than once.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6039
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 months agoselftests/net: report rcv_mss in tcp_mmap
Willem de Bruijn [Mon, 31 Jul 2023 18:08:09 +0000 (14:08 -0400)]
selftests/net: report rcv_mss in tcp_mmap

tcp_mmap tests TCP_ZEROCOPY_RECEIVE. If 0% of data is received using
mmap, this may be due to mss. Report rcv_mss to identify this cause.

Output of a run failed due to too small mss:

    received 32768 MB (0 % mmap'ed) in 8.40458 s, 32.7057 Gbit
      cpu usage user:0.027922 sys:8.21126, 251.44 usec per MB, 3252 c-switches, rcv_mss 1428

Output on a successful run:

    received 32768 MB (99.9507 % mmap'ed) in 4.69023 s, 58.6064 Gbit
      cpu usage user:0.029172 sys:2.56105, 79.0473 usec per MB, 57591 c-switches, rcv_mss 4096

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 months agowifi: brcmfmac: Fix field-spanning write in brcmf_scan_params_v2_to_v1()
Hans de Goede [Sat, 29 Jul 2023 14:05:00 +0000 (16:05 +0200)]
wifi: brcmfmac: Fix field-spanning write in brcmf_scan_params_v2_to_v1()

Using brcmfmac with 6.5-rc3 on a brcmfmac43241b4-sdio triggers
a backtrace caused by the following field-spanning warning:

memcpy: detected field-spanning write (size 120) of single field
  "&params_le->channel_list[0]" at
  drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c:1072 (size 2)

The driver still works after this warning. The warning was introduced by the
new field-spanning write checks which were enabled recently.

Fix this by replacing the channel_list[1] declaration at the end of
the struct with a flexible array declaration.

Most users of struct brcmf_scan_params_le calculate the size to alloc
using the size of the non flex-array part of the struct + needed extra
space, so they do not care about sizeof(struct brcmf_scan_params_le).

brcmf_notify_escan_complete() however uses the struct on the stack,
expecting there to be room for at least 1 entry in the channel-list
to store the special -1 abort channel-id.

To make this work use an anonymous union with a padding member
added + the actual channel_list flexible array.

Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230729140500.27892-1-hdegoede@redhat.com
11 months agovxlan: Fix nexthop hash size
Benjamin Poirier [Mon, 31 Jul 2023 20:02:08 +0000 (16:02 -0400)]
vxlan: Fix nexthop hash size

The nexthop code expects a 31 bit hash, such as what is returned by
fib_multipath_hash() and rt6_multipath_hash(). Passing the 32 bit hash
returned by skb_get_hash() can lead to problems related to the fact that
'int hash' is a negative number when the MSB is set.

In the case of hash threshold nexthop groups, nexthop_select_path_hthr()
will disproportionately select the first nexthop group entry. In the case
of resilient nexthop groups, nexthop_select_path_res() may do an out of
bounds access in nh_buckets[], for example:
    hash = -912054133
    num_nh_buckets = 2
    bucket_index = 65535

which leads to the following panic:

BUG: unable to handle page fault for address: ffffc900025910c8
PGD 100000067 P4D 100000067 PUD 10026b067 PMD 0
Oops: 0002 [#1] PREEMPT SMP KASAN NOPTI
CPU: 4 PID: 856 Comm: kworker/4:3 Not tainted 6.5.0-rc2+ #34
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:nexthop_select_path+0x197/0xbf0
Code: c1 e4 05 be 08 00 00 00 4c 8b 35 a4 14 7e 01 4e 8d 6c 25 00 4a 8d 7c 25 08 48 01 dd e8 c2 25 15 ff 49 8d 7d 08 e8 39 13 15 ff <4d> 89 75 08 48 89 ef e8 7d 12 15 ff 48 8b 5d 00 e8 14 55 2f 00 85
RSP: 0018:ffff88810c36f260 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 00000000002000c0 RCX: ffffffffaf02dd77
RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffffc900025910c8
RBP: ffffc900025910c0 R08: 0000000000000001 R09: fffff520004b2219
R10: ffffc900025910cf R11: 31392d2068736168 R12: 00000000002000c0
R13: ffffc900025910c0 R14: 00000000fffef608 R15: ffff88811840e900
FS:  0000000000000000(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc900025910c8 CR3: 0000000129d00000 CR4: 0000000000750ee0
PKRU: 55555554
Call Trace:
 <TASK>
 ? __die+0x23/0x70
 ? page_fault_oops+0x1ee/0x5c0
 ? __pfx_is_prefetch.constprop.0+0x10/0x10
 ? __pfx_page_fault_oops+0x10/0x10
 ? search_bpf_extables+0xfe/0x1c0
 ? fixup_exception+0x3b/0x470
 ? exc_page_fault+0xf6/0x110
 ? asm_exc_page_fault+0x26/0x30
 ? nexthop_select_path+0x197/0xbf0
 ? nexthop_select_path+0x197/0xbf0
 ? lock_is_held_type+0xe7/0x140
 vxlan_xmit+0x5b2/0x2340
 ? __lock_acquire+0x92b/0x3370
 ? __pfx_vxlan_xmit+0x10/0x10
 ? __pfx___lock_acquire+0x10/0x10
 ? __pfx_register_lock_class+0x10/0x10
 ? skb_network_protocol+0xce/0x2d0
 ? dev_hard_start_xmit+0xca/0x350
 ? __pfx_vxlan_xmit+0x10/0x10
 dev_hard_start_xmit+0xca/0x350
 __dev_queue_xmit+0x513/0x1e20
 ? __pfx___dev_queue_xmit+0x10/0x10
 ? __pfx_lock_release+0x10/0x10
 ? mark_held_locks+0x44/0x90
 ? skb_push+0x4c/0x80
 ? eth_header+0x81/0xe0
 ? __pfx_eth_header+0x10/0x10
 ? neigh_resolve_output+0x215/0x310
 ? ip6_finish_output2+0x2ba/0xc90
 ip6_finish_output2+0x2ba/0xc90
 ? lock_release+0x236/0x3e0
 ? ip6_mtu+0xbb/0x240
 ? __pfx_ip6_finish_output2+0x10/0x10
 ? find_held_lock+0x83/0xa0
 ? lock_is_held_type+0xe7/0x140
 ip6_finish_output+0x1ee/0x780
 ip6_output+0x138/0x460
 ? __pfx_ip6_output+0x10/0x10
 ? __pfx___lock_acquire+0x10/0x10
 ? __pfx_ip6_finish_output+0x10/0x10
 NF_HOOK.constprop.0+0xc0/0x420
 ? __pfx_NF_HOOK.constprop.0+0x10/0x10
 ? ndisc_send_skb+0x2c0/0x960
 ? __pfx_lock_release+0x10/0x10
 ? __local_bh_enable_ip+0x93/0x110
 ? lock_is_held_type+0xe7/0x140
 ndisc_send_skb+0x4be/0x960
 ? __pfx_ndisc_send_skb+0x10/0x10
 ? mark_held_locks+0x65/0x90
 ? find_held_lock+0x83/0xa0
 ndisc_send_ns+0xb0/0x110
 ? __pfx_ndisc_send_ns+0x10/0x10
 addrconf_dad_work+0x631/0x8e0
 ? lock_acquire+0x180/0x3f0
 ? __pfx_addrconf_dad_work+0x10/0x10
 ? mark_held_locks+0x24/0x90
 process_one_work+0x582/0x9c0
 ? __pfx_process_one_work+0x10/0x10
 ? __pfx_do_raw_spin_lock+0x10/0x10
 ? mark_held_locks+0x24/0x90
 worker_thread+0x93/0x630
 ? __kthread_parkme+0xdc/0x100
 ? __pfx_worker_thread+0x10/0x10
 kthread+0x1a5/0x1e0
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x34/0x60
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1b/0x30
RIP: 0000:0x0
Code: Unable to access opcode bytes at 0xffffffffffffffd6.
RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 </TASK>
Modules linked in:
CR2: ffffc900025910c8
---[ end trace 0000000000000000 ]---
RIP: 0010:nexthop_select_path+0x197/0xbf0
Code: c1 e4 05 be 08 00 00 00 4c 8b 35 a4 14 7e 01 4e 8d 6c 25 00 4a 8d 7c 25 08 48 01 dd e8 c2 25 15 ff 49 8d 7d 08 e8 39 13 15 ff <4d> 89 75 08 48 89 ef e8 7d 12 15 ff 48 8b 5d 00 e8 14 55 2f 00 85
RSP: 0018:ffff88810c36f260 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 00000000002000c0 RCX: ffffffffaf02dd77
RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffffc900025910c8
RBP: ffffc900025910c0 R08: 0000000000000001 R09: fffff520004b2219
R10: ffffc900025910cf R11: 31392d2068736168 R12: 00000000002000c0
R13: ffffc900025910c0 R14: 00000000fffef608 R15: ffff88811840e900
FS:  0000000000000000(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffd6 CR3: 0000000129d00000 CR4: 0000000000750ee0
PKRU: 55555554
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: 0x2ca00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

Fix this problem by ensuring the MSB of hash is 0 using a right shift - the
same approach used in fib_multipath_hash() and rt6_multipath_hash().

Fixes: 1274e1cc4226 ("vxlan: ecmp support for mac fdb entries")
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 months agoMerge branch 'icssg-driver'
David S. Miller [Wed, 2 Aug 2023 09:38:12 +0000 (10:38 +0100)]
Merge branch 'icssg-driver'

MD Danish Anwar says:

====================
Introduce ICSSG based ethernet Driver

The Programmable Real-time Unit and Industrial Communication Subsystem
Gigabit (PRU_ICSSG) is a low-latency microcontroller subsystem in the TI
SoCs. This subsystem is provided for the use cases like the implementation
of custom peripheral interfaces, offloading of tasks from the other
processor cores of the SoC, etc.

The subsystem includes many accelerators for data processing like
multiplier and multiplier-accumulator. It also has peripherals like
UART, MII/RGMII, MDIO, etc. Every ICSSG core includes two 32-bit
load/store RISC CPU cores called PRUs.

The above features allow it to be used for implementing custom firmware
based peripherals like ethernet.

This series adds the YAML documentation and the driver with basic EMAC
support for TI AM654 Silicon Rev 2 SoC with the PRU_ICSSG Sub-system.
running dual-EMAC firmware.
This currently supports basic EMAC with 1Gbps and 100Mbps link. 10M and
half-duplex modes are not yet supported because they require the support
of an IEP, which will be added later.
Advanced features like switch-dev and timestamping will be added later.

This is the v13 of the patch series [v1]. This version of the patchset
addresses comments made on v12.

There series doesn't have any dependency.

Changes from v12 to v13 :
*) Rebased the series on latest net-next.
*) Addressed Jakub's comments on ndo_xmit API. Now we will only stop queues
   based on occupancy not on dma errors.
*) Removed limiting the number of serviced packets to budget for Tx NAPI.
   Now Tx NAPI will keep servicing packets.
*) Removed netif_running() check when packet arrives.
*) Introduced prototypes of APIs in the same patch where these APIs are added.
   Dropped __maybe_unused tags as compiler only cares about prototypes
   existing, not whether actual callers are in place. Now prototypes of these
   APIs are present in the same patch where they are introduced but thes APIs
   are called later (in patch 6).

Changes from v11 to v12 :
*) Rebased the series on latest net-next.
*) Addressed Jakub's comments on ndo_xmit API.
*) Added hooks to .get_rmon_stats for the driver. Now tx / rx bucket size
   and frame counts per bucket will be fetched by ethtool_rmon_stats instead
   of ethtool -S.
*) Added __maybe_unused tags to unused config and classifier APIs in patch
   2,3 and 4. These tags are later removed in patch 6.

Changes from v10 to v11 :
*) Rebased the series on latest net-next.
*) Split the ICSSG driver introduction patch into 9 different patches as
   asked by Jakub.
*) Introduced new patch(patch 8/10) to dump Standard network interface
   staticstics via ndo_get_stats64. Now certain stats that are reported by
   ICSSG hardware and are also part of struct rtnl_link_stats64, will be
   reported by ndo_get_stats64. While other stats that are not part of the
   struct rtnl_link_stats64 will be reported by ethtool -S. These stats
   are not duplicated.

Changes from v9 to v10 :
*) Rebased the series on latest net-next.
*) Moved 'ndev prueth->emac[mac] == emac' assignment to the end of function
   prueth_netdev_init().
*) In unsupported phy_mode switch case instead of returning -EINVAL, store
   the error code in ret and 'goto free'

Changes from v8 to v9 :
*) Rebased the series on latest net-next.
*) Fixed smatch and sparse warnings as pointed by Simon.
*) Fixed leaky ndev in prueth_netdev_init() as asked by Simon.

Changes from v7 to v8 :
*) Rebased the series on 6.5-rc1.
*) Fixed few formattings.

Changes from v6 to v7 :
*) Added RB tag of Rob in patch 1 of this series.
*) Addressed Simon's comment on patch 2 of the series.
*) Rebased patchset on next-20230428 linux-next.

Changes from v5 to v6 :
*) Added RB tag of Andrew Lunn in patch 2 of this series.
*) Addressed Rob's comment on patch 1 of the series.
*) Rebased patchset on next-20230421 linux-next.

Changes from v4 to v5 :
*) Re-arranged properties section in ti,icssg-prueth.yaml file.
*) Added requirement for minimum one ethernet port.
*) Fixed some minor formatting errors as asked by Krzysztof.
*) Dropped SGMII mode from enum mii_mode as SGMII mode is not currently
   supported by the driver.
*) Added switch-case block to handle different phy modes by ICSSG driver.

Changes from v3 to v4 :
*) Addressed Krzysztof's comments and fixed dt_binding_check errors in
   patch 1/2.
*) Added interrupt-extended property in ethernet-ports properties section.
*) Fixed comments in file icssg_switch_map.h according to the Linux coding
   style in patch 2/2. Added Documentation of structures in patch 2/2.

Changes from v2 to v3 :
*) Addressed Rob and Krzysztof's comments on patch 1 of this series.
   Fixed indentation. Removed description and pinctrl section from
   ti,icssg-prueth.yaml file.
*) Addressed Krzysztof, Paolo, Randy, Andrew and Christophe's comments on
   patch 2 of this seires.
*) Fixed blanklines in Kconfig and Makefile. Changed structures to const
   as suggested by Krzysztof.
*) Fixed while loop logic in emac_tx_complete_packets() API as suggested
   by Paolo. Previously in the loop's last iteration 'budget' was 0 and
   napi_consume_skb would wrongly assume the caller is not in NAPI context
   Now, budget won't be zero in last iteration of loop.
*) Removed inline functions addr_to_da1() and addr_to_da0() as asked by
   Andrew.
*) Added dev_err_probe() instead of dev_err() as suggested by Christophe.
*) In ti,icssg-prueth.yaml file, in the patternProperties section of
   ethernet-ports, kept the port name as "port" instead of "ethernet-port"
   as all other drivers were using "port". Will change it if is compulsory
   to use "ethernet-port".
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 months agonet: ti: icssg-prueth: Add Power management support
MD Danish Anwar [Tue, 1 Aug 2023 09:14:28 +0000 (14:44 +0530)]
net: ti: icssg-prueth: Add Power management support

Add suspend / resume APIs to support power management in ICSSG ethernet
driver.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 months agonet: ti: icssg-prueth: Add ethtool ops for ICSSG Ethernet driver
MD Danish Anwar [Tue, 1 Aug 2023 09:14:27 +0000 (14:44 +0530)]
net: ti: icssg-prueth: Add ethtool ops for ICSSG Ethernet driver

Add icssg_ethtool.c file. This file will be used for dumping statistics
via ethtool for ICSSG ethernet driver.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 months agonet: ti: icssg-prueth: Add Standard network staticstics
MD Danish Anwar [Tue, 1 Aug 2023 09:14:26 +0000 (14:44 +0530)]
net: ti: icssg-prueth: Add Standard network staticstics

Implement .ndo_get_stats64 to dump standard network interface
statistics for ICSSG ethernet driver.

Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 months agonet: ti: icssg-prueth: Add ICSSG Stats
MD Danish Anwar [Tue, 1 Aug 2023 09:14:25 +0000 (14:44 +0530)]
net: ti: icssg-prueth: Add ICSSG Stats

Add icssg_stats.c to help dump, icssg related driver statistics.

ICSSG has hardware registers for providing statistics like total rx bytes,
total tx bytes, etc. These registers are of 32 bits and hence in case of 1G
link, they overflows in around 32 seconds. The behaviour of these registers
is such that they don't roll back to 0 after overflow but rather stay at
UINT_MAX.

These registers support a feature where the value written to them is
subtracted from the register. This feature can be utilized to fix the
overflowing of stats.

This solution uses a Workqueues based solution where a function gets
called before the registers overflow (every 25 seconds in 1G link, 25000
seconds in 100M link), this function saves the register
values in local variables and writes the last read value to the
register. So any update during the read will be taken care of.

Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 months agonet: ti: icssg-prueth: Add ICSSG ethernet driver
Roger Quadros [Tue, 1 Aug 2023 09:14:24 +0000 (14:44 +0530)]
net: ti: icssg-prueth: Add ICSSG ethernet driver

This is the Ethernet driver for TI AM654 Silicon rev. 2
with the ICSSG PRU Sub-system running dual-EMAC firmware.

The Programmable Real-time Unit and Industrial Communication Subsystem
Gigabit (PRU_ICSSG) is a low-latency microcontroller subsystem in the TI
SoCs. This subsystem is provided for the use cases like implementation of
custom peripheral interfaces, offloading of tasks from the other
processor cores of the SoC, etc.

Every ICSSG core has two Programmable Real-Time Unit(PRUs),
two auxiliary Real-Time Transfer Unit (RT_PRUs), and
two Transmit Real-Time Transfer Units (TX_PRUs). Each one of these runs
its own firmware. Every ICSSG core has two MII ports connect to these
PRUs and also a MDIO port.

The cores can run different firmwares to support different protocols and
features like switch-dev, timestamping, etc.

It uses System DMA to transfer and receive packets and
shared memory register emulation between the firmware and
driver for control and configuration.

This patch adds support for basic EMAC functionality with 1Gbps
and 100Mbps link speed. 10M and half duplex mode are not supported
currently as they require IEP, the support for which will be added later.
Support for switch-dev, timestamp, etc. will be added later
by subsequent patch series.

Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 months agodt-bindings: net: Add ICSSG Ethernet
MD Danish Anwar [Tue, 1 Aug 2023 09:14:23 +0000 (14:44 +0530)]
dt-bindings: net: Add ICSSG Ethernet

Add a YAML binding document for the ICSSG Programmable real time unit
based Ethernet hardware. The ICSSG driver uses the PRU and PRUSS consumer
APIs to interface the PRUs and load/run the firmware for supporting
ethernet functionality.

Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 months agonet: ti: icssg-prueth: Add icssg queues APIs and macros
MD Danish Anwar [Tue, 1 Aug 2023 09:14:22 +0000 (14:44 +0530)]
net: ti: icssg-prueth: Add icssg queues APIs and macros

Add icssg_queue.c file. This file introduces macros and APIs related to
ICSSG queues. These will be used by ICSSG Ethernet driver.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 months agonet: ti: icssg-prueth: Add Firmware config and classification APIs.
MD Danish Anwar [Tue, 1 Aug 2023 09:14:21 +0000 (14:44 +0530)]
net: ti: icssg-prueth: Add Firmware config and classification APIs.

Add icssg_config.h / .c and icssg_classifier.c files. These are firmware
configuration and classification related files. These will be used by
ICSSG ethernet driver.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 months agonet: ti: icssg-prueth: Add mii helper apis and macros
MD Danish Anwar [Tue, 1 Aug 2023 09:14:20 +0000 (14:44 +0530)]
net: ti: icssg-prueth: Add mii helper apis and macros

Add MII helper APIs and MACROs. These APIs and MACROs will be later used
by ICSSG Ethernet driver. Also introduce icssg_prueth.h which has
definition of prueth related structures.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>