git.monstr.eu Git - linux-2.6-microblaze.git/log

Merge branch 'bpf-xdp-bcast'

Hangbin Liu says:

====================
This patchset is a new implementation for XDP multicast support based
on my previous 2 maps implementation[1]. The reason is that Daniel thinks
the exclude map implementation is missing proper bond support in XDP
context. And there is a plan to add native XDP bonding support. Adding
a exclude map in the helper also increases the complexity of verifier and
has drawbacks on performance.

The new implementation just add two new flags BPF_F_BROADCAST and
BPF_F_EXCLUDE_INGRESS to extend xdp_redirect_map for broadcast support.

With BPF_F_BROADCAST the packet will be broadcasted to all the interfaces
in the map. with BPF_F_EXCLUDE_INGRESS the ingress interface will be
excluded when do broadcasting.

The patchv11 link is here [2].

  [1] https://lore.kernel.org/bpf/20210223125809.1376577-1-liuhangbin@gmail.com
  [2] https://lore.kernel.org/bpf/20210513070447.1878448-1-liuhangbin@gmail.com

v12: As Daniel pointed out:
  a) defined as const u64 for flag_mask and action_mask in
     __bpf_xdp_redirect_map()
  b) remove BPF_F_ACTION_MASK in uapi header
  c) remove EXPORT_SYMBOL_GPL for xdpf_clone()

v11:
  a) Use unlikely() when checking if this is for broadcast redirecting.
  b) Fix a tracepoint NULL pointer issue Jesper found
  c) Remove BPF_F_REDIR_MASK and just use OR flags to make the reader more
     clear about what's flags we are using
  d) Add the performace number with multi veth interfaces in patch 01
     description.
  e) remove some sleeps to reduce the testing time in patch04. Re-struct the
     test and make clear what flags we are testing.

v10: use READ/WRITE_ONCE when read/write map instead of xchg()
v9: Update patch 01 commit description
v8: use hlist_for_each_entry_rcu() when looping the devmap hash ojbs
v7: No need to free xdpf in dev_map_enqueue_clone() if xdpf_clone failed.
v6: Fix a skb leak in the error path for generic XDP
v5: Just walk the map directly to get interfaces as get_next_key() of devmap
    hash may restart looping from the first key if the device get removed.
    After update the performace has improved 10% compired with v4.
v4: Fix flags never cleared issue in patch 02. Update selftest to cover this.
v3: Rebase the code based on latest bpf-next
v2: fix flag renaming issue in patch 02
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

selftests/bpf: Add xdp_redirect_multi test

Add a bpf selftest for new helper xdp_redirect_map_multi(). In this
test there are 3 forward groups and 1 exclude group. The test will
redirect each interface's packets to all the interfaces in the forward
group, and exclude the interface in exclude map.

Two maps (DEVMAP, DEVMAP_HASH) and two xdp modes (generic, drive) will
be tested. XDP egress program will also be tested by setting pkt src MAC
to egress interface's MAC address.

For more test details, you can find it in the test script. Here is
the test result.
]# time ./test_xdp_redirect_multi.sh
Pass: xdpgeneric arp(F_BROADCAST) ns1-1
Pass: xdpgeneric arp(F_BROADCAST) ns1-2
Pass: xdpgeneric arp(F_BROADCAST) ns1-3
Pass: xdpgeneric IPv4 (F_BROADCAST|F_EXCLUDE_INGRESS) ns1-1
Pass: xdpgeneric IPv4 (F_BROADCAST|F_EXCLUDE_INGRESS) ns1-2
Pass: xdpgeneric IPv4 (F_BROADCAST|F_EXCLUDE_INGRESS) ns1-3
Pass: xdpgeneric IPv6 (no flags) ns1-1
Pass: xdpgeneric IPv6 (no flags) ns1-2
Pass: xdpdrv arp(F_BROADCAST) ns1-1
Pass: xdpdrv arp(F_BROADCAST) ns1-2
Pass: xdpdrv arp(F_BROADCAST) ns1-3
Pass: xdpdrv IPv4 (F_BROADCAST|F_EXCLUDE_INGRESS) ns1-1
Pass: xdpdrv IPv4 (F_BROADCAST|F_EXCLUDE_INGRESS) ns1-2
Pass: xdpdrv IPv4 (F_BROADCAST|F_EXCLUDE_INGRESS) ns1-3
Pass: xdpdrv IPv6 (no flags) ns1-1
Pass: xdpdrv IPv6 (no flags) ns1-2
Pass: xdpegress mac ns1-2
Pass: xdpegress mac ns1-3
Summary: PASS 18, FAIL 0

real    1m18.321s
user    0m0.123s
sys     0m0.350s

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210519090747.1655268-5-liuhangbin@gmail.com

sample/bpf: Add xdp_redirect_map_multi for redirect_map broadcast test

This is a sample for xdp redirect broadcast. In the sample we could forward
all packets between given interfaces. There is also an option -X that could
enable 2nd xdp_prog on egress interface.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210519090747.1655268-4-liuhangbin@gmail.com

xdp: Extend xdp_redirect_map with broadcast support

This patch adds two flags BPF_F_BROADCAST and BPF_F_EXCLUDE_INGRESS to
extend xdp_redirect_map for broadcast support.

With BPF_F_BROADCAST the packet will be broadcasted to all the interfaces
in the map. with BPF_F_EXCLUDE_INGRESS the ingress interface will be
excluded when do broadcasting.

When getting the devices in dev hash map via dev_map_hash_get_next_key(),
there is a possibility that we fall back to the first key when a device
was removed. This will duplicate packets on some interfaces. So just walk
the whole buckets to avoid this issue. For dev array map, we also walk the
whole map to find valid interfaces.

Function bpf_clear_redirect_map() was removed in
commit ee75aef23afe ("bpf, xdp: Restructure redirect actions").
Add it back as we need to use ri->map again.

With test topology:
  +-------------------+             +-------------------+
  | Host A (i40e 10G) |  ---------- | eno1(i40e 10G)    |
  +-------------------+             |                   |
                                    |   Host B          |
  +-------------------+             |                   |
  | Host C (i40e 10G) |  ---------- | eno2(i40e 10G)    |
  +-------------------+             |                   |
                                    |          +------+ |
                                    | veth0 -- | Peer | |
                                    | veth1 -- |      | |
                                    | veth2 -- |  NS  | |
                                    |          +------+ |
                                    +-------------------+

On Host A:
# pktgen/pktgen_sample03_burst_single_flow.sh -i eno1 -d $dst_ip -m $dst_mac -s 64

On Host B(Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz, 128G Memory):
Use xdp_redirect_map and xdp_redirect_map_multi in samples/bpf for testing.
All the veth peers in the NS have a XDP_DROP program loaded. The
forward_map max_entries in xdp_redirect_map_multi is modify to 4.

Testing the performance impact on the regular xdp_redirect path with and
without patch (to check impact of additional check for broadcast mode):

5.12 rc4         | redirect_map        i40e->i40e      |    2.0M |  9.7M
5.12 rc4         | redirect_map        i40e->veth      |    1.7M | 11.8M
5.12 rc4 + patch | redirect_map        i40e->i40e      |    2.0M |  9.6M
5.12 rc4 + patch | redirect_map        i40e->veth      |    1.7M | 11.7M

Testing the performance when cloning packets with the redirect_map_multi
test, using a redirect map size of 4, filled with 1-3 devices:

5.12 rc4 + patch | redirect_map multi  i40e->veth (x1) |    1.7M | 11.4M
5.12 rc4 + patch | redirect_map multi  i40e->veth (x2) |    1.1M |  4.3M
5.12 rc4 + patch | redirect_map multi  i40e->veth (x3) |    0.8M |  2.6M

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/20210519090747.1655268-3-liuhangbin@gmail.com

bpf: Run devmap xdp_prog on flush instead of bulk enqueue

This changes the devmap XDP program support to run the program when the
bulk queue is flushed instead of before the frame is enqueued. This has
a couple of benefits:

- It "sorts" the packets by destination devmap entry, and then runs the
  same BPF program on all the packets in sequence. This ensures that we
  keep the XDP program and destination device properties hot in I-cache.

- It makes the multicast implementation simpler because it can just
  enqueue packets using bq_enqueue() without having to deal with the
  devmap program at all.

The drawback is that if the devmap program drops the packet, the enqueue
step is redundant. However, arguably this is mostly visible in a
micro-benchmark, and with more mixed traffic the I-cache benefit should
win out. The performance impact of just this patch is as follows:

Using 2 10Gb i40e NIC, redirecting one to another, or into a veth interface,
which do XDP_DROP on veth peer. With xdp_redirect_map in sample/bpf, send
pkts via pktgen cmd:
./pktgen_sample03_burst_single_flow.sh -i eno1 -d $dst_ip -m $dst_mac -t 10 -s 64

There are about +/- 0.1M deviation for native testing, the performance
improved for the base-case, but some drop back with xdp devmap prog attached.

Version          | Test                           | Generic | Native | Native + 2nd xdp_prog
5.12 rc4         | xdp_redirect_map   i40e->i40e  |    1.9M |   9.6M |  8.4M
5.12 rc4         | xdp_redirect_map   i40e->veth  |    1.7M |  11.7M |  9.8M
5.12 rc4 + patch | xdp_redirect_map   i40e->i40e  |    1.9M |   9.8M |  8.0M
5.12 rc4 + patch | xdp_redirect_map   i40e->veth  |    1.7M |  12.0M |  9.4M

When bq_xmit_all() is called from bq_enqueue(), another packet will
always be enqueued immediately after, so clearing dev_rx, xdp_prog and
flush_node in bq_xmit_all() is redundant. Move the clear to __dev_flush(),
and only check them once in bq_enqueue() since they are all modified
together.

This change also has the side effect of extending the lifetime of the
RCU-protected xdp_prog that lives inside the devmap entries: Instead of
just living for the duration of the XDP program invocation, the
reference now lives all the way until the bq is flushed. This is safe
because the bq flush happens at the end of the NAPI poll loop, so
everything happens between a local_bh_disable()/local_bh_enable() pair.
However, this is by no means obvious from looking at the call sites; in
particular, some drivers have an additional rcu_read_lock() around only
the XDP program invocation, which only confuses matters further.
Cleaning this up will be done in a separate patch series.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210519090747.1655268-2-liuhangbin@gmail.com

Merge branch 'libbpf: error reporting changes for v1.0'

Andrii Nakryiko says:

====================

Implement error reporting changes discussed in "Libbpf: the road to v1.0"
([0]) document.

Libbpf gets a new API, libbpf_set_strict_mode() which accepts a set of flags
that turn on a set of libbpf 1.0 changes, that might be potentially breaking.
It's possible to opt-in into all current and future 1.0 features by specifying
LIBBPF_STRICT_ALL flag.

When some of the 1.0 "features" are requested, libbpf APIs might behave
differently. In this patch set a first set of changes are implemented, all
related to the way libbpf returns errors. See individual patches for details.

Patch #1 adds a no-op libbpf_set_strict_mode() functionality to enable
updating selftests.

Patch #2 gets rid of all the bad code patterns that will break in libbpf 1.0
(exact -1 comparison for low-level APIs, direct IS_ERR() macro usage to check
pointer-returning APIs for error, etc). These changes make selftest work in
both legacy and 1.0 libbpf modes. Selftests also opt-in into 100% libbpf 1.0
mode to automatically gain all the subsequent changes, which will come in
follow up patches.

Patch #3 streamlines error reporting for low-level APIs wrapping bpf() syscall.

Patch #4 streamlines errors for all the rest APIs.

Patch #5 ensures that BPF skeletons propagate errors properly as well, as
currently on error some APIs will return NULL with no way of checking exact
error code.

  [0] https://docs.google.com/document/d/1UyjTZuPFWiPFyKk1tV5an11_iaRuec6U-ZESZ54nNTY

v1->v2:
  - move libbpf_set_strict_mode() implementation to patch #1, where it belongs
    (Alexei);
  - add acks, slight rewording of commit messages.
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpftool: Set errno on skeleton failures and propagate errors

Follow libbpf's error handling conventions and pass through errors and errno
properly. Skeleton code always returned NULL on errors (not ERR_PTR(err)), so
there are no backwards compatibility concerns. But now we also set errno
properly, so it's possible to distinguish different reasons for failure, if
necessary.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210525035935.1461796-6-andrii@kernel.org

libbpf: Streamline error reporting for high-level APIs

Implement changes to error reporting for high-level libbpf APIs to make them
less surprising and less error-prone to users:
  - in all the cases when error happens, errno is set to an appropriate error
    value;
  - in libbpf 1.0 mode, all pointer-returning APIs return NULL on error and
    error code is communicated through errno; this applies both to APIs that
    already returned NULL before (so now they communicate more detailed error
    codes), as well as for many APIs that used ERR_PTR() macro and encoded
    error numbers as fake pointers.
  - in legacy (default) mode, those APIs that were returning ERR_PTR(err),
    continue doing so, but still set errno.

With these changes, errno can be always used to extract actual error,
regardless of legacy or libbpf 1.0 modes. This is utilized internally in
libbpf in places where libbpf uses it's own high-level APIs.
libbpf_get_error() is adapted to handle both cases completely transparently to
end-users (and is used by libbpf consistently as well).

More context, justification, and discussion can be found in "Libbpf: the road
to v1.0" document ([0]).

  [0] https://docs.google.com/document/d/1UyjTZuPFWiPFyKk1tV5an11_iaRuec6U-ZESZ54nNTY

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210525035935.1461796-5-andrii@kernel.org

libbpf: Streamline error reporting for low-level APIs

Ensure that low-level APIs behave uniformly across the libbpf as follows:
  - in case of an error, errno is always set to the correct error code;
  - when libbpf 1.0 mode is enabled with LIBBPF_STRICT_DIRECT_ERRS option to
    libbpf_set_strict_mode(), return -Exxx error value directly, instead of -1;
  - by default, until libbpf 1.0 is released, keep returning -1 directly.

More context, justification, and discussion can be found in "Libbpf: the road
to v1.0" document ([0]).

  [0] https://docs.google.com/document/d/1UyjTZuPFWiPFyKk1tV5an11_iaRuec6U-ZESZ54nNTY

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210525035935.1461796-4-andrii@kernel.org

selftests/bpf: Turn on libbpf 1.0 mode and fix all IS_ERR checks

Turn ony libbpf 1.0 mode. Fix all the explicit IS_ERR checks that now will be
broken because libbpf returns NULL on error (and sets errno). Fix
ASSERT_OK_PTR and ASSERT_ERR_PTR to work for both old mode and new modes and
use them throughout selftests. This is trivial to do by using
libbpf_get_error() API that all libbpf users are supposed to use, instead of
IS_ERR checks.

A bunch of checks also did explicit -1 comparison for various fd-returning
APIs. Such checks are replaced with >= 0 or < 0 cases.

There were also few misuses of bpf_object__find_map_by_name() in test_maps.
Those are fixed in this patch as well.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210525035935.1461796-3-andrii@kernel.org

libbpf: Add libbpf_set_strict_mode() API to turn on libbpf 1.0 behaviors

Add libbpf_set_strict_mode() API that allows application to simulate libbpf
1.0 breaking changes before libbpf 1.0 is released. This will help users
migrate gradually and with confidence.

For now only ALL or NONE options are available, subsequent patches will add
more flags. This patch is preliminary for selftests/bpf changes.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210525035935.1461796-2-andrii@kernel.org

xsk: Use kvcalloc to support large umems

Use kvcalloc() instead of kcalloc() to support large umems with, on my
server, one million pages or more in the umem.

Reported-by: Dan Siemon <dan@coverfire.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210521083301.26921-1-magnus.karlsson@gmail.com

bpf: Fix spelling mistakes

Fix some spelling mistakes in comments:
aother ==> another
Netiher ==> Neither
desribe ==> describe
intializing ==> initializing
funciton ==> function
wont ==> won't and move the word 'the' at the end to the next line
accross ==> across
pathes ==> paths
triggerred ==> triggered
excute ==> execute
ether ==> either
conervative ==> conservative
convetion ==> convention
markes ==> marks
interpeter ==> interpreter

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210525025659.8898-2-thunder.leizhen@huawei.com

samples: bpf: Ix kernel-doc syntax in file header

The opening comment mark '/**' is used for highlighting the beginning of
kernel-doc comments.
The header for samples/bpf/ibumad_kern.c follows this syntax, but
the content inside does not comply with kernel-doc.

This line was probably not meant for kernel-doc parsing, but is parsed
due to the presence of kernel-doc like comment syntax(i.e, '/**'), which
causes unexpected warnings from kernel-doc:
warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* ibumad BPF sample kernel side

Provide a simple fix by replacing this occurrence with general comment
format, i.e. '/*', to prevent kernel-doc from parsing it.

Signed-off-by: Aditya Srivastava <yashsri421@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/bpf/20210523151408.22280-1-yashsri421@gmail.com

libbpf: Add support for new llvm bpf relocations

LLVM patch https://reviews.llvm.org/D102712
narrowed the scope of existing R_BPF_64_64
and R_BPF_64_32 relocations, and added three
new relocations, R_BPF_64_ABS64, R_BPF_64_ABS32
and R_BPF_64_NODYLD32. The main motivation is
to make relocations linker friendly.

This change, unfortunately, breaks libbpf build,
and we will see errors like below:
  libbpf: ELF relo #0 in section #6 has unexpected type 2 in
     /home/yhs/work/bpf-next/tools/testing/selftests/bpf/bpf_tcp_nogpl.o
  Error: failed to link
     '/home/yhs/work/bpf-next/tools/testing/selftests/bpf/bpf_tcp_nogpl.o':
     Unknown error -22 (-22)
The new relocation R_BPF_64_ABS64 is generated
and libbpf linker sanity check doesn't understand it.
Relocation section '.rel.struct_ops' at offset 0x1410 contains 1 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name
0000000000000018  0000000700000002 R_BPF_64_ABS64         0000000000000000 nogpltcp_init

Look at the selftests/bpf/bpf_tcp_nogpl.c,
  void BPF_STRUCT_OPS(nogpltcp_init, struct sock *sk)
  {
  }

  SEC(".struct_ops")
  struct tcp_congestion_ops bpf_nogpltcp = {
          .init           = (void *)nogpltcp_init,
          .name           = "bpf_nogpltcp",
  };
The new llvm relocation scheme categorizes 'nogpltcp_init' reference
as R_BPF_64_ABS64 instead of R_BPF_64_64 which is used to specify
ld_imm64 relocation in the new scheme.

Let us fix the linker sanity checking by including
R_BPF_64_ABS64 and R_BPF_64_ABS32. There is no need to
check R_BPF_64_NODYLD32 which is used for .BTF and .BTF.ext.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210522162341.3687617-1-yhs@fb.com

Merge branch 'Add lookup_and_delete_elem support to BPF hash map types'

Denis Salopek says:

====================

This patch series extends the existing bpf_map_lookup_and_delete_elem()
functionality with 4 more map types:
- BPF_MAP_TYPE_HASH,
- BPF_MAP_TYPE_PERCPU_HASH,
- BPF_MAP_TYPE_LRU_HASH and
- BPF_MAP_TYPE_LRU_PERCPU_HASH.

Patch 1 adds most of its functionality and logic as well as
documentation.

As it was previously limited to only stacks and queues which do not
support the BPF_F_LOCK flag, patch 2 enables its usage by adding a new
libbpf API bpf_map_lookup_and_delete_elem_flags() based on the existing
bpf_map_lookup_elem_flags().

Patch 3 adds selftests for lookup_and_delete_elem().

Changes in patch 1:
v7: Minor formating nits, add Acked-by.
v6: Remove unneeded flag check, minor code/format fixes.
v5: Split patch to 3 patches. Extend BPF_MAP_LOOKUP_AND_DELETE_ELEM
documentation with this changes.
v4: Fix the return value for unsupported map types.
v3: Add bpf_map_lookup_and_delete_elem_flags() and enable BPF_F_LOCK
flag, change CHECKs to ASSERT_OKs, initialize variables to 0.
v2: Add functionality for LRU/per-CPU, add test_progs tests.

Changes in patch 2:
v7: No change.
v6: Add Acked-by.
v5: Move to the newest libbpf version (0.4.0).

Changes in patch 3:
v7: Remove ASSERT_GE macro which is already added in some other commit,
change ASSERT_OK to ASSERT_OK_PTR, add Acked-by.
v6: Remove PERCPU macros, add ASSERT_GE macro to test_progs.h, remove
leftover code.
v5: Use more appropriate macros. Better check for changed value.
====================

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

selftests/bpf: Add bpf_lookup_and_delete_elem tests

Add bpf selftests and extend existing ones for a new function
bpf_lookup_and_delete_elem() for (percpu) hash and (percpu) LRU hash map
types.
In test_lru_map and test_maps we add an element, lookup_and_delete it,
then check whether it's deleted.
The newly added lookup_and_delete prog tests practically do the same
thing but additionally use a BPF program to change the value of the
element for LRU maps.

Signed-off-by: Denis Salopek <denis.salopek@sartura.hr>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/d30d3e0060c1f750e133579623cf1c60ff58f3d9.1620763117.git.denis.salopek@sartura.hr

bpf: Extend libbpf with bpf_map_lookup_and_delete_elem_flags

Add bpf_map_lookup_and_delete_elem_flags() libbpf API in order to use
the BPF_F_LOCK flag with the map_lookup_and_delete_elem() function.

Signed-off-by: Denis Salopek <denis.salopek@sartura.hr>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/15b05dafe46c7e0750d110f233977372029d1f62.1620763117.git.denis.salopek@sartura.hr

bpf: Add lookup_and_delete_elem support to hashtab

Extend the existing bpf_map_lookup_and_delete_elem() functionality to
hashtab map types, in addition to stacks and queues.
Create a new hashtab bpf_map_ops function that does lookup and deletion
of the element under the same bucket lock and add the created map_ops to
bpf.h.

Signed-off-by: Denis Salopek <denis.salopek@sartura.hr>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/4d18480a3e990ffbf14751ddef0325eed3be2966.1620763117.git.denis.salopek@sartura.hr

libbpf: Skip bpf_object__probe_loading for light skeleton

I'm getting the following error when running 'gen skeleton -L' as
regular user:

libbpf: Error in bpf_object__probe_loading():Operation not permitted(1).
Couldn't load trivial BPF program. Make sure your kernel supports BPF
(CONFIG_BPF_SYSCALL=y) and/or that RLIMIT_MEMLOCK is set to big enough
value.

Fixes: 67234743736a ("libbpf: Generate loader program out of BPF ELF file.")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210521030653.2626513-1-sdf@google.com

ethernet: ucc_geth: Use kmemdup() rather than kmalloc+memcpy

Issue identified with Coccinelle.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: r6040: Allow restarting auto-negotiation

Use phy_ethtool_nway_reset() since the driver makes use of the PHY
library.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'r6040-cleanups'

Florian Fainelli says:

====================
net: r6040: Non-functional changes

These two patches clean up the r6040 driver a little bit in preparation
for adding additional features such as dumping MAC counters and properly
dealing with DMA-API mapping.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: r6040: Use ETH_FCS_LEN

Instead of the open coded constant 4.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: r6040: Use logical or for MDIO operations

This is not a functional change, but we should be using a logical or to
assign the bits we will be writing to the MDIO read and write registers.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ehea: Use DEVICE_ATTR_*() macro

Use DEVICE_ATTR_*() helper instead of plain DEVICE_ATTR,
which makes the code a bit shorter and easier to read.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: falcon: use DEVICE_ATTR_*() macro

Use DEVICE_ATTR_*() helper instead of plain DEVICE_ATTR,
which makes the code a bit shorter and easier to read.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: use DEVICE_ATTR_*() macro

Use DEVICE_ATTR_*() helper instead of plain DEVICE_ATTR,
which makes the code a bit shorter and easier to read.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ftgmac100: add missing error return code in ftgmac100_probe()

The variables will be free on path err_phy_connect, it should
return error code, or it will cause double free when calling
ftgmac100_remove().

Fixes: bd466c3fb5a4 ("net/faraday: Support NCSI mode")
Fixes: 39bfab8844a0 ("net: ftgmac100: Add support for DT phy-handle property")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dpaa2-eth: don't print error from dpaa2_mac_connect if that's EPROBE_DEFER

When booting a board with DPAA2 interfaces defined statically via DPL
(as opposed to creating them dynamically using restool), the driver will
print an unspecific error message.

This change adds the error code to the message, and avoids printing
altogether if the error code is EPROBE_DEFER, because that is not a
cause of alarm.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'dpaa2-eth-of_node'

Ioana Ciornei says:

====================
dpaa2-eth: setup the of_node

This patch set allows DSA to work with a DPAA2 master device by setting
up the of_node to point to the corresponding MAC DTS node, so that
of_find_net_device_by_node() can find it.
As an added benefit, udev rules can now be used to create a naming
scheme based on the physical MAC.

The second patch renames the debugfs directory to use the DPNI name
instead of the netdev name, since the latter can be changed after probe
time.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

dpaa2-eth: name the debugfs directory after the DPNI object

Name the debugfs directory after the DPNI object instead of the netdev
name since this can be changed after probe by udev rules.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dpaa2-eth: setup the of_node field of the device

When the DPNI object is connected to a DPMAC, setup the of_node to point
to the DTS device node of that specific MAC. This enables other drivers,
for example the DSA subsystem, to find the net_device by its device
node.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'sja1105-stats'

Vladimir Oltean says:

====================
Ethtool statistics counters cleanup for SJA1105 DSA driver

This series removes some reported data from ethtool -S which were not
counters at all, and reorganizes the code such that counters can be read
individually and not just all at once.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: sja1105: don't use burst SPI reads for port statistics

The current internal sja1105 driver API is optimized for retrieving many
statistics counters at once. But the switch does not do atomic snapshotting
for them anyway.

In case we start reporting the hardware port counters through
ndo_get_stats64 as well, not just ethtool, it would be good to be able
to read individual port counters and not all of them.

Additionally, since Arnd Bergmann's commit ae1804de93f6 ("dsa: sja1105:
dynamically allocate stats structure"), sja1105_get_ethtool_stats
allocates memory dynamically, since struct sja1105_port_status was
deemed to consume too much stack memory. That is not ideal.
The large structure is only needed because of the burst read.
If we read statistics one by one, we can consume less memory, and
we can avoid dynamic allocation.

Additionally, latency-sensitive interfaces such as PTP operations (for
phc2sys) might suffer if the SPI mutex is being held for too long, which
happens in the case of SPI burst reads. By reading counters one by one,
we give a chance for higher priority processes to preempt and take the
SPI bus mutex for accessing the PTP clock.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: sja1105: stop reporting the queue levels in ethtool port counters

The queue levels are not counters, but instead they represent the
occupancy of the MAC TX queues. Having these in ethtool port counters is
not helpful, so remove them.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: Fix return of uninitialized variable ret

In the unlikely event that rule_cnt is zero the variable ret is
not assigned a value and function hclge_dbg_dump_fd_tcam can end
up returning an unitialized value in ret. Fix this by explicitly
setting ret to zero before the for-loop.

Addresses-Coverity: ("Uninitialized scalar variable")
Fixes: b5a0b70d77b9 ("net: hns3: refactor dump fd tcam of debugfs")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atm: Fix typo

Change 'contol' to 'control'.

Signed-off-by: zuoqilin <zuoqilin@yulong.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: Fix inconsistent indenting

Eliminate the follow smatch warning:

drivers/net/phy/phy_device.c:2886 phy_probe() warn: inconsistent
indenting.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: farch: fix compile warning in efx_farch_dimension_resources()

Fix the following kernel build warning when CONFIG_SFC_SRIOV is disabled:

  drivers/net/ethernet/sfc/farch.c: In function ‘efx_farch_dimension_resources’:
  drivers/net/ethernet/sfc/farch.c:1671:21: warning: variable ‘buftbl_min’ set but not used [-Wunused-but-set-variable]
    unsigned vi_count, buftbl_min, total_tx_channels;

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bonding: bond_alb: Fix some typos in bond_alb.c

s/becase/because/
s/reqeusts/requests/
s/funcions/functions/
s/addreses/addresses/

Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

caif_virtio: Fix some typos in caif_virtio.c

s/patckets/packets/
s/avilable/available/
s/tbe/the/

Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'wan-cleanups'

Guangbin Huang says:

====================
net: wan: clean up some code style issues

This patchset clean up some code style issues.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: wan: add necessary () to macro argument

Macro argument 'card' and 'port' may be better as
'(card)' and '(port)' to avoid precedence issues.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: wan: add braces {} to all arms of the statement

Braces {} should be used on all arms of this statement.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: wan: remove redundant blank lines

This patch removes some redundant blank lines.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: wan: fix the code style issue about trailing statements

Trailing statements should be on next line.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: wan: add some required spaces

Add space required after that close brace '}'.
Add space required before the open parenthesis '('.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: wan: fix an code style issue about "foo* bar"

Fix the checkpatch error as "foo* bar" should be "foo *bar".

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'sja1105-spi'

Vladimir Oltean says:

====================
Adapt the sja1105 DSA driver to the SPI controller's transfer limits

This series changes the SPI transfer procedure in sja1105 to take into
consideration the buffer size limitations that the SPI controller driver
might have.

Changes in v2:
Remove the driver's use of cs_change and send multiple, smaller SPI
messages instead of a single large one.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: sja1105: adapt to a SPI controller with a limited max transfer size

The static config of the sja1105 switch is a long stream of bytes which
is programmed to the hardware in chunks (portions with the chip select
continuously asserted) of max 256 bytes each. Each chunk is a
spi_message composed of 2 spi_transfers: the buffer with the data and a
preceding buffer with the SPI access header.

Only that certain SPI controllers, such as the spi-sc18is602 I2C-to-SPI
bridge, cannot keep the chip select asserted for that long.
The spi_max_transfer_size() and spi_max_message_size() functions are how
the controller can impose its hardware limitations upon the SPI
peripheral driver.

For the sja1105 driver to work with these controllers, both buffers must
be smaller than the transfer limit, and their sum must be smaller than
the message limit.

Regression-tested on a switch connected to a controller with no
limitations (spi-fsl-dspi) as well as with one with caps for both
max_transfer_size and max_message_size (spi-sc18is602).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: sja1105: send multiple spi_messages instead of using cs_change

The sja1105 driver has been described by Mark Brown as "not using the
[ SPI ] API at all idiomatically" due to the use of cs_change:
https://patchwork.kernel.org/project/netdevbpf/patch/20210520135031.2969183-1-olteanv@gmail.com/

According to include/linux/spi/spi.h, the chip select is supposed to be
asserted for the entire length of a SPI message, as long as cs_change is
false for all member transfers. The cs_change flag changes the following:

(i) When a non-final SPI transfer has cs_change = true, the chip select
    should temporarily deassert and then reassert starting with the next
    transfer.
(ii) When a final SPI transfer has cs_change = true, the chip select
     should remain asserted until the following SPI message.

The sja1105 driver only uses cs_change for its first property, to form a
single SPI message whose layout can be seen below:

                                             this is an entire, single spi_message
           _______________________________________________________________________________________________
          /                                                                                               \
          +-------------+---------------+-------------+---------------+ ... +-------------+---------------+
          | hdr_xfer[0] | chunk_xfer[0] | hdr_xfer[1] | chunk_xfer[1] |     | hdr_xfer[n] | chunk_xfer[n] |
          +-------------+---------------+-------------+---------------+ ... +-------------+---------------+
cs_change      false          true           false           true                false          false

           ____________________________  _____________________________       _____________________________
CS line __/                            \/                             \ ... /                             \__

The fact of the matter is that spi_max_message_size() has an ambiguous
meaning if any non-final transfer has cs_change = true.

If the SPI master has a limitation in that it cannot keep the chip
select asserted for more than, say, 200 bytes (like the spi-sc18is602),
the normal thing for it to do is to implement .max_transfer_size and
.max_message_size, and limit both to 200: in the "worst case" where
cs_change is always false, then the controller can, indeed, not send
messages larger than 200 bytes.

But the fact that the SPI controller's max_message_size does not
necessarily mean that we cannot send messages larger than that.
Notably, if the SPI master special-cases the transfers with cs_change
and treats every chip select toggling as an entirely new transaction,
then a SPI message can easily exceed that limit. So there is a
temptation to ignore the controller's reported max_message_size when
using cs_change = true in non-final transfers.

But that can lead to false conclusions. As Mark points out, the SPI
controller might have a different kind of limitation with the max
message size, that has nothing at all to do with how long it can keep
the chip select asserted.
For example, that might be the case if the device is able to offload the
chip select changes to the hardware as part of the data stream, and it
packs the entire stream of commands+data (corresponding to a SPI
message) into a single DMA transfer that is itself limited in size.

So the only thing we can do is avoid ambiguity by not using cs_change at
all. Instead of sending a single spi_message, we now send multiple SPI
messages as follows:

                  spi_message 0                 spi_message 1                       spi_message n
           ____________________________   ___________________________        _____________________________
          /                            \ /                           \      /                             \
          +-------------+---------------+-------------+---------------+ ... +-------------+---------------+
          | hdr_xfer[0] | chunk_xfer[0] | hdr_xfer[1] | chunk_xfer[1] |     | hdr_xfer[n] | chunk_xfer[n] |
          +-------------+---------------+-------------+---------------+ ... +-------------+---------------+
cs_change      false          true           false           true                false          false

           ____________________________  _____________________________       _____________________________
CS line __/                            \/                             \ ... /                             \__

which is clearer because the max_message_size limit is now easier to
enforce. What is transmitted on the wire stays, of course, the same.

Additionally, because we send no more than 2 transfers at a time, we now
avoid dynamic memory allocation too, which might be seen as an
improvement by some.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: add driver for Motorcomm yt8511 phy

Add a driver for the Motorcomm yt8511 phy that will be used in the
production Pine64 rk3566-quartz64 development board.
It supports gigabit transfer speeds, rgmii, and 125mhz clk output.

Signed-off-by: Peter Geis <pgwipeout@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: xilinx_emaclite: Do not print real IOMEM pointer

Printing kernel pointers is discouraged because they might leak kernel
memory layout. This fixes smatch warning:

drivers/net/ethernet/xilinx/xilinx_emaclite.c:1191 xemaclite_of_probe() warn:
argument 4 to %08lX specifier is cast from pointer

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: cdc_ncm: use DEVICE_ATTR_RW macro

Use DEVICE_ATTR_RW helper instead of plain DEVICE_ATTR,
which makes the code a bit shorter and easier to read.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: usb: hso: use DEVICE_ATTR_RO macro

Use DEVICE_ATTR_RO helper instead of plain DEVICE_ATTR,
which makes the code a bit shorter and easier to read.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: atm: use DEVICE_ATTR_RO macro

Use DEVICE_ATTR_RO helper instead of plain DEVICE_ATTR,
which makes the code a bit shorter and easier to read.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: net: devlink_port_split.py: skip the test if no devlink device

When there is no devlink device, the following command will return:
  $ devlink -j dev show
  {dev:{}}

This will cause IndexError when trying to access the first element
in dev of this json dataset. Use the kselftest framework skip code
to skip this test in this case.

Example output with this change:
  # selftests: net: devlink_port_split.py
  # no devlink device was found, test skipped
  ok 7 selftests: net: devlink_port_split.py # SKIP

Link: https://bugs.launchpad.net/bugs/1928889
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ibmvnic: remove default label from to_string switch

This way the compiler warns when a new value is added to the enum but
not to the string translation like:

drivers/net/ethernet/ibm/ibmvnic.c: In function 'adapter_state_to_string':
drivers/net/ethernet/ibm/ibmvnic.c:832:2: warning: enumeration value 'VNIC_FOOBAR' not handled in switch [-Wswitch]
  switch (state) {
  ^~~~~~
drivers/net/ethernet/ibm/ibmvnic.c: In function 'reset_reason_to_string':
drivers/net/ethernet/ibm/ibmvnic.c:1935:2: warning: enumeration value 'VNIC_RESET_FOOBAR' not handled in switch [-Wswitch]
  switch (reason) {
  ^~~~~~

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Acked-by: Lijun Pan <lijunp213@gmail.com>
Link: https://lore.kernel.org/netdev/CAOhMmr701LecfuNM+EozqbiTxFvDiXjFdY2aYeKJYaXq9kqVDg@mail.gmail.com/
Signed-off-by: David S. Miller <davem@davemloft.net>

NFC: st21nfca: remove unnecessary variable and labels

assign vlue (EIO/EPROTO) to variable r, and goto exit label,
but just return r follow exit label, so we delete exit label,
and just replace with return sentence.

Signed-off-by: wengjianfeng <wengjianfeng@yulong.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bond-cleanups'

Guangbin Huang says:

====================
net: bonding: clean up some code style issues

This patchset cleans up some code style issues.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: bonding: use tabs instead of space for code indent

Code indent should use tabs where possible, so
use tabs instead of space for code indent.

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bonding: remove unnecessary braces

Braces {} are not necessary for single statement blocks,
so remove these braces {}.

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bonding: fix code indent for conditional statements

Fix incorrect code indent for conditional statements.

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bonding: add some required blank lines

Add some blank lines after declarations as required.

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '1GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
1GbE Intel Wired LAN Driver Updates 2021-05-20

This series contains updates to igc driver only.

Andre Guedes says:

This series adds AF_XDP zero-copy feature to igc driver.

The initial patches do some code refactoring, preparing the code base to
land the AF_XDP zero-copy feature, avoiding code duplications. The last
patches of the series are the ones implementing the feature.

The last patch which indeed implements AF_XDP zero-copy support was
originally way too lengthy so, for the sake of code review, I broke it
up into two patches: one adding support for the RX functionality and the
other one adding TX support.
---
v2:
Patch 8/9 - "igc: Enable RX via AF_XDP zero-copy"
* In XDP_PASS flow, copy metadata too into the skb.
* When HW timestamp is added by the NIC, after copying it into
   a local variable, update xdp_buff->data_meta so that
   metadata length when XDP program is called 0.
* In igc_xdp_enable_pool(), call xsk_pool_dma_unmap() on
   failure.

Known issues:
When an XDP application is running in Tx-Only mode with Zero-Copy
enabled, it is not expected to add the frames to the fill-queue. I have
noticed the following two issues in this scenario:
- If XDP_USE_NEED_WAKEUP flag is not set by application, igc_poll()
   will go into infinite loop because the buffer allocation resulting
   in igc_clean_rx_irq_zc() indicating that all work is not done and NAPI
   should keep polling. This does not occur if XDP_USE_NEED_WAKEUP
   flag is set.
- Since there are no buffers allocated by userspace for the fill
   queue, there is no memory allocated for the NIC to copy the data
   to. If the packet received is destined to the hardware queue where
   XDP application is running, no packets are received even on other
   queues.
Both these issues can be mitigated by adding a few frames to the
fill queue. The second issue can also be mitigated by making sure no
packets are being received on the hardware queue where Rx is running.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-leading-spaces'

Hui Tang says:

====================
net: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running the
following commard:

$ find . -name '*.[ch]' | xargs sed -r -i 's/^[ ]+\t/\t/'
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mii: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running
the following commard:

$ find . -name '*.[ch]' | xargs sed -r -i 's/^[ ]+\t/\t/'

Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ifb: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running
the following commard:

$ find . -name '*.[ch]' | xargs sed -r -i 's/^[ ]+\t/\t/'

Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: appletalk: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running
the following commard:

$ find . -name '*.[ch]' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fddi: skfp: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running
the following commard:

$ find . -name '*.[ch]' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hamradio: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running
the following commard:

$ find . -name '*.[ch]' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Jiri Slaby <jirislaby@kernel.org>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ppp: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running
the following commard:

$ find . -name '*.[ch]' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Tom Parkin <tparkin@katalix.com>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: slip: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running
the following commard:

$ find . -name '*.[ch]' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: usb: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running
the following commard:

$ find . -name '*.[ch]' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: wan: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running
the following commard:

$ find . -name '*.[ch]' | xargs sed -r -i 's/^[ ]+\t/\t/'

Cc: Xie He <xie.he.0141@gmail.com>
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'hns3-debugfs'

Huazhong Tan says:

====================
net: hns3: refactor some debugfs commands

This series refactors the debugfs command to the new
process and removes the useless debugfs file node cmd
for the HNS3 ethernet driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: remove the useless debugfs file node cmd

Currently, all debugfs commands have been reconstructed, and the
debugfs file node cmd is useless. So remove this debugfs file node.

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: refactor dump serv info of debugfs

Currently, the debugfs command for serv info is implemented by
"echo xxxx > cmd", and record the inforamtion in dmesg. It's
unnecessary and heavy. To improve it, create a single file
"serv_info" for it, and query it by command "cat serv_info",
return the result to userspace, rather than record in dmesg.

The display style is below:
$ cat service_task_info
local_clock: [ 114.203321]
delta: 784(ms)
last_service_task_processed: 4294918512(jiffies)
last_service_task_cnt: 4

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: refactor dump mac tnl status of debugfs

Currently, the debugfs command for dump mac tnl status is
implemented by "echo xxxx > cmd", and record the information
in dmesg. It's unnecessary and heavy. To improve it, create
a single file "mac_tnl_status" for it, and query it by command
"cat mac_tnl_status", return the result to userspace, rather
than record in dmesg.

The display style is below:
$ cat mac_tnl_status
Recently generated mac tnl interruption:
[0111204.175437] status = 0x30
[0154120.329912] status = 0x30

Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: refactor dump qs shaper of debugfs

Currently, user gets qset shaper parameters by implementing debugfs
command "echo dump qs shaper > cmd", this command will dump info in
dmesg. It's unnecessary and heavy.

As there is "tm_qset" file in tm directory for dump qset info, to
optimize these command, merge qset shaper parameters to tm_qset
file and use cat command to get them.

The display style is below:
$ cat tm_qset
ID    MAP_PRI  LINK_VLD  MODE  DWRR  IR_B  IR_U  IR_S  BS_B  BS_S  FLAG
0000     0        1      dwrr  100   150     7     0     5    20     0
0001     0        0        sp    0   150     7     0     5    20     0

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: refactor dump qos buf cfg of debugfs

Currently, user gets qos buffer config by implementing debugfs command
"echo dump qos buf cfg > cmd", this command will dump info in dmesg.
It's unnecessary and heavy.

To optimize it, create a single file "qos_buf_cfg" in tm directory
and use cat command to get info. It will return info to userspace,
rather than record in dmesg.

The display style is below:
$ cat qos_buf_cfg
tx_packet_buf_tc_0: 0x120
tx_packet_buf_tc_1: 0x120
tx_packet_buf_tc_2: 0x120
tx_packet_buf_tc_3: 0x120
tx_packet_buf_tc_4: 0x0
tx_packet_buf_tc_5: 0x0
tx_packet_buf_tc_6: 0x0
tx_packet_buf_tc_7: 0x0
......

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: refactor dump qos pri map of debugfs

Currently, user gets priority map by implementing debugfs command
"echo dump qos pri map > cmd", this command will dump info in dmesg.
It's unnecessary and heavy.

To optimize it, create a single file "qos_pri_map" in tm directory
and use cat command to get info. It will return info to userspace,
rather than record in dmesg.

The display style is below:
$ cat qos_pri_map
vlan_to_pri: 0
PRI    TC
0       0
1       1
2       2
3       3
4       0
5       1
6       2

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: refactor dump qos pause cfg of debugfs

Currently, user gets pause config by implementing debugfs command
"echo dump qos pause cfg > cmd", this command will dump info in dmesg.
It's unnecessary and heavy.

To optimize it, create a single file "qos_pause_cfg" in tm directory
and use cat command to get info. It will return info to userspace,
rather than record in dmesg.

The display style is below:
$ cat qos_pause_cfg
pause_trans_gap: 0x7f
pause_trans_time: 0xffff

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: refactor dump tc of debugfs

Currently, user gets tc schedule info by implementing debugfs command
"echo dump tc > cmd", this command will dump info in dmesg. It's
unnecessary and heavy.

To optimize it, create a single file "tc_sch_info" and use cat command
to get info. It will return info to userspace, rather than record in
dmesg.

The display style is below:
$ cat tc_sch_info
enabled tc number: 4
weight_offset: 14
TC    MODE  WEIGHT
0     dwrr     25
1     dwrr     25
2     dwrr     25
3     dwrr     25
4     dwrr      0
5     dwrr      0
6     dwrr      0
7     dwrr      0

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: refactor dump tm of debugfs

Currently, user gets some tm info by implementing debugfs command
"echo dump tm > cmd", this command will dump info in dmesg. It's
unnecessary and heavy.

In addition, the info of this command mixes info of qset, priority,
pg and port. Qset and priority have their own command to get info of
themself, so can remove info of qset and priority from this command.

To optimize it, create two new files "tm_pg", "tm_port" in tm directory
and use cat command to separately get info of pg and port.

The display style is below:
$ cat tm_pg
ID  PRI_MAP  MODE DWRR  C_IR_B  C_IR_U  C_IR_S  C_BS_B  C_BS_S ...
00   0x1f    dwrr  1       75       9       0      31      20  ...

$ cat tm_port
IR_B  IR_U  IR_S  BS_B  BS_S  FLAG  RATE(Mbps)
75     9     0    31    20    1     200000

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: refactor dump tm map of debugfs

Currently, the debugfs command for tm map is implemented by
"echo xxxx > cmd", and record the information in dmesg. It's
unnecessary and heavy. To improve it, create a single file
"tm_map" for it, and query it by command "cat tm_map",
return the result to userspace, rather than record in dmesg.

As user can't specify queue id in cat command, driver will return info
of all queue id.

The display style is below:
$ cat tm_map
queue_id   qset_id   pri_id   tc_id
0000         0000      00       00
INDEX | TM BP QSET MAPPING:
0000  | 00000000:00000000:00000000:00000000:00000000:00000000:00000000
0256  | 00000000:00000000:00000000:00000000:00000000:00000002:00000000
0512  | 00000000:00000000:00000000:00000004:00000000:00000000:00000000
0768  | 00000000:00000008:00000000:00000000:00000000:00000000:00000000

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: refactor dump fd tcam of debugfs

Currently, the debugfs command for fd tcam is implemented by
"echo xxxx > cmd", and record the information in dmesg. It's
unnecessary and heavy. To improve it, create a single file
"fd_tcam" for it, and query it by command "cat fd_tcam",
return the result to userspace, rather than record in dmesg.

The display style is below:
$ cat fd_tcam
read result tcam key x(31):
00000000
00000000
00000000
08000000
00000600
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
read result tcam key y(31):
00000000
00000000
00000000
f7ff0000
0000f900
00000000
00000000
00000000
00000000
00000000
00000000
00000000
0000fff8

Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: refactor queue info of debugfs

Currently, the debugfs command for queue info is implemented by
"echo xxxx > cmd", and record the information in dmesg. It's
unnecessary and heavy. To improve it, create two files
"rx_queue_info" and "tx_queue_info" for it, and query it
by command "cat rx_queue_info" and "cat tx_queue_info",
return the result to userspace, rather than record in dmesg.

The display style is below:
$ cat rx_queue_info
QUEUE_ID  BD_NUM  BD_LEN  TAIL  HEAD  FBDNUM  PKTNUM   ...
0           0       0     0     0       0       0      ...
1           0       0     0     0       0       0      ...
2           0       0     0     0       0       0      ...

$ cat tx_queue_info
QUEUE_ID  BD_NUM  TC  TAIL  HEAD  FBDNUM  OFFSET  PKTNUM  ...
0           0     0     0     0       0       0        0  ...
1           0     0     0     0       0       0        0  ...
2           0     0     0     0       0       0        0  ...

Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: refactor queue map of debugfs

Currently, the debugfs command for queue map is implemented by
"echo xxxx > cmd", and record the information in dmesg. It's
unnecessary and heavy. To improve it, create a single file
"queue_map" for it, and query it by command "cat queue_map",
return the result to userspace, rather than record in dmesg.

The display style is below:
$ cat queue_map
local_queue_id global_queue_id vector_id
0 0 341

Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: refactor dump reg dcb info of debugfs

Currently, the debugfs command for reg dcb info is implemented by
"echo xxxx > cmd", and record the information in dmesg. It's
unnecessary and heavy. To improve it, create a single file
"dcb" for it, and query it by command "cat dcb",
return the result to userspace, rather than record in dmesg.

The display style is below:
$ cat dcb
qset_id  roce_qset_mask  nic_qset_mask  qset_shaping_pass  qset_bp_status
0000           0x1            0x1             0x1               0x0
0001           0x1            0x1             0x1               0x0
0002           0x1            0x1             0x1               0x0
0003           0x1            0x1             0x1               0x0
0004           0x1            0x1             0x1               0x0
0005           0x1            0x1             0x1               0x0
0006           0x1            0x1             0x1               0x0
0007           0x1            0x1             0x1               0x0
pri_id  pri_mask  pri_cshaping_pass  pri_pshaping_pass
000       0x1           0x0                0x1
001       0x1           0x0                0x0
002       0x1           0x0                0x0
003       0x1           0x0                0x0
004       0x1           0x0                0x0
005       0x1           0x0                0x0
006       0x1           0x0                0x0
007       0x1           0x0                0x0
pg_id  pg_mask  pg_cshaping_pass  pg_pshaping_pass
000      0x1           0x0               0x1

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: refactor dump reg of debugfs

Currently, the debugfs command for reg is implemented by
"echo xxxx > cmd", and record the information in dmesg. It's
unnecessary and heavy. To improve it, create some files
"bios_common/ssu/igu_egu/rpu/ncsi/rtc/ppp/rcb/tqp/mac" for it,
and query it by command "cat xxx", return the result to
userspace, rather than record in dmesg.

The display style is below:
$ cat bios_common
BP_CPU_STATE: 0x0
DFX_MSIX_INFO_NIC_0: 0xc000
DFX_MSIX_INFO_NIC_1: 0x0
DFX_MSIX_INFO_NIC_2: 0x0
DFX_MSIX_INFO_NIC_3: 0x0
DFX_MSIX_INFO_ROC_0: 0xc000
DFX_MSIX_INFO_ROC_1: 0x0
DFX_MSIX_INFO_ROC_2: 0x0
DFX_MSIX_INFO_ROC_3: 0x0

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mISDN: Remove obsolete PIPELINE_DEBUG debugging information

As Leon Romanovsky's tips:
The definition of macro PIPELINE_DEBUG is commented more than 10 years ago
and can be seen as a dead code that should be removed.

Suggested-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igc: Enable TX via AF_XDP zero-copy

Add support for transmitting packets via AF_XDP zero-copy mechanism.

The packet transmission itself is implemented by igc_xdp_xmit_zc() which
is called from igc_clean_tx_irq() when the ring has AF_XDP zero-copy
enabled. Likewise i40e and ice drivers, the transmission budget used is
the number of descriptors available on the ring.

A new tx buffer type is introduced to 'enum igc_tx_buffer_type' to
indicate the tx buffer uses memory from xsk pool so it can be properly
cleaned after transmission or when the ring is cleaned.

The I225 controller has only 4 Tx hardware queues so the main difference
between igc and other Intel drivers that support AF_XDP zero-copy is
that there is no tx ring dedicated exclusively to XDP. Instead, tx
rings are shared between the network stack and XDP, and netdev queue
lock is used to ensure mutual exclusion. This is the same approach
implemented to support XDP_TX and XDP_REDIRECT actions.

Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

igc: Enable RX via AF_XDP zero-copy

Add support for receiving packets via AF_XDP zero-copy mechanism.

Add a new flag to 'enum igc_ring_flags_t' to indicate the ring has
AF_XDP zero-copy enabled so proper ring setup is carried out during ring
configuration in igc_configure_rx_ring().

RX buffers can now be allocated via the shared pages mechanism (default
behavior of the driver) or via xsk pool (when AF_XDP zero-copy is
enabled) so a union is added to the 'struct igc_rx_buffer' to cover both
cases.

When AF_XDP zero-copy is enabled, rx buffers are allocated from the xsk
pool using the new helper igc_alloc_rx_buffers_zc() which is the
counterpart of igc_alloc_rx_buffers().

Likewise other Intel drivers that support AF_XDP zero-copy, in igc we
have a dedicated path for cleaning up rx irqs when zero-copy is enabled.
This avoids adding too many checks within igc_clean_rx_irq(), resulting
in a more readable and efficient code since this function is called from
the hot-path of the driver.

Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

igc: Replace IGC_TX_FLAGS_XDP flag by an enum

Up to this point, Tx buffers are associated with either a skb or a xdpf,
and the IGC_TX_FLAGS_XDP flag was enough to distinguish between these
two case. However, with upcoming patches that will add AF_XDP zero-copy
support, a third case will be introduced so this flag-based approach
won't fit well.

In preparation to land AF_XDP zero-copy support, replace the
IGC_TX_FLAGS_XDP flag by an enum which will be extended once zero-copy
support is introduced to the driver.

Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

igc: Introduce igc_unmap_tx_buffer() helper

In preparation for AF_XDP zero-copy support, encapsulate the code that
unmaps Tx buffers into its own local helper so we can reuse it, avoiding
code duplication.

Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

igc: Introduce TX/RX stats helpers

In preparation for AF_XDP zero-copy support, encapsulate the code that
updates the driver RX stats in its own local helper so it can be reused
in the zero-copy path. Likewise, encapsulate TX stats code as well.

Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

igc: Refactor XDP rxq info registration

Refactor XDP rxq info registration code, preparing the driver for AF_XDP
zero-copy support which is added by upcoming patches.

Currently, xdp_rxq and memory model are both registered during RX
resource setup time by igc_xdp_register_rxq_info() helper. With AF_XDP,
we want to register the memory model later on while configuring the ring
because we will know which memory model type to register
(MEM_TYPE_PAGE_SHARED or MEM_TYPE_XSK_BUFF_POOL).

The helpers igc_xdp_register_rxq_info() and igc_xdp_unregister_rxq_
info() are not useful anymore so they are removed.

Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

igc: Refactor igc_clean_rx_ring()

Refactor igc_clean_rx_ring() helper, preparing the code for AF_XDP
zero-copy support which is added by upcoming patches.

The refactor consists of encapsulating page-shared specific code into
its own helper, leaving common code that will be shared by both
page-shared and xsk pool in igc_clean_rx_ring().

Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>