i40e: Use batched xsk Tx interfaces to increase performance
authorMagnus Karlsson <magnus.karlsson@intel.com>
Mon, 16 Nov 2020 11:12:47 +0000 (12:12 +0100)
committerDaniel Borkmann <daniel@iogearbox.net>
Tue, 17 Nov 2020 21:07:40 +0000 (22:07 +0100)
commit3106c580fb7cf26691c1ce3aba2223f3ae56d846
treefd8b06d91a34b79b6f9b456ed2d6509033f39e1b
parent9349eb3a9d2ae0151510dd98b6640dfaeebee9cc
i40e: Use batched xsk Tx interfaces to increase performance

Use the new batched xsk interfaces for the Tx path in the i40e driver
to improve performance. On my machine, this yields a throughput
increase of 4% for the l2fwd sample app in xdpsock. If we instead just
look at the Tx part, this patch set increases throughput with above
20% for Tx.

Note that I had to explicitly loop unroll the inner loop to get to
this performance level, by using a pragma. It is honored by both clang
and gcc and should be ignored by versions that do not support
it. Using the -funroll-loops compiler command line switch on the
source file resulted in a loop unrolling on a higher level that
lead to a performance decrease instead of an increase.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/1605525167-14450-6-git-send-email-magnus.karlsson@gmail.com
drivers/net/ethernet/intel/i40e/i40e_txrx.c
drivers/net/ethernet/intel/i40e/i40e_txrx.h
drivers/net/ethernet/intel/i40e/i40e_xsk.c
drivers/net/ethernet/intel/i40e/i40e_xsk.h