net: bonding: Use per-cpu rr_tx_counter
authorJussi Maki <joamaki@gmail.com>
Tue, 15 Jun 2021 08:54:15 +0000 (08:54 +0000)
committerDavid S. Miller <davem@davemloft.net>
Tue, 15 Jun 2021 18:26:15 +0000 (11:26 -0700)
commit848ca9182a7d25bb54955c3aab9a3a2742bf9678
tree15449c43a407368fbc404c5d79c5e4189840226e
parentb8f6b0522c298ae9267bd6584e19b942a0636910
net: bonding: Use per-cpu rr_tx_counter

The round-robin rr_tx_counter was shared across CPUs leading to
significant cache thrashing at high packet rates. This patch switches
the round-robin packet counter to use a per-cpu variable to decide
the destination slave.

On a test with 2x100Gbit ICE nic with pktgen_sample_04_many_flows.sh
(-s 64 -t 32) the tx rate was 19.6Mpps before and 22.3Mpps after
this patch.

"perf top -e cache_misses" before:
    12.31%  [bonding]       [k] bond_xmit_roundrobin_slave_get
    10.59%  [sch_fq_codel]  [k] fq_codel_dequeue
     9.34%  [kernel]        [k] skb_release_data
after:
    15.42%  [sch_fq_codel]  [k] fq_codel_dequeue
    10.06%  [kernel]        [k] __memset
     9.12%  [kernel]        [k] skb_release_data

Signed-off-by: Jussi Maki <joamaki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/net/bonding/bond_main.c
include/net/bonding.h