net: cache for same cpu skb_attempt_defer_free
authorPavel Begunkov <asml.silence@gmail.com>
Wed, 10 Apr 2024 01:28:09 +0000 (02:28 +0100)
committerJakub Kicinski <kuba@kernel.org>
Thu, 11 Apr 2024 02:27:32 +0000 (19:27 -0700)
Optimise skb_attempt_defer_free() when run by the same CPU the skb was
allocated on. Instead of __kfree_skb() -> kmem_cache_free() we can
disable softirqs and put the buffer into cpu local caches.

CPU bound TCP ping pong style benchmarking (i.e. netbench) showed a 1%
throughput increase (392.2 -> 396.4 Krps). Cross checking with profiles,
the total CPU share of skb_attempt_defer_free() dropped by 0.6%. Note,
I'd expect the win doubled with rx only benchmarks, as the optimisation
is for the receive path, but the test spends >55% of CPU doing writes.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/a887463fb219d973ec5ad275e31194812571f1f5.1712711977.git.asml.silence@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net/core/skbuff.c

index 888874e..18612f2 100644 (file)
@@ -6974,6 +6974,19 @@ free_now:
 EXPORT_SYMBOL(__skb_ext_put);
 #endif /* CONFIG_SKB_EXTENSIONS */
 
+static void kfree_skb_napi_cache(struct sk_buff *skb)
+{
+       /* if SKB is a clone, don't handle this case */
+       if (skb->fclone != SKB_FCLONE_UNAVAILABLE) {
+               __kfree_skb(skb);
+               return;
+       }
+
+       local_bh_disable();
+       __napi_kfree_skb(skb, SKB_DROP_REASON_NOT_SPECIFIED);
+       local_bh_enable();
+}
+
 /**
  * skb_attempt_defer_free - queue skb for remote freeing
  * @skb: buffer
@@ -6992,7 +7005,7 @@ void skb_attempt_defer_free(struct sk_buff *skb)
        if (WARN_ON_ONCE(cpu >= nr_cpu_ids) ||
            !cpu_online(cpu) ||
            cpu == raw_smp_processor_id()) {
-nodefer:       __kfree_skb(skb);
+nodefer:       kfree_skb_napi_cache(skb);
                return;
        }