bpf: Do not grab the bucket spinlock by default on htab batch ops
Grabbing the spinlock for every bucket even if it's empty, was causing
significant perfomance cost when traversing htab maps that have only a
few entries. This patch addresses the issue by checking first the
bucket_cnt, if the bucket has some entries then we go and grab the
spinlock and proceed with the batching.
Tested with a htab of size 50K and different value of populated entries.
Before:
Benchmark Time(ns) CPU(ns)
---------------------------------------------
BM_DumpHashMap/1
2759655 2752033
BM_DumpHashMap/10
2933722 2930825
BM_DumpHashMap/200
3171680 3170265
BM_DumpHashMap/500
3639607 3635511
BM_DumpHashMap/1000
4369008 4364981
BM_DumpHashMap/5k
11171919 11134028
BM_DumpHashMap/20k
69150080 69033496
BM_DumpHashMap/39k
190501036 190226162
After:
Benchmark Time(ns) CPU(ns)
---------------------------------------------
BM_DumpHashMap/1 202707 200109
BM_DumpHashMap/10 213441 210569
BM_DumpHashMap/200 478641 472350
BM_DumpHashMap/500 980061 967102
BM_DumpHashMap/1000
1863835 1839575
BM_DumpHashMap/5k
8961836 8902540
BM_DumpHashMap/20k
69761497 69322756
BM_DumpHashMap/39k
187437830 186551111
Fixes:
057996380a42 ("bpf: Add batch ops to all htab bpf map")
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200218172552.215077-1-brianvv@google.com