LoongArch: Add SIMD-optimized XOR routines
authorWANG Xuerui <git@xen0n.name>
Wed, 6 Sep 2023 14:53:55 +0000 (22:53 +0800)
committerHuacai Chen <chenhuacai@loongson.cn>
Wed, 6 Sep 2023 14:53:55 +0000 (22:53 +0800)
commit75ded18a5e8e51ca2d26d55f010d60ae9aab652c
tree29464a5abd1b2b6f7141a26374d45d03eaf17078
parent2478e4b7593a2a55073a4a6bf23dc885c19befd8
LoongArch: Add SIMD-optimized XOR routines

Add LSX and LASX implementations of xor operations, operating on 64
bytes (one L1 cache line) at a time, for a balance between memory
utilization and instruction mix. Huacai confirmed that all future
LoongArch implementations by Loongson (that we care) will likely also
feature 64-byte cache lines, and experiments show no throughput
improvement with further unrolling.

Performance numbers measured during system boot on a 3A5000 @ 2.5GHz:

> 8regs           : 12702 MB/sec
> 8regs_prefetch  : 10920 MB/sec
> 32regs          : 12686 MB/sec
> 32regs_prefetch : 10918 MB/sec
> lsx             : 17589 MB/sec
> lasx            : 26116 MB/sec

Acked-by: Song Liu <song@kernel.org>
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
arch/loongarch/include/asm/xor.h [new file with mode: 0644]
arch/loongarch/include/asm/xor_simd.h [new file with mode: 0644]
arch/loongarch/lib/Makefile
arch/loongarch/lib/xor_simd.c [new file with mode: 0644]
arch/loongarch/lib/xor_simd.h [new file with mode: 0644]
arch/loongarch/lib/xor_simd_glue.c [new file with mode: 0644]
arch/loongarch/lib/xor_template.c [new file with mode: 0644]