riscv: module: Optimize PLT/GOT entry counting
authorSamuel Holland <samuel.holland@sifive.com>
Wed, 9 Apr 2025 17:14:51 +0000 (10:14 -0700)
committerPalmer Dabbelt <palmer@dabbelt.com>
Thu, 5 Jun 2025 18:09:43 +0000 (11:09 -0700)
commitbe17c0df67959fe4f88dac75dc26ed9252d4b133
treef00beb7f8a1b319cc911f400b1b945cc731eebb4
parent881dadf0792c02d7d156e0665dc5eaf18701cd5a
riscv: module: Optimize PLT/GOT entry counting

perf reports that 99.63% of the cycles from `modprobe amdgpu` are spent
inside module_frob_arch_sections(). This is because amdgpu.ko contains
about 300000 relocations in its .rela.text section, and the algorithm in
count_max_entries() takes quadratic time.

Apply two optimizations from the arm64 code, which together reduce the
total execution time by 99.58%. First, sort the relocations so duplicate
entries are adjacent. Second, reduce the number of relocations that must
be sorted by filtering to only relocations that need PLT/GOT entries, as
done in commit d4e0340919fb ("arm64/module: Optimize module load time by
optimizing PLT counting").

Unlike the arm64 code, here the filtering and sorting is done in a
scratch buffer, because the HI20 relocation search optimization in
apply_relocate_add() depends on the original order of the relocations.
This allows accumulating PLT/GOT relocations across sections so sorting
and counting is only done once per module.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20250409171526.862481-3-samuel.holland@sifive.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
arch/riscv/kernel/module-sections.c