From: Michael Ellerman Date: Thu, 25 Nov 2021 00:23:59 +0000 (+1100) Subject: Merge branch 'topic/ppc-kvm' into next X-Git-Tag: microblaze-v5.18~87^2~184 X-Git-Url: http://git.monstr.eu/?a=commitdiff_plain;h=ff0d6be4bf9ad4daba024ba0157b97750c7ad1fb;p=linux-2.6-microblaze.git Merge branch 'topic/ppc-kvm' into next This merge's Nick's big P9 KVM series, original cover letter follows: KVM: PPC: Book3S HV P9: entry/exit optimisations This reduces radix guest full entry/exit latency on POWER9 and POWER10 by 2x. Nested HV guests should see smaller improvements in their L1 entry/exit, but this is also combined with most L0 speedups also applying to nested entry. nginx localhost throughput test in a SMP nested guest is improved about 10% (in a direct guest it doesn't change much because it uses XIVE for IPIs) when L0 and L1 are patched. It does this in several main ways: - Rearrange code to optimise SPR accesses. Mainly, avoid scoreboard stalls. - Test SPR values to avoid mtSPRs where possible. mtSPRs are expensive. - Reduce mftb. mftb is expensive. - Demand fault certain facilities to avoid saving and/or restoring them (at the cost of fault when they are used, but this is mitigated over a number of entries, like the facilities when context switching processes). PM, TM, and EBB so far. - Defer some sequences that are made just in case a guest is interrupted in the middle of a critical section to the case where the guest is scheduled on a different CPU, rather than every time (at the cost of an extra IPI in this case). Namely the tlbsync sequence for radix with GTSE, which is very expensive. - Reduce locking, barriers, atomics related to the vcpus-per-vcore > 1 handling that the P9 path does not require. --- ff0d6be4bf9ad4daba024ba0157b97750c7ad1fb