KVM: x86/mmu: Do not recover dirty-tracked NX Huge Pages
authorDavid Matlack <dmatlack@google.com>
Thu, 3 Nov 2022 20:44:21 +0000 (13:44 -0700)
committerPaolo Bonzini <pbonzini@redhat.com>
Thu, 17 Nov 2022 16:26:35 +0000 (11:26 -0500)
Do not recover (i.e. zap) an NX Huge Page that is being dirty tracked,
as it will just be faulted back in at the same 4KiB granularity when
accessed by a vCPU. This may need to be changed if KVM ever supports
2MiB (or larger) dirty tracking granularity, or faulting huge pages
during dirty tracking for reads/executes. However for now, these zaps
are entirely wasteful.

In order to check if this commit increases the CPU usage of the NX
recovery worker thread I used a modified version of execute_perf_test
[1] that supports splitting guest memory into multiple slots and reports
/proc/pid/schedstat:se.sum_exec_runtime for the NX recovery worker just
before tearing down the VM. The goal was to force a large number of NX
Huge Page recoveries and see if the recovery worker used any more CPU.

Test Setup:

  echo 1000 > /sys/module/kvm/parameters/nx_huge_pages_recovery_period_ms
  echo 10 > /sys/module/kvm/parameters/nx_huge_pages_recovery_ratio

Test Command:

  ./execute_perf_test -v64 -s anonymous_hugetlb_1gb -x 16 -o

        | kvm-nx-lpage-re:se.sum_exec_runtime      |
        | ---------------------------------------- |
Run     | Before             | After               |
------- | ------------------ | ------------------- |
1       | 730.084105         | 724.375314          |
2       | 728.751339         | 740.581988          |
3       | 736.264720         | 757.078163          |

Comparing the median results, this commit results in about a 1% increase
CPU usage of the NX recovery worker when testing a VM with 16 slots.
However, the effect is negligible with the default halving time of NX
pages, which is 1 hour rather than 10 seconds given by period_ms = 1000,
ratio = 10.

[1] https://lore.kernel.org/kvm/20221019234050.3919566-2-dmatlack@google.com/

Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20221103204421.1146958-1-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
arch/x86/kvm/mmu/mmu.c

index 93c389e..cfff746 100644 (file)
@@ -6841,6 +6841,7 @@ static int set_nx_huge_pages_recovery_param(const char *val, const struct kernel
 static void kvm_recover_nx_huge_pages(struct kvm *kvm)
 {
        unsigned long nx_lpage_splits = kvm->stat.nx_lpage_splits;
+       struct kvm_memory_slot *slot;
        int rcu_idx;
        struct kvm_mmu_page *sp;
        unsigned int ratio;
@@ -6875,7 +6876,21 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm)
                                      struct kvm_mmu_page,
                                      possible_nx_huge_page_link);
                WARN_ON_ONCE(!sp->nx_huge_page_disallowed);
-               if (is_tdp_mmu_page(sp))
+               WARN_ON_ONCE(!sp->role.direct);
+
+               slot = gfn_to_memslot(kvm, sp->gfn);
+               WARN_ON_ONCE(!slot);
+
+               /*
+                * Unaccount and do not attempt to recover any NX Huge Pages
+                * that are being dirty tracked, as they would just be faulted
+                * back in as 4KiB pages. The NX Huge Pages in this slot will be
+                * recovered, along with all the other huge pages in the slot,
+                * when dirty logging is disabled.
+                */
+               if (slot && kvm_slot_dirty_track_enabled(slot))
+                       unaccount_nx_huge_page(kvm, sp);
+               else if (is_tdp_mmu_page(sp))
                        flush |= kvm_tdp_mmu_zap_sp(kvm, sp);
                else
                        kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);