x86/mce/therm_throt: Undo thermal polling properly on CPU offline
authorThomas Gleixner <tglx@linutronix.de>
Tue, 25 Feb 2020 13:55:15 +0000 (14:55 +0100)
committerBorislav Petkov <bp@suse.de>
Tue, 25 Feb 2020 20:21:44 +0000 (21:21 +0100)
Chris Wilson reported splats from running the thermal throttling
workqueue callback on offlined CPUs. The problem is that that callback
should not even run on offlined CPUs but it happens nevertheless because
the offlining callback thermal_throttle_offline() does not symmetrically
undo the setup work done in its onlining counterpart. IOW,

 1. The thermal interrupt vector should be masked out before ...

 2. ... cancelling any pending work synchronously so that no new work is
 enqueued anymore.

Do those things and fix the issue properly.

 [ bp: Write commit message. ]

Fixes: f6656208f04e ("x86/mce/therm_throt: Optimize notifications of thermal throttle")
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Pandruvada, Srinivas <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/158120068234.18291.7938335950259651295@skylake-alporthouse-com
arch/x86/kernel/cpu/mce/therm_throt.c

index 58b4ee3..f36dc07 100644 (file)
@@ -486,9 +486,14 @@ static int thermal_throttle_offline(unsigned int cpu)
 {
        struct thermal_state *state = &per_cpu(thermal_state, cpu);
        struct device *dev = get_cpu_device(cpu);
+       u32 l;
+
+       /* Mask the thermal vector before draining evtl. pending work */
+       l = apic_read(APIC_LVTTHMR);
+       apic_write(APIC_LVTTHMR, l | APIC_LVT_MASKED);
 
-       cancel_delayed_work(&state->package_throttle.therm_work);
-       cancel_delayed_work(&state->core_throttle.therm_work);
+       cancel_delayed_work_sync(&state->package_throttle.therm_work);
+       cancel_delayed_work_sync(&state->core_throttle.therm_work);
 
        state->package_throttle.rate_control_active = false;
        state->core_throttle.rate_control_active = false;