x86, MCE: Robustify mcheck_init_device
authorBorislav Petkov <bp@suse.de>
Fri, 20 Jun 2014 21:16:45 +0000 (23:16 +0200)
committerBorislav Petkov <bp@suse.de>
Mon, 21 Jul 2014 16:14:32 +0000 (18:14 +0200)
BorisO reports that misc_register() fails often on xen. The current code
unregisters the CPU hotplug notifier in that case. If then a CPU is
offlined and onlined back again, we end up with a second timer running
on that CPU, leading to soft lockups and system hangs.

So let's leave the hotcpu notifier always registered - even if
mce_device_create failed for some cores and never unreg it so that we
can deal with the timer handling accordingly.

Reported-and-Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: http://lkml.kernel.org/r/1403274493-1371-1-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Borislav Petkov <bp@suse.de>
arch/x86/kernel/cpu/mcheck/mce.c

index bb92f38..9a79c8d 100644 (file)
@@ -2451,6 +2451,12 @@ static __init int mcheck_init_device(void)
        for_each_online_cpu(i) {
                err = mce_device_create(i);
                if (err) {
+                       /*
+                        * Register notifier anyway (and do not unreg it) so
+                        * that we don't leave undeleted timers, see notifier
+                        * callback above.
+                        */
+                       __register_hotcpu_notifier(&mce_cpu_notifier);
                        cpu_notifier_register_done();
                        goto err_device_create;
                }
@@ -2471,10 +2477,6 @@ static __init int mcheck_init_device(void)
 err_register:
        unregister_syscore_ops(&mce_syscore_ops);
 
-       cpu_notifier_register_begin();
-       __unregister_hotcpu_notifier(&mce_cpu_notifier);
-       cpu_notifier_register_done();
-
 err_device_create:
        /*
         * We didn't keep track of which devices were created above, but