cxl: fix nested locking hang during EEH hotplug
authorAndrew Donnellan <andrew.donnellan@au1.ibm.com>
Mon, 6 Feb 2017 01:07:17 +0000 (12:07 +1100)
committerMichael Ellerman <mpe@ellerman.id.au>
Tue, 21 Feb 2017 10:32:52 +0000 (21:32 +1100)
commit171ed0fcd8966d82c45376f1434678e7b9d4d9b1
tree866ae9bbb1d2a386362b1d0fb3bb9b054437981b
parent5e48dc0aa4d9daf93e9bff2a274473623a134ec8
cxl: fix nested locking hang during EEH hotplug

Commit 14a3ae34bfd0 ("cxl: Prevent read/write to AFU config space while AFU
not configured") introduced a rwsem to fix an invalid memory access that
occurred when someone attempts to access the config space of an AFU on a
vPHB whilst the AFU is deconfigured, such as during EEH recovery.

It turns out that it's possible to run into a nested locking issue when EEH
recovery fails and a full device hotplug is required.
cxl_pci_error_detected() deconfigures the AFU, taking a writer lock on
configured_rwsem. When EEH recovery fails, the EEH code calls
pci_hp_remove_devices() to remove the device, which in turn calls
cxl_remove() -> cxl_pci_remove_afu() -> pci_deconfigure_afu(), which tries
to grab the writer lock that's already held.

Standard rwsem semantics don't express what we really want to do here and
don't allow for nested locking. Fix this by replacing the rwsem with an
atomic_t which we can control more finely. Allow the AFU to be locked
multiple times so long as there are no readers.

Fixes: 14a3ae34bfd0 ("cxl: Prevent read/write to AFU config space while AFU not configured")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
drivers/misc/cxl/cxl.h
drivers/misc/cxl/main.c
drivers/misc/cxl/pci.c
drivers/misc/cxl/vphb.c