PM, libnvdimm: Add runtime firmware activation support
authorDan Williams <dan.j.williams@intel.com>
Mon, 20 Jul 2020 22:08:18 +0000 (15:08 -0700)
committerVishal Verma <vishal.l.verma@intel.com>
Wed, 29 Jul 2020 01:28:32 +0000 (19:28 -0600)
Abstract platform specific mechanics for nvdimm firmware activation
behind a handful of generic ops. At the bus level ->activate_state()
indicates the unified state (idle, busy, armed) of all DIMMs on the bus,
and ->capability() indicates the system state expectations for activate.
At the DIMM level ->activate_state() indicates the per-DIMM state,
->activate_result() indicates the outcome of the last activation
attempt, and ->arm() attempts to transition the DIMM from 'idle' to
'armed'.

A new hibernate_quiet_exec() facility is added to support firmware
activation in an OS defined system quiesce state. It leverages the fact
that the hibernate-freeze state wants to assert that a memory
hibernation snapshot can be taken. This is in contrast to a platform
firmware defined quiesce state that may forcefully quiet the memory
controller independent of whether an individual device-driver properly
supports hibernate-freeze.

The libnvdimm sysfs interface is extended to support detection of a
firmware activate capability. The mechanism supports enumeration and
triggering of firmware activate, optionally in the
hibernate_quiet_exec() context.

[rafael: hibernate_quiet_exec() proposal]
[vishal: fix up sparse warning, grammar in Documentation/]

Cc: Pavel Machek <pavel@ucw.cz>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Co-developed-by: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Signed-off-by: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Documentation/ABI/testing/sysfs-bus-nvdimm [new file with mode: 0644]
Documentation/driver-api/nvdimm/firmware-activate.rst [new file with mode: 0644]
drivers/nvdimm/core.c
drivers/nvdimm/dimm_devs.c
drivers/nvdimm/nd-core.h
include/linux/libnvdimm.h
include/linux/suspend.h
kernel/power/hibernate.c

diff --git a/Documentation/ABI/testing/sysfs-bus-nvdimm b/Documentation/ABI/testing/sysfs-bus-nvdimm
new file mode 100644 (file)
index 0000000..d643802
--- /dev/null
@@ -0,0 +1,2 @@
+The libnvdimm sub-system implements a common sysfs interface for
+platform nvdimm resources. See Documentation/driver-api/nvdimm/.
diff --git a/Documentation/driver-api/nvdimm/firmware-activate.rst b/Documentation/driver-api/nvdimm/firmware-activate.rst
new file mode 100644 (file)
index 0000000..7ee7dec
--- /dev/null
@@ -0,0 +1,86 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==================================
+NVDIMM Runtime Firmware Activation
+==================================
+
+Some persistent memory devices run a firmware locally on the device /
+"DIMM" to perform tasks like media management, capacity provisioning,
+and health monitoring. The process of updating that firmware typically
+involves a reboot because it has implications for in-flight memory
+transactions. However, reboots are disruptive and at least the Intel
+persistent memory platform implementation, described by the Intel ACPI
+DSM specification [1], has added support for activating firmware at
+runtime.
+
+A native sysfs interface is implemented in libnvdimm to allow platform
+to advertise and control their local runtime firmware activation
+capability.
+
+The libnvdimm bus object, ndbusX, implements an ndbusX/firmware/activate
+attribute that shows the state of the firmware activation as one of 'idle',
+'armed', 'overflow', and 'busy'.
+
+- idle:
+  No devices are set / armed to activate firmware
+
+- armed:
+  At least one device is armed
+
+- busy:
+  In the busy state armed devices are in the process of transitioning
+  back to idle and completing an activation cycle.
+
+- overflow:
+  If the platform has a concept of incremental work needed to perform
+  the activation it could be the case that too many DIMMs are armed for
+  activation. In that scenario the potential for firmware activation to
+  timeout is indicated by the 'overflow' state.
+
+The 'ndbusX/firmware/activate' property can be written with a value of
+either 'live', or 'quiesce'. A value of 'quiesce' triggers the kernel to
+run firmware activation from within the equivalent of the hibernation
+'freeze' state where drivers and applications are notified to stop their
+modifications of system memory. A value of 'live' attempts
+firmware activation without this hibernation cycle. The
+'ndbusX/firmware/activate' property will be elided completely if no
+firmware activation capability is detected.
+
+Another property 'ndbusX/firmware/capability' indicates a value of
+'live' or 'quiesce', where 'live' indicates that the firmware
+does not require or inflict any quiesce period on the system to update
+firmware. A capability value of 'quiesce' indicates that firmware does
+expect and injects a quiet period for the memory controller, but 'live'
+may still be written to 'ndbusX/firmware/activate' as an override to
+assume the risk of racing firmware update with in-flight device and
+application activity. The 'ndbusX/firmware/capability' property will be
+elided completely if no firmware activation capability is detected.
+
+The libnvdimm memory-device / DIMM object, nmemX, implements
+'nmemX/firmware/activate' and 'nmemX/firmware/result' attributes to
+communicate the per-device firmware activation state. Similar to the
+'ndbusX/firmware/activate' attribute, the 'nmemX/firmware/activate'
+attribute indicates 'idle', 'armed', or 'busy'. The state transitions
+from 'armed' to 'idle' when the system is prepared to activate firmware,
+firmware staged + state set to armed, and 'ndbusX/firmware/activate' is
+triggered. After that activation event the nmemX/firmware/result
+attribute reflects the state of the last activation as one of:
+
+- none:
+  No runtime activation triggered since the last time the device was reset
+
+- success:
+  The last runtime activation completed successfully.
+
+- fail:
+  The last runtime activation failed for device-specific reasons.
+
+- not_staged:
+  The last runtime activation failed due to a sequencing error of the
+  firmware image not being staged.
+
+- need_reset:
+  Runtime firmware activation failed, but the firmware can still be
+  activated via the legacy method of power-cycling the system.
+
+[1]: https://docs.pmem.io/persistent-memory/
index fe9bd6f..c21ba06 100644 (file)
@@ -4,6 +4,7 @@
  */
 #include <linux/libnvdimm.h>
 #include <linux/badblocks.h>
+#include <linux/suspend.h>
 #include <linux/export.h>
 #include <linux/module.h>
 #include <linux/blkdev.h>
@@ -389,8 +390,156 @@ static const struct attribute_group nvdimm_bus_attribute_group = {
        .attrs = nvdimm_bus_attributes,
 };
 
+static ssize_t capability_show(struct device *dev,
+               struct device_attribute *attr, char *buf)
+{
+       struct nvdimm_bus *nvdimm_bus = to_nvdimm_bus(dev);
+       struct nvdimm_bus_descriptor *nd_desc = nvdimm_bus->nd_desc;
+       enum nvdimm_fwa_capability cap;
+
+       if (!nd_desc->fw_ops)
+               return -EOPNOTSUPP;
+
+       nvdimm_bus_lock(dev);
+       cap = nd_desc->fw_ops->capability(nd_desc);
+       nvdimm_bus_unlock(dev);
+
+       switch (cap) {
+       case NVDIMM_FWA_CAP_QUIESCE:
+               return sprintf(buf, "quiesce\n");
+       case NVDIMM_FWA_CAP_LIVE:
+               return sprintf(buf, "live\n");
+       default:
+               return -EOPNOTSUPP;
+       }
+}
+
+static DEVICE_ATTR_RO(capability);
+
+static ssize_t activate_show(struct device *dev,
+               struct device_attribute *attr, char *buf)
+{
+       struct nvdimm_bus *nvdimm_bus = to_nvdimm_bus(dev);
+       struct nvdimm_bus_descriptor *nd_desc = nvdimm_bus->nd_desc;
+       enum nvdimm_fwa_capability cap;
+       enum nvdimm_fwa_state state;
+
+       if (!nd_desc->fw_ops)
+               return -EOPNOTSUPP;
+
+       nvdimm_bus_lock(dev);
+       cap = nd_desc->fw_ops->capability(nd_desc);
+       state = nd_desc->fw_ops->activate_state(nd_desc);
+       nvdimm_bus_unlock(dev);
+
+       if (cap < NVDIMM_FWA_CAP_QUIESCE)
+               return -EOPNOTSUPP;
+
+       switch (state) {
+       case NVDIMM_FWA_IDLE:
+               return sprintf(buf, "idle\n");
+       case NVDIMM_FWA_BUSY:
+               return sprintf(buf, "busy\n");
+       case NVDIMM_FWA_ARMED:
+               return sprintf(buf, "armed\n");
+       case NVDIMM_FWA_ARM_OVERFLOW:
+               return sprintf(buf, "overflow\n");
+       default:
+               return -ENXIO;
+       }
+}
+
+static int exec_firmware_activate(void *data)
+{
+       struct nvdimm_bus_descriptor *nd_desc = data;
+
+       return nd_desc->fw_ops->activate(nd_desc);
+}
+
+static ssize_t activate_store(struct device *dev,
+               struct device_attribute *attr, const char *buf, size_t len)
+{
+       struct nvdimm_bus *nvdimm_bus = to_nvdimm_bus(dev);
+       struct nvdimm_bus_descriptor *nd_desc = nvdimm_bus->nd_desc;
+       enum nvdimm_fwa_state state;
+       bool quiesce;
+       ssize_t rc;
+
+       if (!nd_desc->fw_ops)
+               return -EOPNOTSUPP;
+
+       if (sysfs_streq(buf, "live"))
+               quiesce = false;
+       else if (sysfs_streq(buf, "quiesce"))
+               quiesce = true;
+       else
+               return -EINVAL;
+
+       nvdimm_bus_lock(dev);
+       state = nd_desc->fw_ops->activate_state(nd_desc);
+
+       switch (state) {
+       case NVDIMM_FWA_BUSY:
+               rc = -EBUSY;
+               break;
+       case NVDIMM_FWA_ARMED:
+       case NVDIMM_FWA_ARM_OVERFLOW:
+               if (quiesce)
+                       rc = hibernate_quiet_exec(exec_firmware_activate, nd_desc);
+               else
+                       rc = nd_desc->fw_ops->activate(nd_desc);
+               break;
+       case NVDIMM_FWA_IDLE:
+       default:
+               rc = -ENXIO;
+       }
+       nvdimm_bus_unlock(dev);
+
+       if (rc == 0)
+               rc = len;
+       return rc;
+}
+
+static DEVICE_ATTR_ADMIN_RW(activate);
+
+static umode_t nvdimm_bus_firmware_visible(struct kobject *kobj, struct attribute *a, int n)
+{
+       struct device *dev = container_of(kobj, typeof(*dev), kobj);
+       struct nvdimm_bus *nvdimm_bus = to_nvdimm_bus(dev);
+       struct nvdimm_bus_descriptor *nd_desc = nvdimm_bus->nd_desc;
+       enum nvdimm_fwa_capability cap;
+
+       /*
+        * Both 'activate' and 'capability' disappear when no ops
+        * detected, or a negative capability is indicated.
+        */
+       if (!nd_desc->fw_ops)
+               return 0;
+
+       nvdimm_bus_lock(dev);
+       cap = nd_desc->fw_ops->capability(nd_desc);
+       nvdimm_bus_unlock(dev);
+
+       if (cap < NVDIMM_FWA_CAP_QUIESCE)
+               return 0;
+
+       return a->mode;
+}
+static struct attribute *nvdimm_bus_firmware_attributes[] = {
+       &dev_attr_activate.attr,
+       &dev_attr_capability.attr,
+       NULL,
+};
+
+static const struct attribute_group nvdimm_bus_firmware_attribute_group = {
+       .name = "firmware",
+       .attrs = nvdimm_bus_firmware_attributes,
+       .is_visible = nvdimm_bus_firmware_visible,
+};
+
 const struct attribute_group *nvdimm_bus_attribute_groups[] = {
        &nvdimm_bus_attribute_group,
+       &nvdimm_bus_firmware_attribute_group,
        NULL,
 };
 
index b7b77e8..85b53a7 100644 (file)
@@ -446,9 +446,124 @@ static const struct attribute_group nvdimm_attribute_group = {
        .is_visible = nvdimm_visible,
 };
 
+static ssize_t result_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+       struct nvdimm *nvdimm = to_nvdimm(dev);
+       enum nvdimm_fwa_result result;
+
+       if (!nvdimm->fw_ops)
+               return -EOPNOTSUPP;
+
+       nvdimm_bus_lock(dev);
+       result = nvdimm->fw_ops->activate_result(nvdimm);
+       nvdimm_bus_unlock(dev);
+
+       switch (result) {
+       case NVDIMM_FWA_RESULT_NONE:
+               return sprintf(buf, "none\n");
+       case NVDIMM_FWA_RESULT_SUCCESS:
+               return sprintf(buf, "success\n");
+       case NVDIMM_FWA_RESULT_FAIL:
+               return sprintf(buf, "fail\n");
+       case NVDIMM_FWA_RESULT_NOTSTAGED:
+               return sprintf(buf, "not_staged\n");
+       case NVDIMM_FWA_RESULT_NEEDRESET:
+               return sprintf(buf, "need_reset\n");
+       default:
+               return -ENXIO;
+       }
+}
+static DEVICE_ATTR_ADMIN_RO(result);
+
+static ssize_t activate_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+       struct nvdimm *nvdimm = to_nvdimm(dev);
+       enum nvdimm_fwa_state state;
+
+       if (!nvdimm->fw_ops)
+               return -EOPNOTSUPP;
+
+       nvdimm_bus_lock(dev);
+       state = nvdimm->fw_ops->activate_state(nvdimm);
+       nvdimm_bus_unlock(dev);
+
+       switch (state) {
+       case NVDIMM_FWA_IDLE:
+               return sprintf(buf, "idle\n");
+       case NVDIMM_FWA_BUSY:
+               return sprintf(buf, "busy\n");
+       case NVDIMM_FWA_ARMED:
+               return sprintf(buf, "armed\n");
+       default:
+               return -ENXIO;
+       }
+}
+
+static ssize_t activate_store(struct device *dev, struct device_attribute *attr,
+               const char *buf, size_t len)
+{
+       struct nvdimm *nvdimm = to_nvdimm(dev);
+       enum nvdimm_fwa_trigger arg;
+       int rc;
+
+       if (!nvdimm->fw_ops)
+               return -EOPNOTSUPP;
+
+       if (sysfs_streq(buf, "arm"))
+               arg = NVDIMM_FWA_ARM;
+       else if (sysfs_streq(buf, "disarm"))
+               arg = NVDIMM_FWA_DISARM;
+       else
+               return -EINVAL;
+
+       nvdimm_bus_lock(dev);
+       rc = nvdimm->fw_ops->arm(nvdimm, arg);
+       nvdimm_bus_unlock(dev);
+
+       if (rc < 0)
+               return rc;
+       return len;
+}
+static DEVICE_ATTR_ADMIN_RW(activate);
+
+static struct attribute *nvdimm_firmware_attributes[] = {
+       &dev_attr_activate.attr,
+       &dev_attr_result.attr,
+};
+
+static umode_t nvdimm_firmware_visible(struct kobject *kobj, struct attribute *a, int n)
+{
+       struct device *dev = container_of(kobj, typeof(*dev), kobj);
+       struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(dev);
+       struct nvdimm_bus_descriptor *nd_desc = nvdimm_bus->nd_desc;
+       struct nvdimm *nvdimm = to_nvdimm(dev);
+       enum nvdimm_fwa_capability cap;
+
+       if (!nd_desc->fw_ops)
+               return 0;
+       if (!nvdimm->fw_ops)
+               return 0;
+
+       nvdimm_bus_lock(dev);
+       cap = nd_desc->fw_ops->capability(nd_desc);
+       nvdimm_bus_unlock(dev);
+
+       if (cap < NVDIMM_FWA_CAP_QUIESCE)
+               return 0;
+
+       return a->mode;
+}
+
+static const struct attribute_group nvdimm_firmware_attribute_group = {
+       .name = "firmware",
+       .attrs = nvdimm_firmware_attributes,
+       .is_visible = nvdimm_firmware_visible,
+};
+
 static const struct attribute_group *nvdimm_attribute_groups[] = {
        &nd_device_attribute_group,
        &nvdimm_attribute_group,
+       &nvdimm_firmware_attribute_group,
        NULL,
 };
 
index ddb9d97..564faa3 100644 (file)
@@ -45,6 +45,7 @@ struct nvdimm {
                struct kernfs_node *overwrite_state;
        } sec;
        struct delayed_work dwork;
+       const struct nvdimm_fw_ops *fw_ops;
 };
 
 static inline unsigned long nvdimm_security_flags(
index ad9898e..15dbcb7 100644 (file)
@@ -86,6 +86,7 @@ struct nvdimm_bus_descriptor {
        int (*flush_probe)(struct nvdimm_bus_descriptor *nd_desc);
        int (*clear_to_send)(struct nvdimm_bus_descriptor *nd_desc,
                        struct nvdimm *nvdimm, unsigned int cmd, void *data);
+       const struct nvdimm_bus_fw_ops *fw_ops;
 };
 
 struct nd_cmd_desc {
@@ -200,6 +201,49 @@ struct nvdimm_security_ops {
        int (*query_overwrite)(struct nvdimm *nvdimm);
 };
 
+enum nvdimm_fwa_state {
+       NVDIMM_FWA_INVALID,
+       NVDIMM_FWA_IDLE,
+       NVDIMM_FWA_ARMED,
+       NVDIMM_FWA_BUSY,
+       NVDIMM_FWA_ARM_OVERFLOW,
+};
+
+enum nvdimm_fwa_trigger {
+       NVDIMM_FWA_ARM,
+       NVDIMM_FWA_DISARM,
+};
+
+enum nvdimm_fwa_capability {
+       NVDIMM_FWA_CAP_INVALID,
+       NVDIMM_FWA_CAP_NONE,
+       NVDIMM_FWA_CAP_QUIESCE,
+       NVDIMM_FWA_CAP_LIVE,
+};
+
+enum nvdimm_fwa_result {
+       NVDIMM_FWA_RESULT_INVALID,
+       NVDIMM_FWA_RESULT_NONE,
+       NVDIMM_FWA_RESULT_SUCCESS,
+       NVDIMM_FWA_RESULT_NOTSTAGED,
+       NVDIMM_FWA_RESULT_NEEDRESET,
+       NVDIMM_FWA_RESULT_FAIL,
+};
+
+struct nvdimm_bus_fw_ops {
+       enum nvdimm_fwa_state (*activate_state)
+               (struct nvdimm_bus_descriptor *nd_desc);
+       enum nvdimm_fwa_capability (*capability)
+               (struct nvdimm_bus_descriptor *nd_desc);
+       int (*activate)(struct nvdimm_bus_descriptor *nd_desc);
+};
+
+struct nvdimm_fw_ops {
+       enum nvdimm_fwa_state (*activate_state)(struct nvdimm *nvdimm);
+       enum nvdimm_fwa_result (*activate_result)(struct nvdimm *nvdimm);
+       int (*arm)(struct nvdimm *nvdimm, enum nvdimm_fwa_trigger arg);
+};
+
 void badrange_init(struct badrange *badrange);
 int badrange_add(struct badrange *badrange, u64 addr, u64 length);
 void badrange_forget(struct badrange *badrange, phys_addr_t start,
index b960098..cb9afad 100644 (file)
@@ -453,6 +453,8 @@ extern bool hibernation_available(void);
 asmlinkage int swsusp_save(void);
 extern struct pbe *restore_pblist;
 int pfn_is_nosave(unsigned long pfn);
+
+int hibernate_quiet_exec(int (*func)(void *data), void *data);
 #else /* CONFIG_HIBERNATION */
 static inline void register_nosave_region(unsigned long b, unsigned long e) {}
 static inline void register_nosave_region_late(unsigned long b, unsigned long e) {}
@@ -464,6 +466,10 @@ static inline void hibernation_set_ops(const struct platform_hibernation_ops *op
 static inline int hibernate(void) { return -ENOSYS; }
 static inline bool system_entering_hibernation(void) { return false; }
 static inline bool hibernation_available(void) { return false; }
+
+static inline int hibernate_quiet_exec(int (*func)(void *data), void *data) {
+       return -ENOTSUPP;
+}
 #endif /* CONFIG_HIBERNATION */
 
 #ifdef CONFIG_HIBERNATION_SNAPSHOT_DEV
index 02ec716..e6fab3f 100644 (file)
@@ -795,6 +795,103 @@ int hibernate(void)
        return error;
 }
 
+/**
+ * hibernate_quiet_exec - Execute a function with all devices frozen.
+ * @func: Function to execute.
+ * @data: Data pointer to pass to @func.
+ *
+ * Return the @func return value or an error code if it cannot be executed.
+ */
+int hibernate_quiet_exec(int (*func)(void *data), void *data)
+{
+       int error, nr_calls = 0;
+
+       lock_system_sleep();
+
+       if (!hibernate_acquire()) {
+               error = -EBUSY;
+               goto unlock;
+       }
+
+       pm_prepare_console();
+
+       error = __pm_notifier_call_chain(PM_HIBERNATION_PREPARE, -1, &nr_calls);
+       if (error) {
+               nr_calls--;
+               goto exit;
+       }
+
+       error = freeze_processes();
+       if (error)
+               goto exit;
+
+       lock_device_hotplug();
+
+       pm_suspend_clear_flags();
+
+       error = platform_begin(true);
+       if (error)
+               goto thaw;
+
+       error = freeze_kernel_threads();
+       if (error)
+               goto thaw;
+
+       error = dpm_prepare(PMSG_FREEZE);
+       if (error)
+               goto dpm_complete;
+
+       suspend_console();
+
+       error = dpm_suspend(PMSG_FREEZE);
+       if (error)
+               goto dpm_resume;
+
+       error = dpm_suspend_end(PMSG_FREEZE);
+       if (error)
+               goto dpm_resume;
+
+       error = platform_pre_snapshot(true);
+       if (error)
+               goto skip;
+
+       error = func(data);
+
+skip:
+       platform_finish(true);
+
+       dpm_resume_start(PMSG_THAW);
+
+dpm_resume:
+       dpm_resume(PMSG_THAW);
+
+       resume_console();
+
+dpm_complete:
+       dpm_complete(PMSG_THAW);
+
+       thaw_kernel_threads();
+
+thaw:
+       platform_end(true);
+
+       unlock_device_hotplug();
+
+       thaw_processes();
+
+exit:
+       __pm_notifier_call_chain(PM_POST_HIBERNATION, nr_calls, NULL);
+
+       pm_restore_console();
+
+       hibernate_release();
+
+unlock:
+       unlock_system_sleep();
+
+       return error;
+}
+EXPORT_SYMBOL_GPL(hibernate_quiet_exec);
 
 /**
  * software_resume - Resume from a saved hibernation image.