Merge branches 'pm-em' and 'pm-core'

author Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Mon, 3 Aug 2020 11:11:39 +0000 (13:11 +0200)

committer Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Mon, 3 Aug 2020 11:11:39 +0000 (13:11 +0200)
author Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Mon, 3 Aug 2020 11:11:39 +0000 (13:11 +0200)
committer Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Mon, 3 Aug 2020 11:11:39 +0000 (13:11 +0200)
diff --git a/Documentation/ABI/testing/sysfs-class-devfreq b/Documentation/ABI/testing/sysfs-class-devfreq

index 9758eb8..deefffb 100644 (file)
--- a/Documentation/ABI/testing/sysfs-class-devfreq
+++ b/Documentation/ABI/testing/sysfs-class-devfreq
@@ -108,3 +108,15 @@ Description:
                 frequency requested by governors and min_freq.
                 The max_freq overrides min_freq because max_freq may be
                 used to throttle devices to avoid overheating.
+
+What:          /sys/class/devfreq/.../timer
+Date:          July 2020
+Contact:       Chanwoo Choi <cw00.choi@samsung.com>
+Description:
+               This ABI shows and stores the kind of work timer by users.
+               This work timer is used by devfreq workqueue in order to
+               monitor the device status such as utilization. The user
+               can change the work timer on runtime according to their demand
+               as following:
+                       echo deferrable > /sys/class/devfreq/.../timer
+                       echo delayed > /sys/class/devfreq/.../timer
diff --git a/Documentation/devicetree/bindings/devfreq/rk3399_dmc.txt b/Documentation/devicetree/bindings/devfreq/rk3399_dmc.txt

index 0ec6814..a10d1f6 100644 (file)
--- a/Documentation/devicetree/bindings/devfreq/rk3399_dmc.txt
+++ b/Documentation/devicetree/bindings/devfreq/rk3399_dmc.txt
@@ -18,6 +18,8 @@ Optional properties:
                          format depends on the interrupt controller.
                          It should be a DCF interrupt. When DDR DVFS finishes
                          a DCF interrupt is triggered.
+- rockchip,pmu:                 Phandle to the syscon managing the "PMU general register
+                        files".
  
  Following properties relate to DDR timing:
  
diff --git a/Documentation/power/energy-model.rst b/Documentation/power/energy-model.rst

index 90a345d..a6fb986 100644 (file)
--- a/Documentation/power/energy-model.rst
+++ b/Documentation/power/energy-model.rst
@@ -1,15 +1,17 @@
-====================
-Energy Model of CPUs
-====================
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================
+Energy Model of devices
+=======================
  
  1. Overview
  -----------
  
  The Energy Model (EM) framework serves as an interface between drivers knowing
-the power consumed by CPUs at various performance levels, and the kernel
+the power consumed by devices at various performance levels, and the kernel
  subsystems willing to use that information to make energy-aware decisions.
  
-The source of the information about the power consumed by CPUs can vary greatly
+The source of the information about the power consumed by devices can vary greatly
  from one platform to another. These power costs can be estimated using
  devicetree data in some cases. In others, the firmware will know better.
  Alternatively, userspace might be best positioned. And so on. In order to avoid
@@ -25,7 +27,7 @@ framework, and interested clients reading the data from it::
         +---------------+  +-----------------+  +---------------+
         | Thermal (IPA) |  | Scheduler (EAS) |  |     Other     |
         +---------------+  +-----------------+  +---------------+
-               |                   | em_pd_energy()    |
+               |                   | em_cpu_energy()   |
                 |                   | em_cpu_get()      |
                 +---------+         |         +---------+
                           |         |         |
@@ -35,7 +37,7 @@ framework, and interested clients reading the data from it::
                          |     Framework       |
                          +---------------------+
                             ^       ^       ^
-                           |       |       | em_register_perf_domain()
+                           |       |       | em_dev_register_perf_domain()
                  +----------+       |       +---------+
                  |                  |                 |
          +---------------+  +---------------+  +--------------+
@@ -47,12 +49,12 @@ framework, and interested clients reading the data from it::
          | Device Tree  |   |   Firmware    |  |      ?       |
          +--------------+   +---------------+  +--------------+
  
-The EM framework manages power cost tables per 'performance domain' in the
-system. A performance domain is a group of CPUs whose performance is scaled
-together. Performance domains generally have a 1-to-1 mapping with CPUFreq
-policies. All CPUs in a performance domain are required to have the same
-micro-architecture. CPUs in different performance domains can have different
-micro-architectures.
+In case of CPU devices the EM framework manages power cost tables per
+'performance domain' in the system. A performance domain is a group of CPUs
+whose performance is scaled together. Performance domains generally have a
+1-to-1 mapping with CPUFreq policies. All CPUs in a performance domain are
+required to have the same micro-architecture. CPUs in different performance
+domains can have different micro-architectures.
  
  
  2. Core APIs
@@ -70,14 +72,16 @@ CONFIG_ENERGY_MODEL must be enabled to use the EM framework.
  Drivers are expected to register performance domains into the EM framework by
  calling the following API::
  
-  int em_register_perf_domain(cpumask_t *span, unsigned int nr_states,
-                             struct em_data_callback *cb);
+  int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
+               struct em_data_callback *cb, cpumask_t *cpus);
  
-Drivers must specify the CPUs of the performance domains using the cpumask
-argument, and provide a callback function returning <frequency, power> tuples
-for each capacity state. The callback function provided by the driver is free
+Drivers must provide a callback function returning <frequency, power> tuples
+for each performance state. The callback function provided by the driver is free
  to fetch data from any relevant location (DT, firmware, ...), and by any mean
-deemed necessary. See Section 3. for an example of driver implementing this
+deemed necessary. Only for CPU devices, drivers must specify the CPUs of the
+performance domains using cpumask. For other devices than CPUs the last
+argument must be set to NULL.
+See Section 3. for an example of driver implementing this
  callback, and kernel/power/energy_model.c for further documentation on this
  API.
  
@@ -85,13 +89,20 @@ API.
  2.3 Accessing performance domains
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  
+There are two API functions which provide the access to the energy model:
+em_cpu_get() which takes CPU id as an argument and em_pd_get() with device
+pointer as an argument. It depends on the subsystem which interface it is
+going to use, but in case of CPU devices both functions return the same
+performance domain.
+
  Subsystems interested in the energy model of a CPU can retrieve it using the
  em_cpu_get() API. The energy model tables are allocated once upon creation of
  the performance domains, and kept in memory untouched.
  
  The energy consumed by a performance domain can be estimated using the
-em_pd_energy() API. The estimation is performed assuming that the schedutil
-CPUfreq governor is in use.
+em_cpu_energy() API. The estimation is performed assuming that the schedutil
+CPUfreq governor is in use in case of CPU device. Currently this calculation is
+not provided for other type of devices.
  
  More details about the above APIs can be found in include/linux/energy_model.h.
  
@@ -106,42 +117,46 @@ EM framework::
  
    -> drivers/cpufreq/foo_cpufreq.c
  
-  01   static int est_power(unsigned long *mW, unsigned long *KHz, int cpu)
-  02   {
-  03           long freq, power;
-  04
-  05           /* Use the 'foo' protocol to ceil the frequency */
-  06           freq = foo_get_freq_ceil(cpu, *KHz);
-  07           if (freq < 0);
-  08                   return freq;
-  09
-  10           /* Estimate the power cost for the CPU at the relevant freq. */
-  11           power = foo_estimate_power(cpu, freq);
-  12           if (power < 0);
-  13                   return power;
-  14
-  15           /* Return the values to the EM framework */
-  16           *mW = power;
-  17           *KHz = freq;
-  18
-  19           return 0;
-  20   }
-  21
-  22   static int foo_cpufreq_init(struct cpufreq_policy *policy)
-  23   {
-  24           struct em_data_callback em_cb = EM_DATA_CB(est_power);
-  25           int nr_opp, ret;
-  26
-  27           /* Do the actual CPUFreq init work ... */
-  28           ret = do_foo_cpufreq_init(policy);
-  29           if (ret)
-  30                   return ret;
-  31
-  32           /* Find the number of OPPs for this policy */
-  33           nr_opp = foo_get_nr_opp(policy);
-  34
-  35           /* And register the new performance domain */
-  36           em_register_perf_domain(policy->cpus, nr_opp, &em_cb);
-  37
-  38           return 0;
-  39   }
+  01   static int est_power(unsigned long *mW, unsigned long *KHz,
+  02                   struct device *dev)
+  03   {
+  04           long freq, power;
+  05
+  06           /* Use the 'foo' protocol to ceil the frequency */
+  07           freq = foo_get_freq_ceil(dev, *KHz);
+  08           if (freq < 0);
+  09                   return freq;
+  10
+  11           /* Estimate the power cost for the dev at the relevant freq. */
+  12           power = foo_estimate_power(dev, freq);
+  13           if (power < 0);
+  14                   return power;
+  15
+  16           /* Return the values to the EM framework */
+  17           *mW = power;
+  18           *KHz = freq;
+  19
+  20           return 0;
+  21   }
+  22
+  23   static int foo_cpufreq_init(struct cpufreq_policy *policy)
+  24   {
+  25           struct em_data_callback em_cb = EM_DATA_CB(est_power);
+  26           struct device *cpu_dev;
+  27           int nr_opp, ret;
+  28
+  29           cpu_dev = get_cpu_device(cpumask_first(policy->cpus));
+  30
+  31           /* Do the actual CPUFreq init work ... */
+  32           ret = do_foo_cpufreq_init(policy);
+  33           if (ret)
+  34                   return ret;
+  35
+  36           /* Find the number of OPPs for this policy */
+  37           nr_opp = foo_get_nr_opp(policy);
+  38
+  39           /* And register the new performance domain */
+  40           em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus);
+  41
+  42           return 0;
+  43   }
diff --git a/MAINTAINERS b/MAINTAINERS

index f0569cf..85d7fb9 100644 (file)
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11097,6 +11097,15 @@ F:     Documentation/core-api/boot-time-mm.rst
  F:     include/linux/memblock.h
  F:     mm/memblock.c
  
+MEMORY FREQUENCY SCALING DRIVERS FOR NVIDIA TEGRA
+M:     Dmitry Osipenko <digetx@gmail.com>
+L:     linux-pm@vger.kernel.org
+L:     linux-tegra@vger.kernel.org
+T:     git git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git
+S:     Maintained
+F:     drivers/devfreq/tegra20-devfreq.c
+F:     drivers/devfreq/tegra30-devfreq.c
+
  MEMORY MANAGEMENT
  M:     Andrew Morton <akpm@linux-foundation.org>
  L:     linux-mm@kvack.org
diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c

index 79742bb..944d7b4 100644 (file)
--- a/drivers/cpufreq/cpufreq-dt.c
+++ b/drivers/cpufreq/cpufreq-dt.c
@@ -279,7 +279,7 @@ static int cpufreq_init(struct cpufreq_policy *policy)
         policy->cpuinfo.transition_latency = transition_latency;
         policy->dvfs_possible_from_any_cpu = true;
  
-       dev_pm_opp_of_register_em(policy->cpus);
+       dev_pm_opp_of_register_em(cpu_dev, policy->cpus);
  
         return 0;
  
diff --git a/drivers/cpufreq/imx6q-cpufreq.c b/drivers/cpufreq/imx6q-cpufreq.c

index fdb2fff..ef7b34c 100644 (file)
--- a/drivers/cpufreq/imx6q-cpufreq.c
+++ b/drivers/cpufreq/imx6q-cpufreq.c
@@ -193,7 +193,7 @@ static int imx6q_cpufreq_init(struct cpufreq_policy *policy)
         policy->clk = clks[ARM].clk;
         cpufreq_generic_init(policy, freq_table, transition_latency);
         policy->suspend_freq = max_freq;
-       dev_pm_opp_of_register_em(policy->cpus);
+       dev_pm_opp_of_register_em(cpu_dev, policy->cpus);
  
         return 0;
  }
diff --git a/drivers/cpufreq/mediatek-cpufreq.c b/drivers/cpufreq/mediatek-cpufreq.c

index 0c98dd0..7d1212c 100644 (file)
--- a/drivers/cpufreq/mediatek-cpufreq.c
+++ b/drivers/cpufreq/mediatek-cpufreq.c
@@ -448,7 +448,7 @@ static int mtk_cpufreq_init(struct cpufreq_policy *policy)
         policy->driver_data = info;
         policy->clk = info->cpu_clk;
  
-       dev_pm_opp_of_register_em(policy->cpus);
+       dev_pm_opp_of_register_em(info->cpu_dev, policy->cpus);
  
         return 0;
  }
diff --git a/drivers/cpufreq/omap-cpufreq.c b/drivers/cpufreq/omap-cpufreq.c

index 8d14b42..3694bb0 100644 (file)
--- a/drivers/cpufreq/omap-cpufreq.c
+++ b/drivers/cpufreq/omap-cpufreq.c
@@ -131,7 +131,7 @@ static int omap_cpu_init(struct cpufreq_policy *policy)
  
         /* FIXME: what's the actual transition time? */
         cpufreq_generic_init(policy, freq_table, 300 * 1000);
-       dev_pm_opp_of_register_em(policy->cpus);
+       dev_pm_opp_of_register_em(mpu_dev, policy->cpus);
  
         return 0;
  }
diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c

index fc92a88..0a04b6f 100644 (file)
--- a/drivers/cpufreq/qcom-cpufreq-hw.c
+++ b/drivers/cpufreq/qcom-cpufreq-hw.c
@@ -238,7 +238,7 @@ static int qcom_cpufreq_hw_cpu_init(struct cpufreq_policy *policy)
                 goto error;
         }
  
-       dev_pm_opp_of_register_em(policy->cpus);
+       dev_pm_opp_of_register_em(cpu_dev, policy->cpus);
  
         policy->fast_switch_possible = true;
  
diff --git a/drivers/cpufreq/scmi-cpufreq.c b/drivers/cpufreq/scmi-cpufreq.c

index 61623e2..11ee24e 100644 (file)
--- a/drivers/cpufreq/scmi-cpufreq.c
+++ b/drivers/cpufreq/scmi-cpufreq.c
@@ -103,17 +103,12 @@ scmi_get_sharing_cpus(struct device *cpu_dev, struct cpumask *cpumask)
  }
  
  static int __maybe_unused
-scmi_get_cpu_power(unsigned long *power, unsigned long *KHz, int cpu)
+scmi_get_cpu_power(unsigned long *power, unsigned long *KHz,
+                  struct device *cpu_dev)
  {
-       struct device *cpu_dev = get_cpu_device(cpu);
         unsigned long Hz;
         int ret, domain;
  
-       if (!cpu_dev) {
-               pr_err("failed to get cpu%d device\n", cpu);
-               return -ENODEV;
-       }
-
         domain = handle->perf_ops->device_domain_id(cpu_dev);
         if (domain < 0)
                 return domain;
@@ -200,7 +195,7 @@ static int scmi_cpufreq_init(struct cpufreq_policy *policy)
  
         policy->fast_switch_possible = true;
  
-       em_register_perf_domain(policy->cpus, nr_opp, &em_cb);
+       em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus);
  
         return 0;
  
diff --git a/drivers/cpufreq/scpi-cpufreq.c b/drivers/cpufreq/scpi-cpufreq.c

index 20d1f85..b0f5388 100644 (file)
--- a/drivers/cpufreq/scpi-cpufreq.c
+++ b/drivers/cpufreq/scpi-cpufreq.c
@@ -167,7 +167,7 @@ static int scpi_cpufreq_init(struct cpufreq_policy *policy)
  
         policy->fast_switch_possible = false;
  
-       dev_pm_opp_of_register_em(policy->cpus);
+       dev_pm_opp_of_register_em(cpu_dev, policy->cpus);
  
         return 0;
  
diff --git a/drivers/cpufreq/vexpress-spc-cpufreq.c b/drivers/cpufreq/vexpress-spc-cpufreq.c

index 83c85d3..4e8b1de 100644 (file)
--- a/drivers/cpufreq/vexpress-spc-cpufreq.c
+++ b/drivers/cpufreq/vexpress-spc-cpufreq.c
@@ -450,7 +450,7 @@ static int ve_spc_cpufreq_init(struct cpufreq_policy *policy)
         policy->freq_table = freq_table[cur_cluster];
         policy->cpuinfo.transition_latency = 1000000; /* 1 ms */
  
-       dev_pm_opp_of_register_em(policy->cpus);
+       dev_pm_opp_of_register_em(cpu_dev, policy->cpus);
  
         if (is_bL_switching_enabled())
                 per_cpu(cpu_last_req_freq, policy->cpu) =
diff --git a/drivers/devfreq/devfreq-event.c b/drivers/devfreq/devfreq-event.c

index 8c31b0f..56efbeb 100644 (file)
--- a/drivers/devfreq/devfreq-event.c
+++ b/drivers/devfreq/devfreq-event.c
@@ -293,7 +293,7 @@ static void devfreq_event_release_edev(struct device *dev)
  /**
   * devfreq_event_add_edev() - Add new devfreq-event device.
   * @dev                : the device owning the devfreq-event device being created
- * @desc       : the devfreq-event device's decriptor which include essential
+ * @desc       : the devfreq-event device's descriptor which include essential
   *               data for devfreq-event device.
   *
   * Note that this function add new devfreq-event device to devfreq-event class
@@ -385,7 +385,7 @@ static void devm_devfreq_event_release(struct device *dev, void *res)
  /**
   * devm_devfreq_event_add_edev() - Resource-managed devfreq_event_add_edev()
   * @dev                : the device owning the devfreq-event device being created
- * @desc       : the devfreq-event device's decriptor which include essential
+ * @desc       : the devfreq-event device's descriptor which include essential
   *               data for devfreq-event device.
   *
   * Note that this function manages automatically the memory of devfreq-event
diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c

index 52b9c3e..561d91b 100644 (file)
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -49,6 +49,11 @@ static LIST_HEAD(devfreq_governor_list);
  static LIST_HEAD(devfreq_list);
  static DEFINE_MUTEX(devfreq_list_lock);
  
+static const char timer_name[][DEVFREQ_NAME_LEN] = {
+       [DEVFREQ_TIMER_DEFERRABLE] = { "deferrable" },
+       [DEVFREQ_TIMER_DELAYED] = { "delayed" },
+};
+
  /**
   * find_device_devfreq() - find devfreq struct using device pointer
   * @dev:       device pointer used to lookup device devfreq.
@@ -454,7 +459,17 @@ void devfreq_monitor_start(struct devfreq *devfreq)
         if (devfreq->governor->interrupt_driven)
                 return;
  
-       INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor);
+       switch (devfreq->profile->timer) {
+       case DEVFREQ_TIMER_DEFERRABLE:
+               INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor);
+               break;
+       case DEVFREQ_TIMER_DELAYED:
+               INIT_DELAYED_WORK(&devfreq->work, devfreq_monitor);
+               break;
+       default:
+               return;
+       }
+
         if (devfreq->profile->polling_ms)
                 queue_delayed_work(devfreq_wq, &devfreq->work,
                         msecs_to_jiffies(devfreq->profile->polling_ms));
@@ -771,6 +786,11 @@ struct devfreq *devfreq_add_device(struct device *dev,
         devfreq->data = data;
         devfreq->nb.notifier_call = devfreq_notifier_call;
  
+       if (devfreq->profile->timer < 0
+               || devfreq->profile->timer >= DEVFREQ_TIMER_NUM) {
+               goto err_out;
+       }
+
         if (!devfreq->profile->max_state && !devfreq->profile->freq_table) {
                 mutex_unlock(&devfreq->lock);
                 err = set_freq_table(devfreq);
@@ -1260,18 +1280,20 @@ EXPORT_SYMBOL(devfreq_remove_governor);
  static ssize_t name_show(struct device *dev,
                         struct device_attribute *attr, char *buf)
  {
-       struct devfreq *devfreq = to_devfreq(dev);
-       return sprintf(buf, "%s\n", dev_name(devfreq->dev.parent));
+       struct devfreq *df = to_devfreq(dev);
+       return sprintf(buf, "%s\n", dev_name(df->dev.parent));
  }
  static DEVICE_ATTR_RO(name);
  
  static ssize_t governor_show(struct device *dev,
                              struct device_attribute *attr, char *buf)
  {
-       if (!to_devfreq(dev)->governor)
+       struct devfreq *df = to_devfreq(dev);
+
+       if (!df->governor)
                 return -EINVAL;
  
-       return sprintf(buf, "%s\n", to_devfreq(dev)->governor->name);
+       return sprintf(buf, "%s\n", df->governor->name);
  }
  
  static ssize_t governor_store(struct device *dev, struct device_attribute *attr,
@@ -1282,6 +1304,9 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr,
         char str_governor[DEVFREQ_NAME_LEN + 1];
         const struct devfreq_governor *governor, *prev_governor;
  
+       if (!df->governor)
+               return -EINVAL;
+
         ret = sscanf(buf, "%" __stringify(DEVFREQ_NAME_LEN) "s", str_governor);
         if (ret != 1)
                 return -EINVAL;
@@ -1295,20 +1320,18 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr,
         if (df->governor == governor) {
                 ret = 0;
                 goto out;
-       } else if ((df->governor && df->governor->immutable) ||
-                                       governor->immutable) {
+       } else if (df->governor->immutable || governor->immutable) {
                 ret = -EINVAL;
                 goto out;
         }
  
-       if (df->governor) {
-               ret = df->governor->event_handler(df, DEVFREQ_GOV_STOP, NULL);
-               if (ret) {
-                       dev_warn(dev, "%s: Governor %s not stopped(%d)\n",
-                                __func__, df->governor->name, ret);
-                       goto out;
-               }
+       ret = df->governor->event_handler(df, DEVFREQ_GOV_STOP, NULL);
+       if (ret) {
+               dev_warn(dev, "%s: Governor %s not stopped(%d)\n",
+                        __func__, df->governor->name, ret);
+               goto out;
         }
+
         prev_governor = df->governor;
         df->governor = governor;
         strncpy(df->governor_name, governor->name, DEVFREQ_NAME_LEN);
@@ -1343,13 +1366,16 @@ static ssize_t available_governors_show(struct device *d,
         struct devfreq *df = to_devfreq(d);
         ssize_t count = 0;
  
+       if (!df->governor)
+               return -EINVAL;
+
         mutex_lock(&devfreq_list_lock);
  
         /*
          * The devfreq with immutable governor (e.g., passive) shows
          * only own governor.
          */
-       if (df->governor && df->governor->immutable) {
+       if (df->governor->immutable) {
                 count = scnprintf(&buf[count], DEVFREQ_NAME_LEN,
                                   "%s ", df->governor_name);
         /*
@@ -1383,27 +1409,37 @@ static ssize_t cur_freq_show(struct device *dev, struct device_attribute *attr,
                              char *buf)
  {
         unsigned long freq;
-       struct devfreq *devfreq = to_devfreq(dev);
+       struct devfreq *df = to_devfreq(dev);
  
-       if (devfreq->profile->get_cur_freq &&
-               !devfreq->profile->get_cur_freq(devfreq->dev.parent, &freq))
+       if (!df->profile)
+               return -EINVAL;
+
+       if (df->profile->get_cur_freq &&
+               !df->profile->get_cur_freq(df->dev.parent, &freq))
                 return sprintf(buf, "%lu\n", freq);
  
-       return sprintf(buf, "%lu\n", devfreq->previous_freq);
+       return sprintf(buf, "%lu\n", df->previous_freq);
  }
  static DEVICE_ATTR_RO(cur_freq);
  
  static ssize_t target_freq_show(struct device *dev,
                                 struct device_attribute *attr, char *buf)
  {
-       return sprintf(buf, "%lu\n", to_devfreq(dev)->previous_freq);
+       struct devfreq *df = to_devfreq(dev);
+
+       return sprintf(buf, "%lu\n", df->previous_freq);
  }
  static DEVICE_ATTR_RO(target_freq);
  
  static ssize_t polling_interval_show(struct device *dev,
                                      struct device_attribute *attr, char *buf)
  {
-       return sprintf(buf, "%d\n", to_devfreq(dev)->profile->polling_ms);
+       struct devfreq *df = to_devfreq(dev);
+
+       if (!df->profile)
+               return -EINVAL;
+
+       return sprintf(buf, "%d\n", df->profile->polling_ms);
  }
  
  static ssize_t polling_interval_store(struct device *dev,
@@ -1531,6 +1567,9 @@ static ssize_t available_frequencies_show(struct device *d,
         ssize_t count = 0;
         int i;
  
+       if (!df->profile)
+               return -EINVAL;
+
         mutex_lock(&df->lock);
  
         for (i = 0; i < df->profile->max_state; i++)
@@ -1551,49 +1590,53 @@ static DEVICE_ATTR_RO(available_frequencies);
  static ssize_t trans_stat_show(struct device *dev,
                                struct device_attribute *attr, char *buf)
  {
-       struct devfreq *devfreq = to_devfreq(dev);
+       struct devfreq *df = to_devfreq(dev);
         ssize_t len;
         int i, j;
-       unsigned int max_state = devfreq->profile->max_state;
+       unsigned int max_state;
+
+       if (!df->profile)
+               return -EINVAL;
+       max_state = df->profile->max_state;
  
         if (max_state == 0)
                 return sprintf(buf, "Not Supported.\n");
  
-       mutex_lock(&devfreq->lock);
-       if (!devfreq->stop_polling &&
-                       devfreq_update_status(devfreq, devfreq->previous_freq)) {
-               mutex_unlock(&devfreq->lock);
+       mutex_lock(&df->lock);
+       if (!df->stop_polling &&
+                       devfreq_update_status(df, df->previous_freq)) {
+               mutex_unlock(&df->lock);
                 return 0;
         }
-       mutex_unlock(&devfreq->lock);
+       mutex_unlock(&df->lock);
  
         len = sprintf(buf, "     From  :   To\n");
         len += sprintf(buf + len, "           :");
         for (i = 0; i < max_state; i++)
                 len += sprintf(buf + len, "%10lu",
-                               devfreq->profile->freq_table[i]);
+                               df->profile->freq_table[i]);
  
         len += sprintf(buf + len, "   time(ms)\n");
  
         for (i = 0; i < max_state; i++) {
-               if (devfreq->profile->freq_table[i]
-                                       == devfreq->previous_freq) {
+               if (df->profile->freq_table[i]
+                                       == df->previous_freq) {
                         len += sprintf(buf + len, "*");
                 } else {
                         len += sprintf(buf + len, " ");
                 }
                 len += sprintf(buf + len, "%10lu:",
-                               devfreq->profile->freq_table[i]);
+                               df->profile->freq_table[i]);
                 for (j = 0; j < max_state; j++)
                         len += sprintf(buf + len, "%10u",
-                               devfreq->stats.trans_table[(i * max_state) + j]);
+                               df->stats.trans_table[(i * max_state) + j]);
  
                 len += sprintf(buf + len, "%10llu\n", (u64)
-                       jiffies64_to_msecs(devfreq->stats.time_in_state[i]));
+                       jiffies64_to_msecs(df->stats.time_in_state[i]));
         }
  
         len += sprintf(buf + len, "Total transition : %u\n",
-                                       devfreq->stats.total_trans);
+                                       df->stats.total_trans);
         return len;
  }
  
@@ -1604,6 +1647,9 @@ static ssize_t trans_stat_store(struct device *dev,
         struct devfreq *df = to_devfreq(dev);
         int err, value;
  
+       if (!df->profile)
+               return -EINVAL;
+
         if (df->profile->max_state == 0)
                 return count;
  
@@ -1625,6 +1671,69 @@ static ssize_t trans_stat_store(struct device *dev,
  }
  static DEVICE_ATTR_RW(trans_stat);
  
+static ssize_t timer_show(struct device *dev,
+                            struct device_attribute *attr, char *buf)
+{
+       struct devfreq *df = to_devfreq(dev);
+
+       if (!df->profile)
+               return -EINVAL;
+
+       return sprintf(buf, "%s\n", timer_name[df->profile->timer]);
+}
+
+static ssize_t timer_store(struct device *dev, struct device_attribute *attr,
+                             const char *buf, size_t count)
+{
+       struct devfreq *df = to_devfreq(dev);
+       char str_timer[DEVFREQ_NAME_LEN + 1];
+       int timer = -1;
+       int ret = 0, i;
+
+       if (!df->governor || !df->profile)
+               return -EINVAL;
+
+       ret = sscanf(buf, "%16s", str_timer);
+       if (ret != 1)
+               return -EINVAL;
+
+       for (i = 0; i < DEVFREQ_TIMER_NUM; i++) {
+               if (!strncmp(timer_name[i], str_timer, DEVFREQ_NAME_LEN)) {
+                       timer = i;
+                       break;
+               }
+       }
+
+       if (timer < 0) {
+               ret = -EINVAL;
+               goto out;
+       }
+
+       if (df->profile->timer == timer) {
+               ret = 0;
+               goto out;
+       }
+
+       mutex_lock(&df->lock);
+       df->profile->timer = timer;
+       mutex_unlock(&df->lock);
+
+       ret = df->governor->event_handler(df, DEVFREQ_GOV_STOP, NULL);
+       if (ret) {
+               dev_warn(dev, "%s: Governor %s not stopped(%d)\n",
+                        __func__, df->governor->name, ret);
+               goto out;
+       }
+
+       ret = df->governor->event_handler(df, DEVFREQ_GOV_START, NULL);
+       if (ret)
+               dev_warn(dev, "%s: Governor %s not started(%d)\n",
+                        __func__, df->governor->name, ret);
+out:
+       return ret ? ret : count;
+}
+static DEVICE_ATTR_RW(timer);
+
  static struct attribute *devfreq_attrs[] = {
         &dev_attr_name.attr,
         &dev_attr_governor.attr,
@@ -1636,6 +1745,7 @@ static struct attribute *devfreq_attrs[] = {
         &dev_attr_min_freq.attr,
         &dev_attr_max_freq.attr,
         &dev_attr_trans_stat.attr,
+       &dev_attr_timer.attr,
         NULL,
  };
  ATTRIBUTE_GROUPS(devfreq);
@@ -1657,8 +1767,7 @@ static int devfreq_summary_show(struct seq_file *s, void *data)
         unsigned long cur_freq, min_freq, max_freq;
         unsigned int polling_ms;
  
-       seq_printf(s, "%-30s %-10s %-10s %-15s %10s %12s %12s %12s\n",
-                       "dev_name",
+       seq_printf(s, "%-30s %-30s %-15s %10s %12s %12s %12s\n",
                         "dev",
                         "parent_dev",
                         "governor",
@@ -1666,10 +1775,9 @@ static int devfreq_summary_show(struct seq_file *s, void *data)
                         "cur_freq_Hz",
                         "min_freq_Hz",
                         "max_freq_Hz");
-       seq_printf(s, "%30s %10s %10s %15s %10s %12s %12s %12s\n",
+       seq_printf(s, "%30s %30s %15s %10s %12s %12s %12s\n",
+                       "------------------------------",
                         "------------------------------",
-                       "----------",
-                       "----------",
                         "---------------",
                         "----------",
                         "------------",
@@ -1692,14 +1800,13 @@ static int devfreq_summary_show(struct seq_file *s, void *data)
  #endif
  
                 mutex_lock(&devfreq->lock);
-               cur_freq = devfreq->previous_freq,
+               cur_freq = devfreq->previous_freq;
                 get_freq_range(devfreq, &min_freq, &max_freq);
-               polling_ms = devfreq->profile->polling_ms,
+               polling_ms = devfreq->profile->polling_ms;
                 mutex_unlock(&devfreq->lock);
  
                 seq_printf(s,
-                       "%-30s %-10s %-10s %-15s %10d %12ld %12ld %12ld\n",
-                       dev_name(devfreq->dev.parent),
+                       "%-30s %-30s %-15s %10d %12ld %12ld %12ld\n",
                         dev_name(&devfreq->dev),
                         p_devfreq ? dev_name(&p_devfreq->dev) : "null",
                         devfreq->governor_name,
diff --git a/drivers/devfreq/rk3399_dmc.c b/drivers/devfreq/rk3399_dmc.c

index 24f04f7..027769e 100644 (file)
--- a/drivers/devfreq/rk3399_dmc.c
+++ b/drivers/devfreq/rk3399_dmc.c
@@ -95,18 +95,20 @@ static int rk3399_dmcfreq_target(struct device *dev, unsigned long *freq,
  
         mutex_lock(&dmcfreq->lock);
  
-       if (target_rate >= dmcfreq->odt_dis_freq)
-               odt_enable = true;
-
-       /*
-        * This makes a SMC call to the TF-A to set the DDR PD (power-down)
-        * timings and to enable or disable the ODT (on-die termination)
-        * resistors.
-        */
-       arm_smccc_smc(ROCKCHIP_SIP_DRAM_FREQ, dmcfreq->odt_pd_arg0,
-                     dmcfreq->odt_pd_arg1,
-                     ROCKCHIP_SIP_CONFIG_DRAM_SET_ODT_PD,
-                     odt_enable, 0, 0, 0, &res);
+       if (dmcfreq->regmap_pmu) {
+               if (target_rate >= dmcfreq->odt_dis_freq)
+                       odt_enable = true;
+
+               /*
+                * This makes a SMC call to the TF-A to set the DDR PD
+                * (power-down) timings and to enable or disable the
+                * ODT (on-die termination) resistors.
+                */
+               arm_smccc_smc(ROCKCHIP_SIP_DRAM_FREQ, dmcfreq->odt_pd_arg0,
+                             dmcfreq->odt_pd_arg1,
+                             ROCKCHIP_SIP_CONFIG_DRAM_SET_ODT_PD,
+                             odt_enable, 0, 0, 0, &res);
+       }
  
         /*
          * If frequency scaling from low to high, adjust voltage first.
@@ -371,13 +373,14 @@ static int rk3399_dmcfreq_probe(struct platform_device *pdev)
         }
  
         node = of_parse_phandle(np, "rockchip,pmu", 0);
-       if (node) {
-               data->regmap_pmu = syscon_node_to_regmap(node);
-               of_node_put(node);
-               if (IS_ERR(data->regmap_pmu)) {
-                       ret = PTR_ERR(data->regmap_pmu);
-                       goto err_edev;
-               }
+       if (!node)
+               goto no_pmu;
+
+       data->regmap_pmu = syscon_node_to_regmap(node);
+       of_node_put(node);
+       if (IS_ERR(data->regmap_pmu)) {
+               ret = PTR_ERR(data->regmap_pmu);
+               goto err_edev;
         }
  
         regmap_read(data->regmap_pmu, RK3399_PMUGRF_OS_REG2, &val);
@@ -399,6 +402,7 @@ static int rk3399_dmcfreq_probe(struct platform_device *pdev)
                 goto err_edev;
         };
  
+no_pmu:
         arm_smccc_smc(ROCKCHIP_SIP_DRAM_FREQ, 0, 0,
                       ROCKCHIP_SIP_CONFIG_DRAM_INIT,
                       0, 0, 0, 0, &res);
diff --git a/drivers/memory/samsung/exynos5422-dmc.c b/drivers/memory/samsung/exynos5422-dmc.c

index 25196d6..53bfe6b 100644 (file)
--- a/drivers/memory/samsung/exynos5422-dmc.c
+++ b/drivers/memory/samsung/exynos5422-dmc.c
@@ -12,6 +12,7 @@
  #include <linux/io.h>
  #include <linux/mfd/syscon.h>
  #include <linux/module.h>
+#include <linux/moduleparam.h>
  #include <linux/of_device.h>
  #include <linux/pm_opp.h>
  #include <linux/platform_device.h>
@@ -21,6 +22,10 @@
  #include "../jedec_ddr.h"
  #include "../of_memory.h"
  
+static int irqmode;
+module_param(irqmode, int, 0644);
+MODULE_PARM_DESC(irqmode, "Enable IRQ mode (0=off [default], 1=on)");
+
  #define EXYNOS5_DREXI_TIMINGAREF               (0x0030)
  #define EXYNOS5_DREXI_TIMINGROW0               (0x0034)
  #define EXYNOS5_DREXI_TIMINGDATA0              (0x0038)
@@ -945,6 +950,7 @@ static int exynos5_dmc_get_cur_freq(struct device *dev, unsigned long *freq)
   * It provides to the devfreq framework needed functions and polling period.
   */
  static struct devfreq_dev_profile exynos5_dmc_df_profile = {
+       .timer = DEVFREQ_TIMER_DELAYED,
         .target = exynos5_dmc_target,
         .get_dev_status = exynos5_dmc_get_status,
         .get_cur_freq = exynos5_dmc_get_cur_freq,
@@ -1427,7 +1433,7 @@ static int exynos5_dmc_probe(struct platform_device *pdev)
         /* There is two modes in which the driver works: polling or IRQ */
         irq[0] = platform_get_irq_byname(pdev, "drex_0");
         irq[1] = platform_get_irq_byname(pdev, "drex_1");
-       if (irq[0] > 0 && irq[1] > 0) {
+       if (irq[0] > 0 && irq[1] > 0 && irqmode) {
                 ret = devm_request_threaded_irq(dev, irq[0], NULL,
                                                 dmc_irq_thread, IRQF_ONESHOT,
                                                 dev_name(dev), dmc);
@@ -1465,10 +1471,10 @@ static int exynos5_dmc_probe(struct platform_device *pdev)
                  * Setup default thresholds for the devfreq governor.
                  * The values are chosen based on experiments.
                  */
-               dmc->gov_data.upthreshold = 30;
+               dmc->gov_data.upthreshold = 10;
                 dmc->gov_data.downdifferential = 5;
  
-               exynos5_dmc_df_profile.polling_ms = 500;
+               exynos5_dmc_df_profile.polling_ms = 100;
         }
  
  
@@ -1484,7 +1490,7 @@ static int exynos5_dmc_probe(struct platform_device *pdev)
         if (dmc->in_irq_mode)
                 exynos5_dmc_start_perf_events(dmc, PERF_COUNTER_START_VALUE);
  
-       dev_info(dev, "DMC initialized\n");
+       dev_info(dev, "DMC initialized, in irq mode: %d\n", dmc->in_irq_mode);
  
         return 0;
  
diff --git a/drivers/opp/core.c b/drivers/opp/core.c

index dfbd3d1..0c8c74a 100644 (file)
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -118,7 +118,7 @@ EXPORT_SYMBOL_GPL(dev_pm_opp_get_voltage);
   */
  unsigned long dev_pm_opp_get_freq(struct dev_pm_opp *opp)
  {
-       if (IS_ERR_OR_NULL(opp) || !opp->available) {
+       if (IS_ERR_OR_NULL(opp)) {
                 pr_err("%s: Invalid parameters\n", __func__);
                 return 0;
         }
@@ -2271,6 +2271,7 @@ adjust_put_table:
         dev_pm_opp_put_opp_table(opp_table);
         return r;
  }
+EXPORT_SYMBOL_GPL(dev_pm_opp_adjust_voltage);
  
  /**
   * dev_pm_opp_enable() - Enable a specific OPP
diff --git a/drivers/opp/of.c b/drivers/opp/of.c

index 314f306..0430290 100644 (file)
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -1209,20 +1209,19 @@ EXPORT_SYMBOL_GPL(dev_pm_opp_get_of_node);
  
  /*
   * Callback function provided to the Energy Model framework upon registration.
- * This computes the power estimated by @CPU at @kHz if it is the frequency
+ * This computes the power estimated by @dev at @kHz if it is the frequency
   * of an existing OPP, or at the frequency of the first OPP above @kHz otherwise
   * (see dev_pm_opp_find_freq_ceil()). This function updates @kHz to the ceiled
   * frequency and @mW to the associated power. The power is estimated as
- * P = C * V^2 * f with C being the CPU's capacitance and V and f respectively
- * the voltage and frequency of the OPP.
+ * P = C * V^2 * f with C being the device's capacitance and V and f
+ * respectively the voltage and frequency of the OPP.
   *
- * Returns -ENODEV if the CPU device cannot be found, -EINVAL if the power
- * calculation failed because of missing parameters, 0 otherwise.
+ * Returns -EINVAL if the power calculation failed because of missing
+ * parameters, 0 otherwise.
   */
-static int __maybe_unused _get_cpu_power(unsigned long *mW, unsigned long *kHz,
-                                        int cpu)
+static int __maybe_unused _get_power(unsigned long *mW, unsigned long *kHz,
+                                    struct device *dev)
  {
-       struct device *cpu_dev;
         struct dev_pm_opp *opp;
         struct device_node *np;
         unsigned long mV, Hz;
@@ -1230,11 +1229,7 @@ static int __maybe_unused _get_cpu_power(unsigned long *mW, unsigned long *kHz,
         u64 tmp;
         int ret;
  
-       cpu_dev = get_cpu_device(cpu);
-       if (!cpu_dev)
-               return -ENODEV;
-
-       np = of_node_get(cpu_dev->of_node);
+       np = of_node_get(dev->of_node);
         if (!np)
                 return -EINVAL;
  
@@ -1244,7 +1239,7 @@ static int __maybe_unused _get_cpu_power(unsigned long *mW, unsigned long *kHz,
                 return -EINVAL;
  
         Hz = *kHz * 1000;
-       opp = dev_pm_opp_find_freq_ceil(cpu_dev, &Hz);
+       opp = dev_pm_opp_find_freq_ceil(dev, &Hz);
         if (IS_ERR(opp))
                 return -EINVAL;
  
@@ -1264,30 +1259,38 @@ static int __maybe_unused _get_cpu_power(unsigned long *mW, unsigned long *kHz,
  
  /**
   * dev_pm_opp_of_register_em() - Attempt to register an Energy Model
- * @cpus       : CPUs for which an Energy Model has to be registered
+ * @dev                : Device for which an Energy Model has to be registered
+ * @cpus       : CPUs for which an Energy Model has to be registered. For
+ *             other type of devices it should be set to NULL.
   *
   * This checks whether the "dynamic-power-coefficient" devicetree property has
   * been specified, and tries to register an Energy Model with it if it has.
+ * Having this property means the voltages are known for OPPs and the EM
+ * might be calculated.
   */
-void dev_pm_opp_of_register_em(struct cpumask *cpus)
+int dev_pm_opp_of_register_em(struct device *dev, struct cpumask *cpus)
  {
-       struct em_data_callback em_cb = EM_DATA_CB(_get_cpu_power);
-       int ret, nr_opp, cpu = cpumask_first(cpus);
-       struct device *cpu_dev;
+       struct em_data_callback em_cb = EM_DATA_CB(_get_power);
         struct device_node *np;
+       int ret, nr_opp;
         u32 cap;
  
-       cpu_dev = get_cpu_device(cpu);
-       if (!cpu_dev)
-               return;
+       if (IS_ERR_OR_NULL(dev)) {
+               ret = -EINVAL;
+               goto failed;
+       }
  
-       nr_opp = dev_pm_opp_get_opp_count(cpu_dev);
-       if (nr_opp <= 0)
-               return;
+       nr_opp = dev_pm_opp_get_opp_count(dev);
+       if (nr_opp <= 0) {
+               ret = -EINVAL;
+               goto failed;
+       }
  
-       np = of_node_get(cpu_dev->of_node);
-       if (!np)
-               return;
+       np = of_node_get(dev->of_node);
+       if (!np) {
+               ret = -EINVAL;
+               goto failed;
+       }
  
         /*
          * Register an EM only if the 'dynamic-power-coefficient' property is
@@ -1298,9 +1301,20 @@ void dev_pm_opp_of_register_em(struct cpumask *cpus)
          */
         ret = of_property_read_u32(np, "dynamic-power-coefficient", &cap);
         of_node_put(np);
-       if (ret || !cap)
-               return;
+       if (ret || !cap) {
+               dev_dbg(dev, "Couldn't find proper 'dynamic-power-coefficient' in DT\n");
+               ret = -EINVAL;
+               goto failed;
+       }
+
+       ret = em_dev_register_perf_domain(dev, nr_opp, &em_cb, cpus);
+       if (ret)
+               goto failed;
  
-       em_register_perf_domain(cpus, nr_opp, &em_cb);
+       return 0;
+
+failed:
+       dev_dbg(dev, "Couldn't register Energy Model %d\n", ret);
+       return ret;
  }
  EXPORT_SYMBOL_GPL(dev_pm_opp_of_register_em);
diff --git a/drivers/opp/ti-opp-supply.c b/drivers/opp/ti-opp-supply.c

index e3357e9..bd4771f 100644 (file)
--- a/drivers/opp/ti-opp-supply.c
+++ b/drivers/opp/ti-opp-supply.c
@@ -1,6 +1,6 @@
  // SPDX-License-Identifier: GPL-2.0
  /*
- * Copyright (C) 2016-2017 Texas Instruments Incorporated - http://www.ti.com/
+ * Copyright (C) 2016-2017 Texas Instruments Incorporated - https://www.ti.com/
   *     Nishanth Menon <nm@ti.com>
   *     Dave Gerlach <d-gerlach@ti.com>
   *
diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c

index 6c0e1b0..6cf23a5 100644 (file)
--- a/drivers/thermal/cpufreq_cooling.c
+++ b/drivers/thermal/cpufreq_cooling.c
@@ -333,18 +333,18 @@ static inline bool em_is_sane(struct cpufreq_cooling_device *cpufreq_cdev,
                 return false;
  
         policy = cpufreq_cdev->policy;
-       if (!cpumask_equal(policy->related_cpus, to_cpumask(em->cpus))) {
+       if (!cpumask_equal(policy->related_cpus, em_span_cpus(em))) {
                 pr_err("The span of pd %*pbl is misaligned with cpufreq policy %*pbl\n",
-                       cpumask_pr_args(to_cpumask(em->cpus)),
+                       cpumask_pr_args(em_span_cpus(em)),
                         cpumask_pr_args(policy->related_cpus));
                 return false;
         }
  
         nr_levels = cpufreq_cdev->max_level + 1;
-       if (em->nr_cap_states != nr_levels) {
-               pr_err("The number of cap states in pd %*pbl (%u) doesn't match the number of cooling levels (%u)\n",
-                       cpumask_pr_args(to_cpumask(em->cpus)),
-                       em->nr_cap_states, nr_levels);
+       if (em_pd_nr_perf_states(em) != nr_levels) {
+               pr_err("The number of performance states in pd %*pbl (%u) doesn't match the number of cooling levels (%u)\n",
+                       cpumask_pr_args(em_span_cpus(em)),
+                       em_pd_nr_perf_states(em), nr_levels);
                 return false;
         }
  
diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h

index 57e871a..12782fb 100644 (file)
--- a/include/linux/devfreq.h
+++ b/include/linux/devfreq.h
@@ -31,6 +31,13 @@
  #define        DEVFREQ_PRECHANGE               (0)
  #define DEVFREQ_POSTCHANGE             (1)
  
+/* DEVFREQ work timers */
+enum devfreq_timer {
+       DEVFREQ_TIMER_DEFERRABLE = 0,
+       DEVFREQ_TIMER_DELAYED,
+       DEVFREQ_TIMER_NUM,
+};
+
  struct devfreq;
  struct devfreq_governor;
  
@@ -70,6 +77,7 @@ struct devfreq_dev_status {
   * @initial_freq:      The operating frequency when devfreq_add_device() is
   *                     called.
   * @polling_ms:                The polling interval in ms. 0 disables polling.
+ * @timer:             Timer type is either deferrable or delayed timer.
   * @target:            The device should set its operating frequency at
   *                     freq or lowest-upper-than-freq value. If freq is
   *                     higher than any operable frequency, set maximum.
@@ -96,6 +104,7 @@ struct devfreq_dev_status {
  struct devfreq_dev_profile {
         unsigned long initial_freq;
         unsigned int polling_ms;
+       enum devfreq_timer timer;
  
         int (*target)(struct device *dev, unsigned long *freq, u32 flags);
         int (*get_dev_status)(struct device *dev,
diff --git a/include/linux/device.h b/include/linux/device.h

index 5efed86..4e2e9d3 100644 (file)
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -13,6 +13,7 @@
  #define _DEVICE_H_
  
  #include <linux/dev_printk.h>
+#include <linux/energy_model.h>
  #include <linux/ioport.h>
  #include <linux/kobject.h>
  #include <linux/klist.h>
@@ -560,6 +561,10 @@ struct device {
         struct dev_pm_info      power;
         struct dev_pm_domain    *pm_domain;
  
+#ifdef CONFIG_ENERGY_MODEL
+       struct em_perf_domain   *em_pd;
+#endif
+
  #ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
         struct irq_domain       *msi_domain;
  #endif
diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h

index ade6486..b67a51c 100644 (file)
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -2,6 +2,7 @@
  #ifndef _LINUX_ENERGY_MODEL_H
  #define _LINUX_ENERGY_MODEL_H
  #include <linux/cpumask.h>
+#include <linux/device.h>
  #include <linux/jump_label.h>
  #include <linux/kobject.h>
  #include <linux/rcupdate.h>
@@ -10,13 +11,15 @@
  #include <linux/types.h>
  
  /**
- * em_cap_state - Capacity state of a performance domain
- * @frequency: The CPU frequency in KHz, for consistency with CPUFreq
- * @power:     The power consumed by 1 CPU at this level, in milli-watts
+ * em_perf_state - Performance state of a performance domain
+ * @frequency: The frequency in KHz, for consistency with CPUFreq
+ * @power:     The power consumed at this level, in milli-watts (by 1 CPU or
+               by a registered device). It can be a total power: static and
+               dynamic.
   * @cost:      The cost coefficient associated with this level, used during
   *             energy calculation. Equal to: power * max_frequency / frequency
   */
-struct em_cap_state {
+struct em_perf_state {
         unsigned long frequency;
         unsigned long power;
         unsigned long cost;
@@ -24,102 +27,119 @@ struct em_cap_state {
  
  /**
   * em_perf_domain - Performance domain
- * @table:             List of capacity states, in ascending order
- * @nr_cap_states:     Number of capacity states
- * @cpus:              Cpumask covering the CPUs of the domain
+ * @table:             List of performance states, in ascending order
+ * @nr_perf_states:    Number of performance states
+ * @cpus:              Cpumask covering the CPUs of the domain. It's here
+ *                     for performance reasons to avoid potential cache
+ *                     misses during energy calculations in the scheduler
+ *                     and simplifies allocating/freeing that memory region.
   *
- * A "performance domain" represents a group of CPUs whose performance is
- * scaled together. All CPUs of a performance domain must have the same
- * micro-architecture. Performance domains often have a 1-to-1 mapping with
- * CPUFreq policies.
+ * In case of CPU device, a "performance domain" represents a group of CPUs
+ * whose performance is scaled together. All CPUs of a performance domain
+ * must have the same micro-architecture. Performance domains often have
+ * a 1-to-1 mapping with CPUFreq policies. In case of other devices the @cpus
+ * field is unused.
   */
  struct em_perf_domain {
-       struct em_cap_state *table;
-       int nr_cap_states;
+       struct em_perf_state *table;
+       int nr_perf_states;
         unsigned long cpus[];
  };
  
+#define em_span_cpus(em) (to_cpumask((em)->cpus))
+
  #ifdef CONFIG_ENERGY_MODEL
-#define EM_CPU_MAX_POWER 0xFFFF
+#define EM_MAX_POWER 0xFFFF
  
  struct em_data_callback {
         /**
-        * active_power() - Provide power at the next capacity state of a CPU
-        * @power       : Active power at the capacity state in mW (modified)
-        * @freq        : Frequency at the capacity state in kHz (modified)
-        * @cpu         : CPU for which we do this operation
+        * active_power() - Provide power at the next performance state of
+        *              a device
+        * @power       : Active power at the performance state in mW
+        *              (modified)
+        * @freq        : Frequency at the performance state in kHz
+        *              (modified)
+        * @dev         : Device for which we do this operation (can be a CPU)
          *
-        * active_power() must find the lowest capacity state of 'cpu' above
+        * active_power() must find the lowest performance state of 'dev' above
          * 'freq' and update 'power' and 'freq' to the matching active power
          * and frequency.
          *
-        * The power is the one of a single CPU in the domain, expressed in
-        * milli-watts. It is expected to fit in the [0, EM_CPU_MAX_POWER]
-        * range.
+        * In case of CPUs, the power is the one of a single CPU in the domain,
+        * expressed in milli-watts. It is expected to fit in the
+        * [0, EM_MAX_POWER] range.
          *
          * Return 0 on success.
          */
-       int (*active_power)(unsigned long *power, unsigned long *freq, int cpu);
+       int (*active_power)(unsigned long *power, unsigned long *freq,
+                           struct device *dev);
  };
  #define EM_DATA_CB(_active_power_cb) { .active_power = &_active_power_cb }
  
  struct em_perf_domain *em_cpu_get(int cpu);
-int em_register_perf_domain(cpumask_t *span, unsigned int nr_states,
-                                               struct em_data_callback *cb);
+struct em_perf_domain *em_pd_get(struct device *dev);
+int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
+                               struct em_data_callback *cb, cpumask_t *span);
+void em_dev_unregister_perf_domain(struct device *dev);
  
  /**
- * em_pd_energy() - Estimates the energy consumed by the CPUs of a perf. domain
+ * em_cpu_energy() - Estimates the energy consumed by the CPUs of a
+               performance domain
   * @pd         : performance domain for which energy has to be estimated
   * @max_util   : highest utilization among CPUs of the domain
   * @sum_util   : sum of the utilization of all CPUs in the domain
   *
+ * This function must be used only for CPU devices. There is no validation,
+ * i.e. if the EM is a CPU type and has cpumask allocated. It is called from
+ * the scheduler code quite frequently and that is why there is not checks.
+ *
   * Return: the sum of the energy consumed by the CPUs of the domain assuming
   * a capacity state satisfying the max utilization of the domain.
   */
-static inline unsigned long em_pd_energy(struct em_perf_domain *pd,
+static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
                                 unsigned long max_util, unsigned long sum_util)
  {
         unsigned long freq, scale_cpu;
-       struct em_cap_state *cs;
+       struct em_perf_state *ps;
         int i, cpu;
  
         /*
-        * In order to predict the capacity state, map the utilization of the
-        * most utilized CPU of the performance domain to a requested frequency,
-        * like schedutil.
+        * In order to predict the performance state, map the utilization of
+        * the most utilized CPU of the performance domain to a requested
+        * frequency, like schedutil.
          */
         cpu = cpumask_first(to_cpumask(pd->cpus));
         scale_cpu = arch_scale_cpu_capacity(cpu);
-       cs = &pd->table[pd->nr_cap_states - 1];
-       freq = map_util_freq(max_util, cs->frequency, scale_cpu);
+       ps = &pd->table[pd->nr_perf_states - 1];
+       freq = map_util_freq(max_util, ps->frequency, scale_cpu);
  
         /*
-        * Find the lowest capacity state of the Energy Model above the
+        * Find the lowest performance state of the Energy Model above the
          * requested frequency.
          */
-       for (i = 0; i < pd->nr_cap_states; i++) {
-               cs = &pd->table[i];
-               if (cs->frequency >= freq)
+       for (i = 0; i < pd->nr_perf_states; i++) {
+               ps = &pd->table[i];
+               if (ps->frequency >= freq)
                         break;
         }
  
         /*
-        * The capacity of a CPU in the domain at that capacity state (cs)
+        * The capacity of a CPU in the domain at the performance state (ps)
          * can be computed as:
          *
-        *             cs->freq * scale_cpu
-        *   cs->cap = --------------------                          (1)
+        *             ps->freq * scale_cpu
+        *   ps->cap = --------------------                          (1)
          *                 cpu_max_freq
          *
          * So, ignoring the costs of idle states (which are not available in
-        * the EM), the energy consumed by this CPU at that capacity state is
-        * estimated as:
+        * the EM), the energy consumed by this CPU at that performance state
+        * is estimated as:
          *
-        *             cs->power * cpu_util
+        *             ps->power * cpu_util
          *   cpu_nrg = --------------------                          (2)
-        *                   cs->cap
+        *                   ps->cap
          *
-        * since 'cpu_util / cs->cap' represents its percentage of busy time.
+        * since 'cpu_util / ps->cap' represents its percentage of busy time.
          *
          *   NOTE: Although the result of this computation actually is in
          *         units of power, it can be manipulated as an energy value
@@ -129,55 +149,64 @@ static inline unsigned long em_pd_energy(struct em_perf_domain *pd,
          * By injecting (1) in (2), 'cpu_nrg' can be re-expressed as a product
          * of two terms:
          *
-        *             cs->power * cpu_max_freq   cpu_util
+        *             ps->power * cpu_max_freq   cpu_util
          *   cpu_nrg = ------------------------ * ---------          (3)
-        *                    cs->freq            scale_cpu
+        *                    ps->freq            scale_cpu
          *
-        * The first term is static, and is stored in the em_cap_state struct
-        * as 'cs->cost'.
+        * The first term is static, and is stored in the em_perf_state struct
+        * as 'ps->cost'.
          *
          * Since all CPUs of the domain have the same micro-architecture, they
-        * share the same 'cs->cost', and the same CPU capacity. Hence, the
+        * share the same 'ps->cost', and the same CPU capacity. Hence, the
          * total energy of the domain (which is the simple sum of the energy of
          * all of its CPUs) can be factorized as:
          *
-        *            cs->cost * \Sum cpu_util
+        *            ps->cost * \Sum cpu_util
          *   pd_nrg = ------------------------                       (4)
          *                  scale_cpu
          */
-       return cs->cost * sum_util / scale_cpu;
+       return ps->cost * sum_util / scale_cpu;
  }
  
  /**
- * em_pd_nr_cap_states() - Get the number of capacity states of a perf. domain
+ * em_pd_nr_perf_states() - Get the number of performance states of a perf.
+ *                             domain
   * @pd         : performance domain for which this must be done
   *
- * Return: the number of capacity states in the performance domain table
+ * Return: the number of performance states in the performance domain table
   */
-static inline int em_pd_nr_cap_states(struct em_perf_domain *pd)
+static inline int em_pd_nr_perf_states(struct em_perf_domain *pd)
  {
-       return pd->nr_cap_states;
+       return pd->nr_perf_states;
  }
  
  #else
  struct em_data_callback {};
  #define EM_DATA_CB(_active_power_cb) { }
  
-static inline int em_register_perf_domain(cpumask_t *span,
-                       unsigned int nr_states, struct em_data_callback *cb)
+static inline
+int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
+                               struct em_data_callback *cb, cpumask_t *span)
  {
         return -EINVAL;
  }
+static inline void em_dev_unregister_perf_domain(struct device *dev)
+{
+}
  static inline struct em_perf_domain *em_cpu_get(int cpu)
  {
         return NULL;
  }
-static inline unsigned long em_pd_energy(struct em_perf_domain *pd,
+static inline struct em_perf_domain *em_pd_get(struct device *dev)
+{
+       return NULL;
+}
+static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
                         unsigned long max_util, unsigned long sum_util)
  {
         return 0;
  }
-static inline int em_pd_nr_cap_states(struct em_perf_domain *pd)
+static inline int em_pd_nr_perf_states(struct em_perf_domain *pd)
  {
         return 0;
  }
diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h

index d5c4a32..ee34c55 100644 (file)
--- a/include/linux/pm_opp.h
+++ b/include/linux/pm_opp.h
@@ -11,6 +11,7 @@
  #ifndef __LINUX_OPP_H__
  #define __LINUX_OPP_H__
  
+#include <linux/energy_model.h>
  #include <linux/err.h>
  #include <linux/notifier.h>
  
@@ -373,7 +374,11 @@ struct device_node *dev_pm_opp_of_get_opp_desc_node(struct device *dev);
  struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp);
  int of_get_required_opp_performance_state(struct device_node *np, int index);
  int dev_pm_opp_of_find_icc_paths(struct device *dev, struct opp_table *opp_table);
-void dev_pm_opp_of_register_em(struct cpumask *cpus);
+int dev_pm_opp_of_register_em(struct device *dev, struct cpumask *cpus);
+static inline void dev_pm_opp_of_unregister_em(struct device *dev)
+{
+       em_dev_unregister_perf_domain(dev);
+}
  #else
  static inline int dev_pm_opp_of_add_table(struct device *dev)
  {
@@ -413,7 +418,13 @@ static inline struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp)
         return NULL;
  }
  
-static inline void dev_pm_opp_of_register_em(struct cpumask *cpus)
+static inline int dev_pm_opp_of_register_em(struct device *dev,
+                                           struct cpumask *cpus)
+{
+       return -ENOTSUPP;
+}
+
+static inline void dev_pm_opp_of_unregister_em(struct device *dev)
  {
  }
  
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c

index 0a9326f..c1ff7fa 100644 (file)
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -1,9 +1,10 @@
  // SPDX-License-Identifier: GPL-2.0
  /*
- * Energy Model of CPUs
+ * Energy Model of devices
   *
- * Copyright (c) 2018, Arm ltd.
+ * Copyright (c) 2018-2020, Arm ltd.
   * Written by: Quentin Perret, Arm ltd.
+ * Improvements provided by: Lukasz Luba, Arm ltd.
   */
  
  #define pr_fmt(fmt) "energy_model: " fmt
@@ -15,30 +16,32 @@
  #include <linux/sched/topology.h>
  #include <linux/slab.h>
  
-/* Mapping of each CPU to the performance domain to which it belongs. */
-static DEFINE_PER_CPU(struct em_perf_domain *, em_data);
-
  /*
   * Mutex serializing the registrations of performance domains and letting
   * callbacks defined by drivers sleep.
   */
  static DEFINE_MUTEX(em_pd_mutex);
  
+static bool _is_cpu_device(struct device *dev)
+{
+       return (dev->bus == &cpu_subsys);
+}
+
  #ifdef CONFIG_DEBUG_FS
  static struct dentry *rootdir;
  
-static void em_debug_create_cs(struct em_cap_state *cs, struct dentry *pd)
+static void em_debug_create_ps(struct em_perf_state *ps, struct dentry *pd)
  {
         struct dentry *d;
         char name[24];
  
-       snprintf(name, sizeof(name), "cs:%lu", cs->frequency);
+       snprintf(name, sizeof(name), "ps:%lu", ps->frequency);
  
-       /* Create per-cs directory */
+       /* Create per-ps directory */
         d = debugfs_create_dir(name, pd);
-       debugfs_create_ulong("frequency", 0444, d, &cs->frequency);
-       debugfs_create_ulong("power", 0444, d, &cs->power);
-       debugfs_create_ulong("cost", 0444, d, &cs->cost);
+       debugfs_create_ulong("frequency", 0444, d, &ps->frequency);
+       debugfs_create_ulong("power", 0444, d, &ps->power);
+       debugfs_create_ulong("cost", 0444, d, &ps->cost);
  }
  
  static int em_debug_cpus_show(struct seq_file *s, void *unused)
@@ -49,22 +52,30 @@ static int em_debug_cpus_show(struct seq_file *s, void *unused)
  }
  DEFINE_SHOW_ATTRIBUTE(em_debug_cpus);
  
-static void em_debug_create_pd(struct em_perf_domain *pd, int cpu)
+static void em_debug_create_pd(struct device *dev)
  {
         struct dentry *d;
-       char name[8];
         int i;
  
-       snprintf(name, sizeof(name), "pd%d", cpu);
-
         /* Create the directory of the performance domain */
-       d = debugfs_create_dir(name, rootdir);
+       d = debugfs_create_dir(dev_name(dev), rootdir);
  
-       debugfs_create_file("cpus", 0444, d, pd->cpus, &em_debug_cpus_fops);
+       if (_is_cpu_device(dev))
+               debugfs_create_file("cpus", 0444, d, dev->em_pd->cpus,
+                                   &em_debug_cpus_fops);
+
+       /* Create a sub-directory for each performance state */
+       for (i = 0; i < dev->em_pd->nr_perf_states; i++)
+               em_debug_create_ps(&dev->em_pd->table[i], d);
  
-       /* Create a sub-directory for each capacity state */
-       for (i = 0; i < pd->nr_cap_states; i++)
-               em_debug_create_cs(&pd->table[i], d);
+}
+
+static void em_debug_remove_pd(struct device *dev)
+{
+       struct dentry *debug_dir;
+
+       debug_dir = debugfs_lookup(dev_name(dev), rootdir);
+       debugfs_remove_recursive(debug_dir);
  }
  
  static int __init em_debug_init(void)
@@ -76,58 +87,55 @@ static int __init em_debug_init(void)
  }
  core_initcall(em_debug_init);
  #else /* CONFIG_DEBUG_FS */
-static void em_debug_create_pd(struct em_perf_domain *pd, int cpu) {}
+static void em_debug_create_pd(struct device *dev) {}
+static void em_debug_remove_pd(struct device *dev) {}
  #endif
-static struct em_perf_domain *em_create_pd(cpumask_t *span, int nr_states,
-                                               struct em_data_callback *cb)
+
+static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
+                               int nr_states, struct em_data_callback *cb)
  {
         unsigned long opp_eff, prev_opp_eff = ULONG_MAX;
         unsigned long power, freq, prev_freq = 0;
-       int i, ret, cpu = cpumask_first(span);
-       struct em_cap_state *table;
-       struct em_perf_domain *pd;
+       struct em_perf_state *table;
+       int i, ret;
         u64 fmax;
  
-       if (!cb->active_power)
-               return NULL;
-
-       pd = kzalloc(sizeof(*pd) + cpumask_size(), GFP_KERNEL);
-       if (!pd)
-               return NULL;
-
         table = kcalloc(nr_states, sizeof(*table), GFP_KERNEL);
         if (!table)
-               goto free_pd;
+               return -ENOMEM;
  
-       /* Build the list of capacity states for this performance domain */
+       /* Build the list of performance states for this performance domain */
         for (i = 0, freq = 0; i < nr_states; i++, freq++) {
                 /*
                  * active_power() is a driver callback which ceils 'freq' to
-                * lowest capacity state of 'cpu' above 'freq' and updates
+                * lowest performance state of 'dev' above 'freq' and updates
                  * 'power' and 'freq' accordingly.
                  */
-               ret = cb->active_power(&power, &freq, cpu);
+               ret = cb->active_power(&power, &freq, dev);
                 if (ret) {
-                       pr_err("pd%d: invalid cap. state: %d\n", cpu, ret);
-                       goto free_cs_table;
+                       dev_err(dev, "EM: invalid perf. state: %d\n",
+                               ret);
+                       goto free_ps_table;
                 }
  
                 /*
                  * We expect the driver callback to increase the frequency for
-                * higher capacity states.
+                * higher performance states.
                  */
                 if (freq <= prev_freq) {
-                       pr_err("pd%d: non-increasing freq: %lu\n", cpu, freq);
-                       goto free_cs_table;
+                       dev_err(dev, "EM: non-increasing freq: %lu\n",
+                               freq);
+                       goto free_ps_table;
                 }
  
                 /*
                  * The power returned by active_state() is expected to be
                  * positive, in milli-watts and to fit into 16 bits.
                  */
-               if (!power || power > EM_CPU_MAX_POWER) {
-                       pr_err("pd%d: invalid power: %lu\n", cpu, power);
-                       goto free_cs_table;
+               if (!power || power > EM_MAX_POWER) {
+                       dev_err(dev, "EM: invalid power: %lu\n",
+                               power);
+                       goto free_ps_table;
                 }
  
                 table[i].power = power;
@@ -141,12 +149,12 @@ static struct em_perf_domain *em_create_pd(cpumask_t *span, int nr_states,
                  */
                 opp_eff = freq / power;
                 if (opp_eff >= prev_opp_eff)
-                       pr_warn("pd%d: hertz/watts ratio non-monotonically decreasing: em_cap_state %d >= em_cap_state%d\n",
-                                       cpu, i, i - 1);
+                       dev_dbg(dev, "EM: hertz/watts ratio non-monotonically decreasing: em_perf_state %d >= em_perf_state%d\n",
+                                       i, i - 1);
                 prev_opp_eff = opp_eff;
         }
  
-       /* Compute the cost of each capacity_state. */
+       /* Compute the cost of each performance state. */
         fmax = (u64) table[nr_states - 1].frequency;
         for (i = 0; i < nr_states; i++) {
                 table[i].cost = div64_u64(fmax * table[i].power,
@@ -154,39 +162,94 @@ static struct em_perf_domain *em_create_pd(cpumask_t *span, int nr_states,
         }
  
         pd->table = table;
-       pd->nr_cap_states = nr_states;
-       cpumask_copy(to_cpumask(pd->cpus), span);
-
-       em_debug_create_pd(pd, cpu);
+       pd->nr_perf_states = nr_states;
  
-       return pd;
+       return 0;
  
-free_cs_table:
+free_ps_table:
         kfree(table);
-free_pd:
-       kfree(pd);
+       return -EINVAL;
+}
+
+static int em_create_pd(struct device *dev, int nr_states,
+                       struct em_data_callback *cb, cpumask_t *cpus)
+{
+       struct em_perf_domain *pd;
+       struct device *cpu_dev;
+       int cpu, ret;
+
+       if (_is_cpu_device(dev)) {
+               pd = kzalloc(sizeof(*pd) + cpumask_size(), GFP_KERNEL);
+               if (!pd)
+                       return -ENOMEM;
+
+               cpumask_copy(em_span_cpus(pd), cpus);
+       } else {
+               pd = kzalloc(sizeof(*pd), GFP_KERNEL);
+               if (!pd)
+                       return -ENOMEM;
+       }
+
+       ret = em_create_perf_table(dev, pd, nr_states, cb);
+       if (ret) {
+               kfree(pd);
+               return ret;
+       }
+
+       if (_is_cpu_device(dev))
+               for_each_cpu(cpu, cpus) {
+                       cpu_dev = get_cpu_device(cpu);
+                       cpu_dev->em_pd = pd;
+               }
+
+       dev->em_pd = pd;
+
+       return 0;
+}
+
+/**
+ * em_pd_get() - Return the performance domain for a device
+ * @dev : Device to find the performance domain for
+ *
+ * Returns the performance domain to which @dev belongs, or NULL if it doesn't
+ * exist.
+ */
+struct em_perf_domain *em_pd_get(struct device *dev)
+{
+       if (IS_ERR_OR_NULL(dev))
+               return NULL;
  
-       return NULL;
+       return dev->em_pd;
  }
+EXPORT_SYMBOL_GPL(em_pd_get);
  
  /**
   * em_cpu_get() - Return the performance domain for a CPU
   * @cpu : CPU to find the performance domain for
   *
- * Return: the performance domain to which 'cpu' belongs, or NULL if it doesn't
+ * Returns the performance domain to which @cpu belongs, or NULL if it doesn't
   * exist.
   */
  struct em_perf_domain *em_cpu_get(int cpu)
  {
-       return READ_ONCE(per_cpu(em_data, cpu));
+       struct device *cpu_dev;
+
+       cpu_dev = get_cpu_device(cpu);
+       if (!cpu_dev)
+               return NULL;
+
+       return em_pd_get(cpu_dev);
  }
  EXPORT_SYMBOL_GPL(em_cpu_get);
  
  /**
- * em_register_perf_domain() - Register the Energy Model of a performance domain
- * @span       : Mask of CPUs in the performance domain
- * @nr_states  : Number of capacity states to register
+ * em_dev_register_perf_domain() - Register the Energy Model (EM) for a device
+ * @dev                : Device for which the EM is to register
+ * @nr_states  : Number of performance states to register
   * @cb         : Callback functions providing the data of the Energy Model
+ * @cpus       : Pointer to cpumask_t, which in case of a CPU device is
+ *             obligatory. It can be taken from i.e. 'policy->cpus'. For other
+ *             type of devices this should be set to NULL.
   *
   * Create Energy Model tables for a performance domain using the callbacks
   * defined in cb.
@@ -196,14 +259,13 @@ EXPORT_SYMBOL_GPL(em_cpu_get);
   *
   * Return 0 on success
   */
-int em_register_perf_domain(cpumask_t *span, unsigned int nr_states,
-                                               struct em_data_callback *cb)
+int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
+                               struct em_data_callback *cb, cpumask_t *cpus)
  {
         unsigned long cap, prev_cap = 0;
-       struct em_perf_domain *pd;
-       int cpu, ret = 0;
+       int cpu, ret;
  
-       if (!span || !nr_states || !cb)
+       if (!dev || !nr_states || !cb)
                 return -EINVAL;
  
         /*
@@ -212,47 +274,79 @@ int em_register_perf_domain(cpumask_t *span, unsigned int nr_states,
          */
         mutex_lock(&em_pd_mutex);
  
-       for_each_cpu(cpu, span) {
-               /* Make sure we don't register again an existing domain. */
-               if (READ_ONCE(per_cpu(em_data, cpu))) {
-                       ret = -EEXIST;
-                       goto unlock;
-               }
+       if (dev->em_pd) {
+               ret = -EEXIST;
+               goto unlock;
+       }
  
-               /*
-                * All CPUs of a domain must have the same micro-architecture
-                * since they all share the same table.
-                */
-               cap = arch_scale_cpu_capacity(cpu);
-               if (prev_cap && prev_cap != cap) {
-                       pr_err("CPUs of %*pbl must have the same capacity\n",
-                                                       cpumask_pr_args(span));
+       if (_is_cpu_device(dev)) {
+               if (!cpus) {
+                       dev_err(dev, "EM: invalid CPU mask\n");
                         ret = -EINVAL;
                         goto unlock;
                 }
-               prev_cap = cap;
+
+               for_each_cpu(cpu, cpus) {
+                       if (em_cpu_get(cpu)) {
+                               dev_err(dev, "EM: exists for CPU%d\n", cpu);
+                               ret = -EEXIST;
+                               goto unlock;
+                       }
+                       /*
+                        * All CPUs of a domain must have the same
+                        * micro-architecture since they all share the same
+                        * table.
+                        */
+                       cap = arch_scale_cpu_capacity(cpu);
+                       if (prev_cap && prev_cap != cap) {
+                               dev_err(dev, "EM: CPUs of %*pbl must have the same capacity\n",
+                                       cpumask_pr_args(cpus));
+
+                               ret = -EINVAL;
+                               goto unlock;
+                       }
+                       prev_cap = cap;
+               }
         }
  
-       /* Create the performance domain and add it to the Energy Model. */
-       pd = em_create_pd(span, nr_states, cb);
-       if (!pd) {
-               ret = -EINVAL;
+       ret = em_create_pd(dev, nr_states, cb, cpus);
+       if (ret)
                 goto unlock;
-       }
  
-       for_each_cpu(cpu, span) {
-               /*
-                * The per-cpu array can be read concurrently from em_cpu_get().
-                * The barrier enforces the ordering needed to make sure readers
-                * can only access well formed em_perf_domain structs.
-                */
-               smp_store_release(per_cpu_ptr(&em_data, cpu), pd);
-       }
+       em_debug_create_pd(dev);
+       dev_info(dev, "EM: created perf domain\n");
  
-       pr_debug("Created perf domain %*pbl\n", cpumask_pr_args(span));
  unlock:
         mutex_unlock(&em_pd_mutex);
-
         return ret;
  }
-EXPORT_SYMBOL_GPL(em_register_perf_domain);
+EXPORT_SYMBOL_GPL(em_dev_register_perf_domain);
+
+/**
+ * em_dev_unregister_perf_domain() - Unregister Energy Model (EM) for a device
+ * @dev                : Device for which the EM is registered
+ *
+ * Unregister the EM for the specified @dev (but not a CPU device).
+ */
+void em_dev_unregister_perf_domain(struct device *dev)
+{
+       if (IS_ERR_OR_NULL(dev) || !dev->em_pd)
+               return;
+
+       if (_is_cpu_device(dev))
+               return;
+
+       /*
+        * The mutex separates all register/unregister requests and protects
+        * from potential clean-up/setup issues in the debugfs directories.
+        * The debugfs directory name is the same as device's name.
+        */
+       mutex_lock(&em_pd_mutex);
+       em_debug_remove_pd(dev);
+
+       kfree(dev->em_pd->table);
+       kfree(dev->em_pd);
+       dev->em_pd = NULL;
+       mutex_unlock(&em_pd_mutex);
+}
+EXPORT_SYMBOL_GPL(em_dev_unregister_perf_domain);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c

index 04fa8db..14c80bb 100644 (file)
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6501,7 +6501,7 @@ compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd)
                 max_util = max(max_util, cpu_util);
         }
  
-       return em_pd_energy(pd->em_pd, max_util, sum_util);
+       return em_cpu_energy(pd->em_pd, max_util, sum_util);
  }
  
  /*
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c

index ba81187..2f91d31 100644 (file)
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -272,10 +272,10 @@ static void perf_domain_debug(const struct cpumask *cpu_map,
         printk(KERN_DEBUG "root_domain %*pbl:", cpumask_pr_args(cpu_map));
  
         while (pd) {
-               printk(KERN_CONT " pd%d:{ cpus=%*pbl nr_cstate=%d }",
+               printk(KERN_CONT " pd%d:{ cpus=%*pbl nr_pstate=%d }",
                                 cpumask_first(perf_domain_span(pd)),
                                 cpumask_pr_args(perf_domain_span(pd)),
-                               em_pd_nr_cap_states(pd->em_pd));
+                               em_pd_nr_perf_states(pd->em_pd));
                 pd = pd->next;
         }
  
@@ -313,26 +313,26 @@ static void sched_energy_set(bool has_eas)
   *
   * The complexity of the Energy Model is defined as:
   *
- *              C = nr_pd * (nr_cpus + nr_cs)
+ *              C = nr_pd * (nr_cpus + nr_ps)
   *
   * with parameters defined as:
   *  - nr_pd:    the number of performance domains
   *  - nr_cpus:  the number of CPUs
- *  - nr_cs:    the sum of the number of capacity states of all performance
+ *  - nr_ps:    the sum of the number of performance states of all performance
   *              domains (for example, on a system with 2 performance domains,
- *              with 10 capacity states each, nr_cs = 2 * 10 = 20).
+ *              with 10 performance states each, nr_ps = 2 * 10 = 20).
   *
   * It is generally not a good idea to use such a model in the wake-up path on
   * very complex platforms because of the associated scheduling overheads. The
   * arbitrary constraint below prevents that. It makes EAS usable up to 16 CPUs
- * with per-CPU DVFS and less than 8 capacity states each, for example.
+ * with per-CPU DVFS and less than 8 performance states each, for example.
   */
  #define EM_MAX_COMPLEXITY 2048
  
  extern struct cpufreq_governor schedutil_gov;
  static bool build_perf_domains(const struct cpumask *cpu_map)
  {
-       int i, nr_pd = 0, nr_cs = 0, nr_cpus = cpumask_weight(cpu_map);
+       int i, nr_pd = 0, nr_ps = 0, nr_cpus = cpumask_weight(cpu_map);
         struct perf_domain *pd = NULL, *tmp;
         int cpu = cpumask_first(cpu_map);
         struct root_domain *rd = cpu_rq(cpu)->rd;
@@ -384,15 +384,15 @@ static bool build_perf_domains(const struct cpumask *cpu_map)
                 pd = tmp;
  
                 /*
-                * Count performance domains and capacity states for the
+                * Count performance domains and performance states for the
                  * complexity check.
                  */
                 nr_pd++;
-               nr_cs += em_pd_nr_cap_states(pd->em_pd);
+               nr_ps += em_pd_nr_perf_states(pd->em_pd);
         }
  
         /* Bail out if the Energy Model complexity is too high. */
-       if (nr_pd * (nr_cs + nr_cpus) > EM_MAX_COMPLEXITY) {
+       if (nr_pd * (nr_ps + nr_cpus) > EM_MAX_COMPLEXITY) {
                 WARN(1, "rd %*pbl: Failed to start EAS, EM complexity is too high\n",
                                                 cpumask_pr_args(cpu_map));
                 goto free;
author	Rafael J. Wysocki <rafael.j.wysocki@intel.com>
	Mon, 3 Aug 2020 11:11:39 +0000 (13:11 +0200)
committer	Rafael J. Wysocki <rafael.j.wysocki@intel.com>
	Mon, 3 Aug 2020 11:11:39 +0000 (13:11 +0200)
Documentation/ABI/testing/sysfs-class-devfreq		patch \| blob \| history
Documentation/devicetree/bindings/devfreq/rk3399_dmc.txt		patch \| blob \| history
Documentation/power/energy-model.rst		patch \| blob \| history
MAINTAINERS		patch \| blob \| history
drivers/cpufreq/cpufreq-dt.c		patch \| blob \| history
drivers/cpufreq/imx6q-cpufreq.c		patch \| blob \| history
drivers/cpufreq/mediatek-cpufreq.c		patch \| blob \| history
drivers/cpufreq/omap-cpufreq.c		patch \| blob \| history
drivers/cpufreq/qcom-cpufreq-hw.c		patch \| blob \| history
drivers/cpufreq/scmi-cpufreq.c		patch \| blob \| history
drivers/cpufreq/scpi-cpufreq.c		patch \| blob \| history
drivers/cpufreq/vexpress-spc-cpufreq.c		patch \| blob \| history
drivers/devfreq/devfreq-event.c		patch \| blob \| history
drivers/devfreq/devfreq.c		patch \| blob \| history
drivers/devfreq/rk3399_dmc.c		patch \| blob \| history
drivers/memory/samsung/exynos5422-dmc.c		patch \| blob \| history
drivers/opp/core.c		patch \| blob \| history
drivers/opp/of.c		patch \| blob \| history
drivers/opp/ti-opp-supply.c		patch \| blob \| history
drivers/thermal/cpufreq_cooling.c		patch \| blob \| history
include/linux/devfreq.h		patch \| blob \| history
include/linux/device.h		patch \| blob \| history
include/linux/energy_model.h		patch \| blob \| history
include/linux/pm_opp.h		patch \| blob \| history
kernel/power/energy_model.c		patch \| blob \| history
kernel/sched/fair.c		patch \| blob \| history
kernel/sched/topology.c		patch \| blob \| history