Commit a9a939cb authored by Rafael J. Wysocki's avatar Rafael J. Wysocki
Browse files

Merge branches 'powercap' and 'pm-misc'

* powercap:
  powercap: intel_rapl: Use topology interface in rapl_init_domains()
  powercap: intel_rapl: Use topology interface in rapl_add_package()
  powercap/intel_rapl: add support for AlderLake Mobile
  powercap/drivers/dtpm: Fix size of object being allocated
  powercap/drivers/dtpm: Fix an IS_ERR() vs NULL check
  powercap/drivers/dtpm: Fix some missing unlock bugs
  powercap/drivers/dtpm: Fix a double shift bug
  powercap/drivers/dtpm: Fix __udivdi3 and __aeabi_uldivmod unresolved symbols
  powercap/drivers/dtpm: Add CPU energy model based support
  powercap/drivers/dtpm: Add API for dynamic thermal power management
  Documentation/powercap/dtpm: Add documentation for dtpm
  units: Add Watt units

* pm-misc:
  PM: Kconfig: remove unneeded "default n" options
  PM: EM: update Kconfig description and drop "default n" option
Loading
Loading
Loading
Loading
+1 −0
Original line number Diff line number Diff line
@@ -30,6 +30,7 @@ Power Management
    userland-swsusp

    powercap/powercap
    powercap/dtpm

    regulator/consumer
    regulator/design
+212 −0
Original line number Diff line number Diff line
.. SPDX-License-Identifier: GPL-2.0

==========================================
Dynamic Thermal Power Management framework
==========================================

On the embedded world, the complexity of the SoC leads to an
increasing number of hotspots which need to be monitored and mitigated
as a whole in order to prevent the temperature to go above the
normative and legally stated 'skin temperature'.

Another aspect is to sustain the performance for a given power budget,
for example virtual reality where the user can feel dizziness if the
performance is capped while a big CPU is processing something else. Or
reduce the battery charging because the dissipated power is too high
compared with the power consumed by other devices.

The user space is the most adequate place to dynamically act on the
different devices by limiting their power given an application
profile: it has the knowledge of the platform.

The Dynamic Thermal Power Management (DTPM) is a technique acting on
the device power by limiting and/or balancing a power budget among
different devices.

The DTPM framework provides an unified interface to act on the
device power.

Overview
========

The DTPM framework relies on the powercap framework to create the
powercap entries in the sysfs directory and implement the backend
driver to do the connection with the power manageable device.

The DTPM is a tree representation describing the power constraints
shared between devices, not their physical positions.

The nodes of the tree are a virtual description aggregating the power
characteristics of the children nodes and their power limitations.

The leaves of the tree are the real power manageable devices.

For instance::

  SoC
   |
   `-- pkg
	|
	|-- pd0 (cpu0-3)
	|
	`-- pd1 (cpu4-5)

The pkg power will be the sum of pd0 and pd1 power numbers::

  SoC (400mW - 3100mW)
   |
   `-- pkg (400mW - 3100mW)
	|
	|-- pd0 (100mW - 700mW)
	|
	`-- pd1 (300mW - 2400mW)

When the nodes are inserted in the tree, their power characteristics are propagated to the parents::

  SoC (600mW - 5900mW)
   |
   |-- pkg (400mW - 3100mW)
   |    |
   |    |-- pd0 (100mW - 700mW)
   |    |
   |    `-- pd1 (300mW - 2400mW)
   |
   `-- pd2 (200mW - 2800mW)

Each node have a weight on a 2^10 basis reflecting the percentage of power consumption along the siblings::

  SoC (w=1024)
   |
   |-- pkg (w=538)
   |    |
   |    |-- pd0 (w=231)
   |    |
   |    `-- pd1 (w=794)
   |
   `-- pd2 (w=486)

   Note the sum of weights at the same level are equal to 1024.

When a power limitation is applied to a node, then it is distributed along the children given their weights. For example, if we set a power limitation of 3200mW at the 'SoC' root node, the resulting tree will be::

  SoC (w=1024) <--- power_limit = 3200mW
   |
   |-- pkg (w=538) --> power_limit = 1681mW
   |    |
   |    |-- pd0 (w=231) --> power_limit = 378mW
   |    |
   |    `-- pd1 (w=794) --> power_limit = 1303mW
   |
   `-- pd2 (w=486) --> power_limit = 1519mW


Flat description
----------------

A root node is created and it is the parent of all the nodes. This
description is the simplest one and it is supposed to give to user
space a flat representation of all the devices supporting the power
limitation without any power limitation distribution.

Hierarchical description
------------------------

The different devices supporting the power limitation are represented
hierarchically. There is one root node, all intermediate nodes are
grouping the child nodes which can be intermediate nodes also or real
devices.

The intermediate nodes aggregate the power information and allows to
set the power limit given the weight of the nodes.

User space API
==============

As stated in the overview, the DTPM framework is built on top of the
powercap framework. Thus the sysfs interface is the same, please refer
to the powercap documentation for further details.

 * power_uw: Instantaneous power consumption. If the node is an
   intermediate node, then the power consumption will be the sum of all
   children power consumption.

 * max_power_range_uw: The power range resulting of the maximum power
   minus the minimum power.

 * name: The name of the node. This is implementation dependent. Even
   if it is not recommended for the user space, several nodes can have
   the same name.

 * constraint_X_name: The name of the constraint.

 * constraint_X_max_power_uw: The maximum power limit to be applicable
   to the node.

 * constraint_X_power_limit_uw: The power limit to be applied to the
   node. If the value contained in constraint_X_max_power_uw is set,
   the constraint will be removed.

 * constraint_X_time_window_us: The meaning of this file will depend
   on the constraint number.

Constraints
-----------

 * Constraint 0: The power limitation is immediately applied, without
   limitation in time.

Kernel API
==========

Overview
--------

The DTPM framework has no power limiting backend support. It is
generic and provides a set of API to let the different drivers to
implement the backend part for the power limitation and create the
power constraints tree.

It is up to the platform to provide the initialization function to
allocate and link the different nodes of the tree.

A special macro has the role of declaring a node and the corresponding
initialization function via a description structure. This one contains
an optional parent field allowing to hook different devices to an
already existing tree at boot time.

For instance::

	struct dtpm_descr my_descr = {
		.name = "my_name",
		.init = my_init_func,
	};

	DTPM_DECLARE(my_descr);

The nodes of the DTPM tree are described with dtpm structure. The
steps to add a new power limitable device is done in three steps:

 * Allocate the dtpm node
 * Set the power number of the dtpm node
 * Register the dtpm node

The registration of the dtpm node is done with the powercap
ops. Basically, it must implements the callbacks to get and set the
power and the limit.

Alternatively, if the node to be inserted is an intermediate one, then
a simple function to insert it as a future parent is available.

If a device has its power characteristics changing, then the tree must
be updated with the new power numbers and weights.

Nomenclature
------------

 * dtpm_alloc() : Allocate and initialize a dtpm structure

 * dtpm_register() : Add the dtpm node to the tree

 * dtpm_unregister() : Remove the dtpm node from the tree

 * dtpm_update_power() : Update the power characteristics of the dtpm node
+13 −0
Original line number Diff line number Diff line
@@ -43,4 +43,17 @@ config IDLE_INJECT
	  CPUs for power capping. Idle period can be injected
	  synchronously on a set of specified CPUs or alternatively
	  on a per CPU basis.

config DTPM
	bool "Power capping for Dynamic Thermal Power Management"
	help
	  This enables support for the power capping for the dynamic
	  thermal power management userspace engine.

config DTPM_CPU
	bool "Add CPU power capping based on the energy model"
	depends on DTPM && ENERGY_MODEL
	help
	  This enables support for CPU power limitation based on
	  energy model.
endif
+2 −0
Original line number Diff line number Diff line
# SPDX-License-Identifier: GPL-2.0-only
obj-$(CONFIG_DTPM) += dtpm.o
obj-$(CONFIG_DTPM_CPU) += dtpm_cpu.o
obj-$(CONFIG_POWERCAP)	+= powercap_sys.o
obj-$(CONFIG_INTEL_RAPL_CORE) += intel_rapl_common.o
obj-$(CONFIG_INTEL_RAPL) += intel_rapl_msr.o
+480 −0
Original line number Diff line number Diff line
// SPDX-License-Identifier: GPL-2.0-only
/*
 * Copyright 2020 Linaro Limited
 *
 * Author: Daniel Lezcano <daniel.lezcano@linaro.org>
 *
 * The powercap based Dynamic Thermal Power Management framework
 * provides to the userspace a consistent API to set the power limit
 * on some devices.
 *
 * DTPM defines the functions to create a tree of constraints. Each
 * parent node is a virtual description of the aggregation of the
 * children. It propagates the constraints set at its level to its
 * children and collect the children power information. The leaves of
 * the tree are the real devices which have the ability to get their
 * current power consumption and set their power limit.
 */
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include <linux/dtpm.h>
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/powercap.h>
#include <linux/slab.h>
#include <linux/mutex.h>

#define DTPM_POWER_LIMIT_FLAG 0

static const char *constraint_name[] = {
	"Instantaneous",
};

static DEFINE_MUTEX(dtpm_lock);
static struct powercap_control_type *pct;
static struct dtpm *root;

static int get_time_window_us(struct powercap_zone *pcz, int cid, u64 *window)
{
	return -ENOSYS;
}

static int set_time_window_us(struct powercap_zone *pcz, int cid, u64 window)
{
	return -ENOSYS;
}

static int get_max_power_range_uw(struct powercap_zone *pcz, u64 *max_power_uw)
{
	struct dtpm *dtpm = to_dtpm(pcz);

	mutex_lock(&dtpm_lock);
	*max_power_uw = dtpm->power_max - dtpm->power_min;
	mutex_unlock(&dtpm_lock);

	return 0;
}

static int __get_power_uw(struct dtpm *dtpm, u64 *power_uw)
{
	struct dtpm *child;
	u64 power;
	int ret = 0;

	if (dtpm->ops) {
		*power_uw = dtpm->ops->get_power_uw(dtpm);
		return 0;
	}

	*power_uw = 0;

	list_for_each_entry(child, &dtpm->children, sibling) {
		ret = __get_power_uw(child, &power);
		if (ret)
			break;
		*power_uw += power;
	}

	return ret;
}

static int get_power_uw(struct powercap_zone *pcz, u64 *power_uw)
{
	struct dtpm *dtpm = to_dtpm(pcz);
	int ret;

	mutex_lock(&dtpm_lock);
	ret = __get_power_uw(dtpm, power_uw);
	mutex_unlock(&dtpm_lock);

	return ret;
}

static void __dtpm_rebalance_weight(struct dtpm *dtpm)
{
	struct dtpm *child;

	list_for_each_entry(child, &dtpm->children, sibling) {

		pr_debug("Setting weight '%d' for '%s'\n",
			 child->weight, child->zone.name);

		child->weight = DIV64_U64_ROUND_CLOSEST(
			child->power_max * 1024, dtpm->power_max);

		__dtpm_rebalance_weight(child);
	}
}

static void __dtpm_sub_power(struct dtpm *dtpm)
{
	struct dtpm *parent = dtpm->parent;

	while (parent) {
		parent->power_min -= dtpm->power_min;
		parent->power_max -= dtpm->power_max;
		parent->power_limit -= dtpm->power_limit;
		parent = parent->parent;
	}

	__dtpm_rebalance_weight(root);
}

static void __dtpm_add_power(struct dtpm *dtpm)
{
	struct dtpm *parent = dtpm->parent;

	while (parent) {
		parent->power_min += dtpm->power_min;
		parent->power_max += dtpm->power_max;
		parent->power_limit += dtpm->power_limit;
		parent = parent->parent;
	}

	__dtpm_rebalance_weight(root);
}

/**
 * dtpm_update_power - Update the power on the dtpm
 * @dtpm: a pointer to a dtpm structure to update
 * @power_min: a u64 representing the new power_min value
 * @power_max: a u64 representing the new power_max value
 *
 * Function to update the power values of the dtpm node specified in
 * parameter. These new values will be propagated to the tree.
 *
 * Return: zero on success, -EINVAL if the values are inconsistent
 */
int dtpm_update_power(struct dtpm *dtpm, u64 power_min, u64 power_max)
{
	int ret = 0;

	mutex_lock(&dtpm_lock);

	if (power_min == dtpm->power_min && power_max == dtpm->power_max)
		goto unlock;

	if (power_max < power_min) {
		ret = -EINVAL;
		goto unlock;
	}

	__dtpm_sub_power(dtpm);

	dtpm->power_min = power_min;
	dtpm->power_max = power_max;
	if (!test_bit(DTPM_POWER_LIMIT_FLAG, &dtpm->flags))
		dtpm->power_limit = power_max;

	__dtpm_add_power(dtpm);

unlock:
	mutex_unlock(&dtpm_lock);

	return ret;
}

/**
 * dtpm_release_zone - Cleanup when the node is released
 * @pcz: a pointer to a powercap_zone structure
 *
 * Do some housecleaning and update the weight on the tree. The
 * release will be denied if the node has children. This function must
 * be called by the specific release callback of the different
 * backends.
 *
 * Return: 0 on success, -EBUSY if there are children
 */
int dtpm_release_zone(struct powercap_zone *pcz)
{
	struct dtpm *dtpm = to_dtpm(pcz);
	struct dtpm *parent = dtpm->parent;

	mutex_lock(&dtpm_lock);

	if (!list_empty(&dtpm->children)) {
		mutex_unlock(&dtpm_lock);
		return -EBUSY;
	}

	if (parent)
		list_del(&dtpm->sibling);

	__dtpm_sub_power(dtpm);

	mutex_unlock(&dtpm_lock);

	if (dtpm->ops)
		dtpm->ops->release(dtpm);

	kfree(dtpm);

	return 0;
}

static int __get_power_limit_uw(struct dtpm *dtpm, int cid, u64 *power_limit)
{
	*power_limit = dtpm->power_limit;
	return 0;
}

static int get_power_limit_uw(struct powercap_zone *pcz,
			      int cid, u64 *power_limit)
{
	struct dtpm *dtpm = to_dtpm(pcz);
	int ret;

	mutex_lock(&dtpm_lock);
	ret = __get_power_limit_uw(dtpm, cid, power_limit);
	mutex_unlock(&dtpm_lock);

	return ret;
}

/*
 * Set the power limit on the nodes, the power limit is distributed
 * given the weight of the children.
 *
 * The dtpm node lock must be held when calling this function.
 */
static int __set_power_limit_uw(struct dtpm *dtpm, int cid, u64 power_limit)
{
	struct dtpm *child;
	int ret = 0;
	u64 power;

	/*
	 * A max power limitation means we remove the power limit,
	 * otherwise we set a constraint and flag the dtpm node.
	 */
	if (power_limit == dtpm->power_max) {
		clear_bit(DTPM_POWER_LIMIT_FLAG, &dtpm->flags);
	} else {
		set_bit(DTPM_POWER_LIMIT_FLAG, &dtpm->flags);
	}

	pr_debug("Setting power limit for '%s': %llu uW\n",
		 dtpm->zone.name, power_limit);

	/*
	 * Only leaves of the dtpm tree has ops to get/set the power
	 */
	if (dtpm->ops) {
		dtpm->power_limit = dtpm->ops->set_power_uw(dtpm, power_limit);
	} else {
		dtpm->power_limit = 0;

		list_for_each_entry(child, &dtpm->children, sibling) {

			/*
			 * Integer division rounding will inevitably
			 * lead to a different min or max value when
			 * set several times. In order to restore the
			 * initial value, we force the child's min or
			 * max power every time if the constraint is
			 * at the boundaries.
			 */
			if (power_limit == dtpm->power_max) {
				power = child->power_max;
			} else if (power_limit == dtpm->power_min) {
				power = child->power_min;
			} else {
				power = DIV_ROUND_CLOSEST_ULL(
					power_limit * child->weight, 1024);
			}

			pr_debug("Setting power limit for '%s': %llu uW\n",
				 child->zone.name, power);

			ret = __set_power_limit_uw(child, cid, power);
			if (!ret)
				ret = __get_power_limit_uw(child, cid, &power);

			if (ret)
				break;

			dtpm->power_limit += power;
		}
	}

	return ret;
}

static int set_power_limit_uw(struct powercap_zone *pcz,
			      int cid, u64 power_limit)
{
	struct dtpm *dtpm = to_dtpm(pcz);
	int ret;

	mutex_lock(&dtpm_lock);

	/*
	 * Don't allow values outside of the power range previously
	 * set when initializing the power numbers.
	 */
	power_limit = clamp_val(power_limit, dtpm->power_min, dtpm->power_max);

	ret = __set_power_limit_uw(dtpm, cid, power_limit);

	pr_debug("%s: power limit: %llu uW, power max: %llu uW\n",
		 dtpm->zone.name, dtpm->power_limit, dtpm->power_max);

	mutex_unlock(&dtpm_lock);

	return ret;
}

static const char *get_constraint_name(struct powercap_zone *pcz, int cid)
{
	return constraint_name[cid];
}

static int get_max_power_uw(struct powercap_zone *pcz, int id, u64 *max_power)
{
	struct dtpm *dtpm = to_dtpm(pcz);

	mutex_lock(&dtpm_lock);
	*max_power = dtpm->power_max;
	mutex_unlock(&dtpm_lock);

	return 0;
}

static struct powercap_zone_constraint_ops constraint_ops = {
	.set_power_limit_uw = set_power_limit_uw,
	.get_power_limit_uw = get_power_limit_uw,
	.set_time_window_us = set_time_window_us,
	.get_time_window_us = get_time_window_us,
	.get_max_power_uw = get_max_power_uw,
	.get_name = get_constraint_name,
};

static struct powercap_zone_ops zone_ops = {
	.get_max_power_range_uw = get_max_power_range_uw,
	.get_power_uw = get_power_uw,
	.release = dtpm_release_zone,
};

/**
 * dtpm_alloc - Allocate and initialize a dtpm struct
 * @name: a string specifying the name of the node
 *
 * Return: a struct dtpm pointer, NULL in case of error
 */
struct dtpm *dtpm_alloc(struct dtpm_ops *ops)
{
	struct dtpm *dtpm;

	dtpm = kzalloc(sizeof(*dtpm), GFP_KERNEL);
	if (dtpm) {
		INIT_LIST_HEAD(&dtpm->children);
		INIT_LIST_HEAD(&dtpm->sibling);
		dtpm->weight = 1024;
		dtpm->ops = ops;
	}

	return dtpm;
}

/**
 * dtpm_unregister - Unregister a dtpm node from the hierarchy tree
 * @dtpm: a pointer to a dtpm structure corresponding to the node to be removed
 *
 * Call the underlying powercap unregister function. That will call
 * the release callback of the powercap zone.
 */
void dtpm_unregister(struct dtpm *dtpm)
{
	powercap_unregister_zone(pct, &dtpm->zone);

	pr_info("Unregistered dtpm node '%s'\n", dtpm->zone.name);
}

/**
 * dtpm_register - Register a dtpm node in the hierarchy tree
 * @name: a string specifying the name of the node
 * @dtpm: a pointer to a dtpm structure corresponding to the new node
 * @parent: a pointer to a dtpm structure corresponding to the parent node
 *
 * Create a dtpm node in the tree. If no parent is specified, the node
 * is the root node of the hierarchy. If the root node already exists,
 * then the registration will fail. The powercap controller must be
 * initialized before calling this function.
 *
 * The dtpm structure must be initialized with the power numbers
 * before calling this function.
 *
 * Return: zero on success, a negative value in case of error:
 *  -EAGAIN: the function is called before the framework is initialized.
 *  -EBUSY: the root node is already inserted
 *  -EINVAL: * there is no root node yet and @parent is specified
 *           * no all ops are defined
 *           * parent have ops which are reserved for leaves
 *   Other negative values are reported back from the powercap framework
 */
int dtpm_register(const char *name, struct dtpm *dtpm, struct dtpm *parent)
{
	struct powercap_zone *pcz;

	if (!pct)
		return -EAGAIN;

	if (root && !parent)
		return -EBUSY;

	if (!root && parent)
		return -EINVAL;

	if (parent && parent->ops)
		return -EINVAL;

	if (!dtpm)
		return -EINVAL;

	if (dtpm->ops && !(dtpm->ops->set_power_uw &&
			   dtpm->ops->get_power_uw &&
			   dtpm->ops->release))
		return -EINVAL;

	pcz = powercap_register_zone(&dtpm->zone, pct, name,
				     parent ? &parent->zone : NULL,
				     &zone_ops, MAX_DTPM_CONSTRAINTS,
				     &constraint_ops);
	if (IS_ERR(pcz))
		return PTR_ERR(pcz);

	mutex_lock(&dtpm_lock);

	if (parent) {
		list_add_tail(&dtpm->sibling, &parent->children);
		dtpm->parent = parent;
	} else {
		root = dtpm;
	}

	__dtpm_add_power(dtpm);

	pr_info("Registered dtpm node '%s' / %llu-%llu uW, \n",
		dtpm->zone.name, dtpm->power_min, dtpm->power_max);

	mutex_unlock(&dtpm_lock);

	return 0;
}

static int __init dtpm_init(void)
{
	struct dtpm_descr **dtpm_descr;

	pct = powercap_register_control_type(NULL, "dtpm", NULL);
	if (IS_ERR(pct)) {
		pr_err("Failed to register control type\n");
		return PTR_ERR(pct);
	}

	for_each_dtpm_table(dtpm_descr)
		(*dtpm_descr)->init(*dtpm_descr);

	return 0;
}
late_initcall(dtpm_init);
Loading