Commit aac09ce2 authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull thermal management updates from Zhang Rui:

 - Convert thermal documents to ReST (Mauro Carvalho Chehab)

 - Fix a cyclic depedency in between thermal core and governors (Daniel
   Lezcano)

 - Fix processor_thermal_device driver to re-evaluate power limits after
   resume (Srinivas Pandruvada, Zhang Rui)

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
  drivers: thermal: processor_thermal_device: Fix build warning
  docs: thermal: convert to ReST
  thermal/drivers/core: Use governor table to initialize
  thermal/drivers/core: Add init section table for self-encapsulation
  drivers: thermal: processor_thermal: Read PPCC on resume
parents c3c08f93 6c395f66
Loading
Loading
Loading
Loading
+27 −12
Original line number Original line Diff line number Diff line
=======================
CPU cooling APIs How To
CPU cooling APIs How To
===================================
=======================


Written by Amit Daniel Kachhap <amit.kachhap@linaro.org>
Written by Amit Daniel Kachhap <amit.kachhap@linaro.org>


@@ -8,40 +9,54 @@ Updated: 6 Jan 2015
Copyright (c)  2012 Samsung Electronics Co., Ltd(http://www.samsung.com)
Copyright (c)  2012 Samsung Electronics Co., Ltd(http://www.samsung.com)


0. Introduction
0. Introduction
===============


The generic cpu cooling(freq clipping) provides registration/unregistration APIs
The generic cpu cooling(freq clipping) provides registration/unregistration APIs
to the caller. The binding of the cooling devices to the trip point is left for
to the caller. The binding of the cooling devices to the trip point is left for
the user. The registration APIs returns the cooling device pointer.
the user. The registration APIs returns the cooling device pointer.


1. cpu cooling APIs
1. cpu cooling APIs
===================


1.1 cpufreq registration/unregistration APIs
1.1 cpufreq registration/unregistration APIs
1.1.1 struct thermal_cooling_device *cpufreq_cooling_register(
--------------------------------------------
	struct cpumask *clip_cpus)

    ::

	struct thermal_cooling_device
	*cpufreq_cooling_register(struct cpumask *clip_cpus)


    This interface function registers the cpufreq cooling device with the name
    This interface function registers the cpufreq cooling device with the name
    "thermal-cpufreq-%x". This api can support multiple instances of cpufreq
    "thermal-cpufreq-%x". This api can support multiple instances of cpufreq
    cooling devices.
    cooling devices.


   clip_cpus: cpumask of cpus where the frequency constraints will happen.
   clip_cpus:
	cpumask of cpus where the frequency constraints will happen.

    ::


1.1.2 struct thermal_cooling_device *of_cpufreq_cooling_register(
	struct thermal_cooling_device
					struct cpufreq_policy *policy)
	*of_cpufreq_cooling_register(struct cpufreq_policy *policy)


    This interface function registers the cpufreq cooling device with
    This interface function registers the cpufreq cooling device with
    the name "thermal-cpufreq-%x" linking it with a device tree node, in
    the name "thermal-cpufreq-%x" linking it with a device tree node, in
    order to bind it via the thermal DT code. This api can support multiple
    order to bind it via the thermal DT code. This api can support multiple
    instances of cpufreq cooling devices.
    instances of cpufreq cooling devices.


    policy: CPUFreq policy.
    policy:
	CPUFreq policy.


    ::


1.1.3 void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev)
	void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev)


    This interface function unregisters the "thermal-cpufreq-%x" cooling device.
    This interface function unregisters the "thermal-cpufreq-%x" cooling device.


    cdev: Cooling device pointer which has to be unregistered.
    cdev: Cooling device pointer which has to be unregistered.


2. Power models
2. Power models
===============


The power API registration functions provide a simple power model for
The power API registration functions provide a simple power model for
CPUs.  The current power is calculated as dynamic power (static power isn't
CPUs.  The current power is calculated as dynamic power (static power isn't
@@ -65,7 +80,7 @@ For a given processor implementation the primary factors are:
  variation.  In pathological cases this variation can be significant,
  variation.  In pathological cases this variation can be significant,
  but typically it is of a much lesser impact than the factors above.
  but typically it is of a much lesser impact than the factors above.


A high level dynamic power consumption model may then be represented as:
A high level dynamic power consumption model may then be represented as::


	Pdyn = f(run) * Voltage^2 * Frequency * Utilisation
	Pdyn = f(run) * Voltage^2 * Frequency * Utilisation


@@ -80,7 +95,7 @@ factors. Therefore, in initial implementation that contribution is
represented as a constant coefficient.  This is a simplification
represented as a constant coefficient.  This is a simplification
consistent with the relative contribution to overall power variation.
consistent with the relative contribution to overall power variation.


In this simplified representation our model becomes:
In this simplified representation our model becomes::


	Pdyn = Capacitance * Voltage^2 * Frequency * Utilisation
	Pdyn = Capacitance * Voltage^2 * Frequency * Utilisation


+30 −17
Original line number Original line Diff line number Diff line
========================
Kernel driver exynos_tmu
Kernel driver exynos_tmu
=================
========================


Supported chips:
Supported chips:

* ARM SAMSUNG EXYNOS4, EXYNOS5 series of SoC
* ARM SAMSUNG EXYNOS4, EXYNOS5 series of SoC

  Datasheet: Not publicly available
  Datasheet: Not publicly available


Authors: Donggeun Kim <dg77.kim@samsung.com>
Authors: Donggeun Kim <dg77.kim@samsung.com>
@@ -19,25 +22,32 @@ Temperature can be taken from the temperature code.
There are three equations converting from temperature to temperature code.
There are three equations converting from temperature to temperature code.


The three equations are:
The three equations are:
  1. Two point trimming
  1. Two point trimming::

	Tc = (T - 25) * (TI2 - TI1) / (85 - 25) + TI1
	Tc = (T - 25) * (TI2 - TI1) / (85 - 25) + TI1


  2. One point trimming
  2. One point trimming::

	Tc = T + TI1 - 25
	Tc = T + TI1 - 25


  3. No trimming
  3. No trimming::

	Tc = T + 50
	Tc = T + 50


  Tc: Temperature code, T: Temperature,
  Tc:
  TI1: Trimming info for 25 degree Celsius (stored at TRIMINFO register)
       Temperature code, T: Temperature,
  TI1:
       Trimming info for 25 degree Celsius (stored at TRIMINFO register)
       Temperature code measured at 25 degree Celsius which is unchanged
       Temperature code measured at 25 degree Celsius which is unchanged
  TI2: Trimming info for 85 degree Celsius (stored at TRIMINFO register)
  TI2:
       Trimming info for 85 degree Celsius (stored at TRIMINFO register)
       Temperature code measured at 85 degree Celsius which is unchanged
       Temperature code measured at 85 degree Celsius which is unchanged


TMU(Thermal Management Unit) in EXYNOS4/5 generates interrupt
TMU(Thermal Management Unit) in EXYNOS4/5 generates interrupt
when temperature exceeds pre-defined levels.
when temperature exceeds pre-defined levels.
The maximum number of configurable threshold is five.
The maximum number of configurable threshold is five.
The threshold levels are defined as follows:
The threshold levels are defined as follows::

  Level_0: current temperature > trigger_level_0 + threshold
  Level_0: current temperature > trigger_level_0 + threshold
  Level_1: current temperature > trigger_level_1 + threshold
  Level_1: current temperature > trigger_level_1 + threshold
  Level_2: current temperature > trigger_level_2 + threshold
  Level_2: current temperature > trigger_level_2 + threshold
@@ -54,24 +64,27 @@ it can be used to synchronize the cooling action.
TMU driver description:
TMU driver description:
-----------------------
-----------------------


The exynos thermal driver is structured as,
The exynos thermal driver is structured as::


					Kernel Core thermal framework
					Kernel Core thermal framework
				(thermal_core.c, step_wise.c, cpu_cooling.c)
				(thermal_core.c, step_wise.c, cpu_cooling.c)
								^
								^
								|
								|
								|
								|
TMU configuration data -------> TMU Driver  <------> Exynos Core thermal wrapper
  TMU configuration data -----> TMU Driver  <----> Exynos Core thermal wrapper
  (exynos_tmu_data.c)	      (exynos_tmu.c)	   (exynos_thermal_common.c)
  (exynos_tmu_data.c)	      (exynos_tmu.c)	   (exynos_thermal_common.c)
  (exynos_tmu_data.h)	      (exynos_tmu.h)	   (exynos_thermal_common.h)
  (exynos_tmu_data.h)	      (exynos_tmu.h)	   (exynos_thermal_common.h)


a) TMU configuration data: This consist of TMU register offsets/bitfields
a) TMU configuration data:
		This consist of TMU register offsets/bitfields
		described through structure exynos_tmu_registers. Also several
		described through structure exynos_tmu_registers. Also several
		other platform data (struct exynos_tmu_platform_data) members
		other platform data (struct exynos_tmu_platform_data) members
		are used to configure the TMU.
		are used to configure the TMU.
b) TMU driver: This component initialises the TMU controller and sets different
b) TMU driver:
		This component initialises the TMU controller and sets different
		thresholds. It invokes core thermal implementation with the call
		thresholds. It invokes core thermal implementation with the call
		exynos_report_trigger.
		exynos_report_trigger.
c) Exynos Core thermal wrapper: This provides 3 wrapper function to use the
c) Exynos Core thermal wrapper:
		This provides 3 wrapper function to use the
		Kernel core thermal framework. They are exynos_unregister_thermal,
		Kernel core thermal framework. They are exynos_unregister_thermal,
		exynos_register_thermal and exynos_report_trigger.
		exynos_register_thermal and exynos_report_trigger.
+61 −0
Original line number Original line Diff line number Diff line
EXYNOS EMULATION MODE
=====================
========================
Exynos Emulation Mode
=====================


Copyright (C) 2012 Samsung Electronics
Copyright (C) 2012 Samsung Electronics


@@ -8,30 +9,37 @@ Written by Jonghwa Lee <jonghwa3.lee@samsung.com>
Description
Description
-----------
-----------


Exynos 4x12 (4212, 4412) and 5 series provide emulation mode for thermal management unit.
Exynos 4x12 (4212, 4412) and 5 series provide emulation mode for thermal
Thermal emulation mode supports software debug for TMU's operation. User can set temperature
management unit. Thermal emulation mode supports software debug for
manually with software code and TMU will read current temperature from user value not from
TMU's operation. User can set temperature manually with software code
sensor's value.
and TMU will read current temperature from user value not from sensor's
value.


Enabling CONFIG_THERMAL_EMULATION option will make this support available.
Enabling CONFIG_THERMAL_EMULATION option will make this support
When it's enabled, sysfs node will be created as
available. When it's enabled, sysfs node will be created as
/sys/devices/virtual/thermal/thermal_zone'zone id'/emul_temp.
/sys/devices/virtual/thermal/thermal_zone'zone id'/emul_temp.


The sysfs node, 'emul_node', will contain value 0 for the initial state. When you input any
The sysfs node, 'emul_node', will contain value 0 for the initial state.
temperature you want to update to sysfs node, it automatically enable emulation mode and
When you input any temperature you want to update to sysfs node, it
current temperature will be changed into it.
automatically enable emulation mode and current temperature will be
(Exynos also supports user changeable delay time which would be used to delay of
changed into it.
 changing temperature. However, this node only uses same delay of real sensing time, 938us.)


Exynos emulation mode requires synchronous of value changing and enabling. It means when you
(Exynos also supports user changeable delay time which would be used to
want to update the any value of delay or next temperature, then you have to enable emulation
delay of changing temperature. However, this node only uses same delay
mode at the same time. (Or you have to keep the mode enabling.) If you don't, it fails to
of real sensing time, 938us.)
change the value to updated one and just use last succeessful value repeatedly. That's why

this node gives users the right to change termerpature only. Just one interface makes it more
Exynos emulation mode requires synchronous of value changing and
simply to use.
enabling. It means when you want to update the any value of delay or
next temperature, then you have to enable emulation mode at the same
time. (Or you have to keep the mode enabling.) If you don't, it fails to
change the value to updated one and just use last succeessful value
repeatedly. That's why this node gives users the right to change
termerpature only. Just one interface makes it more simply to use.


Disabling emulation mode only requires writing value 0 to sysfs node.
Disabling emulation mode only requires writing value 0 to sysfs node.


::



  TEMP	120 |
  TEMP	120 |
	    |
	    |
+18 −0
Original line number Original line Diff line number Diff line
:orphan:

=======
Thermal
=======

.. toctree::
   :maxdepth: 1

   cpu-cooling-api
   sysfs-api
   power_allocator

   exynos_thermal
   exynos_thermal_emulation
   intel_powerclamp
   nouveau_thermal
   x86_pkg_temperature_thermal
+93 −90
Original line number Original line Diff line number Diff line
=======================
=======================
			 INTEL POWERCLAMP DRIVER
Intel Powerclamp Driver
=======================
=======================
By: Arjan van de Ven <arjan@linux.intel.com>
    Jacob Pan <jacob.jun.pan@linux.intel.com>


Contents:
By:
  - Arjan van de Ven <arjan@linux.intel.com>
  - Jacob Pan <jacob.jun.pan@linux.intel.com>

.. Contents:

	(*) Introduction
	(*) Introduction
	    - Goals and Objectives
	    - Goals and Objectives


@@ -23,7 +26,6 @@ Contents:
	    - Generic Thermal Layer (sysfs)
	    - Generic Thermal Layer (sysfs)
	    - Kernel APIs (TBD)
	    - Kernel APIs (TBD)


============
INTRODUCTION
INTRODUCTION
============
============


@@ -47,7 +49,6 @@ scalability, and user experience. In many cases, clear advantage is
shown over taking the CPU offline or modulating the CPU clock.
shown over taking the CPU offline or modulating the CPU clock.




===================
THEORY OF OPERATION
THEORY OF OPERATION
===================
===================


@@ -57,7 +58,8 @@ Idle Injection
On modern Intel processors (Nehalem or later), package level C-state
On modern Intel processors (Nehalem or later), package level C-state
residency is available in MSRs, thus also available to the kernel.
residency is available in MSRs, thus also available to the kernel.


These MSRs are:
These MSRs are::

      #define MSR_PKG_C2_RESIDENCY      0x60D
      #define MSR_PKG_C2_RESIDENCY      0x60D
      #define MSR_PKG_C3_RESIDENCY      0x3F8
      #define MSR_PKG_C3_RESIDENCY      0x3F8
      #define MSR_PKG_C6_RESIDENCY      0x3F9
      #define MSR_PKG_C6_RESIDENCY      0x3F9
@@ -96,6 +98,8 @@ are not masked. Tests show that the extra wakeups from scheduler tick
have a dramatic impact on the effectiveness of the powerclamp driver
have a dramatic impact on the effectiveness of the powerclamp driver
on large scale systems (Westmere system with 80 processors).
on large scale systems (Westmere system with 80 processors).


::

  CPU0
  CPU0
		    ____________          ____________
		    ____________          ____________
  kidle_inject/0   |   sleep    |  mwait |  sleep     |
  kidle_inject/0   |   sleep    |  mwait |  sleep     |
@@ -158,7 +162,8 @@ Compensation to each target ratio consists of two parts:
	slowing down CPU activities.
	slowing down CPU activities.


A debugfs file is provided for the user to examine compensation
A debugfs file is provided for the user to examine compensation
progress and results, such as on a Westmere system.
progress and results, such as on a Westmere system::

  [jacob@nex01 ~]$ cat
  [jacob@nex01 ~]$ cat
  /sys/kernel/debug/intel_powerclamp/powerclamp_calib
  /sys/kernel/debug/intel_powerclamp/powerclamp_calib
  controlling cpu: 0
  controlling cpu: 0
@@ -217,9 +222,8 @@ keeps track of clamping kernel threads, even after they are migrated
to other CPUs, after a CPU offline event.
to other CPUs, after a CPU offline event.




=====================
Performance Analysis
Performance Analysis
=====================
====================
This section describes the general performance data collected on
This section describes the general performance data collected on
multiple systems, including Westmere (80P) and Ivy Bridge (4P, 8P).
multiple systems, including Westmere (80P) and Ivy Bridge (4P, 8P).


@@ -257,11 +261,10 @@ achieve up to 40% better performance per watt. (measured by a spin
counter summed over per CPU counting threads spawned for all running
counter summed over per CPU counting threads spawned for all running
CPUs).
CPUs).


====================
Usage and Interfaces
Usage and Interfaces
====================
====================
The powerclamp driver is registered to the generic thermal layer as a
The powerclamp driver is registered to the generic thermal layer as a
cooling device. Currently, it’s not bound to any thermal zones.
cooling device. Currently, it’s not bound to any thermal zones::


  jacob@chromoly:/sys/class/thermal/cooling_device14$ grep . *
  jacob@chromoly:/sys/class/thermal/cooling_device14$ grep . *
  cur_state:0
  cur_state:0
@@ -278,9 +281,9 @@ cur_state returns value -1 instead of 0 which is to avoid confusing
100% busy state with the disabled state.
100% busy state with the disabled state.


Example usage:
Example usage:
- To inject 25% idle time
- To inject 25% idle time::

	$ sudo sh -c "echo 25 > /sys/class/thermal/cooling_device80/cur_state
	$ sudo sh -c "echo 25 > /sys/class/thermal/cooling_device80/cur_state
"


If the system is not busy and has more than 25% idle time already,
If the system is not busy and has more than 25% idle time already,
then the powerclamp driver will not start idle injection. Using Top
then the powerclamp driver will not start idle injection. Using Top
@@ -292,7 +295,7 @@ idle time is accounted as normal idle in that common code path is
taken as the idle task.
taken as the idle task.


In this example, 24.1% idle is shown. This helps the system admin or
In this example, 24.1% idle is shown. This helps the system admin or
user determine the cause of slowdown, when a powerclamp driver is in action.
user determine the cause of slowdown, when a powerclamp driver is in action::




  Tasks: 197 total,   1 running, 196 sleeping,   0 stopped,   0 zombie
  Tasks: 197 total,   1 running, 196 sleeping,   0 stopped,   0 zombie
Loading