Commit a771ea64 authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull power management updates from Rafael Wysocki:
 "These are mostly minor improvements all over including new CPU IDs for
  the Intel RAPL driver, an Energy Model rework to use micro-Watt as the
  power unit, cpufreq fixes and cleanus, cpuidle updates, devfreq
  updates, documentation cleanups and a new version of the pm-graph
  suite of utilities.

  Specifics:

   - Make cpufreq_show_cpus() more straightforward (Viresh Kumar).

   - Drop unnecessary CPU hotplug locking from store() used by cpufreq
     sysfs attributes (Viresh Kumar).

   - Make the ACPI cpufreq driver support the boost control interface on
     Zhaoxin/Centaur processors (Tony W Wang-oc).

   - Print a warning message on attempts to free an active cpufreq
     policy which should never happen (Viresh Kumar).

   - Fix grammar in the Kconfig help text for the loongson2 cpufreq
     driver (Randy Dunlap).

   - Use cpumask_var_t for an on-stack CPU mask in the ondemand cpufreq
     governor (Zhao Liu).

   - Add trace points for guest_halt_poll_ns grow/shrink to the haltpoll
     cpuidle driver (Eiichi Tsukata).

   - Modify intel_idle to treat C1 and C1E as independent idle states on
     Sapphire Rapids (Artem Bityutskiy).

   - Extend support for wakeirq to callback wrappers used during system
     suspend and resume (Ulf Hansson).

   - Defer waiting for device probe before loading a hibernation image
     till the first actual device access to avoid possible deadlocks
     reported by syzbot (Tetsuo Handa).

   - Unify device_init_wakeup() for PM_SLEEP and !PM_SLEEP (Bjorn
     Helgaas).

   - Add Raptor Lake-P to the list of processors supported by the Intel
     RAPL driver (George D Sworo).

   - Add Alder Lake-N and Raptor Lake-P to the list of processors for
     which Power Limit4 is supported in the Intel RAPL driver (Sumeet
     Pawnikar).

   - Make pm_genpd_remove() check genpd_debugfs_dir against NULL before
     attempting to remove it (Hsin-Yi Wang).

   - Change the Energy Model code to represent power in micro-Watts and
     adjust its users accordingly (Lukasz Luba).

   - Add new devfreq driver for Mediatek CCI (Cache Coherent
     Interconnect) (Johnson Wang).

   - Convert the Samsung Exynos SoC Bus bindings to DT schema of
     exynos-bus.c (Krzysztof Kozlowski).

   - Address kernel-doc warnings by adding the description for unused
     function parameters in devfreq core (Mauro Carvalho Chehab).

   - Use NULL to pass a null pointer rather than zero according to the
     function propotype in imx-bus.c (Colin Ian King).

   - Print error message instead of error interger value in
     tegra30-devfreq.c (Dmitry Osipenko).

   - Add checks to prevent setting negative frequency QoS limits for
     CPUs (Shivnandan Kumar).

   - Update the pm-graph suite of utilities to the latest revision 5.9
     including multiple improvements (Todd Brandt).

   - Drop pme_interrupt reference from the PCI power management
     documentation (Mario Limonciello)"

* tag 'pm-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (27 commits)
  powercap: RAPL: Add Power Limit4 support for Alder Lake-N and Raptor Lake-P
  PM: QoS: Add check to make sure CPU freq is non-negative
  PM: hibernate: defer device probing when resuming from hibernation
  intel_idle: make SPR C1 and C1E be independent
  cpufreq: ondemand: Use cpumask_var_t for on-stack cpu mask
  cpufreq: loongson2: fix Kconfig "its" grammar
  pm-graph v5.9
  cpufreq: Warn users while freeing active policy
  cpufreq: scmi: Support the power scale in micro-Watts in SCMI v3.1
  firmware: arm_scmi: Get detailed power scale from perf
  Documentation: EM: Switch to micro-Watts scale
  PM: EM: convert power field to micro-Watts precision and align drivers
  PM / devfreq: tegra30: Add error message for devm_devfreq_add_device()
  PM / devfreq: imx-bus: use NULL to pass a null pointer rather than zero
  PM / devfreq: shut up kernel-doc warnings
  dt-bindings: interconnect: samsung,exynos-bus: convert to dtschema
  PM / devfreq: mediatek: Introduce MediaTek CCI devfreq driver
  dt-bindings: interconnect: Add MediaTek CCI dt-bindings
  PM: domains: Ensure genpd_debugfs_dir exists before remove
  PM: runtime: Extend support for wakeirq for force_suspend|resume
  ...
parents 8fa0db3a aa727b7b
Loading
Loading
Loading
Loading
+0 −488
Original line number Diff line number Diff line
* Generic Exynos Bus frequency device

The Samsung Exynos SoC has many buses for data transfer between DRAM
and sub-blocks in SoC. Most Exynos SoCs share the common architecture
for buses. Generally, each bus of Exynos SoC includes a source clock
and a power line, which are able to change the clock frequency
of the bus in runtime. To monitor the usage of each bus in runtime,
the driver uses the PPMU (Platform Performance Monitoring Unit), which
is able to measure the current load of sub-blocks.

The Exynos SoC includes the various sub-blocks which have the each AXI bus.
The each AXI bus has the owned source clock but, has not the only owned
power line. The power line might be shared among one more sub-blocks.
So, we can divide into two type of device as the role of each sub-block.
There are two type of bus devices as following:
- parent bus device
- passive bus device

Basically, parent and passive bus device share the same power line.
The parent bus device can only change the voltage of shared power line
and the rest bus devices (passive bus device) depend on the decision of
the parent bus device. If there are three blocks which share the VDD_xxx
power line, Only one block should be parent device and then the rest blocks
should depend on the parent device as passive device.

	VDD_xxx |--- A block (parent)
		|--- B block (passive)
		|--- C block (passive)

There are a little different composition among Exynos SoC because each Exynos
SoC has different sub-blocks. Therefore, such difference should be specified
in devicetree file instead of each device driver. In result, this driver
is able to support the bus frequency for all Exynos SoCs.

Required properties for all bus devices:
- compatible: Should be "samsung,exynos-bus".
- clock-names : the name of clock used by the bus, "bus".
- clocks : phandles for clock specified in "clock-names" property.
- operating-points-v2: the OPP table including frequency/voltage information
  to support DVFS (Dynamic Voltage/Frequency Scaling) feature.

Required properties only for parent bus device:
- vdd-supply: the regulator to provide the buses with the voltage.
- devfreq-events: the devfreq-event device to monitor the current utilization
  of buses.

Required properties only for passive bus device:
- devfreq: the parent bus device.

Optional properties only for parent bus device:
- exynos,saturation-ratio: the percentage value which is used to calibrate
			the performance count against total cycle count.

Optional properties for the interconnect functionality (QoS frequency
constraints):
- #interconnect-cells: should be 0.
- interconnects: as documented in ../interconnect.txt, describes a path at the
  higher level interconnects used by this interconnect provider.
  If this interconnect provider is directly linked to a top level interconnect
  provider the property contains only one phandle. The provider extends
  the interconnect graph by linking its node to a node registered by provider
  pointed to by first phandle in the 'interconnects' property.

- samsung,data-clock-ratio: ratio of the data throughput in B/s to minimum data
   clock frequency in Hz, default value is 8 when this property is missing.

Detailed correlation between sub-blocks and power line according to Exynos SoC:
- In case of Exynos3250, there are two power line as following:
	VDD_MIF |--- DMC

	VDD_INT |--- LEFTBUS (parent device)
		|--- PERIL
		|--- MFC
		|--- G3D
		|--- RIGHTBUS
		|--- PERIR
		|--- FSYS
		|--- LCD0
		|--- PERIR
		|--- ISP
		|--- CAM

- In case of Exynos4210, there is one power line as following:
	VDD_INT |--- DMC (parent device)
		|--- LEFTBUS
		|--- PERIL
		|--- MFC(L)
		|--- G3D
		|--- TV
		|--- LCD0
		|--- RIGHTBUS
		|--- PERIR
		|--- MFC(R)
		|--- CAM
		|--- FSYS
		|--- GPS
		|--- LCD0
		|--- LCD1

- In case of Exynos4x12, there are two power line as following:
	VDD_MIF |--- DMC

	VDD_INT |--- LEFTBUS (parent device)
		|--- PERIL
		|--- MFC(L)
		|--- G3D
		|--- TV
		|--- IMAGE
		|--- RIGHTBUS
		|--- PERIR
		|--- MFC(R)
		|--- CAM
		|--- FSYS
		|--- GPS
		|--- LCD0
		|--- ISP

- In case of Exynos5422, there are two power line as following:
	VDD_MIF |--- DREX 0 (parent device, DRAM EXpress controller)
	        |--- DREX 1

	VDD_INT |--- NoC_Core (parent device)
		|--- G2D
		|--- G3D
		|--- DISP1
		|--- NoC_WCORE
		|--- GSCL
		|--- MSCL
		|--- ISP
		|--- MFC
		|--- GEN
		|--- PERIS
		|--- PERIC
		|--- FSYS
		|--- FSYS2

- In case of Exynos5433, there is VDD_INT power line as following:
	VDD_INT |--- G2D (parent device)
		|--- MSCL
		|--- GSCL
		|--- JPEG
		|--- MFC
		|--- HEVC
		|--- BUS0
		|--- BUS1
		|--- BUS2
		|--- PERIS (Fixed clock rate)
		|--- PERIC (Fixed clock rate)
		|--- FSYS  (Fixed clock rate)

Example 1:
	Show the AXI buses of Exynos3250 SoC. Exynos3250 divides the buses to
	power line (regulator). The MIF (Memory Interface) AXI bus is used to
	transfer data between DRAM and CPU and uses the VDD_MIF regulator.

	- MIF (Memory Interface) block
	: VDD_MIF |--- DMC (Dynamic Memory Controller)

	- INT (Internal) block
	: VDD_INT |--- LEFTBUS (parent device)
		  |--- PERIL
		  |--- MFC
		  |--- G3D
		  |--- RIGHTBUS
		  |--- FSYS
		  |--- LCD0
		  |--- PERIR
		  |--- ISP
		  |--- CAM

	- MIF bus's frequency/voltage table
	-----------------------
	|Lv| Freq   | Voltage |
	-----------------------
	|L1| 50000  |800000   |
	|L2| 100000 |800000   |
	|L3| 134000 |800000   |
	|L4| 200000 |825000   |
	|L5| 400000 |875000   |
	-----------------------

	- INT bus's frequency/voltage table
	----------------------------------------------------------
	|Block|LEFTBUS|RIGHTBUS|MCUISP |ISP    |PERIL  ||VDD_INT |
	| name|       |LCD0    |       |       |       ||        |
	|     |       |FSYS    |       |       |       ||        |
	|     |       |MFC     |       |       |       ||        |
	----------------------------------------------------------
	|Mode |*parent|passive |passive|passive|passive||        |
	----------------------------------------------------------
	|Lv   |Frequency                               ||Voltage |
	----------------------------------------------------------
	|L1   |50000  |50000   |50000  |50000  |50000  ||900000  |
	|L2   |80000  |80000   |80000  |80000  |80000  ||900000  |
	|L3   |100000 |100000  |100000 |100000 |100000 ||1000000 |
	|L4   |134000 |134000  |200000 |200000 |       ||1000000 |
	|L5   |200000 |200000  |400000 |300000 |       ||1000000 |
	----------------------------------------------------------

Example 2:
	The bus of DMC (Dynamic Memory Controller) block in exynos3250.dtsi
	is listed below:

	bus_dmc: bus_dmc {
		compatible = "samsung,exynos-bus";
		clocks = <&cmu_dmc CLK_DIV_DMC>;
		clock-names = "bus";
		operating-points-v2 = <&bus_dmc_opp_table>;
		status = "disabled";
	};

	bus_dmc_opp_table: opp_table1 {
		compatible = "operating-points-v2";
		opp-shared;

		opp-50000000 {
			opp-hz = /bits/ 64 <50000000>;
			opp-microvolt = <800000>;
		};
		opp-100000000 {
			opp-hz = /bits/ 64 <100000000>;
			opp-microvolt = <800000>;
		};
		opp-134000000 {
			opp-hz = /bits/ 64 <134000000>;
			opp-microvolt = <800000>;
		};
		opp-200000000 {
			opp-hz = /bits/ 64 <200000000>;
			opp-microvolt = <825000>;
		};
		opp-400000000 {
			opp-hz = /bits/ 64 <400000000>;
			opp-microvolt = <875000>;
		};
	};

	bus_leftbus: bus_leftbus {
		compatible = "samsung,exynos-bus";
		clocks = <&cmu CLK_DIV_GDL>;
		clock-names = "bus";
		operating-points-v2 = <&bus_leftbus_opp_table>;
		status = "disabled";
	};

	bus_rightbus: bus_rightbus {
		compatible = "samsung,exynos-bus";
		clocks = <&cmu CLK_DIV_GDR>;
		clock-names = "bus";
		operating-points-v2 = <&bus_leftbus_opp_table>;
		status = "disabled";
	};

	bus_lcd0: bus_lcd0 {
		compatible = "samsung,exynos-bus";
		clocks = <&cmu CLK_DIV_ACLK_160>;
		clock-names = "bus";
		operating-points-v2 = <&bus_leftbus_opp_table>;
		status = "disabled";
	};

	bus_fsys: bus_fsys {
		compatible = "samsung,exynos-bus";
		clocks = <&cmu CLK_DIV_ACLK_200>;
		clock-names = "bus";
		operating-points-v2 = <&bus_leftbus_opp_table>;
		status = "disabled";
	};

	bus_mcuisp: bus_mcuisp {
		compatible = "samsung,exynos-bus";
		clocks = <&cmu CLK_DIV_ACLK_400_MCUISP>;
		clock-names = "bus";
		operating-points-v2 = <&bus_mcuisp_opp_table>;
		status = "disabled";
	};

	bus_isp: bus_isp {
		compatible = "samsung,exynos-bus";
		clocks = <&cmu CLK_DIV_ACLK_266>;
		clock-names = "bus";
		operating-points-v2 = <&bus_isp_opp_table>;
		status = "disabled";
	};

	bus_peril: bus_peril {
		compatible = "samsung,exynos-bus";
		clocks = <&cmu CLK_DIV_ACLK_100>;
		clock-names = "bus";
		operating-points-v2 = <&bus_peril_opp_table>;
		status = "disabled";
	};

	bus_mfc: bus_mfc {
		compatible = "samsung,exynos-bus";
		clocks = <&cmu CLK_SCLK_MFC>;
		clock-names = "bus";
		operating-points-v2 = <&bus_leftbus_opp_table>;
		status = "disabled";
	};

	bus_leftbus_opp_table: opp_table1 {
		compatible = "operating-points-v2";
		opp-shared;

		opp-50000000 {
			opp-hz = /bits/ 64 <50000000>;
			opp-microvolt = <900000>;
		};
		opp-80000000 {
			opp-hz = /bits/ 64 <80000000>;
			opp-microvolt = <900000>;
		};
		opp-100000000 {
			opp-hz = /bits/ 64 <100000000>;
			opp-microvolt = <1000000>;
		};
		opp-134000000 {
			opp-hz = /bits/ 64 <134000000>;
			opp-microvolt = <1000000>;
		};
		opp-200000000 {
			opp-hz = /bits/ 64 <200000000>;
			opp-microvolt = <1000000>;
		};
	};

	bus_mcuisp_opp_table: opp_table2 {
		compatible = "operating-points-v2";
		opp-shared;

		opp-50000000 {
			opp-hz = /bits/ 64 <50000000>;
		};
		opp-80000000 {
			opp-hz = /bits/ 64 <80000000>;
		};
		opp-100000000 {
			opp-hz = /bits/ 64 <100000000>;
		};
		opp-200000000 {
			opp-hz = /bits/ 64 <200000000>;
		};
		opp-400000000 {
			opp-hz = /bits/ 64 <400000000>;
		};
	};

	bus_isp_opp_table: opp_table3 {
		compatible = "operating-points-v2";
		opp-shared;

		opp-50000000 {
			opp-hz = /bits/ 64 <50000000>;
		};
		opp-80000000 {
			opp-hz = /bits/ 64 <80000000>;
		};
		opp-100000000 {
			opp-hz = /bits/ 64 <100000000>;
		};
		opp-200000000 {
			opp-hz = /bits/ 64 <200000000>;
		};
		opp-300000000 {
			opp-hz = /bits/ 64 <300000000>;
		};
	};

	bus_peril_opp_table: opp_table4 {
		compatible = "operating-points-v2";
		opp-shared;

		opp-50000000 {
			opp-hz = /bits/ 64 <50000000>;
		};
		opp-80000000 {
			opp-hz = /bits/ 64 <80000000>;
		};
		opp-100000000 {
			opp-hz = /bits/ 64 <100000000>;
		};
	};


	Usage case to handle the frequency and voltage of bus on runtime
	in exynos3250-rinato.dts is listed below:

	&bus_dmc {
		devfreq-events = <&ppmu_dmc0_3>, <&ppmu_dmc1_3>;
		vdd-supply = <&buck1_reg>;	/* VDD_MIF */
		status = "okay";
	};

	&bus_leftbus {
		devfreq-events = <&ppmu_leftbus_3>, <&ppmu_rightbus_3>;
		vdd-supply = <&buck3_reg>;
		status = "okay";
	};

	&bus_rightbus {
		devfreq = <&bus_leftbus>;
		status = "okay";
	};

	&bus_lcd0 {
		devfreq = <&bus_leftbus>;
		status = "okay";
	};

	&bus_fsys {
		devfreq = <&bus_leftbus>;
		status = "okay";
	};

	&bus_mcuisp {
		devfreq = <&bus_leftbus>;
		status = "okay";
	};

	&bus_isp {
		devfreq = <&bus_leftbus>;
		status = "okay";
	};

	&bus_peril {
		devfreq = <&bus_leftbus>;
		status = "okay";
	};

	&bus_mfc {
		devfreq = <&bus_leftbus>;
		status = "okay";
	};

Example 3:
	An interconnect path "bus_display -- bus_leftbus -- bus_dmc" on
	Exynos4412 SoC with video mixer as an interconnect consumer device.

	soc {
		bus_dmc: bus_dmc {
			compatible = "samsung,exynos-bus";
			clocks = <&clock CLK_DIV_DMC>;
			clock-names = "bus";
			operating-points-v2 = <&bus_dmc_opp_table>;
			samsung,data-clock-ratio = <4>;
			#interconnect-cells = <0>;
		};

		bus_leftbus: bus_leftbus {
			compatible = "samsung,exynos-bus";
			clocks = <&clock CLK_DIV_GDL>;
			clock-names = "bus";
			operating-points-v2 = <&bus_leftbus_opp_table>;
			#interconnect-cells = <0>;
			interconnects = <&bus_dmc>;
		};

		bus_display: bus_display {
			compatible = "samsung,exynos-bus";
			clocks = <&clock CLK_ACLK160>;
			clock-names = "bus";
			operating-points-v2 = <&bus_display_opp_table>;
			#interconnect-cells = <0>;
			interconnects = <&bus_leftbus &bus_dmc>;
		};

		bus_dmc_opp_table: opp_table1 {
			compatible = "operating-points-v2";
			/* ... */
		}

		bus_leftbus_opp_table: opp_table3 {
			compatible = "operating-points-v2";
			/* ... */
		};

		bus_display_opp_table: opp_table4 {
			compatible = "operating-points-v2";
			/* .. */
		};

		&mixer {
			compatible = "samsung,exynos4212-mixer";
			interconnects = <&bus_display &bus_dmc>;
			/* ... */
		};
	};
+141 −0
Original line number Diff line number Diff line
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
%YAML 1.2
---
$id: http://devicetree.org/schemas/interconnect/mediatek,cci.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#

title: MediaTek Cache Coherent Interconnect (CCI) frequency and voltage scaling

maintainers:
  - Jia-Wei Chang <jia-wei.chang@mediatek.com>
  - Johnson Wang <johnson.wang@mediatek.com>

description: |
  MediaTek Cache Coherent Interconnect (CCI) is a hardware engine used by
  MT8183 and MT8186 SoCs to scale the frequency and adjust the voltage in
  hardware. It can also optimize the voltage to reduce the power consumption.

properties:
  compatible:
    enum:
      - mediatek,mt8183-cci
      - mediatek,mt8186-cci

  clocks:
    items:
      - description:
          The multiplexer for clock input of the bus.
      - description:
          A parent of "bus" clock which is used as an intermediate clock source
          when the original clock source (PLL) is under transition and not
          stable yet.

  clock-names:
    items:
      - const: cci
      - const: intermediate

  operating-points-v2: true
  opp-table: true

  proc-supply:
    description:
      Phandle of the regulator for CCI that provides the supply voltage.

  sram-supply:
    description:
      Phandle of the regulator for sram of CCI that provides the supply
      voltage. When it is present, the implementation needs to do
      "voltage tracking" to step by step scale up/down Vproc and Vsram to fit
      SoC specific needs. When absent, the voltage scaling flow is handled by
      hardware, hence no software "voltage tracking" is needed.

required:
  - compatible
  - clocks
  - clock-names
  - operating-points-v2
  - proc-supply

additionalProperties: false

examples:
  - |
    #include <dt-bindings/clock/mt8183-clk.h>
    cci: cci {
        compatible = "mediatek,mt8183-cci";
        clocks = <&mcucfg CLK_MCU_BUS_SEL>,
                 <&topckgen CLK_TOP_ARMPLL_DIV_PLL1>;
        clock-names = "cci", "intermediate";
        operating-points-v2 = <&cci_opp>;
        proc-supply = <&mt6358_vproc12_reg>;
    };

    cci_opp: opp-table-cci {
        compatible = "operating-points-v2";
        opp-shared;
        opp2_00: opp-273000000 {
            opp-hz = /bits/ 64 <273000000>;
            opp-microvolt = <650000>;
        };
        opp2_01: opp-338000000 {
            opp-hz = /bits/ 64 <338000000>;
            opp-microvolt = <687500>;
        };
        opp2_02: opp-403000000 {
            opp-hz = /bits/ 64 <403000000>;
            opp-microvolt = <718750>;
        };
        opp2_03: opp-463000000 {
            opp-hz = /bits/ 64 <463000000>;
            opp-microvolt = <756250>;
        };
        opp2_04: opp-546000000 {
            opp-hz = /bits/ 64 <546000000>;
            opp-microvolt = <800000>;
        };
        opp2_05: opp-624000000 {
            opp-hz = /bits/ 64 <624000000>;
            opp-microvolt = <818750>;
        };
        opp2_06: opp-689000000 {
            opp-hz = /bits/ 64 <689000000>;
            opp-microvolt = <850000>;
        };
        opp2_07: opp-767000000 {
            opp-hz = /bits/ 64 <767000000>;
            opp-microvolt = <868750>;
        };
        opp2_08: opp-845000000 {
            opp-hz = /bits/ 64 <845000000>;
            opp-microvolt = <893750>;
        };
        opp2_09: opp-871000000 {
            opp-hz = /bits/ 64 <871000000>;
            opp-microvolt = <906250>;
        };
        opp2_10: opp-923000000 {
            opp-hz = /bits/ 64 <923000000>;
            opp-microvolt = <931250>;
        };
        opp2_11: opp-962000000 {
            opp-hz = /bits/ 64 <962000000>;
            opp-microvolt = <943750>;
        };
        opp2_12: opp-1027000000 {
            opp-hz = /bits/ 64 <1027000000>;
            opp-microvolt = <975000>;
        };
        opp2_13: opp-1092000000 {
            opp-hz = /bits/ 64 <1092000000>;
            opp-microvolt = <1000000>;
        };
        opp2_14: opp-1144000000 {
            opp-hz = /bits/ 64 <1144000000>;
            opp-microvolt = <1025000>;
        };
        opp2_15: opp-1196000000 {
            opp-hz = /bits/ 64 <1196000000>;
            opp-microvolt = <1050000>;
        };
    };
+290 −0

File added.

Preview size limit exceeded, changes collapsed.

+7 −7
Original line number Diff line number Diff line
@@ -20,20 +20,20 @@ possible source of information on its own, the EM framework intervenes as an
abstraction layer which standardizes the format of power cost tables in the
kernel, hence enabling to avoid redundant work.

The power values might be expressed in milli-Watts or in an 'abstract scale'.
The power values might be expressed in micro-Watts or in an 'abstract scale'.
Multiple subsystems might use the EM and it is up to the system integrator to
check that the requirements for the power value scale types are met. An example
can be found in the Energy-Aware Scheduler documentation
Documentation/scheduler/sched-energy.rst. For some subsystems like thermal or
powercap power values expressed in an 'abstract scale' might cause issues.
These subsystems are more interested in estimation of power used in the past,
thus the real milli-Watts might be needed. An example of these requirements can
thus the real micro-Watts might be needed. An example of these requirements can
be found in the Intelligent Power Allocation in
Documentation/driver-api/thermal/power_allocator.rst.
Kernel subsystems might implement automatic detection to check whether EM
registered devices have inconsistent scale (based on EM internal flag).
Important thing to keep in mind is that when the power values are expressed in
an 'abstract scale' deriving real energy in milli-Joules would not be possible.
an 'abstract scale' deriving real energy in micro-Joules would not be possible.

The figure below depicts an example of drivers (Arm-specific here, but the
approach is applicable to any architecture) providing power costs to the EM
@@ -98,7 +98,7 @@ Drivers are expected to register performance domains into the EM framework by
calling the following API::

  int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
		struct em_data_callback *cb, cpumask_t *cpus, bool milliwatts);
		struct em_data_callback *cb, cpumask_t *cpus, bool microwatts);

Drivers must provide a callback function returning <frequency, power> tuples
for each performance state. The callback function provided by the driver is free
@@ -106,10 +106,10 @@ to fetch data from any relevant location (DT, firmware, ...), and by any mean
deemed necessary. Only for CPU devices, drivers must specify the CPUs of the
performance domains using cpumask. For other devices than CPUs the last
argument must be set to NULL.
The last argument 'milliwatts' is important to set with correct value. Kernel
The last argument 'microwatts' is important to set with correct value. Kernel
subsystems which use EM might rely on this flag to check if all EM devices use
the same scale. If there are different scales, these subsystems might decide
to: return warning/error, stop working or panic.
to return warning/error, stop working or panic.
See Section 3. for an example of driver implementing this
callback, or Section 2.4 for further documentation on this API

@@ -137,7 +137,7 @@ The .get_cost() allows to provide the 'cost' values which reflect the
efficiency of the CPUs. This would allow to provide EAS information which
has different relation than what would be forced by the EM internal
formulas calculating 'cost' values. To register an EM for such platform, the
driver must set the flag 'milliwatts' to 0, provide .get_power() callback
driver must set the flag 'microwatts' to 0, provide .get_power() callback
and provide .get_cost() callback. The EM framework would handle such platform
properly during registration. A flag EM_PERF_DOMAIN_ARTIFICIAL is set for such
platform. Special care should be taken by other frameworks which are using EM
+1 −1
Original line number Diff line number Diff line
@@ -315,7 +315,7 @@ that these callbacks operate on::
					   configuration space */
	unsigned int	pme_support:5;	/* Bitmask of states from which PME#
					   can be generated */
	unsigned int	pme_interrupt:1;/* Is native PCIe PME signaling used? */
	unsigned int	pme_poll:1;	/* Poll device's PME status bit */
	unsigned int	d1_support:1;	/* Low power state D1 is supported */
	unsigned int	d2_support:1;	/* Low power state D2 is supported */
	unsigned int	no_d1d2:1;	/* D1 and D2 are forbidden */
Loading