Commit 0cfd8703 authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull power management updates from Rafael Wysocki:
 "These update several cpufreq drivers and the cpufreq core, add sysfs
  interface for exposing the time really spent in the platform low-power
  state during suspend-to-idle, update devfreq (core and drivers) and
  the pm-graph suite of tools and clean up code.

  Specifics:

   - Fix the frequency unit in cpufreq_verify_current_freq checks()
     Sanjay Chandrashekara)

   - Make mode_state_machine in amd-pstate static (Tom Rix)

   - Make the cpufreq core require drivers with target_index() to set
     freq_table (Viresh Kumar)

   - Fix typo in the ARM_BRCMSTB_AVS_CPUFREQ Kconfig entry (Jingyu Wang)

   - Use of_property_read_bool() for boolean properties in the pmac32
     cpufreq driver (Rob Herring)

   - Make the cpufreq sysfs interface return proper error codes on
     obviously invalid input (qinyu)

   - Add guided autonomous mode support to the AMD P-state driver (Wyes
     Karny)

   - Make the Intel P-state driver enable HWP IO boost on all server
     platforms (Srinivas Pandruvada)

   - Add opp and bandwidth support to tegra194 cpufreq driver (Sumit
     Gupta)

   - Use of_property_present() for testing DT property presence (Rob
     Herring)

   - Remove MODULE_LICENSE in non-modules (Nick Alcock)

   - Add SM7225 to cpufreq-dt-platdev blocklist (Luca Weiss)

   - Optimizations and fixes for qcom-cpufreq-hw driver (Krzysztof
     Kozlowski, Konrad Dybcio, and Bjorn Andersson)

   - DT binding updates for qcom-cpufreq-hw driver (Konrad Dybcio and
     Bartosz Golaszewski)

   - Updates and fixes for mediatek driver (Jia-Wei Chang and
     AngeloGioacchino Del Regno)

   - Use of_property_present() for testing DT property presence in the
     cpuidle code (Rob Herring)

   - Drop unnecessary (void *) conversions from the PM core (Li zeming)

   - Add sysfs files to represent time spent in a platform sleep state
     during suspend-to-idle and make AMD and Intel PMC drivers use them
     Mario Limonciello)

   - Use of_property_present() for testing DT property presence (Rob
     Herring)

   - Add set_required_opps() callback to the 'struct opp_table', to make
     the code paths cleaner (Viresh Kumar)

   - Update the pm-graph siute of utilities to v5.11 with the following
     changes:
       * New script which allows users to install the latest pm-graph
         from the upstream github repo.
       * Update all the dmesg suspend/resume PM print formats to be able
         to process recent timelines using dmesg only.
       * Add ethtool output to the log for the system's ethernet device
         if ethtool exists.
       * Make the tool more robustly handle events where mangled dmesg
         or ftrace outputs do not include all the requisite data.

   - Make the sleepgraph utility recognize "CPU killed" messages (Xueqin
     Luo)

   - Remove unneeded SRCU selection in Kconfig because it's always set
     from devfreq core (Paul E. McKenney)

   - Drop of_match_ptr() macro from exynos-bus.c because this driver is
     always using the DT table for driver probe (Krzysztof Kozlowski)

   - Use the preferred of_property_present() instead of the low-level
     of_get_property() on exynos-bus.c (Rob Herring)

   - Use devm_platform_get_and_ioream_resource() in exyno-ppmu.c (Yang
     Li)"

* tag 'pm-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (44 commits)
  platform/x86/intel/pmc: core: Report duration of time in HW sleep state
  platform/x86/intel/pmc: core: Always capture counters on suspend
  platform/x86/amd: pmc: Report duration of time in hw sleep state
  PM: Add sysfs files to represent time spent in hardware sleep state
  cpufreq: use correct unit when verify cur freq
  cpufreq: tegra194: add OPP support and set bandwidth
  cpufreq: amd-pstate: Make varaiable mode_state_machine static
  PM: core: Remove unnecessary (void *) conversions
  cpufreq: drivers with target_index() must set freq_table
  PM / devfreq: exynos-ppmu: Use devm_platform_get_and_ioremap_resource()
  OPP: Move required opps configuration to specialized callback
  OPP: Handle all genpd cases together in _set_required_opps()
  cpufreq: qcom-cpufreq-hw: Revert adding cpufreq qos
  dt-bindings: cpufreq: cpufreq-qcom-hw: Add QCM2290
  dt-bindings: cpufreq: cpufreq-qcom-hw: Sanitize data per compatible
  dt-bindings: cpufreq: cpufreq-qcom-hw: Allow just 1 frequency domain
  cpufreq: Add SM7225 to cpufreq-dt-platdev blocklist
  cpufreq: qcom-cpufreq-hw: fix double IO unmap and resource release on exit
  cpufreq: mediatek: Raise proc and sram max voltage for MT7622/7623
  cpufreq: mediatek: raise proc/sram max voltage for MT8516
  ...
parents 793582ff d3f2c402
Loading
Loading
Loading
Loading
+29 −0
Original line number Diff line number Diff line
@@ -413,6 +413,35 @@ Description:
		The /sys/power/suspend_stats/last_failed_step file contains
		the last failed step in the suspend/resume path.

What:		/sys/power/suspend_stats/last_hw_sleep
Date:		June 2023
Contact:	Mario Limonciello <mario.limonciello@amd.com>
Description:
		The /sys/power/suspend_stats/last_hw_sleep file
		contains the duration of time spent in a hardware sleep
		state in the most recent system suspend-resume cycle.
		This number is measured in microseconds.

What:		/sys/power/suspend_stats/total_hw_sleep
Date:		June 2023
Contact:	Mario Limonciello <mario.limonciello@amd.com>
Description:
		The /sys/power/suspend_stats/total_hw_sleep file
		contains the aggregate of time spent in a hardware sleep
		state since the kernel was booted. This number
		is measured in microseconds.

What:		/sys/power/suspend_stats/max_hw_sleep
Date:		June 2023
Contact:	Mario Limonciello <mario.limonciello@amd.com>
Description:
		The /sys/power/suspend_stats/max_hw_sleep file
		contains the maximum amount of time that the hardware can
		report for time spent in a hardware sleep state. When sleep
		cycles are longer than this time, the values for
		'total_hw_sleep' and 'last_hw_sleep' may not be accurate.
		This number is measured in microseconds.

What:		/sys/power/sync_on_suspend
Date:		October 2019
Contact:	Jonas Meurer <jonas@freesources.org>
+23 −17
Original line number Diff line number Diff line
@@ -339,6 +339,29 @@
			             This mode requires kvm-amd.avic=1.
			             (Default when IOMMU HW support is present.)

	amd_pstate=	[X86]
			disable
			  Do not enable amd_pstate as the default
			  scaling driver for the supported processors
			passive
			  Use amd_pstate with passive mode as a scaling driver.
			  In this mode autonomous selection is disabled.
			  Driver requests a desired performance level and platform
			  tries to match the same performance level if it is
			  satisfied by guaranteed performance level.
			active
			  Use amd_pstate_epp driver instance as the scaling driver,
			  driver provides a hint to the hardware if software wants
			  to bias toward performance (0x0) or energy efficiency (0xff)
			  to the CPPC firmware. then CPPC power algorithm will
			  calculate the runtime workload and adjust the realtime cores
			  frequency.
			guided
			  Activate guided autonomous mode. Driver requests minimum and
			  maximum performance level and the platform autonomously
			  selects a performance level in this range and appropriate
			  to the current workload.

	amijoy.map=	[HW,JOY] Amiga joystick support
			Map of devices attached to JOY0DAT and JOY1DAT
			Format: <a>,<b>
@@ -7062,20 +7085,3 @@
				xmon commands.
			off	xmon is disabled.
	amd_pstate=	[X86]
			disable
			  Do not enable amd_pstate as the default
			  scaling driver for the supported processors
			passive
			  Use amd_pstate as a scaling driver, driver requests a
			  desired performance on this abstract scale and the power
			  management firmware translates the requests into actual
			  hardware states (core frequency, data fabric and memory
			  clocks etc.)
			active
			  Use amd_pstate_epp driver instance as the scaling driver,
			  driver provides a hint to the hardware if software wants
			  to bias toward performance (0x0) or energy efficiency (0xff)
			  to the CPPC firmware. then CPPC power algorithm will
			  calculate the runtime workload and adjust the realtime cores
			  frequency.
+24 −7
Original line number Diff line number Diff line
@@ -303,13 +303,18 @@ efficiency frequency management method on AMD processors.
AMD Pstate Driver Operation Modes
=================================

``amd_pstate`` CPPC has two operation modes: CPPC Autonomous(active) mode and
CPPC non-autonomous(passive) mode.
active mode and passive mode can be chosen by different kernel parameters.
When in Autonomous mode, CPPC ignores requests done in the Desired Performance
Target register and takes into account only the values set to the Minimum requested
performance, Maximum requested performance, and Energy Performance Preference
registers. When Autonomous is disabled, it only considers the Desired Performance Target.
``amd_pstate`` CPPC has 3 operation modes: autonomous (active) mode,
non-autonomous (passive) mode and guided autonomous (guided) mode.
Active/passive/guided mode can be chosen by different kernel parameters.

- In autonomous mode, platform ignores the desired performance level request
  and takes into account only the values set to the minimum, maximum and energy
  performance preference registers.
- In non-autonomous mode, platform gets desired performance level
  from OS directly through Desired Performance Register.
- In guided-autonomous mode, platform sets operating performance level
  autonomously according to the current workload and within the limits set by
  OS through min and max performance registers.

Active Mode
------------
@@ -338,6 +343,15 @@ to the Performance Reduction Tolerance register. Above the nominal performance l
processor must provide at least nominal performance requested and go higher if current
operating conditions allow.

Guided Mode
-----------

``amd_pstate=guided``

If ``amd_pstate=guided`` is passed to kernel command line option then this mode
is activated.  In this mode, driver requests minimum and maximum performance
level and the platform autonomously selects a performance level in this range
and appropriate to the current workload.

User Space Interface in ``sysfs`` - General
===========================================
@@ -358,6 +372,9 @@ control its functionality at the system level. They are located in the
	"passive"
		The driver is functional and in the ``passive mode``

	"guided"
		The driver is functional and in the ``guided mode``

	"disable"
		The driver is unregistered and not functional now.

+116 −3
Original line number Diff line number Diff line
@@ -20,12 +20,20 @@ properties:
    oneOf:
      - description: v1 of CPUFREQ HW
        items:
          - enum:
              - qcom,qcm2290-cpufreq-hw
              - qcom,sc7180-cpufreq-hw
              - qcom,sdm845-cpufreq-hw
              - qcom,sm6115-cpufreq-hw
              - qcom,sm6350-cpufreq-hw
              - qcom,sm8150-cpufreq-hw
          - const: qcom,cpufreq-hw

      - description: v2 of CPUFREQ HW (EPSS)
        items:
          - enum:
              - qcom,qdu1000-cpufreq-epss
              - qcom,sa8775p-cpufreq-epss
              - qcom,sc7280-cpufreq-epss
              - qcom,sc8280xp-cpufreq-epss
              - qcom,sm6375-cpufreq-epss
@@ -36,14 +44,14 @@ properties:
          - const: qcom,cpufreq-epss

  reg:
    minItems: 2
    minItems: 1
    items:
      - description: Frequency domain 0 register region
      - description: Frequency domain 1 register region
      - description: Frequency domain 2 register region

  reg-names:
    minItems: 2
    minItems: 1
    items:
      - const: freq-domain0
      - const: freq-domain1
@@ -85,6 +93,111 @@ required:

additionalProperties: false

allOf:
  - if:
      properties:
        compatible:
          contains:
            enum:
              - qcom,qcm2290-cpufreq-hw
    then:
      properties:
        reg:
          minItems: 1
          maxItems: 1

        reg-names:
          minItems: 1
          maxItems: 1

        interrupts:
          minItems: 1
          maxItems: 1

        interrupt-names:
          minItems: 1

  - if:
      properties:
        compatible:
          contains:
            enum:
              - qcom,qdu1000-cpufreq-epss
              - qcom,sc7180-cpufreq-hw
              - qcom,sc8280xp-cpufreq-epss
              - qcom,sdm845-cpufreq-hw
              - qcom,sm6115-cpufreq-hw
              - qcom,sm6350-cpufreq-hw
              - qcom,sm6375-cpufreq-epss
    then:
      properties:
        reg:
          minItems: 2
          maxItems: 2

        reg-names:
          minItems: 2
          maxItems: 2

        interrupts:
          minItems: 2
          maxItems: 2

        interrupt-names:
          minItems: 2

  - if:
      properties:
        compatible:
          contains:
            enum:
              - qcom,sc7280-cpufreq-epss
              - qcom,sm8250-cpufreq-epss
              - qcom,sm8350-cpufreq-epss
              - qcom,sm8450-cpufreq-epss
              - qcom,sm8550-cpufreq-epss
    then:
      properties:
        reg:
          minItems: 3
          maxItems: 3

        reg-names:
          minItems: 3
          maxItems: 3

        interrupts:
          minItems: 3
          maxItems: 3

        interrupt-names:
          minItems: 3

  - if:
      properties:
        compatible:
          contains:
            enum:
              - qcom,sm8150-cpufreq-hw
    then:
      properties:
        reg:
          minItems: 3
          maxItems: 3

        reg-names:
          minItems: 3
          maxItems: 3

        # On some SoCs the Prime core shares the LMH irq with Big cores
        interrupts:
          minItems: 2
          maxItems: 2

        interrupt-names:
          minItems: 2


examples:
  - |
    #include <dt-bindings/clock/qcom,gcc-sdm845.h>
@@ -235,7 +348,7 @@ examples:
      #size-cells = <1>;

      cpufreq@17d43000 {
        compatible = "qcom,cpufreq-hw";
        compatible = "qcom,sdm845-cpufreq-hw", "qcom,cpufreq-hw";
        reg = <0x17d43000 0x1400>, <0x17d45800 0x1400>;
        reg-names = "freq-domain0", "freq-domain1";

+111 −7
Original line number Diff line number Diff line
@@ -1433,6 +1433,102 @@ int cppc_set_epp_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls, bool enable)
}
EXPORT_SYMBOL_GPL(cppc_set_epp_perf);

/**
 * cppc_get_auto_sel_caps - Read autonomous selection register.
 * @cpunum : CPU from which to read register.
 * @perf_caps : struct where autonomous selection register value is updated.
 */
int cppc_get_auto_sel_caps(int cpunum, struct cppc_perf_caps *perf_caps)
{
	struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpunum);
	struct cpc_register_resource *auto_sel_reg;
	u64  auto_sel;

	if (!cpc_desc) {
		pr_debug("No CPC descriptor for CPU:%d\n", cpunum);
		return -ENODEV;
	}

	auto_sel_reg = &cpc_desc->cpc_regs[AUTO_SEL_ENABLE];

	if (!CPC_SUPPORTED(auto_sel_reg))
		pr_warn_once("Autonomous mode is not unsupported!\n");

	if (CPC_IN_PCC(auto_sel_reg)) {
		int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpunum);
		struct cppc_pcc_data *pcc_ss_data = NULL;
		int ret = 0;

		if (pcc_ss_id < 0)
			return -ENODEV;

		pcc_ss_data = pcc_data[pcc_ss_id];

		down_write(&pcc_ss_data->pcc_lock);

		if (send_pcc_cmd(pcc_ss_id, CMD_READ) >= 0) {
			cpc_read(cpunum, auto_sel_reg, &auto_sel);
			perf_caps->auto_sel = (bool)auto_sel;
		} else {
			ret = -EIO;
		}

		up_write(&pcc_ss_data->pcc_lock);

		return ret;
	}

	return 0;
}
EXPORT_SYMBOL_GPL(cppc_get_auto_sel_caps);

/**
 * cppc_set_auto_sel - Write autonomous selection register.
 * @cpu    : CPU to which to write register.
 * @enable : the desired value of autonomous selection resiter to be updated.
 */
int cppc_set_auto_sel(int cpu, bool enable)
{
	int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpu);
	struct cpc_register_resource *auto_sel_reg;
	struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpu);
	struct cppc_pcc_data *pcc_ss_data = NULL;
	int ret = -EINVAL;

	if (!cpc_desc) {
		pr_debug("No CPC descriptor for CPU:%d\n", cpu);
		return -ENODEV;
	}

	auto_sel_reg = &cpc_desc->cpc_regs[AUTO_SEL_ENABLE];

	if (CPC_IN_PCC(auto_sel_reg)) {
		if (pcc_ss_id < 0) {
			pr_debug("Invalid pcc_ss_id\n");
			return -ENODEV;
		}

		if (CPC_SUPPORTED(auto_sel_reg)) {
			ret = cpc_write(cpu, auto_sel_reg, enable);
			if (ret)
				return ret;
		}

		pcc_ss_data = pcc_data[pcc_ss_id];

		down_write(&pcc_ss_data->pcc_lock);
		/* after writing CPC, transfer the ownership of PCC to platform */
		ret = send_pcc_cmd(pcc_ss_id, CMD_WRITE);
		up_write(&pcc_ss_data->pcc_lock);
	} else {
		ret = -ENOTSUPP;
		pr_debug("_CPC in PCC is not supported\n");
	}

	return ret;
}
EXPORT_SYMBOL_GPL(cppc_set_auto_sel);

/**
 * cppc_set_enable - Set to enable CPPC on the processor by writing the
 * Continuous Performance Control package EnableRegister field.
@@ -1488,7 +1584,7 @@ EXPORT_SYMBOL_GPL(cppc_set_enable);
int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls)
{
	struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpu);
	struct cpc_register_resource *desired_reg;
	struct cpc_register_resource *desired_reg, *min_perf_reg, *max_perf_reg;
	int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpu);
	struct cppc_pcc_data *pcc_ss_data = NULL;
	int ret = 0;
@@ -1499,6 +1595,8 @@ int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls)
	}

	desired_reg = &cpc_desc->cpc_regs[DESIRED_PERF];
	min_perf_reg = &cpc_desc->cpc_regs[MIN_PERF];
	max_perf_reg = &cpc_desc->cpc_regs[MAX_PERF];

	/*
	 * This is Phase-I where we want to write to CPC registers
@@ -1507,7 +1605,7 @@ int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls)
	 * Since read_lock can be acquired by multiple CPUs simultaneously we
	 * achieve that goal here
	 */
	if (CPC_IN_PCC(desired_reg)) {
	if (CPC_IN_PCC(desired_reg) || CPC_IN_PCC(min_perf_reg) || CPC_IN_PCC(max_perf_reg)) {
		if (pcc_ss_id < 0) {
			pr_debug("Invalid pcc_ss_id\n");
			return -ENODEV;
@@ -1530,13 +1628,19 @@ int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls)
		cpc_desc->write_cmd_status = 0;
	}

	cpc_write(cpu, desired_reg, perf_ctrls->desired_perf);

	/*
	 * Skip writing MIN/MAX until Linux knows how to come up with
	 * useful values.
	 * Only write if min_perf and max_perf not zero. Some drivers pass zero
	 * value to min and max perf, but they don't mean to set the zero value,
	 * they just don't want to write to those registers.
	 */
	cpc_write(cpu, desired_reg, perf_ctrls->desired_perf);
	if (perf_ctrls->min_perf)
		cpc_write(cpu, min_perf_reg, perf_ctrls->min_perf);
	if (perf_ctrls->max_perf)
		cpc_write(cpu, max_perf_reg, perf_ctrls->max_perf);

	if (CPC_IN_PCC(desired_reg))
	if (CPC_IN_PCC(desired_reg) || CPC_IN_PCC(min_perf_reg) || CPC_IN_PCC(max_perf_reg))
		up_read(&pcc_ss_data->pcc_lock);	/* END Phase-I */
	/*
	 * This is Phase-II where we transfer the ownership of PCC to Platform
@@ -1584,7 +1688,7 @@ int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls)
	 * case during a CMD_READ and if there are pending writes it delivers
	 * the write command before servicing the read command
	 */
	if (CPC_IN_PCC(desired_reg)) {
	if (CPC_IN_PCC(desired_reg) || CPC_IN_PCC(min_perf_reg) || CPC_IN_PCC(max_perf_reg)) {
		if (down_write_trylock(&pcc_ss_data->pcc_lock)) {/* BEGIN Phase-II */
			/* Update only if there are pending write commands */
			if (pcc_ss_data->pending_pcc_write_cmd)
Loading