Commit 02b82b02 authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull power management updates from Rafael Wysocki:
 "These are mostly fixes and cleanups all over the code and a new piece
  of documentation for Intel uncore frequency scaling.

  Functionality-wise, the intel_idle driver will support Sapphire Rapids
  Xeons natively now (with some extra facilities for controlling
  C-states more precisely on those systems), virtual guests will take
  the ACPI S4 hardware signature into account by default, the
  intel_pstate driver will take the defualt EPP value from the firmware,
  cpupower utility will support the AMD P-state driver added in the
  previous cycle, and there is a new tracer utility for that driver.

  Specifics:

   - Allow device_pm_check_callbacks() to be called from interrupt
     context without issues (Dmitry Baryshkov).

   - Modify devm_pm_runtime_enable() to automatically handle
     pm_runtime_dont_use_autosuspend() at driver exit time (Douglas
     Anderson).

   - Make the schedutil cpufreq governor use to_gov_attr_set() instead
     of open coding it (Kevin Hao).

   - Replace acpi_bus_get_device() with acpi_fetch_acpi_dev() in the
     cpufreq longhaul driver (Rafael Wysocki).

   - Unify show() and store() naming in cpufreq and make it use
     __ATTR_XX (Lianjie Zhang).

   - Make the intel_pstate driver use the EPP value set by the firmware
     by default (Srinivas Pandruvada).

   - Re-order the init checks in the powernow-k8 cpufreq driver (Mario
     Limonciello).

   - Make the ACPI processor idle driver check for architectural support
     for LPI to avoid using it on x86 by mistake (Mario Limonciello).

   - Add Sapphire Rapids Xeon support to the intel_idle driver (Artem
     Bityutskiy).

   - Add 'preferred_cstates' module argument to the intel_idle driver to
     work around C1 and C1E handling issue on Sapphire Rapids (Artem
     Bityutskiy).

   - Add core C6 optimization on Sapphire Rapids to the intel_idle
     driver (Artem Bityutskiy).

   - Optimize the haltpoll cpuidle driver a bit (Li RongQing).

   - Remove leftover text from intel_idle() kerneldoc comment and fix up
     white space in intel_idle (Rafael Wysocki).

   - Fix load_image_and_restore() error path (Ye Bin).

   - Fix typos in comments in the system wakeup hadling code (Tom Rix).

   - Clean up non-kernel-doc comments in hibernation code (Jiapeng
     Chong).

   - Fix __setup handler error handling in system-wide suspend and
     hibernation core code (Randy Dunlap).

   - Add device name to suspend_report_result() (Youngjin Jang).

   - Make virtual guests honour ACPI S4 hardware signature by default
     (David Woodhouse).

   - Block power off of a parent PM domain unless child is in deepest
     state (Ulf Hansson).

   - Use dev_err_probe() to simplify error handling for generic PM
     domains (Ahmad Fatoum).

   - Fix sleep-in-atomic bug caused by genpd_debug_remove() (Shawn Guo).

   - Document Intel uncore frequency scaling (Srinivas Pandruvada).

   - Add DTPM hierarchy description (Daniel Lezcano).

   - Change the locking scheme in DTPM (Daniel Lezcano).

   - Fix dtpm_cpu cleanup at exit time and missing virtual DTPM pointer
     release (Daniel Lezcano).

   - Make dtpm_node_callback[] static (kernel test robot).

   - Fix spelling mistake "initialze" -> "initialize" in
     dtpm_create_hierarchy() (Colin Ian King).

   - Add tracer tool for the amd-pstate driver (Jinzhou Su).

   - Fix PC6 displaying in turbostat on some systems (Artem Bityutskiy).

   - Add AMD P-State support to the cpupower utility (Huang Rui)"

* tag 'pm-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (58 commits)
  cpufreq: powernow-k8: Re-order the init checks
  cpuidle: intel_idle: Drop redundant backslash at line end
  cpuidle: intel_idle: Update intel_idle() kerneldoc comment
  PM: hibernate: Honour ACPI hardware signature by default for virtual guests
  cpufreq: intel_pstate: Use firmware default EPP
  cpufreq: unify show() and store() naming and use __ATTR_XX
  PM: core: keep irq flags in device_pm_check_callbacks()
  cpuidle: haltpoll: Call cpuidle_poll_state_init() later
  Documentation: amd-pstate: add tracer tool introduction
  tools/power/x86/amd_pstate_tracer: Add tracer tool for AMD P-state
  tools/power/x86/intel_pstate_tracer: make tracer as a module
  cpufreq: amd-pstate: Add more tracepoint for AMD P-State module
  PM: sleep: Add device name to suspend_report_result()
  turbostat: fix PC6 displaying on some systems
  intel_idle: add core C6 optimization for SPR
  intel_idle: add 'preferred_cstates' module argument
  intel_idle: add SPR support
  PM: runtime: Have devm_pm_runtime_enable() handle pm_runtime_dont_use_autosuspend()
  ACPI: processor idle: Check for architectural support for LPI
  cpuidle: PSCI: Move the `has_lpi` check to the beginning of the function
  ...
parents 242ba665 ec3d8b83
Loading
Loading
Loading
Loading
+26 −0
Original line number Diff line number Diff line
@@ -369,6 +369,32 @@ governor (for the policies it is attached to), or by the ``CPUFreq`` core (for t
policies with other scaling governors).


Tracer Tool
-------------

``amd_pstate_tracer.py`` can record and parse ``amd-pstate`` trace log, then
generate performance plots. This utility can be used to debug and tune the
performance of ``amd-pstate`` driver. The tracer tool needs to import intel
pstate tracer.

Tracer tool located in ``linux/tools/power/x86/amd_pstate_tracer``. It can be
used in two ways. If trace file is available, then directly parse the file
with command ::

 ./amd_pstate_trace.py [-c cpus] -t <trace_file> -n <test_name>

Or generate trace file with root privilege, then parse and plot with command ::

 sudo ./amd_pstate_trace.py [-c cpus] -n <test_name> -i <interval> [-m kbytes]

The test result can be found in ``results/test_name``. Following is the example
about part of the output. ::

 common_cpu  common_secs  common_usecs  min_perf  des_perf  max_perf  freq    mperf   apef    tsc       load   duration_ms  sample_num  elapsed_time  common_comm
 CPU_005     712          116384        39        49        166       0.7565  9645075 2214891 38431470  25.1   11.646       469         2.496         kworker/5:0-40
 CPU_006     712          116408        39        49        166       0.6769  8950227 1839034 37192089  24.06  11.272       470         2.496         kworker/6:0-1264


Reference
===========

+60 −0
Original line number Diff line number Diff line
.. SPDX-License-Identifier: GPL-2.0
.. include:: <isonum.txt>

==============================
Intel Uncore Frequency Scaling
==============================

:Copyright: |copy| 2022 Intel Corporation

:Author: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

Introduction
------------

The uncore can consume significant amount of power in Intel's Xeon servers based
on the workload characteristics. To optimize the total power and improve overall
performance, SoCs have internal algorithms for scaling uncore frequency. These
algorithms monitor workload usage of uncore and set a desirable frequency.

It is possible that users have different expectations of uncore performance and
want to have control over it. The objective is similar to allowing users to set
the scaling min/max frequencies via cpufreq sysfs to improve CPU performance.
Users may have some latency sensitive workloads where they do not want any
change to uncore frequency. Also, users may have workloads which require
different core and uncore performance at distinct phases and they may want to
use both cpufreq and the uncore scaling interface to distribute power and
improve overall performance.

Sysfs Interface
---------------

To control uncore frequency, a sysfs interface is provided in the directory:
`/sys/devices/system/cpu/intel_uncore_frequency/`.

There is one directory for each package and die combination as the scope of
uncore scaling control is per die in multiple die/package SoCs or per
package for single die per package SoCs. The name represents the
scope of control. For example: 'package_00_die_00' is for package id 0 and
die 0.

Each package_*_die_* contains the following attributes:

``initial_max_freq_khz``
	Out of reset, this attribute represent the maximum possible frequency.
	This is a read-only attribute. If users adjust max_freq_khz,
	they can always go back to maximum using the value from this attribute.

``initial_min_freq_khz``
	Out of reset, this attribute represent the minimum possible frequency.
	This is a read-only attribute. If users adjust min_freq_khz,
	they can always go back to minimum using the value from this attribute.

``max_freq_khz``
	This attribute is used to set the maximum uncore frequency.

``min_freq_khz``
	This attribute is used to set the minimum uncore frequency.

``current_freq_khz``
	This attribute is used to get the current uncore frequency.
+1 −0
Original line number Diff line number Diff line
@@ -15,3 +15,4 @@ Working-State Power Management
   cpufreq_drivers
   intel_epb
   intel-speed-select
   intel_uncore_frequency_scaling
+1 −0
Original line number Diff line number Diff line
@@ -1002,6 +1002,7 @@ L: linux-pm@vger.kernel.org
S:	Supported
F:	Documentation/admin-guide/pm/amd-pstate.rst
F:	drivers/cpufreq/amd-pstate*
F:	tools/power/x86/amd_pstate_tracer/amd_pstate_trace.py
AMD PTDMA DRIVER
M:	Sanjay R Mehta <sanju.mehta@amd.com>
+3 −3
Original line number Diff line number Diff line
@@ -54,6 +54,9 @@ static int psci_acpi_cpu_init_idle(unsigned int cpu)
	struct acpi_lpi_state *lpi;
	struct acpi_processor *pr = per_cpu(processors, cpu);

	if (unlikely(!pr || !pr->flags.has_lpi))
		return -EINVAL;

	/*
	 * If the PSCI cpu_suspend function hook has not been initialized
	 * idle states must not be enabled, so bail out
@@ -61,9 +64,6 @@ static int psci_acpi_cpu_init_idle(unsigned int cpu)
	if (!psci_ops.cpu_suspend)
		return -EOPNOTSUPP;

	if (unlikely(!pr || !pr->flags.has_lpi))
		return -EINVAL;

	count = pr->power.count - 1;
	if (count <= 0)
		return -ENODEV;
Loading