Commit aa8c3db4 authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge tag 'x86_cache_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 resource control updates from Borislav Petkov:

 - Add support for a new AMD feature called slow memory bandwidth
   allocation. Its goal is to control resource allocation in external
   slow memory which is connected to the machine like for example
   through CXL devices, accelerators etc

* tag 'x86_cache_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/resctrl: Fix a silly -Wunused-but-set-variable warning
  Documentation/x86: Update resctrl.rst for new features
  x86/resctrl: Add interface to write mbm_local_bytes_config
  x86/resctrl: Add interface to write mbm_total_bytes_config
  x86/resctrl: Add interface to read mbm_local_bytes_config
  x86/resctrl: Add interface to read mbm_total_bytes_config
  x86/resctrl: Support monitor configuration
  x86/resctrl: Add __init attribute to rdt_get_mon_l3_config()
  x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation
  x86/resctrl: Include new features in command line options
  x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag
  x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA
  x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
  x86/resctrl: Replace smp_call_function_many() with on_each_cpu_mask()
parents 1adce1b9 793207ba
Loading
Loading
Loading
Loading
+1 −1
Original line number Diff line number Diff line
@@ -5221,7 +5221,7 @@
	rdt=		[HW,X86,RDT]
			Turn on/off individual RDT features. List is:
			cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
			mba.
			mba, smba, bmec.
			E.g. to turn on cmt and turn off mba use:
				rdt=cmt,!mba

+145 −2
Original line number Diff line number Diff line
@@ -17,14 +17,21 @@ AMD refers to this feature as AMD Platform Quality of Service(AMD QoS).
This feature is enabled by the CONFIG_X86_CPU_RESCTRL and the x86 /proc/cpuinfo
flag bits:

=============================================	================================
===============================================	================================
RDT (Resource Director Technology) Allocation	"rdt_a"
CAT (Cache Allocation Technology)		"cat_l3", "cat_l2"
CDP (Code and Data Prioritization)		"cdp_l3", "cdp_l2"
CQM (Cache QoS Monitoring)			"cqm_llc", "cqm_occup_llc"
MBM (Memory Bandwidth Monitoring)		"cqm_mbm_total", "cqm_mbm_local"
MBA (Memory Bandwidth Allocation)		"mba"
=============================================	================================
SMBA (Slow Memory Bandwidth Allocation)         ""
BMEC (Bandwidth Monitoring Event Configuration) ""
===============================================	================================

Historically, new features were made visible by default in /proc/cpuinfo. This
resulted in the feature flags becoming hard to parse by humans. Adding a new
flag to /proc/cpuinfo should be avoided if user space can obtain information
about the feature from resctrl's info directory.

To use the feature mount the file system::

@@ -161,6 +168,83 @@ with the following files:
"mon_features":
		Lists the monitoring events if
		monitoring is enabled for the resource.
		Example::

			# cat /sys/fs/resctrl/info/L3_MON/mon_features
			llc_occupancy
			mbm_total_bytes
			mbm_local_bytes

		If the system supports Bandwidth Monitoring Event
		Configuration (BMEC), then the bandwidth events will
		be configurable. The output will be::

			# cat /sys/fs/resctrl/info/L3_MON/mon_features
			llc_occupancy
			mbm_total_bytes
			mbm_total_bytes_config
			mbm_local_bytes
			mbm_local_bytes_config

"mbm_total_bytes_config", "mbm_local_bytes_config":
	Read/write files containing the configuration for the mbm_total_bytes
	and mbm_local_bytes events, respectively, when the Bandwidth
	Monitoring Event Configuration (BMEC) feature is supported.
	The event configuration settings are domain specific and affect
	all the CPUs in the domain. When either event configuration is
	changed, the bandwidth counters for all RMIDs of both events
	(mbm_total_bytes as well as mbm_local_bytes) are cleared for that
	domain. The next read for every RMID will report "Unavailable"
	and subsequent reads will report the valid value.

	Following are the types of events supported:

	====    ========================================================
	Bits    Description
	====    ========================================================
	6       Dirty Victims from the QOS domain to all types of memory
	5       Reads to slow memory in the non-local NUMA domain
	4       Reads to slow memory in the local NUMA domain
	3       Non-temporal writes to non-local NUMA domain
	2       Non-temporal writes to local NUMA domain
	1       Reads to memory in the non-local NUMA domain
	0       Reads to memory in the local NUMA domain
	====    ========================================================

	By default, the mbm_total_bytes configuration is set to 0x7f to count
	all the event types and the mbm_local_bytes configuration is set to
	0x15 to count all the local memory events.

	Examples:

	* To view the current configuration::
	  ::

	    # cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
	    0=0x7f;1=0x7f;2=0x7f;3=0x7f

	    # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
	    0=0x15;1=0x15;3=0x15;4=0x15

	* To change the mbm_total_bytes to count only reads on domain 0,
	  the bits 0, 1, 4 and 5 needs to be set, which is 110011b in binary
	  (in hexadecimal 0x33):
	  ::

	    # echo  "0=0x33" > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config

	    # cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
	    0=0x33;1=0x7f;2=0x7f;3=0x7f

	* To change the mbm_local_bytes to count all the slow memory reads on
	  domain 0 and 1, the bits 4 and 5 needs to be set, which is 110000b
	  in binary (in hexadecimal 0x30):
	  ::

	    # echo  "0=0x30;1=0x30" > /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config

	    # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
	    0=0x30;1=0x30;3=0x15;4=0x15

"max_threshold_occupancy":
		Read/write file provides the largest value (in
@@ -464,6 +548,25 @@ Memory bandwidth domain is L3 cache.

	MB:<cache_id0>=bw_MBps0;<cache_id1>=bw_MBps1;...

Slow Memory Bandwidth Allocation (SMBA)
---------------------------------------
AMD hardware supports Slow Memory Bandwidth Allocation (SMBA).
CXL.memory is the only supported "slow" memory device. With the
support of SMBA, the hardware enables bandwidth allocation on
the slow memory devices. If there are multiple such devices in
the system, the throttling logic groups all the slow sources
together and applies the limit on them as a whole.

The presence of SMBA (with CXL.memory) is independent of slow memory
devices presence. If there are no such devices on the system, then
configuring SMBA will have no impact on the performance of the system.

The bandwidth domain for slow memory is L3 cache. Its schemata file
is formatted as:
::

	SMBA:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;...

Reading/writing the schemata file
---------------------------------
Reading the schemata file will show the state of all resources
@@ -479,6 +582,46 @@ which you wish to change. E.g.
  L3DATA:0=fffff;1=fffff;2=3c0;3=fffff
  L3CODE:0=fffff;1=fffff;2=fffff;3=fffff

Reading/writing the schemata file (on AMD systems)
--------------------------------------------------
Reading the schemata file will show the current bandwidth limit on all
domains. The allocated resources are in multiples of one eighth GB/s.
When writing to the file, you need to specify what cache id you wish to
configure the bandwidth limit.

For example, to allocate 2GB/s limit on the first cache id:

::

  # cat schemata
    MB:0=2048;1=2048;2=2048;3=2048
    L3:0=ffff;1=ffff;2=ffff;3=ffff

  # echo "MB:1=16" > schemata
  # cat schemata
    MB:0=2048;1=  16;2=2048;3=2048
    L3:0=ffff;1=ffff;2=ffff;3=ffff

Reading/writing the schemata file (on AMD systems) with SMBA feature
--------------------------------------------------------------------
Reading and writing the schemata file is the same as without SMBA in
above section.

For example, to allocate 8GB/s limit on the first cache id:

::

  # cat schemata
    SMBA:0=2048;1=2048;2=2048;3=2048
      MB:0=2048;1=2048;2=2048;3=2048
      L3:0=ffff;1=ffff;2=ffff;3=ffff

  # echo "SMBA:1=64" > schemata
  # cat schemata
    SMBA:0=2048;1=  64;2=2048;3=2048
      MB:0=2048;1=2048;2=2048;3=2048
      L3:0=ffff;1=ffff;2=ffff;3=ffff

Cache Pseudo-Locking
====================
CAT enables a user to specify the amount of cache space that an
+2 −0
Original line number Diff line number Diff line
@@ -307,6 +307,8 @@
#define X86_FEATURE_SGX_EDECCSSA	(11*32+18) /* "" SGX EDECCSSA user leaf function */
#define X86_FEATURE_CALL_DEPTH		(11*32+19) /* "" Call depth tracking for RSB stuffing */
#define X86_FEATURE_MSR_TSX_CTRL	(11*32+20) /* "" MSR IA32_TSX_CTRL (Intel) implemented */
#define X86_FEATURE_SMBA		(11*32+21) /* "" Slow Memory Bandwidth Allocation */
#define X86_FEATURE_BMEC		(11*32+22) /* "" Bandwidth Monitoring Event Configuration */

/* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
#define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI instructions */
+2 −0
Original line number Diff line number Diff line
@@ -1084,6 +1084,8 @@

/* - AMD: */
#define MSR_IA32_MBA_BW_BASE		0xc0000200
#define MSR_IA32_SMBA_BW_BASE		0xc0000280
#define MSR_IA32_EVT_CFG_BASE		0xc0000400

/* MSR_IA32_VMX_MISC bits */
#define MSR_IA32_VMX_MISC_INTEL_PT                 (1ULL << 14)
+2 −0
Original line number Diff line number Diff line
@@ -68,6 +68,8 @@ static const struct cpuid_dep cpuid_deps[] = {
	{ X86_FEATURE_CQM_OCCUP_LLC,		X86_FEATURE_CQM_LLC   },
	{ X86_FEATURE_CQM_MBM_TOTAL,		X86_FEATURE_CQM_LLC   },
	{ X86_FEATURE_CQM_MBM_LOCAL,		X86_FEATURE_CQM_LLC   },
	{ X86_FEATURE_BMEC,			X86_FEATURE_CQM_MBM_TOTAL   },
	{ X86_FEATURE_BMEC,			X86_FEATURE_CQM_MBM_LOCAL   },
	{ X86_FEATURE_AVX512_BF16,		X86_FEATURE_AVX512VL  },
	{ X86_FEATURE_AVX512_FP16,		X86_FEATURE_AVX512BW  },
	{ X86_FEATURE_ENQCMD,			X86_FEATURE_XSAVES    },
Loading