!580 Intel: Recover two microcode interfaces when support In Field Scan(IFS) multi-blob images (6dc4499a) · Commits · EulixOS / Software / Kernel

Documentation/admin-guide/sysctl/kernel.rst

+1 −1

Original line number	Diff line number	Diff line
		@@ -1352,7 +1352,7 @@ ORed together. The letters are seen in "Tainted" line of Oops reports.
		====== ===== ==============================================================
		1 `(P)` proprietary module was loaded
		2 `(F)` module was force loaded
		4 `(S)` kernel running on an out of specification system
		4 `(S)` SMP kernel oops on an officially SMP incapable processor
		8 `(R)` module was force unloaded
		16 `(M)` processor reported a Machine Check Exception (MCE)
		32 `(B)` bad page referenced or some unexpected page flags

Documentation/admin-guide/tainted-kernels.rst

+5 −24

Original line number	Diff line number	Diff line
		@@ -84,7 +84,7 @@ Bit Log Number Reason that got the kernel tainted
		=== === ====== ========================================================
		0 G/P 1 proprietary module was loaded
		1 _/F 2 module was force loaded
		2 _/S 4 kernel running on an out of specification system
		2 _/S 4 SMP kernel oops on an officially SMP incapable processor
		3 _/R 8 module was force unloaded
		4 _/M 16 processor reported a Machine Check Exception (MCE)
		5 _/B 32 bad page referenced or some unexpected page flags
		@@ -116,29 +116,10 @@ More detailed explanation for tainting
		1) ``F`` if any module was force loaded by ``insmod -f``, ``' '`` if all
		modules were loaded normally.

		2) ``S`` if the kernel is running on a processor or system that is out of
		specification: hardware has been put into an unsupported configuration,
		therefore proper execution cannot be guaranteed.
		Kernel will be tainted if, for example:

		- on x86: PAE is forced through forcepae on intel CPUs (such as Pentium M)
		which do not report PAE but may have a functional implementation, an SMP
		kernel is running on non officially capable SMP Athlon CPUs, MSRs are
		being poked at from userspace.
		- on arm: kernel running on certain CPUs (such as Keystone 2) without
		having certain kernel features enabled.
		- on arm64: there are mismatched hardware features between CPUs, the
		bootloader has booted CPUs in different modes.
		- certain drivers are being used on non supported architectures (such as
		scsi/snic on something else than x86_64, scsi/ips on non
		x86/x86_64/itanium, have broken firmware settings for the
		irqchip/irq-gic on arm64 ...).
		- x86/x86_64: Microcode late loading is dangerous and will result in
		tainting the kernel. It requires that all CPUs rendezvous to make sure
		the update happens when the system is as quiescent as possible. However,
		a higher priority MCE/SMI/NMI can move control flow away from that
		rendezvous and interrupt the update, which can be detrimental to the
		machine.
		2) ``S`` if the oops occurred on an SMP kernel running on hardware that
		hasn't been certified as safe to run multiprocessor.
		Currently this occurs only on various Athlons that are not
		SMP capable.

		3) ``R`` if a module was force unloaded by ``rmmod -f``, ``' '`` if all
		modules were unloaded normally.

Documentation/x86/microcode.rst

+9 −107

Original line number	Diff line number	Diff line
		@@ -6,7 +6,6 @@ The Linux Microcode Loader

		:Authors: - Fenghua Yu <fenghua.yu@intel.com>
		- Borislav Petkov <bp@suse.de>
		- Ashok Raj <ashok.raj@intel.com>

		The kernel has a x86 microcode loading facility which is supposed to
		provide microcode loading methods in the OS. Potential use cases are
		@@ -93,8 +92,15 @@ vendor's site.
		Late loading
		============

		You simply install the microcode packages your distro supplies and
		run::
		There are two legacy user space interfaces to load microcode, either through
		/dev/cpu/microcode or through /sys/devices/system/cpu/microcode/reload file
		in sysfs.

		The /dev/cpu/microcode method is deprecated because it needs a special
		userspace tool for that.

		The easier method is simply installing the microcode packages your distro
		supplies and running::

		# echo 1 > /sys/devices/system/cpu/microcode/reload

		@@ -104,110 +110,6 @@ The loading mechanism looks for microcode blobs in
		/lib/firmware/{intel-ucode,amd-ucode}. The default distro installation
		packages already put them there.

		Since kernel 5.19, late loading is not enabled by default.

		The /dev/cpu/microcode method has been removed in 5.19.

		Why is late loading dangerous?
		==============================

		Synchronizing all CPUs
		----------------------

		The microcode engine which receives the microcode update is shared
		between the two logical threads in a SMT system. Therefore, when
		the update is executed on one SMT thread of the core, the sibling
		"automatically" gets the update.

		Since the microcode can "simulate" MSRs too, while the microcode update
		is in progress, those simulated MSRs transiently cease to exist. This
		can result in unpredictable results if the SMT sibling thread happens to
		be in the middle of an access to such an MSR. The usual observation is
		that such MSR accesses cause #GPs to be raised to signal that former are
		not present.

		The disappearing MSRs are just one common issue which is being observed.
		Any other instruction that's being patched and gets concurrently
		executed by the other SMT sibling, can also result in similar,
		unpredictable behavior.

		To eliminate this case, a stop_machine()-based CPU synchronization was
		introduced as a way to guarantee that all logical CPUs will not execute
		any code but just wait in a spin loop, polling an atomic variable.

		While this took care of device or external interrupts, IPIs including
		LVT ones, such as CMCI etc, it cannot address other special interrupts
		that can't be shut off. Those are Machine Check (#MC), System Management
		(#SMI) and Non-Maskable interrupts (#NMI).

		Machine Checks
		--------------

		Machine Checks (#MC) are non-maskable. There are two kinds of MCEs.
		Fatal un-recoverable MCEs and recoverable MCEs. While un-recoverable
		errors are fatal, recoverable errors can also happen in kernel context
		are also treated as fatal by the kernel.

		On certain Intel machines, MCEs are also broadcast to all threads in a
		system. If one thread is in the middle of executing WRMSR, a MCE will be
		taken at the end of the flow. Either way, they will wait for the thread
		performing the wrmsr(0x79) to rendezvous in the MCE handler and shutdown
		eventually if any of the threads in the system fail to check in to the
		MCE rendezvous.

		To be paranoid and get predictable behavior, the OS can choose to set
		MCG_STATUS.MCIP. Since MCEs can be at most one in a system, if an
		MCE was signaled, the above condition will promote to a system reset
		automatically. OS can turn off MCIP at the end of the update for that
		core.

		System Management Interrupt
		---------------------------

		SMIs are also broadcast to all CPUs in the platform. Microcode update
		requests exclusive access to the core before writing to MSR 0x79. So if
		it does happen such that, one thread is in WRMSR flow, and the 2nd got
		an SMI, that thread will be stopped in the first instruction in the SMI
		handler.

		Since the secondary thread is stopped in the first instruction in SMI,
		there is very little chance that it would be in the middle of executing
		an instruction being patched. Plus OS has no way to stop SMIs from
		happening.

		Non-Maskable Interrupts
		-----------------------

		When thread0 of a core is doing the microcode update, if thread1 is
		pulled into NMI, that can cause unpredictable behavior due to the
		reasons above.

		OS can choose a variety of methods to avoid running into this situation.


		Is the microcode suitable for late loading?
		-------------------------------------------

		Late loading is done when the system is fully operational and running
		real workloads. Late loading behavior depends on what the base patch on
		the CPU is before upgrading to the new patch.

		This is true for Intel CPUs.

		Consider, for example, a CPU has patch level 1 and the update is to
		patch level 3.

		Between patch1 and patch3, patch2 might have deprecated a software-visible
		feature.

		This is unacceptable if software is even potentially using that feature.
		For instance, say MSR_X is no longer available after an update,
		accessing that MSR will cause a #GP fault.

		Basically there is no way to declare a new microcode update suitable
		for late-loading. This is another one of the problems that caused late
		loading to be not enabled by default.

		Builtin microcode
		=================

arch/x86/Kconfig

+8 −7

Original line number	Diff line number	Diff line
		@@ -1336,16 +1336,17 @@ config MICROCODE_AMD
		If you select this option, microcode patch loading support for AMD
		processors will be enabled.

		config MICROCODE_LATE_LOADING
		bool "Late microcode loading (DANGEROUS)"
		config MICROCODE_OLD_INTERFACE
		bool "Ancient loading interface (DEPRECATED)"
		default n
		depends on MICROCODE
		help
		Loading microcode late, when the system is up and executing instructions
		is a tricky business and should be avoided if possible. Just the sequence
		of synchronizing all cores and SMT threads is one fragile dance which does
		not guarantee that cores might not softlock after the loading. Therefore,
		use this at your own risk. Late loading taints the kernel too.
		DO NOT USE THIS! This is the ancient /dev/cpu/microcode interface
		which was used by userspace tools like iucode_tool and microcode.ctl.
		It is inadequate because it runs too late to be able to properly
		load microcode on a machine and it needs special tools. Instead, you
		should've switched to the early loading method with the initrd or
		builtin microcode by now: Documentation/x86/microcode.rst

		config X86_MSR
		tristate "/dev/cpu/*/msr - Model-specific register support"

arch/x86/include/asm/microcode.h

+6 −1

Original line number	Diff line number	Diff line
		@@ -33,7 +33,11 @@ enum ucode_state {
		};

		struct microcode_ops {
		enum ucode_state (request_microcode_fw) (int cpu, struct device );
		enum ucode_state (*request_microcode_user) (int cpu,
		const void __user *buf, size_t size);

		enum ucode_state (request_microcode_fw) (int cpu, struct device ,
		bool refresh_fw);

		void (*microcode_fini_cpu) (int cpu);

		@@ -49,6 +53,7 @@ struct microcode_ops {

		struct ucode_cpu_info {
		struct cpu_signature cpu_sig;
		int valid;
		void *mc;
		};
		extern struct ucode_cpu_info ucode_cpu_info[];