Commit 57c78a23 authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull arm64 updates from Catalin Marinas:

 - Support for 32-bit tasks on asymmetric AArch32 systems (on top of the
   scheduler changes merged via the tip tree).

 - More entry.S clean-ups and conversion to C.

 - MTE updates: allow a preferred tag checking mode to be set per CPU
   (the overhead of synchronous mode is smaller for some CPUs than
   others); optimisations for kernel entry/exit path; optionally disable
   MTE on the kernel command line.

 - Kselftest improvements for SVE and signal handling, PtrAuth.

 - Fix unlikely race where a TLBI could use stale ASID on an ASID
   roll-over (found by inspection).

 - Miscellaneous fixes: disable trapping of PMSNEVFR_EL1 to higher
   exception levels; drop unnecessary sigdelsetmask() call in the
   signal32 handling; remove BUG_ON when failing to allocate SVE state
   (just signal the process); SYM_CODE annotations.

 - Other trivial clean-ups: use macros instead of magic numbers, remove
   redundant returns, typos.

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (56 commits)
  arm64: Do not trap PMSNEVFR_EL1
  arm64: mm: fix comment typo of pud_offset_phys()
  arm64: signal32: Drop pointless call to sigdelsetmask()
  arm64/sve: Better handle failure to allocate SVE register storage
  arm64: Document the requirement for SCR_EL3.HCE
  arm64: head: avoid over-mapping in map_memory
  arm64/sve: Add a comment documenting the binutils needed for SVE asm
  arm64/sve: Add some comments for sve_save/load_state()
  kselftest/arm64: signal: Add a TODO list for signal handling tests
  kselftest/arm64: signal: Add test case for SVE register state in signals
  kselftest/arm64: signal: Verify that signals can't change the SVE vector length
  kselftest/arm64: signal: Check SVE signal frame shows expected vector length
  kselftest/arm64: signal: Support signal frames with SVE register data
  kselftest/arm64: signal: Add SVE to the set of features we can check for
  arm64: replace in_irq() with in_hardirq()
  kselftest/arm64: pac: Fix skipping of tests on systems without PAC
  Documentation: arm64: describe asymmetric 32-bit support
  arm64: Remove logic to kill 32-bit tasks on 64-bit-only cores
  arm64: Hook up cmdline parameter to allow mismatched 32-bit EL0
  arm64: Advertise CPUs capable of running 32-bit applications in sysfs
  ...
parents bcfeebbf 65266a7c
Loading
Loading
Loading
Loading
+26 −0
Original line number Diff line number Diff line
@@ -494,6 +494,15 @@ Description: AArch64 CPU registers
		'identification' directory exposes the CPU ID registers for
		identifying model and revision of the CPU.

What:		/sys/devices/system/cpu/aarch32_el0
Date:		May 2021
Contact:	Linux ARM Kernel Mailing list <linux-arm-kernel@lists.infradead.org>
Description:	Identifies the subset of CPUs in the system that can execute
		AArch32 (32-bit ARM) applications. If present, the same format as
		/sys/devices/system/cpu/{offline,online,possible,present} is used.
		If absent, then all or none of the CPUs can execute AArch32
		applications and execve() will behave accordingly.

What:		/sys/devices/system/cpu/cpu#/cpu_capacity
Date:		December 2016
Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
@@ -640,3 +649,20 @@ Description: SPURR ticks for cpuX when it was idle.

		This sysfs interface exposes the number of SPURR ticks
		for cpuX when it was idle.

What: 		/sys/devices/system/cpu/cpuX/mte_tcf_preferred
Date:		July 2021
Contact:	Linux ARM Kernel Mailing list <linux-arm-kernel@lists.infradead.org>
Description:	Preferred MTE tag checking mode

		When a user program specifies more than one MTE tag checking
		mode, this sysfs node is used to specify which mode should
		be preferred when scheduling a task on that CPU. Possible
		values:

		================  ==============================================
		"sync"	  	  Prefer synchronous mode
		"async"	  	  Prefer asynchronous mode
		================  ==============================================

		See also: Documentation/arm64/memory-tagging-extension.rst
+14 −0
Original line number Diff line number Diff line
@@ -287,6 +287,17 @@
			do not want to use tracing_snapshot_alloc() as it needs
			to be done where GFP_KERNEL allocations are allowed.

	allow_mismatched_32bit_el0 [ARM64]
			Allow execve() of 32-bit applications and setting of the
			PER_LINUX32 personality on systems where only a strict
			subset of the CPUs support 32-bit EL0. When this
			parameter is present, the set of CPUs supporting 32-bit
			EL0 is indicated by /sys/devices/system/cpu/aarch32_el0
			and hot-unplug operations may be restricted.

			See Documentation/arm64/asymmetric-32bit.rst for more
			information.

	amd_iommu=	[HW,X86-64]
			Pass parameters to the AMD IOMMU driver in the system.
			Possible values are:
@@ -380,6 +391,9 @@
	arm64.nopauth	[ARM64] Unconditionally disable Pointer Authentication
			support

	arm64.nomte	[ARM64] Unconditionally disable Memory Tagging Extension
			support

	ataflop=	[HW,M68k]

	atarimouse=	[HW,MOUSE] Atari Mouse
+155 −0
Original line number Diff line number Diff line
======================
Asymmetric 32-bit SoCs
======================

Author: Will Deacon <will@kernel.org>

This document describes the impact of asymmetric 32-bit SoCs on the
execution of 32-bit (``AArch32``) applications.

Date: 2021-05-17

Introduction
============

Some Armv9 SoCs suffer from a big.LITTLE misfeature where only a subset
of the CPUs are capable of executing 32-bit user applications. On such
a system, Linux by default treats the asymmetry as a "mismatch" and
disables support for both the ``PER_LINUX32`` personality and
``execve(2)`` of 32-bit ELF binaries, with the latter returning
``-ENOEXEC``. If the mismatch is detected during late onlining of a
64-bit-only CPU, then the onlining operation fails and the new CPU is
unavailable for scheduling.

Surprisingly, these SoCs have been produced with the intention of
running legacy 32-bit binaries. Unsurprisingly, that doesn't work very
well with the default behaviour of Linux.

It seems inevitable that future SoCs will drop 32-bit support
altogether, so if you're stuck in the unenviable position of needing to
run 32-bit code on one of these transitionary platforms then you would
be wise to consider alternatives such as recompilation, emulation or
retirement. If neither of those options are practical, then read on.

Enabling kernel support
=======================

Since the kernel support is not completely transparent to userspace,
allowing 32-bit tasks to run on an asymmetric 32-bit system requires an
explicit "opt-in" and can be enabled by passing the
``allow_mismatched_32bit_el0`` parameter on the kernel command-line.

For the remainder of this document we will refer to an *asymmetric
system* to mean an asymmetric 32-bit SoC running Linux with this kernel
command-line option enabled.

Userspace impact
================

32-bit tasks running on an asymmetric system behave in mostly the same
way as on a homogeneous system, with a few key differences relating to
CPU affinity.

sysfs
-----

The subset of CPUs capable of running 32-bit tasks is described in
``/sys/devices/system/cpu/aarch32_el0`` and is documented further in
``Documentation/ABI/testing/sysfs-devices-system-cpu``.

**Note:** CPUs are advertised by this file as they are detected and so
late-onlining of 32-bit-capable CPUs can result in the file contents
being modified by the kernel at runtime. Once advertised, CPUs are never
removed from the file.

``execve(2)``
-------------

On a homogeneous system, the CPU affinity of a task is preserved across
``execve(2)``. This is not always possible on an asymmetric system,
specifically when the new program being executed is 32-bit yet the
affinity mask contains 64-bit-only CPUs. In this situation, the kernel
determines the new affinity mask as follows:

  1. If the 32-bit-capable subset of the affinity mask is not empty,
     then the affinity is restricted to that subset and the old affinity
     mask is saved. This saved mask is inherited over ``fork(2)`` and
     preserved across ``execve(2)`` of 32-bit programs.

     **Note:** This step does not apply to ``SCHED_DEADLINE`` tasks.
     See `SCHED_DEADLINE`_.

  2. Otherwise, the cpuset hierarchy of the task is walked until an
     ancestor is found containing at least one 32-bit-capable CPU. The
     affinity of the task is then changed to match the 32-bit-capable
     subset of the cpuset determined by the walk.

  3. On failure (i.e. out of memory), the affinity is changed to the set
     of all 32-bit-capable CPUs of which the kernel is aware.

A subsequent ``execve(2)`` of a 64-bit program by the 32-bit task will
invalidate the affinity mask saved in (1) and attempt to restore the CPU
affinity of the task using the saved mask if it was previously valid.
This restoration may fail due to intervening changes to the deadline
policy or cpuset hierarchy, in which case the ``execve(2)`` continues
with the affinity unchanged.

Calls to ``sched_setaffinity(2)`` for a 32-bit task will consider only
the 32-bit-capable CPUs of the requested affinity mask. On success, the
affinity for the task is updated and any saved mask from a prior
``execve(2)`` is invalidated.

``SCHED_DEADLINE``
------------------

Explicit admission of a 32-bit deadline task to the default root domain
(e.g. by calling ``sched_setattr(2)``) is rejected on an asymmetric
32-bit system unless admission control is disabled by writing -1 to
``/proc/sys/kernel/sched_rt_runtime_us``.

``execve(2)`` of a 32-bit program from a 64-bit deadline task will
return ``-ENOEXEC`` if the root domain for the task contains any
64-bit-only CPUs and admission control is enabled. Concurrent offlining
of 32-bit-capable CPUs may still necessitate the procedure described in
`execve(2)`_, in which case step (1) is skipped and a warning is
emitted on the console.

**Note:** It is recommended that a set of 32-bit-capable CPUs are placed
into a separate root domain if ``SCHED_DEADLINE`` is to be used with
32-bit tasks on an asymmetric system. Failure to do so is likely to
result in missed deadlines.

Cpusets
-------

The affinity of a 32-bit task on an asymmetric system may include CPUs
that are not explicitly allowed by the cpuset to which it is attached.
This can occur as a result of the following two situations:

  - A 64-bit task attached to a cpuset which allows only 64-bit CPUs
    executes a 32-bit program.

  - All of the 32-bit-capable CPUs allowed by a cpuset containing a
    32-bit task are offlined.

In both of these cases, the new affinity is calculated according to step
(2) of the process described in `execve(2)`_ and the cpuset hierarchy is
unchanged irrespective of the cgroup version.

CPU hotplug
-----------

On an asymmetric system, the first detected 32-bit-capable CPU is
prevented from being offlined by userspace and any such attempt will
return ``-EPERM``. Note that suspend is still permitted even if the
primary CPU (i.e. CPU 0) is 64-bit-only.

KVM
---

Although KVM will not advertise 32-bit EL0 support to any vCPUs on an
asymmetric system, a broken guest at EL1 could still attempt to execute
32-bit code at EL0. In this case, an exit from a vCPU thread in 32-bit
mode will return to host userspace with an ``exit_reason`` of
``KVM_EXIT_FAIL_ENTRY`` and will remain non-runnable until successfully
re-initialised by a subsequent ``KVM_ARM_VCPU_INIT`` operation.
+33 −4
Original line number Diff line number Diff line
@@ -207,11 +207,18 @@ Before jumping into the kernel, the following conditions must be met:
  software at a higher exception level to prevent execution in an UNKNOWN
  state.

  For all systems:
  - If EL3 is present:

    - SCR_EL3.FIQ must have the same value across all CPUs the kernel is
      executing on.
    - The value of SCR_EL3.FIQ must be the same as the one present at boot
      time whenever the kernel is executing.

  - If EL3 is present and the kernel is entered at EL2:

    - SCR_EL3.HCE (bit 8) must be initialised to 0b1.

  For systems with a GICv3 interrupt controller to be used in v3 mode:
  - If EL3 is present:

@@ -311,6 +318,28 @@ Before jumping into the kernel, the following conditions must be met:
    - ZCR_EL2.LEN must be initialised to the same value for all CPUs the
      kernel will execute on.

  For CPUs with the Scalable Matrix Extension (FEAT_SME):

  - If EL3 is present:

    - CPTR_EL3.ESM (bit 12) must be initialised to 0b1.

    - SCR_EL3.EnTP2 (bit 41) must be initialised to 0b1.

    - SMCR_EL3.LEN must be initialised to the same value for all CPUs the
      kernel will execute on.

 - If the kernel is entered at EL1 and EL2 is present:

    - CPTR_EL2.TSM (bit 12) must be initialised to 0b0.

    - CPTR_EL2.SMEN (bits 25:24) must be initialised to 0b11.

    - SCTLR_EL2.EnTP2 (bit 60) must be initialised to 0b1.

    - SMCR_EL2.LEN must be initialised to the same value for all CPUs the
      kernel will execute on.

The requirements described above for CPU mode, caches, MMUs, architected
timers, coherency and system registers apply to all CPUs.  All CPUs must
enter the kernel in the same exception level.  Where the values documented
+1 −0
Original line number Diff line number Diff line
@@ -10,6 +10,7 @@ ARM64 Architecture
    acpi_object_usage
    amu
    arm-acpi
    asymmetric-32bit
    booting
    cpu-feature-registers
    elf_hwcaps
Loading