Commit 9244724f authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull SMP updates from Thomas Gleixner:
 "A large update for SMP management:

   - Parallel CPU bringup

     The reason why people are interested in parallel bringup is to
     shorten the (kexec) reboot time of cloud servers to reduce the
     downtime of the VM tenants.

     The current fully serialized bringup does the following per AP:

       1) Prepare callbacks (allocate, intialize, create threads)
       2) Kick the AP alive (e.g. INIT/SIPI on x86)
       3) Wait for the AP to report alive state
       4) Let the AP continue through the atomic bringup
       5) Let the AP run the threaded bringup to full online state

     There are two significant delays:

       #3 The time for an AP to report alive state in start_secondary()
          on x86 has been measured in the range between 350us and 3.5ms
          depending on vendor and CPU type, BIOS microcode size etc.

       #4 The atomic bringup does the microcode update. This has been
          measured to take up to ~8ms on the primary threads depending
          on the microcode patch size to apply.

     On a two socket SKL server with 56 cores (112 threads) the boot CPU
     spends on current mainline about 800ms busy waiting for the APs to
     come up and apply microcode. That's more than 80% of the actual
     onlining procedure.

     This can be reduced significantly by splitting the bringup
     mechanism into two parts:

       1) Run the prepare callbacks and kick the AP alive for each AP
          which needs to be brought up.

          The APs wake up, do their firmware initialization and run the
          low level kernel startup code including microcode loading in
          parallel up to the first synchronization point. (#1 and #2
          above)

       2) Run the rest of the bringup code strictly serialized per CPU
          (#3 - #5 above) as it's done today.

          Parallelizing that stage of the CPU bringup might be possible
          in theory, but it's questionable whether required surgery
          would be justified for a pretty small gain.

     If the system is large enough the first AP is already waiting at
     the first synchronization point when the boot CPU finished the
     wake-up of the last AP. That reduces the AP bringup time on that
     SKL from ~800ms to ~80ms, i.e. by a factor ~10x.

     The actual gain varies wildly depending on the system, CPU,
     microcode patch size and other factors. There are some
     opportunities to reduce the overhead further, but that needs some
     deep surgery in the x86 CPU bringup code.

     For now this is only enabled on x86, but the core functionality
     obviously works for all SMP capable architectures.

   - Enhancements for SMP function call tracing so it is possible to
     locate the scheduling and the actual execution points. That allows
     to measure IPI delivery time precisely"

* tag 'smp-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
  trace,smp: Add tracepoints for scheduling remotelly called functions
  trace,smp: Add tracepoints around remotelly called functions
  MAINTAINERS: Add CPU HOTPLUG entry
  x86/smpboot: Fix the parallel bringup decision
  x86/realmode: Make stack lock work in trampoline_compat()
  x86/smp: Initialize cpu_primary_thread_mask late
  cpu/hotplug: Fix off by one in cpuhp_bringup_mask()
  x86/apic: Fix use of X{,2}APIC_ENABLE in asm with older binutils
  x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup() and enable it
  x86/smpboot: Support parallel startup of secondary CPUs
  x86/smpboot: Implement a bit spinlock to protect the realmode stack
  x86/apic: Save the APIC virtual base address
  cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE
  x86/apic: Provide cpu_primary_thread mask
  x86/smpboot: Enable split CPU startup
  cpu/hotplug: Provide a split up CPUHP_BRINGUP mechanism
  cpu/hotplug: Reset task stack state in _cpu_up()
  cpu/hotplug: Remove unused state functions
  riscv: Switch to hotplug core state synchronization
  parisc: Switch to hotplug core state synchronization
  ...
parents 7cffdbe3 bf5a8c26
Loading
Loading
Loading
Loading
+6 −14
Original line number Diff line number Diff line
@@ -818,20 +818,6 @@
			Format:
			<first_slot>,<last_slot>,<port>,<enum_bit>[,<debug>]

	cpu0_hotplug	[X86] Turn on CPU0 hotplug feature when
			CONFIG_BOOTPARAM_HOTPLUG_CPU0 is off.
			Some features depend on CPU0. Known dependencies are:
			1. Resume from suspend/hibernate depends on CPU0.
			Suspend/hibernate will fail if CPU0 is offline and you
			need to online CPU0 before suspend/hibernate.
			2. PIC interrupts also depend on CPU0. CPU0 can't be
			removed if a PIC interrupt is detected.
			It's said poweroff/reboot may depend on CPU0 on some
			machines although I haven't seen such issues so far
			after CPU0 is offline on a few tested machines.
			If the dependencies are under your control, you can
			turn on cpu0_hotplug.

	cpuidle.off=1	[CPU_IDLE]
			disable the cpuidle sub-system

@@ -852,6 +838,12 @@
			on every CPU online, such as boot, and resume from suspend.
			Default: 10000

	cpuhp.parallel=
			[SMP] Enable/disable parallel bringup of secondary CPUs
			Format: <bool>
			Default is enabled if CONFIG_HOTPLUG_PARALLEL=y. Otherwise
			the parameter has no effect.

	crash_kexec_post_notifiers
			Run kdump after running panic-notifiers and dumping
			kmsg. This only for the users who doubt kdump always
+2 −11
Original line number Diff line number Diff line
@@ -127,17 +127,8 @@ bring CPU4 back online::
 $ echo 1 > /sys/devices/system/cpu/cpu4/online
 smpboot: Booting Node 0 Processor 4 APIC 0x1

The CPU is usable again. This should work on all CPUs. CPU0 is often special
and excluded from CPU hotplug. On X86 the kernel option
*CONFIG_BOOTPARAM_HOTPLUG_CPU0* has to be enabled in order to be able to
shutdown CPU0. Alternatively the kernel command option *cpu0_hotplug* can be
used. Some known dependencies of CPU0:

* Resume from hibernate/suspend. Hibernate/suspend will fail if CPU0 is offline.
* PIC interrupts. CPU0 can't be removed if a PIC interrupt is detected.

Please let Fenghua Yu <fenghua.yu@intel.com> know if you find any dependencies
on CPU0.
The CPU is usable again. This should work on all CPUs, but CPU0 is often special
and excluded from CPU hotplug.

The CPU hotplug coordination
============================
+12 −0
Original line number Diff line number Diff line
@@ -5344,6 +5344,18 @@ F: include/linux/sched/cpufreq.h
F:	kernel/sched/cpufreq*.c
F:	tools/testing/selftests/cpufreq/
CPU HOTPLUG
M:	Thomas Gleixner <tglx@linutronix.de>
M:	Peter Zijlstra <peterz@infradead.org>
L:	linux-kernel@vger.kernel.org
S:	Maintained
T:	git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git smp/core
F:	kernel/cpu.c
F:	kernel/smpboot.*
F:	include/linux/cpu.h
F:	include/linux/cpuhotplug.h
F:	include/linux/smpboot.h
CPU IDLE TIME MANAGEMENT FRAMEWORK
M:	"Rafael J. Wysocki" <rafael@kernel.org>
M:	Daniel Lezcano <daniel.lezcano@linaro.org>
+23 −0
Original line number Diff line number Diff line
@@ -34,6 +34,29 @@ config ARCH_HAS_SUBPAGE_FAULTS
config HOTPLUG_SMT
	bool

# Selected by HOTPLUG_CORE_SYNC_DEAD or HOTPLUG_CORE_SYNC_FULL
config HOTPLUG_CORE_SYNC
	bool

# Basic CPU dead synchronization selected by architecture
config HOTPLUG_CORE_SYNC_DEAD
	bool
	select HOTPLUG_CORE_SYNC

# Full CPU synchronization with alive state selected by architecture
config HOTPLUG_CORE_SYNC_FULL
	bool
	select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
	select HOTPLUG_CORE_SYNC

config HOTPLUG_SPLIT_STARTUP
	bool
	select HOTPLUG_CORE_SYNC_FULL

config HOTPLUG_PARALLEL
	bool
	select HOTPLUG_SPLIT_STARTUP

config GENERIC_ENTRY
	bool

+1 −0
Original line number Diff line number Diff line
@@ -125,6 +125,7 @@ config ARM
	select HAVE_SYSCALL_TRACEPOINTS
	select HAVE_UID16
	select HAVE_VIRT_CPU_ACCOUNTING_GEN
	select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
	select IRQ_FORCED_THREADING
	select MODULES_USE_ELF_REL
	select NEED_DMA_MAP_STATE
Loading