Merge tag 'sched-core-2022-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip (3fe2f744) · Commits · EulixOS / Software / Kernel

Documentation/admin-guide/sysctl/kernel.rst

+1 −45

Original line number	Diff line number	Diff line
		@@ -609,51 +609,7 @@ be migrated to a local memory node.
		The unmapping of pages and trapping faults incur additional overhead that
		ideally is offset by improved memory locality but there is no universal
		guarantee. If the target workload is already bound to NUMA nodes then this
		feature should be disabled. Otherwise, if the system overhead from the
		feature is too high then the rate the kernel samples for NUMA hinting
		faults may be controlled by the `numa_balancing_scan_period_min_ms,
		numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms,
		numa_balancing_scan_size_mb`_, and numa_balancing_settle_count sysctls.


		numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb
		===============================================================================================================================


		Automatic NUMA balancing scans tasks address space and unmaps pages to
		detect if pages are properly placed or if the data should be migrated to a
		memory node local to where the task is running. Every "scan delay" the task
		scans the next "scan size" number of pages in its address space. When the
		end of the address space is reached the scanner restarts from the beginning.

		In combination, the "scan delay" and "scan size" determine the scan rate.
		When "scan delay" decreases, the scan rate increases. The scan delay and
		hence the scan rate of every task is adaptive and depends on historical
		behaviour. If pages are properly placed then the scan delay increases,
		otherwise the scan delay decreases. The "scan size" is not adaptive but
		the higher the "scan size", the higher the scan rate.

		Higher scan rates incur higher system overhead as page faults must be
		trapped and potentially data must be migrated. However, the higher the scan
		rate, the more quickly a tasks memory is migrated to a local node if the
		workload pattern changes and minimises performance impact due to remote
		memory accesses. These sysctls control the thresholds for scan delays and
		the number of pages scanned.

		``numa_balancing_scan_period_min_ms`` is the minimum time in milliseconds to
		scan a tasks virtual memory. It effectively controls the maximum scanning
		rate for each task.

		``numa_balancing_scan_delay_ms`` is the starting "scan delay" used for a task
		when it initially forks.

		``numa_balancing_scan_period_max_ms`` is the maximum time in milliseconds to
		scan a tasks virtual memory. It effectively controls the minimum scanning
		rate for each task.

		``numa_balancing_scan_size_mb`` is how many megabytes worth of pages are
		scanned for a given scan.

		feature should be disabled.

		oops_all_cpu_backtrace
		======================

Documentation/scheduler/index.rst

+1 −0

Original line number	Diff line number	Diff line
		@@ -18,6 +18,7 @@ Linux Scheduler
		sched-nice-design
		sched-rt-group
		sched-stats
		sched-debug

		text_files

Documentation/scheduler/sched-debug.rst

0 → 100644

+54 −0

Original line number	Diff line number	Diff line
		=================
		Scheduler debugfs
		=================

		Booting a kernel with CONFIG_SCHED_DEBUG=y will give access to
		scheduler specific debug files under /sys/kernel/debug/sched. Some of
		those files are described below.

		numa_balancing
		==============

		`numa_balancing` directory is used to hold files to control NUMA
		balancing feature. If the system overhead from the feature is too
		high then the rate the kernel samples for NUMA hinting faults may be
		controlled by the `scan_period_min_ms, scan_delay_ms,
		scan_period_max_ms, scan_size_mb` files.


		scan_period_min_ms, scan_delay_ms, scan_period_max_ms, scan_size_mb
		-------------------------------------------------------------------

		Automatic NUMA balancing scans tasks address space and unmaps pages to
		detect if pages are properly placed or if the data should be migrated to a
		memory node local to where the task is running. Every "scan delay" the task
		scans the next "scan size" number of pages in its address space. When the
		end of the address space is reached the scanner restarts from the beginning.

		In combination, the "scan delay" and "scan size" determine the scan rate.
		When "scan delay" decreases, the scan rate increases. The scan delay and
		hence the scan rate of every task is adaptive and depends on historical
		behaviour. If pages are properly placed then the scan delay increases,
		otherwise the scan delay decreases. The "scan size" is not adaptive but
		the higher the "scan size", the higher the scan rate.

		Higher scan rates incur higher system overhead as page faults must be
		trapped and potentially data must be migrated. However, the higher the scan
		rate, the more quickly a tasks memory is migrated to a local node if the
		workload pattern changes and minimises performance impact due to remote
		memory accesses. These files control the thresholds for scan delays and
		the number of pages scanned.

		``scan_period_min_ms`` is the minimum time in milliseconds to scan a
		tasks virtual memory. It effectively controls the maximum scanning
		rate for each task.

		``scan_delay_ms`` is the starting "scan delay" used for a task when it
		initially forks.

		``scan_period_max_ms`` is the maximum time in milliseconds to scan a
		tasks virtual memory. It effectively controls the minimum scanning
		rate for each task.

		``scan_size_mb`` is how many megabytes worth of pages are scanned for
		a given scan.

MAINTAINERS

+1 −0

Original line number	Diff line number	Diff line
		@@ -15566,6 +15566,7 @@ F: drivers/net/ppp/pptp.c

		PRESSURE STALL INFORMATION (PSI)
		M: Johannes Weiner <hannes@cmpxchg.org>
		M: Suren Baghdasaryan <surenb@google.com>
		S: Maintained
		F: include/linux/psi*
		F: kernel/sched/psi.c

arch/Kconfig

+33 −4

Original line number	Diff line number	Diff line
		@@ -1293,12 +1293,41 @@ config HAVE_STATIC_CALL_INLINE

		config HAVE_PREEMPT_DYNAMIC
		bool

		config HAVE_PREEMPT_DYNAMIC_CALL
		bool
		depends on HAVE_STATIC_CALL
		depends on GENERIC_ENTRY
		select HAVE_PREEMPT_DYNAMIC
		help
		An architecture should select this if it can handle the preemption
		model being selected at boot time using static calls.

		Where an architecture selects HAVE_STATIC_CALL_INLINE, any call to a
		preemption function will be patched directly.

		Where an architecture does not select HAVE_STATIC_CALL_INLINE, any
		call to a preemption function will go through a trampoline, and the
		trampoline will be patched.

		It is strongly advised to support inline static call to avoid any
		overhead.

		config HAVE_PREEMPT_DYNAMIC_KEY
		bool
		depends on HAVE_ARCH_JUMP_LABEL && CC_HAS_ASM_GOTO
		select HAVE_PREEMPT_DYNAMIC
		help
		Select this if the architecture support boot time preempt setting
		on top of static calls. It is strongly advised to support inline
		static call to avoid any overhead.
		An architecture should select this if it can handle the preemption
		model being selected at boot time using static keys.

		Each preemption function will be given an early return based on a
		static key. This should have slightly lower overhead than non-inline
		static calls, as this effectively inlines each trampoline into the
		start of its callee. This may avoid redundant work, and may
		integrate better with CFI schemes.

		This will have greater overhead than using inline static calls as
		the call to the preemption function cannot be entirely elided.

		config ARCH_WANT_LD_ORPHAN_WARN
		bool