Commit 14726903 authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge branch 'akpm' (patches from Andrew)

Merge misc updates from Andrew Morton:
 "173 patches.

  Subsystems affected by this series: ia64, ocfs2, block, and mm (debug,
  pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap,
  bootmem, sparsemem, vmalloc, kasan, pagealloc, memory-failure,
  hugetlb, userfaultfd, vmscan, compaction, mempolicy, memblock,
  oom-kill, migration, ksm, percpu, vmstat, and madvise)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (173 commits)
  mm/madvise: add MADV_WILLNEED to process_madvise()
  mm/vmstat: remove unneeded return value
  mm/vmstat: simplify the array size calculation
  mm/vmstat: correct some wrong comments
  mm/percpu,c: remove obsolete comments of pcpu_chunk_populated()
  selftests: vm: add COW time test for KSM pages
  selftests: vm: add KSM merging time test
  mm: KSM: fix data type
  selftests: vm: add KSM merging across nodes test
  selftests: vm: add KSM zero page merging test
  selftests: vm: add KSM unmerge test
  selftests: vm: add KSM merge test
  mm/migrate: correct kernel-doc notation
  mm: wire up syscall process_mrelease
  mm: introduce process_mrelease system call
  memblock: make memblock_find_in_range method private
  mm/mempolicy.c: use in_task() in mempolicy_slab_node()
  mm/mempolicy: unify the create() func for bind/interleave/prefer-many policies
  mm/mempolicy: advertise new MPOL_PREFERRED_MANY
  mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY
  ...
parents a9c9a6f7 d5fffc5a
Loading
Loading
Loading
Loading
+24 −0
Original line number Diff line number Diff line
What:		/sys/kernel/mm/numa/
Date:		June 2021
Contact:	Linux memory management mailing list <linux-mm@kvack.org>
Description:	Interface for NUMA

What:		/sys/kernel/mm/numa/demotion_enabled
Date:		June 2021
Contact:	Linux memory management mailing list <linux-mm@kvack.org>
Description:	Enable/disable demoting pages during reclaim

		Page migration during reclaim is intended for systems
		with tiered memory configurations.  These systems have
		multiple types of memory with varied performance
		characteristics instead of plain NUMA systems where
		the same kind of memory is found at varied distances.
		Allowing page migration during reclaim enables these
		systems to migrate pages from fast tiers to slow tiers
		when the fast tier is under pressure.  This migration
		is performed before swap.  It may move data to a NUMA
		node that does not fall into the cpuset of the
		allocating process which might be construed to violate
		the guarantees of cpusets.  This should not be enabled
		on systems which need strict cpuset location
		guarantees.
+11 −4
Original line number Diff line number Diff line
@@ -245,6 +245,13 @@ MPOL_INTERLEAVED
	address range or file.  During system boot up, the temporary
	interleaved system default policy works in this mode.

MPOL_PREFERRED_MANY
	This mode specifices that the allocation should be preferrably
	satisfied from the nodemask specified in the policy. If there is
	a memory pressure on all nodes in the nodemask, the allocation
	can fall back to all existing numa nodes. This is effectively
	MPOL_PREFERRED allowed for a mask rather than a single node.

NUMA memory policy supports the following optional mode flags:

MPOL_F_STATIC_NODES
@@ -253,10 +260,10 @@ MPOL_F_STATIC_NODES
	nodes changes after the memory policy has been defined.

	Without this flag, any time a mempolicy is rebound because of a
	change in the set of allowed nodes, the node (Preferred) or
	nodemask (Bind, Interleave) is remapped to the new set of
	allowed nodes.  This may result in nodes being used that were
	previously undesired.
        change in the set of allowed nodes, the preferred nodemask (Preferred
        Many), preferred node (Preferred) or nodemask (Bind, Interleave) is
        remapped to the new set of allowed nodes.  This may result in nodes
        being used that were previously undesired.

	With this flag, if the user-specified nodes overlap with the
	nodes allowed by the task's cpuset, then the memory policy is
+2 −1
Original line number Diff line number Diff line
@@ -118,7 +118,8 @@ compaction_proactiveness

This tunable takes a value in the range [0, 100] with a default value of
20. This tunable determines how aggressively compaction is done in the
background. Setting it to 0 disables proactive compaction.
background. Write of a non zero value to this tunable will immediately
trigger the proactive compaction. Setting it to 0 disables proactive compaction.

Note that compaction has a non-trivial system-wide impact as pages
belonging to different processes are moved around, which could also lead
+37 −49
Original line number Diff line number Diff line
@@ -271,10 +271,15 @@ maps this page at its virtual address.

  ``void flush_dcache_page(struct page *page)``

	Any time the kernel writes to a page cache page, _OR_
	the kernel is about to read from a page cache page and
	user space shared/writable mappings of this page potentially
	exist, this routine is called.
        This routines must be called when:

	  a) the kernel did write to a page that is in the page cache page
	     and / or in high memory
	  b) the kernel is about to read from a page cache page and user space
	     shared/writable mappings of this page potentially exist.  Note
	     that {get,pin}_user_pages{_fast} already call flush_dcache_page
	     on any page found in the user address space and thus driver
	     code rarely needs to take this into account.

	.. note::

@@ -284,38 +289,34 @@ maps this page at its virtual address.
	      handling vfs symlinks in the page cache need not call
	      this interface at all.

	The phrase "kernel writes to a page cache page" means,
	specifically, that the kernel executes store instructions
	that dirty data in that page at the page->virtual mapping
	of that page.  It is important to flush here to handle
	D-cache aliasing, to make sure these kernel stores are
	visible to user space mappings of that page.

	The corollary case is just as important, if there are users
	which have shared+writable mappings of this file, we must make
	sure that kernel reads of these pages will see the most recent
	stores done by the user.

	If D-cache aliasing is not an issue, this routine may
	simply be defined as a nop on that architecture.

        There is a bit set aside in page->flags (PG_arch_1) as
	"architecture private".  The kernel guarantees that,
	for pagecache pages, it will clear this bit when such
	a page first enters the pagecache.

	This allows these interfaces to be implemented much more
	efficiently.  It allows one to "defer" (perhaps indefinitely)
	the actual flush if there are currently no user processes
	mapping this page.  See sparc64's flush_dcache_page and
	update_mmu_cache implementations for an example of how to go
	about doing this.

	The idea is, first at flush_dcache_page() time, if
	page->mapping->i_mmap is an empty tree, just mark the architecture
	private page flag bit.  Later, in update_mmu_cache(), a check is
	made of this flag bit, and if set the flush is done and the flag
	bit is cleared.
	The phrase "kernel writes to a page cache page" means, specifically,
	that the kernel executes store instructions that dirty data in that
	page at the page->virtual mapping of that page.  It is important to
	flush here to handle D-cache aliasing, to make sure these kernel stores
	are visible to user space mappings of that page.

	The corollary case is just as important, if there are users which have
	shared+writable mappings of this file, we must make sure that kernel
	reads of these pages will see the most recent stores done by the user.

	If D-cache aliasing is not an issue, this routine may simply be defined
	as a nop on that architecture.

        There is a bit set aside in page->flags (PG_arch_1) as "architecture
	private".  The kernel guarantees that, for pagecache pages, it will
	clear this bit when such a page first enters the pagecache.

	This allows these interfaces to be implemented much more efficiently.
	It allows one to "defer" (perhaps indefinitely) the actual flush if
	there are currently no user processes mapping this page.  See sparc64's
	flush_dcache_page and update_mmu_cache implementations for an example
	of how to go about doing this.

	The idea is, first at flush_dcache_page() time, if page_file_mapping()
	returns a mapping, and mapping_mapped on that mapping returns %false,
	just mark the architecture private page flag bit.  Later, in
	update_mmu_cache(), a check is made of this flag bit, and if set the
	flush is done and the flag bit is cleared.

	.. important::

@@ -351,19 +352,6 @@ maps this page at its virtual address.
	architectures).  For incoherent architectures, it should flush
	the cache of the page at vmaddr.

  ``void flush_kernel_dcache_page(struct page *page)``

	When the kernel needs to modify a user page is has obtained
	with kmap, it calls this function after all modifications are
	complete (but before kunmapping it) to bring the underlying
	page up to date.  It is assumed here that the user has no
	incoherent cached copies (i.e. the original page was obtained
	from a mechanism like get_user_pages()).  The default
	implementation is a nop and should remain so on all coherent
	architectures.  On incoherent architectures, this should flush
	the kernel cache for page (using page_address(page)).


  ``void flush_icache_range(unsigned long start, unsigned long end)``

  	When the kernel stores into addresses that it will execute
+8 −5
Original line number Diff line number Diff line
@@ -181,9 +181,16 @@ By default, KASAN prints a bug report only for the first invalid memory access.
With ``kasan_multi_shot``, KASAN prints a report on every invalid access. This
effectively disables ``panic_on_warn`` for KASAN reports.

Alternatively, independent of ``panic_on_warn`` the ``kasan.fault=`` boot
parameter can be used to control panic and reporting behaviour:

- ``kasan.fault=report`` or ``=panic`` controls whether to only print a KASAN
  report or also panic the kernel (default: ``report``). The panic happens even
  if ``kasan_multi_shot`` is enabled.

Hardware tag-based KASAN mode (see the section about various modes below) is
intended for use in production as a security mitigation. Therefore, it supports
boot parameters that allow disabling KASAN or controlling its features.
additional boot parameters that allow disabling KASAN or controlling features:

- ``kasan=off`` or ``=on`` controls whether KASAN is enabled (default: ``on``).

@@ -199,10 +206,6 @@ boot parameters that allow disabling KASAN or controlling its features.
- ``kasan.stacktrace=off`` or ``=on`` disables or enables alloc and free stack
  traces collection (default: ``on``).

- ``kasan.fault=report`` or ``=panic`` controls whether to only print a KASAN
  report or also panic the kernel (default: ``report``). The panic happens even
  if ``kasan_multi_shot`` is enabled.

Implementation details
----------------------

Loading