Commit bf5c2726 authored by Ben Widawsky's avatar Ben Widawsky Committed by Ma Wupeng
Browse files

mm/mempolicy: advertise new MPOL_PREFERRED_MANY

mainline inclusion
from mainline-v5.15-rc1
commit a38a59fd
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I6I1Z2
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a38a59fdfa10be55d08e4530923d950e739ac6a2

--------------------------------

Adds a new mode to the existing mempolicy modes, MPOL_PREFERRED_MANY.

MPOL_PREFERRED_MANY will be adequately documented in the internal
admin-guide with this patch.  Eventually, the man pages for mbind(2),
get_mempolicy(2), set_mempolicy(2) and numactl(8) will also have text
about this mode.  Those shall contain the canonical reference.

NUMA systems continue to become more prevalent.  New technologies like
PMEM make finer grain control over memory access patterns increasingly
desirable.  MPOL_PREFERRED_MANY allows userspace to specify a set of nodes
that will be tried first when performing allocations.  If those
allocations fail, all remaining nodes will be tried.  It's a straight
forward API which solves many of the presumptive needs of system
administrators wanting to optimize workloads on such machines.  The mode
will work either per VMA, or per thread.

[Michal Hocko: refine kernel doc for MPOL_PREFERRED_MANY]

Link: https://lore.kernel.org/r/20200630212517.308045-13-ben.widawsky@intel.com
Link: https://lkml.kernel.org/r/1627970362-61305-5-git-send-email-feng.tang@intel.com


Signed-off-by: default avatarBen Widawsky <ben.widawsky@intel.com>
Signed-off-by: default avatarFeng Tang <feng.tang@intel.com>
Acked-by: default avatarMichal Hocko <mhocko@suse.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: default avatarMa Wupeng <mawupeng1@huawei.com>
parent 27a782f7
Loading
Loading
Loading
Loading
+11 −4
Original line number Original line Diff line number Diff line
@@ -245,6 +245,13 @@ MPOL_INTERLEAVED
	address range or file.  During system boot up, the temporary
	address range or file.  During system boot up, the temporary
	interleaved system default policy works in this mode.
	interleaved system default policy works in this mode.


MPOL_PREFERRED_MANY
	This mode specifices that the allocation should be preferrably
	satisfied from the nodemask specified in the policy. If there is
	a memory pressure on all nodes in the nodemask, the allocation
	can fall back to all existing numa nodes. This is effectively
	MPOL_PREFERRED allowed for a mask rather than a single node.

NUMA memory policy supports the following optional mode flags:
NUMA memory policy supports the following optional mode flags:


MPOL_F_STATIC_NODES
MPOL_F_STATIC_NODES
@@ -253,10 +260,10 @@ MPOL_F_STATIC_NODES
	nodes changes after the memory policy has been defined.
	nodes changes after the memory policy has been defined.


	Without this flag, any time a mempolicy is rebound because of a
	Without this flag, any time a mempolicy is rebound because of a
	change in the set of allowed nodes, the node (Preferred) or
        change in the set of allowed nodes, the preferred nodemask (Preferred
	nodemask (Bind, Interleave) is remapped to the new set of
        Many), preferred node (Preferred) or nodemask (Bind, Interleave) is
	allowed nodes.  This may result in nodes being used that were
        remapped to the new set of allowed nodes.  This may result in nodes
	previously undesired.
        being used that were previously undesired.


	With this flag, if the user-specified nodes overlap with the
	With this flag, if the user-specified nodes overlap with the
	nodes allowed by the task's cpuset, then the memory policy is
	nodes allowed by the task's cpuset, then the memory policy is
+1 −6
Original line number Original line Diff line number Diff line
@@ -1496,12 +1496,7 @@ static inline int sanitize_mpol_flags(int *mode, unsigned short *flags)
	*flags = *mode & MPOL_MODE_FLAGS;
	*flags = *mode & MPOL_MODE_FLAGS;
	*mode &= ~MPOL_MODE_FLAGS;
	*mode &= ~MPOL_MODE_FLAGS;


	/*
	if ((unsigned int)(*mode) >=  MPOL_MAX)
	 * The check should be 'mode >= MPOL_MAX', but as 'prefer_many'
	 * is not fully implemented, don't permit it to be used for now,
	 * and the logic will be restored in following patch
	 */
	if ((unsigned int)(*mode) >=  MPOL_PREFERRED_MANY)
		return -EINVAL;
		return -EINVAL;
	if ((*flags & MPOL_F_STATIC_NODES) && (*flags & MPOL_F_RELATIVE_NODES))
	if ((*flags & MPOL_F_STATIC_NODES) && (*flags & MPOL_F_RELATIVE_NODES))
		return -EINVAL;
		return -EINVAL;