Commit a66b953d authored by Stefan Roesch's avatar Stefan Roesch Committed by Jinjiang Tu
Browse files

mm/ksm: support fork/exec for prctl

mainline inclusion
from mainline-v6.7-rc1
commit 3c6f33b7273a7e2f2b2497b62c8400bd957b2fbe
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I9GT87

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3c6f33b7273a7e2f2b2497b62c8400bd957b2fbe

--------------------------------

Patch series "mm/ksm: add fork-exec support for prctl", v4.

A process can enable KSM with the prctl system call.  When the process is
forked the KSM flag is inherited by the child process.  However if the
process is executing an exec system call directly after the fork, the KSM
setting is cleared.  This patch series addresses this problem.

1) Change the mask in coredump.h for execing a new process
2) Add a new test case in ksm_functional_tests

This patch (of 2):

Today we have two ways to enable KSM:

1) madvise system call
   This allows to enable KSM for a memory region for a long time.

2) prctl system call
   This is a recent addition to enable KSM for the complete process.
   In addition when a process is forked, the KSM setting is inherited.

This change only affects the second case.

One of the use cases for (2) was to support the ability to enable
KSM for cgroups. This allows systemd to enable KSM for the seed
process. By enabling it in the seed process all child processes inherit
the setting.

This works correctly when the process is forked. However it doesn't
support fork/exec workflow.

From the previous cover letter:

....
Use case 3:
With the madvise call sharing opportunities are only enabled for the
current process: it is a workload-local decision. A considerable number
of sharing opportunities may exist across multiple workloads or jobs
(if they are part of the same security domain). Only a higler level
entity like a job scheduler or container can know for certain if its
running one or more instances of a job. That job scheduler however
doesn't have the necessary internal workload knowledge to make targeted
madvise calls.
....

In addition it can also be a bit surprising that fork keeps the KSM
setting and fork/exec does not.

Link: https://lkml.kernel.org/r/20230922211141.320789-1-shr@devkernel.io
Link: https://lkml.kernel.org/r/20230922211141.320789-2-shr@devkernel.io


Signed-off-by: default avatarStefan Roesch <shr@devkernel.io>
Fixes: d7597f59 ("mm: add new api to enable ksm per process")
Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
Reported-by: default avatarCarl Klemm <carl@uvos.xyz>
Tested-by: default avatarCarl Klemm <carl@uvos.xyz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>

Conflicts:
	include/linux/sched/coredump.h
[Context conflicts.]
Signed-off-by: default avatarJinjiang Tu <tujinjiang@huawei.com>
parent 8d7f4806
Loading
Loading
Loading
Loading
+4 −2
Original line number Diff line number Diff line
@@ -70,13 +70,15 @@ static inline int get_dumpable(struct mm_struct *mm)
#define MMF_UNSTABLE		22	/* mm is unstable for copy_from_user */
#define MMF_HUGE_ZERO_PAGE	23      /* mm has ever used the global huge zero page */
#define MMF_DISABLE_THP		24	/* disable THP for all VMAs */
#define MMF_DISABLE_THP_MASK	(1 << MMF_DISABLE_THP)
#define MMF_OOM_VICTIM		25	/* mm is the oom victim */
#define MMF_OOM_REAP_QUEUED	26	/* mm was queued for oom_reaper */
#define MMF_MULTIPROCESS	27	/* mm is shared between processes */
#define MMF_DISABLE_THP_MASK	(1 << MMF_DISABLE_THP)
#define MMF_VM_MERGE_ANY	29
#define MMF_VM_MERGE_ANY_MASK	(1 << MMF_VM_MERGE_ANY)

#define MMF_INIT_MASK		(MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
				 MMF_DISABLE_THP_MASK)
				 MMF_DISABLE_THP_MASK | MMF_VM_MERGE_ANY_MASK)

#define MMF_VM_MERGE_ANY	29
#endif /* _LINUX_SCHED_COREDUMP_H */