Commit f56caeda authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge branch 'akpm' (patches from Andrew)

Merge misc updates from Andrew Morton:
 "146 patches.

  Subsystems affected by this patch series: kthread, ia64, scripts,
  ntfs, squashfs, ocfs2, vfs, and mm (slab-generic, slab, kmemleak,
  dax, kasan, debug, pagecache, gup, shmem, frontswap, memremap,
  memcg, selftests, pagemap, dma, vmalloc, memory-failure, hugetlb,
  userfaultfd, vmscan, mempolicy, oom-kill, hugetlbfs, migration, thp,
  ksm, page-poison, percpu, rmap, zswap, zram, cleanups, hmm, and
  damon)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (146 commits)
  mm/damon: hide kernel pointer from tracepoint event
  mm/damon/vaddr: hide kernel pointer from damon_va_three_regions() failure log
  mm/damon/vaddr: use pr_debug() for damon_va_three_regions() failure logging
  mm/damon/dbgfs: remove an unnecessary variable
  mm/damon: move the implementation of damon_insert_region to damon.h
  mm/damon: add access checking for hugetlb pages
  Docs/admin-guide/mm/damon/usage: update for schemes statistics
  mm/damon/dbgfs: support all DAMOS stats
  Docs/admin-guide/mm/damon/reclaim: document statistics parameters
  mm/damon/reclaim: provide reclamation statistics
  mm/damon/schemes: account how many times quota limit has exceeded
  mm/damon/schemes: account scheme actions that successfully applied
  mm/damon: remove a mistakenly added comment for a future feature
  Docs/admin-guide/mm/damon/usage: update for kdamond_pid and (mk|rm)_contexts
  Docs/admin-guide/mm/damon/usage: mention tracepoint at the beginning
  Docs/admin-guide/mm/damon/usage: remove redundant information
  Docs/admin-guide/mm/damon/usage: update for scheme quotas and watermarks
  mm/damon: convert macro functions to static inline functions
  mm/damon: modify damon_rand() macro to static inline function
  mm/damon: move damon_rand() definition into damon.h
  ...
parents a33f5c38 76fd0285
Loading
Loading
Loading
Loading
+4 −0
Original line number Diff line number Diff line
@@ -29,12 +29,14 @@ Brief summary of control files::
 hugetlb.<hugepagesize>.max_usage_in_bytes             # show max "hugepagesize" hugetlb  usage recorded
 hugetlb.<hugepagesize>.usage_in_bytes                 # show current usage for "hugepagesize" hugetlb
 hugetlb.<hugepagesize>.failcnt                        # show the number of allocation failure due to HugeTLB usage limit
 hugetlb.<hugepagesize>.numa_stat                      # show the numa information of the hugetlb memory charged to this cgroup

For a system supporting three hugepage sizes (64k, 32M and 1G), the control
files include::

  hugetlb.1GB.limit_in_bytes
  hugetlb.1GB.max_usage_in_bytes
  hugetlb.1GB.numa_stat
  hugetlb.1GB.usage_in_bytes
  hugetlb.1GB.failcnt
  hugetlb.1GB.rsvd.limit_in_bytes
@@ -43,6 +45,7 @@ files include::
  hugetlb.1GB.rsvd.failcnt
  hugetlb.64KB.limit_in_bytes
  hugetlb.64KB.max_usage_in_bytes
  hugetlb.64KB.numa_stat
  hugetlb.64KB.usage_in_bytes
  hugetlb.64KB.failcnt
  hugetlb.64KB.rsvd.limit_in_bytes
@@ -51,6 +54,7 @@ files include::
  hugetlb.64KB.rsvd.failcnt
  hugetlb.32MB.limit_in_bytes
  hugetlb.32MB.max_usage_in_bytes
  hugetlb.32MB.numa_stat
  hugetlb.32MB.usage_in_bytes
  hugetlb.32MB.failcnt
  hugetlb.32MB.rsvd.limit_in_bytes
+11 −0
Original line number Diff line number Diff line
@@ -1268,6 +1268,9 @@ PAGE_SIZE multiple when read back.
		The number of processes belonging to this cgroup
		killed by any kind of OOM killer.

          oom_group_kill
                The number of times a group OOM has occurred.

  memory.events.local
	Similar to memory.events but the fields in the file are local
	to the cgroup i.e. not hierarchical. The file modified event
@@ -1311,6 +1314,9 @@ PAGE_SIZE multiple when read back.
	  sock (npn)
		Amount of memory used in network transmission buffers

	  vmalloc (npn)
		Amount of memory used for vmap backed memory.

	  shmem
		Amount of cached filesystem data that is swap-backed,
		such as tmpfs, shm segments, shared anonymous mmap()s
@@ -2260,6 +2266,11 @@ HugeTLB Interface Files
	are local to the cgroup i.e. not hierarchical. The file modified event
	generated on this file reflects only the local events.

  hugetlb.<hugepagesize>.numa_stat
	Similar to memory.numa_stat, it shows the numa information of the
        hugetlb pages of <hugepagesize> in this cgroup.  Only active in
        use hugetlb pages are included.  The per-node values are in bytes.

Misc
----

+25 −0
Original line number Diff line number Diff line
@@ -208,6 +208,31 @@ PID of the DAMON thread.
If DAMON_RECLAIM is enabled, this becomes the PID of the worker thread.  Else,
-1.

nr_reclaim_tried_regions
------------------------

Number of memory regions that tried to be reclaimed by DAMON_RECLAIM.

bytes_reclaim_tried_regions
---------------------------

Total bytes of memory regions that tried to be reclaimed by DAMON_RECLAIM.

nr_reclaimed_regions
--------------------

Number of memory regions that successfully be reclaimed by DAMON_RECLAIM.

bytes_reclaimed_regions
-----------------------

Total bytes of memory regions that successfully be reclaimed by DAMON_RECLAIM.

nr_quota_exceeds
----------------

Number of times that the time/space quota limits have exceeded.

Example
=======

+176 −49
Original line number Diff line number Diff line
@@ -7,37 +7,40 @@ Detailed Usages
DAMON provides below three interfaces for different users.

- *DAMON user space tool.*
  This is for privileged people such as system administrators who want a
  just-working human-friendly interface.  Using this, users can use the DAMON’s
  major features in a human-friendly way.  It may not be highly tuned for
  special cases, though.  It supports both virtual and physical address spaces
  monitoring.
  `This <https://github.com/awslabs/damo>`_ is for privileged people such as
  system administrators who want a just-working human-friendly interface.
  Using this, users can use the DAMON’s major features in a human-friendly way.
  It may not be highly tuned for special cases, though.  It supports both
  virtual and physical address spaces monitoring.  For more detail, please
  refer to its `usage document
  <https://github.com/awslabs/damo/blob/next/USAGE.md>`_.
- *debugfs interface.*
  This is for privileged user space programmers who want more optimized use of
  DAMON.  Using this, users can use DAMON’s major features by reading
  from and writing to special debugfs files.  Therefore, you can write and use
  your personalized DAMON debugfs wrapper programs that reads/writes the
  debugfs files instead of you.  The DAMON user space tool is also a reference
  implementation of such programs.  It supports both virtual and physical
  address spaces monitoring.
  :ref:`This <debugfs_interface>` is for privileged user space programmers who
  want more optimized use of DAMON.  Using this, users can use DAMON’s major
  features by reading from and writing to special debugfs files.  Therefore,
  you can write and use your personalized DAMON debugfs wrapper programs that
  reads/writes the debugfs files instead of you.  The `DAMON user space tool
  <https://github.com/awslabs/damo>`_ is one example of such programs.  It
  supports both virtual and physical address spaces monitoring.  Note that this
  interface provides only simple :ref:`statistics <damos_stats>` for the
  monitoring results.  For detailed monitoring results, DAMON provides a
  :ref:`tracepoint <tracepoint>`.
- *Kernel Space Programming Interface.*
  This is for kernel space programmers.  Using this, users can utilize every
  feature of DAMON most flexibly and efficiently by writing kernel space
  DAMON application programs for you.  You can even extend DAMON for various
  address spaces.
  :doc:`This </vm/damon/api>` is for kernel space programmers.  Using this,
  users can utilize every feature of DAMON most flexibly and efficiently by
  writing kernel space DAMON application programs for you.  You can even extend
  DAMON for various address spaces.  For detail, please refer to the interface
  :doc:`document </vm/damon/api>`.

Nevertheless, you could write your own user space tool using the debugfs
interface.  A reference implementation is available at
https://github.com/awslabs/damo.  If you are a kernel programmer, you could
refer to :doc:`/vm/damon/api` for the kernel space programming interface.  For
the reason, this document describes only the debugfs interface

.. _debugfs_interface:

debugfs Interface
=================

DAMON exports five files, ``attrs``, ``target_ids``, ``init_regions``,
``schemes`` and ``monitor_on`` under its debugfs directory,
``<debugfs>/damon/``.
DAMON exports eight files, ``attrs``, ``target_ids``, ``init_regions``,
``schemes``, ``monitor_on``, ``kdamond_pid``, ``mk_contexts`` and
``rm_contexts`` under its debugfs directory, ``<debugfs>/damon/``.


Attributes
@@ -131,24 +134,38 @@ Schemes

For usual DAMON-based data access aware memory management optimizations, users
would simply want the system to apply a memory management action to a memory
region of a specific size having a specific access frequency for a specific
time.  DAMON receives such formalized operation schemes from the user and
applies those to the target processes.  It also counts the total number and
size of regions that each scheme is applied.  This statistics can be used for
online analysis or tuning of the schemes.
region of a specific access pattern.  DAMON receives such formalized operation
schemes from the user and applies those to the target processes.

Users can get and set the schemes by reading from and writing to ``schemes``
debugfs file.  Reading the file also shows the statistics of each scheme.  To
the file, each of the schemes should be represented in each line in below form:
the file, each of the schemes should be represented in each line in below
form::

    <target access pattern> <action> <quota> <watermarks>

You can disable schemes by simply writing an empty string to the file.

Target Access Pattern
~~~~~~~~~~~~~~~~~~~~~

The ``<target access pattern>`` is constructed with three ranges in below
form::

    min-size max-size min-acc max-acc min-age max-age

    min-size max-size min-acc max-acc min-age max-age action
Specifically, bytes for the size of regions (``min-size`` and ``max-size``),
number of monitored accesses per aggregate interval for access frequency
(``min-acc`` and ``max-acc``), number of aggregate intervals for the age of
regions (``min-age`` and ``max-age``) are specified.  Note that the ranges are
closed interval.

Note that the ranges are closed interval.  Bytes for the size of regions
(``min-size`` and ``max-size``), number of monitored accesses per aggregate
interval for access frequency (``min-acc`` and ``max-acc``), number of
aggregate intervals for the age of regions (``min-age`` and ``max-age``), and a
predefined integer for memory management actions should be used.  The supported
numbers and their meanings are as below.
Action
~~~~~~

The ``<action>`` is a predefined integer for memory management actions, which
DAMON will apply to the regions having the target access pattern.  The
supported numbers and their meanings are as below.

 - 0: Call ``madvise()`` for the region with ``MADV_WILLNEED``
 - 1: Call ``madvise()`` for the region with ``MADV_COLD``
@@ -157,20 +174,82 @@ numbers and their meanings are as below.
 - 4: Call ``madvise()`` for the region with ``MADV_NOHUGEPAGE``
 - 5: Do nothing but count the statistics

You can disable schemes by simply writing an empty string to the file.  For
example, below commands applies a scheme saying "If a memory region of size in
[4KiB, 8KiB] is showing accesses per aggregate interval in [0, 5] for aggregate
interval in [10, 20], page out the region", check the entered scheme again, and
finally remove the scheme. ::
Quota
~~~~~

    # cd <debugfs>/damon
    # echo "4096 8192    0 5    10 20    2" > schemes
    # cat schemes
    4096 8192 0 5 10 20 2 0 0
    # echo > schemes
Optimal ``target access pattern`` for each ``action`` is workload dependent, so
not easy to find.  Worse yet, setting a scheme of some action too aggressive
can cause severe overhead.  To avoid such overhead, users can limit time and
size quota for the scheme via the ``<quota>`` in below form::

    <ms> <sz> <reset interval> <priority weights>

This makes DAMON to try to use only up to ``<ms>`` milliseconds for applying
the action to memory regions of the ``target access pattern`` within the
``<reset interval>`` milliseconds, and to apply the action to only up to
``<sz>`` bytes of memory regions within the ``<reset interval>``.  Setting both
``<ms>`` and ``<sz>`` zero disables the quota limits.

When the quota limit is expected to be exceeded, DAMON prioritizes found memory
regions of the ``target access pattern`` based on their size, access frequency,
and age.  For personalized prioritization, users can set the weights for the
three properties in ``<priority weights>`` in below form::

    <size weight> <access frequency weight> <age weight>

Watermarks
~~~~~~~~~~

Some schemes would need to run based on current value of the system's specific
metrics like free memory ratio.  For such cases, users can specify watermarks
for the condition.::

    <metric> <check interval> <high mark> <middle mark> <low mark>

``<metric>`` is a predefined integer for the metric to be checked.  The
supported numbers and their meanings are as below.

 - 0: Ignore the watermarks
 - 1: System's free memory rate (per thousand)

The value of the metric is checked every ``<check interval>`` microseconds.

If the value is higher than ``<high mark>`` or lower than ``<low mark>``, the
scheme is deactivated.  If the value is lower than ``<mid mark>``, the scheme
is activated.

.. _damos_stats:

Statistics
~~~~~~~~~~

It also counts the total number and bytes of regions that each scheme is tried
to be applied, the two numbers for the regions that each scheme is successfully
applied, and the total number of the quota limit exceeds.  This statistics can
be used for online analysis or tuning of the schemes.

The statistics can be shown by reading the ``schemes`` file.  Reading the file
will show each scheme you entered in each line, and the five numbers for the
statistics will be added at the end of each line.

The last two integers in the 4th line of above example is the total number and
the total size of the regions that the scheme is applied.
Example
~~~~~~~

Below commands applies a scheme saying "If a memory region of size in [4KiB,
8KiB] is showing accesses per aggregate interval in [0, 5] for aggregate
interval in [10, 20], page out the region.  For the paging out, use only up to
10ms per second, and also don't page out more than 1GiB per second.  Under the
limitation, page out memory regions having longer age first.  Also, check the
free memory rate of the system every 5 seconds, start the monitoring and paging
out when the free memory rate becomes lower than 50%, but stop it if the free
memory rate becomes larger than 60%, or lower than 30%".::

    # cd <debugfs>/damon
    # scheme="4096 8192  0 5    10 20    2"  # target access pattern and action
    # scheme+=" 10 $((1024*1024*1024)) 1000" # quotas
    # scheme+=" 0 0 100"                     # prioritization weights
    # scheme+=" 1 5000000 600 500 300"       # watermarks
    # echo "$scheme" > schemes


Turning On/Off
@@ -195,6 +274,54 @@ the monitoring is turned on. If you write to the files while DAMON is running,
an error code such as ``-EBUSY`` will be returned.


Monitoring Thread PID
---------------------

DAMON does requested monitoring with a kernel thread called ``kdamond``.  You
can get the pid of the thread by reading the ``kdamond_pid`` file.  When the
monitoring is turned off, reading the file returns ``none``. ::

    # cd <debugfs>/damon
    # cat monitor_on
    off
    # cat kdamond_pid
    none
    # echo on > monitor_on
    # cat kdamond_pid
    18594


Using Multiple Monitoring Threads
---------------------------------

One ``kdamond`` thread is created for each monitoring context.  You can create
and remove monitoring contexts for multiple ``kdamond`` required use case using
the ``mk_contexts`` and ``rm_contexts`` files.

Writing the name of the new context to the ``mk_contexts`` file creates a
directory of the name on the DAMON debugfs directory.  The directory will have
DAMON debugfs files for the context. ::

    # cd <debugfs>/damon
    # ls foo
    # ls: cannot access 'foo': No such file or directory
    # echo foo > mk_contexts
    # ls foo
    # attrs  init_regions  kdamond_pid  schemes  target_ids

If the context is not needed anymore, you can remove it and the corresponding
directory by putting the name of the context to the ``rm_contexts`` file. ::

    # echo foo > rm_contexts
    # ls foo
    # ls: cannot access 'foo': No such file or directory

Note that ``mk_contexts``, ``rm_contexts``, and ``monitor_on`` files are in the
root directory only.


.. _tracepoint:

Tracepoint for Monitoring Results
=================================

+15 −1
Original line number Diff line number Diff line
@@ -408,7 +408,7 @@ follows:
Memory Policy APIs
==================

Linux supports 3 system calls for controlling memory policy.  These APIS
Linux supports 4 system calls for controlling memory policy.  These APIS
always affect only the calling task, the calling task's address space, or
some shared object mapped into the calling task's address space.

@@ -460,6 +460,20 @@ requested via the 'flags' argument.

See the mbind(2) man page for more details.

Set home node for a Range of Task's Address Spacec::

	long sys_set_mempolicy_home_node(unsigned long start, unsigned long len,
					 unsigned long home_node,
					 unsigned long flags);

sys_set_mempolicy_home_node set the home node for a VMA policy present in the
task's address range. The system call updates the home node only for the existing
mempolicy range. Other address ranges are ignored. A home node is the NUMA node
closest to which page allocation will come from. Specifying the home node override
the default allocation policy to allocate memory close to the local node for an
executing CPU.


Memory Policy Command Line Interface
====================================

Loading