Commit b96a3e91 authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Some swap cleanups from Ma Wupeng ("fix WARN_ON in
   add_to_avail_list")

 - Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which
   reduces the special-case code for handling hugetlb pages in GUP. It
   also speeds up GUP handling of transparent hugepages.

 - Peng Zhang provides some maple tree speedups ("Optimize the fast path
   of mas_store()").

 - Sergey Senozhatsky has improved te performance of zsmalloc during
   compaction (zsmalloc: small compaction improvements").

 - Domenico Cerasuolo has developed additional selftest code for zswap
   ("selftests: cgroup: add zswap test program").

 - xu xin has doe some work on KSM's handling of zero pages. These
   changes are mainly to enable the user to better understand the
   effectiveness of KSM's treatment of zero pages ("ksm: support
   tracking KSM-placed zero-pages").

 - Jeff Xu has fixes the behaviour of memfd's
   MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl
   MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED").

 - David Howells has fixed an fscache optimization ("mm, netfs, fscache:
   Stop read optimisation when folio removed from pagecache").

 - Axel Rasmussen has given userfaultfd the ability to simulate memory
   poisoning ("add UFFDIO_POISON to simulate memory poisoning with
   UFFD").

 - Miaohe Lin has contributed some routine maintenance work on the
   memory-failure code ("mm: memory-failure: remove unneeded PageHuge()
   check").

 - Peng Zhang has contributed some maintenance work on the maple tree
   code ("Improve the validation for maple tree and some cleanup").

 - Hugh Dickins has optimized the collapsing of shmem or file pages into
   THPs ("mm: free retracted page table by RCU").

 - Jiaqi Yan has a patch series which permits us to use the healthy
   subpages within a hardware poisoned huge page for general purposes
   ("Improve hugetlbfs read on HWPOISON hugepages").

 - Kemeng Shi has done some maintenance work on the pagetable-check code
   ("Remove unused parameters in page_table_check").

 - More folioification work from Matthew Wilcox ("More filesystem folio
   conversions for 6.6"), ("Followup folio conversions for zswap"). And
   from ZhangPeng ("Convert several functions in page_io.c to use a
   folio").

 - page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext").

 - Baoquan He has converted some architectures to use the
   GENERIC_IOREMAP ioremap()/iounmap() code ("mm: ioremap: Convert
   architectures to take GENERIC_IOREMAP way").

 - Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support
   batched/deferred tlb shootdown during page reclamation/migration").

 - Better maple tree lockdep checking from Liam Howlett ("More strict
   maple tree lockdep"). Liam also developed some efficiency
   improvements ("Reduce preallocations for maple tree").

 - Cleanup and optimization to the secondary IOMMU TLB invalidation,
   from Alistair Popple ("Invalidate secondary IOMMU TLB on permission
   upgrade").

 - Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes
   for arm64").

 - Kemeng Shi provides some maintenance work on the compaction code
   ("Two minor cleanups for compaction").

 - Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle
   most file-backed faults under the VMA lock").

 - Aneesh Kumar contributes code to use the vmemmap optimization for DAX
   on ppc64, under some circumstances ("Add support for DAX vmemmap
   optimization for ppc64").

 - page-ext cleanups from Kemeng Shi ("add page_ext_data to get client
   data in page_ext"), ("minor cleanups to page_ext header").

 - Some zswap cleanups from Johannes Weiner ("mm: zswap: three
   cleanups").

 - kmsan cleanups from ZhangPeng ("minor cleanups for kmsan").

 - VMA handling cleanups from Kefeng Wang ("mm: convert to
   vma_is_initial_heap/stack()").

 - DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes:
   implement DAMOS tried total bytes file"), ("Extend DAMOS filters for
   address ranges and DAMON monitoring targets").

 - Compaction work from Kemeng Shi ("Fixes and cleanups to compaction").

 - Liam Howlett has improved the maple tree node replacement code
   ("maple_tree: Change replacement strategy").

 - ZhangPeng has a general code cleanup - use the K() macro more widely
   ("cleanup with helper macro K()").

 - Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for
   memmap on memory feature on ppc64").

 - pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list
   in page_alloc"), ("Two minor cleanups for get pageblock
   migratetype").

 - Vishal Moola introduces a memory descriptor for page table tracking,
   "struct ptdesc" ("Split ptdesc from struct page").

 - memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups
   for vm.memfd_noexec").

 - MM include file rationalization from Hugh Dickins ("arch: include
   asm/cacheflush.h in asm/hugetlb.h").

 - THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text
   output").

 - kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use
   object_cache instead of kmemleak_initialized").

 - More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor
   and _folio_order").

 - A VMA locking scalability improvement from Suren Baghdasaryan
   ("Per-VMA lock support for swap and userfaults").

 - pagetable handling cleanups from Matthew Wilcox ("New page table
   range API").

 - A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop
   using page->private on tail pages for THP_SWAP + cleanups").

 - Cleanups and speedups to the hugetlb fault handling from Matthew
   Wilcox ("Change calling convention for ->huge_fault").

 - Matthew Wilcox has also done some maintenance work on the MM
   subsystem documentation ("Improve mm documentation").

* tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (489 commits)
  maple_tree: shrink struct maple_tree
  maple_tree: clean up mas_wr_append()
  secretmem: convert page_is_secretmem() to folio_is_secretmem()
  nios2: fix flush_dcache_page() for usage from irq context
  hugetlb: add documentation for vma_kernel_pagesize()
  mm: add orphaned kernel-doc to the rst files.
  mm: fix clean_record_shared_mapping_range kernel-doc
  mm: fix get_mctgt_type() kernel-doc
  mm: fix kernel-doc warning from tlb_flush_rmaps()
  mm: remove enum page_entry_size
  mm: allow ->huge_fault() to be called without the mmap_lock held
  mm: move PMD_ORDER to pgtable.h
  mm: remove checks for pte_index
  memcg: remove duplication detection for mem_cgroup_uncharge_swap
  mm/huge_memory: work on folio->swap instead of page->private when splitting folio
  mm/swap: inline folio_set_swap_entry() and folio_swap_entry()
  mm/swap: use dedicated entry for swap in folio
  mm/swap: stop using page->private on tail pages for THP_SWAP
  selftests/mm: fix WARNING comparing pointer to 0
  selftests: cgroup: fix test_kmem_memcg_deletion kernel mem check
  ...
parents 651a00bc 52ae298e
Loading
Loading
Loading
Loading
+36 −4
Original line number Diff line number Diff line
@@ -29,8 +29,10 @@ Description: Writing 'on' or 'off' to this file makes the kdamond starts or
		file updates contents of schemes stats files of the kdamond.
		Writing 'update_schemes_tried_regions' to the file updates
		contents of 'tried_regions' directory of every scheme directory
		of this kdamond.  Writing 'clear_schemes_tried_regions' to the
		file removes contents of the 'tried_regions' directory.
		of this kdamond.  Writing 'update_schemes_tried_bytes' to the
		file updates only '.../tried_regions/total_bytes' files of this
		kdamond.  Writing 'clear_schemes_tried_regions' to the file
		removes contents of the 'tried_regions' directory.

What:		/sys/kernel/mm/damon/admin/kdamonds/<K>/pid
Date:		Mar 2022
@@ -269,8 +271,10 @@ What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/filters/
Date:		Dec 2022
Contact:	SeongJae Park <sj@kernel.org>
Description:	Writing to and reading from this file sets and gets the type of
		the memory of the interest.  'anon' for anonymous pages, or
		'memcg' for specific memory cgroup can be written and read.
		the memory of the interest.  'anon' for anonymous pages,
		'memcg' for specific memory cgroup, 'addr' for address range
		(an open-ended interval), or 'target' for DAMON monitoring
		target can be written and read.

What:		/sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/filters/<F>/memcg_path
Date:		Dec 2022
@@ -279,6 +283,27 @@ Description: If 'memcg' is written to the 'type' file, writing to and
		reading from this file sets and gets the path to the memory
		cgroup of the interest.

What:		/sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/filters/<F>/addr_start
Date:		Jul 2023
Contact:	SeongJae Park <sj@kernel.org>
Description:	If 'addr' is written to the 'type' file, writing to or reading
		from this file sets or gets the start address of the address
		range for the filter.

What:		/sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/filters/<F>/addr_end
Date:		Jul 2023
Contact:	SeongJae Park <sj@kernel.org>
Description:	If 'addr' is written to the 'type' file, writing to or reading
		from this file sets or gets the end address of the address
		range for the filter.

What:		/sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/filters/<F>/target_idx
Date:		Dec 2022
Contact:	SeongJae Park <sj@kernel.org>
Description:	If 'target' is written to the 'type' file, writing to or
		reading from this file sets or gets the index of the DAMON
		monitoring target of the interest.

What:		/sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/filters/<F>/matching
Date:		Dec 2022
Contact:	SeongJae Park <sj@kernel.org>
@@ -317,6 +342,13 @@ Contact: SeongJae Park <sj@kernel.org>
Description:	Reading this file returns the number of the exceed events of
		the scheme's quotas.

What:		/sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/tried_regions/total_bytes
Date:		Jul 2023
Contact:	SeongJae Park <sj@kernel.org>
Description:	Reading this file returns the total amount of memory that
		corresponding DAMON-based Operation Scheme's action has tried
		to be applied.

What:		/sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/tried_regions/<R>/start
Date:		Oct 2022
Contact:	SeongJae Park <sj@kernel.org>
+2 −2
Original line number Diff line number Diff line
@@ -10,7 +10,7 @@ Description:
		dropping it if possible. The kernel will then be placed
		on the bad page list and never be reused.

		The offlining is done in kernel specific granuality.
		The offlining is done in kernel specific granularity.
		Normally it's the base page size of the kernel, but
		this might change.

@@ -35,7 +35,7 @@ Description:
		to access this page assuming it's poisoned by the
		hardware.

		The offlining is done in kernel specific granuality.
		The offlining is done in kernel specific granularity.
		Normally it's the base page size of the kernel, but
		this might change.

+0 −2
Original line number Diff line number Diff line
@@ -92,8 +92,6 @@ Brief summary of control files.
 memory.oom_control		     set/show oom controls.
 memory.numa_stat		     show the number of memory usage per numa
				     node
 memory.kmem.limit_in_bytes          This knob is deprecated and writing to
                                     it will return -ENOTSUPP.
 memory.kmem.usage_in_bytes          show current kernel memory allocation
 memory.kmem.failcnt                 show the number of kernel memory usage
				     hits limits
+4 −10
Original line number Diff line number Diff line
@@ -141,8 +141,8 @@ nodemask_t
The size of a nodemask_t type. Used to compute the number of online
nodes.

(page, flags|_refcount|mapping|lru|_mapcount|private|compound_dtor|compound_order|compound_head)
-------------------------------------------------------------------------------------------------
(page, flags|_refcount|mapping|lru|_mapcount|private|compound_order|compound_head)
----------------------------------------------------------------------------------

User-space tools compute their values based on the offset of these
variables. The variables are used when excluding unnecessary pages.
@@ -325,8 +325,8 @@ NR_FREE_PAGES
On linux-2.6.21 or later, the number of free pages is in
vm_stat[NR_FREE_PAGES]. Used to get the number of free pages.

PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|PG_hwpoision|PG_head_mask
------------------------------------------------------------------------------
PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|PG_hwpoision|PG_head_mask|PG_hugetlb
-----------------------------------------------------------------------------------------

Page attributes. These flags are used to filter various unnecessary for
dumping pages.
@@ -338,12 +338,6 @@ More page attributes. These flags are used to filter various unnecessary for
dumping pages.


HUGETLB_PAGE_DTOR
-----------------

The HUGETLB_PAGE_DTOR flag denotes hugetlbfs pages. Makedumpfile
excludes these pages.

x86_64
======

+50 −26
Original line number Diff line number Diff line
@@ -87,7 +87,7 @@ comma (","). ::
    │ │ │ │ │ │ │ filters/nr_filters
    │ │ │ │ │ │ │ │ 0/type,matching,memcg_id
    │ │ │ │ │ │ │ stats/nr_tried,sz_tried,nr_applied,sz_applied,qt_exceeds
    │ │ │ │ │ │ │ tried_regions/
    │ │ │ │ │ │ │ tried_regions/total_bytes
    │ │ │ │ │ │ │ │ 0/start,end,nr_accesses,age
    │ │ │ │ │ │ │ │ ...
    │ │ │ │ │ │ ...
@@ -127,14 +127,18 @@ in the state. Writing ``commit`` to the ``state`` file makes kdamond reads the
user inputs in the sysfs files except ``state`` file again.  Writing
``update_schemes_stats`` to ``state`` file updates the contents of stats files
for each DAMON-based operation scheme of the kdamond.  For details of the
stats, please refer to :ref:`stats section <sysfs_schemes_stats>`.  Writing
``update_schemes_tried_regions`` to ``state`` file updates the DAMON-based
operation scheme action tried regions directory for each DAMON-based operation
scheme of the kdamond.  Writing ``clear_schemes_tried_regions`` to ``state``
file clears the DAMON-based operating scheme action tried regions directory for
each DAMON-based operation scheme of the kdamond.  For details of the
DAMON-based operation scheme action tried regions directory, please refer to
:ref:`tried_regions section <sysfs_schemes_tried_regions>`.
stats, please refer to :ref:`stats section <sysfs_schemes_stats>`.

Writing ``update_schemes_tried_regions`` to ``state`` file updates the
DAMON-based operation scheme action tried regions directory for each
DAMON-based operation scheme of the kdamond.  Writing
``update_schemes_tried_bytes`` to ``state`` file updates only
``.../tried_regions/total_bytes`` files.  Writing
``clear_schemes_tried_regions`` to ``state`` file clears the DAMON-based
operating scheme action tried regions directory for each DAMON-based operation
scheme of the kdamond.  For details of the DAMON-based operation scheme action
tried regions directory, please refer to :ref:`tried_regions section
<sysfs_schemes_tried_regions>`.

If the state is ``on``, reading ``pid`` shows the pid of the kdamond thread.

@@ -359,15 +363,21 @@ number (``N``) to the file creates the number of child directories named ``0``
to ``N-1``.  Each directory represents each filter.  The filters are evaluated
in the numeric order.

Each filter directory contains three files, namely ``type``, ``matcing``, and
``memcg_path``.  You can write one of two special keywords, ``anon`` for
anonymous pages, or ``memcg`` for specific memory cgroup filtering.  In case of
the memory cgroup filtering, you can specify the memory cgroup of the interest
by writing the path of the memory cgroup from the cgroups mount point to
``memcg_path`` file.  You can write ``Y`` or ``N`` to ``matching`` file to
filter out pages that does or does not match to the type, respectively.  Then,
the scheme's action will not be applied to the pages that specified to be
filtered out.
Each filter directory contains six files, namely ``type``, ``matcing``,
``memcg_path``, ``addr_start``, ``addr_end``, and ``target_idx``.  To ``type``
file, you can write one of four special keywords: ``anon`` for anonymous pages,
``memcg`` for specific memory cgroup, ``addr`` for specific address range (an
open-ended interval), or ``target`` for specific DAMON monitoring target
filtering.  In case of the memory cgroup filtering, you can specify the memory
cgroup of the interest by writing the path of the memory cgroup from the
cgroups mount point to ``memcg_path`` file.  In case of the address range
filtering, you can specify the start and end address of the range to
``addr_start`` and ``addr_end`` files, respectively.  For the DAMON monitoring
target filtering, you can specify the index of the target between the list of
the DAMON context's monitoring targets list to ``target_idx`` file.  You can
write ``Y`` or ``N`` to ``matching`` file to filter out pages that does or does
not match to the type, respectively.  Then, the scheme's action will not be
applied to the pages that specified to be filtered out.

For example, below restricts a DAMOS action to be applied to only non-anonymous
pages of all memory cgroups except ``/having_care_already``.::
@@ -381,8 +391,14 @@ pages of all memory cgroups except ``/having_care_already``.::
    echo /having_care_already > 1/memcg_path
    echo N > 1/matching

Note that filters are currently supported only when ``paddr``
`implementation <sysfs_contexts>` is being used.
Note that ``anon`` and ``memcg`` filters are currently supported only when
``paddr`` `implementation <sysfs_contexts>` is being used.

Also, memory regions that are filtered out by ``addr`` or ``target`` filters
are not counted as the scheme has tried to those, while regions that filtered
out by other type filters are counted as the scheme has tried to.  The
difference is applied to :ref:`stats <damos_stats>` and
:ref:`tried regions <sysfs_schemes_tried_regions>`.

.. _sysfs_schemes_stats:

@@ -406,13 +422,21 @@ stats by writing a special keyword, ``update_schemes_stats`` to the relevant
schemes/<N>/tried_regions/
--------------------------

This directory initially has one file, ``total_bytes``.

When a special keyword, ``update_schemes_tried_regions``, is written to the
relevant ``kdamonds/<N>/state`` file, DAMON creates directories named integer
starting from ``0`` under this directory.  Each directory contains files
exposing detailed information about each of the memory region that the
corresponding scheme's ``action`` has tried to be applied under this directory,
during next :ref:`aggregation interval <sysfs_monitoring_attrs>`.  The
information includes address range, ``nr_accesses``, and ``age`` of the region.
relevant ``kdamonds/<N>/state`` file, DAMON updates the ``total_bytes`` file so
that reading it returns the total size of the scheme tried regions, and creates
directories named integer starting from ``0`` under this directory.  Each
directory contains files exposing detailed information about each of the memory
region that the corresponding scheme's ``action`` has tried to be applied under
this directory, during next :ref:`aggregation interval
<sysfs_monitoring_attrs>`.  The information includes address range,
``nr_accesses``, and ``age`` of the region.

Writing ``update_schemes_tried_bytes`` to the relevant ``kdamonds/<N>/state``
file will only update the ``total_bytes`` file, and will not create the
subdirectories.

The directories will be removed when another special keyword,
``clear_schemes_tried_regions``, is written to the relevant
Loading