Commit 8404c9fb authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge branch 'akpm' (patches from Andrew)

Merge more updates from Andrew Morton:
 "The remainder of the main mm/ queue.

  143 patches.

  Subsystems affected by this patch series (all mm): pagecache, hugetlb,
  userfaultfd, vmscan, compaction, migration, cma, ksm, vmstat, mmap,
  kconfig, util, memory-hotplug, zswap, zsmalloc, highmem, cleanups, and
  kfence"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (143 commits)
  kfence: use power-efficient work queue to run delayed work
  kfence: maximize allocation wait timeout duration
  kfence: await for allocation using wait_event
  kfence: zero guard page after out-of-bounds access
  mm/process_vm_access.c: remove duplicate include
  mm/mempool: minor coding style tweaks
  mm/highmem.c: fix coding style issue
  btrfs: use memzero_page() instead of open coded kmap pattern
  iov_iter: lift memzero_page() to highmem.h
  mm/zsmalloc: use BUG_ON instead of if condition followed by BUG.
  mm/zswap.c: switch from strlcpy to strscpy
  arm64/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
  x86/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
  mm,memory_hotplug: add kernel boot option to enable memmap_on_memory
  acpi,memhotplug: enable MHP_MEMMAP_ON_MEMORY when supported
  mm,memory_hotplug: allocate memmap from the added memory range
  mm,memory_hotplug: factor out adjusting present pages into adjust_present_page_count()
  mm,memory_hotplug: relax fully spanned sections check
  drivers/base/memory: introduce memory_block_{online,offline}
  mm/memory_hotplug: remove broken locking of zone PCP structures during hot remove
  ...
parents a79cdfba 36f0b35d
Loading
Loading
Loading
Loading
+25 −0
Original line number Diff line number Diff line
What:		/sys/kernel/mm/cma/
Date:		Feb 2021
Contact:	Minchan Kim <minchan@kernel.org>
Description:
		/sys/kernel/mm/cma/ contains a subdirectory for each CMA
		heap name (also sometimes called CMA areas).

		Each CMA heap subdirectory (that is, each
		/sys/kernel/mm/cma/<cma-heap-name> directory) contains the
		following items:

			alloc_pages_success
			alloc_pages_fail

What:		/sys/kernel/mm/cma/<cma-heap-name>/alloc_pages_success
Date:		Feb 2021
Contact:	Minchan Kim <minchan@kernel.org>
Description:
		the number of pages CMA API succeeded to allocate

What:		/sys/kernel/mm/cma/<cma-heap-name>/alloc_pages_fail
Date:		Feb 2021
Contact:	Minchan Kim <minchan@kernel.org>
Description:
		the number of pages CMA API failed to allocate
+17 −0
Original line number Diff line number Diff line
@@ -2804,6 +2804,23 @@
			seconds.  Use this parameter to check at some
			other rate.  0 disables periodic checking.

	memory_hotplug.memmap_on_memory
			[KNL,X86,ARM] Boolean flag to enable this feature.
			Format: {on | off (default)}
			When enabled, runtime hotplugged memory will
			allocate its internal metadata (struct pages)
			from the hotadded memory which will allow to
			hotadd a lot of memory without requiring
			additional memory to do so.
			This feature is disabled by default because it
			has some implication on large (e.g. GB)
			allocations in some configurations (e.g. small
			memory blocks).
			The state of the flag can be read in
			/sys/module/memory_hotplug/parameters/memmap_on_memory.
			Note that even when enabled, there are a few cases where
			the feature is not effective.

	memtest=	[KNL,X86,ARM,PPC] Enable memtest
			Format: <integer>
			default : 0 <disable>
+9 −0
Original line number Diff line number Diff line
@@ -357,6 +357,15 @@ creates ZONE_MOVABLE as following.
   Unfortunately, there is no information to show which memory block belongs
   to ZONE_MOVABLE. This is TBD.

.. note::
   Techniques that rely on long-term pinnings of memory (especially, RDMA and
   vfio) are fundamentally problematic with ZONE_MOVABLE and, therefore, memory
   hot remove. Pinned pages cannot reside on ZONE_MOVABLE, to guarantee that
   memory can still get hot removed - be aware that pinning can fail even if
   there is plenty of free memory in ZONE_MOVABLE. In addition, using
   ZONE_MOVABLE might make page pinning more expensive, because pages have to be
   migrated off that zone first.

.. _memory_hotplug_how_to_offline_memory:

How to offline memory
+66 −41
Original line number Diff line number Diff line
@@ -63,36 +63,36 @@ the generic ioctl available.

The ``uffdio_api.features`` bitmask returned by the ``UFFDIO_API`` ioctl
defines what memory types are supported by the ``userfaultfd`` and what
events, except page fault notifications, may be generated.

If the kernel supports registering ``userfaultfd`` ranges on hugetlbfs
virtual memory areas, ``UFFD_FEATURE_MISSING_HUGETLBFS`` will be set in
``uffdio_api.features``. Similarly, ``UFFD_FEATURE_MISSING_SHMEM`` will be
set if the kernel supports registering ``userfaultfd`` ranges on shared
memory (covering all shmem APIs, i.e. tmpfs, ``IPCSHM``, ``/dev/zero``,
``MAP_SHARED``, ``memfd_create``, etc).

The userland application that wants to use ``userfaultfd`` with hugetlbfs
or shared memory need to set the corresponding flag in
``uffdio_api.features`` to enable those features.

If the userland desires to receive notifications for events other than
page faults, it has to verify that ``uffdio_api.features`` has appropriate
``UFFD_FEATURE_EVENT_*`` bits set. These events are described in more
detail below in `Non-cooperative userfaultfd`_ section.

Once the ``userfaultfd`` has been enabled the ``UFFDIO_REGISTER`` ioctl should
be invoked (if present in the returned ``uffdio_api.ioctls`` bitmask) to
register a memory range in the ``userfaultfd`` by setting the
events, except page fault notifications, may be generated:

- The ``UFFD_FEATURE_EVENT_*`` flags indicate that various other events
  other than page faults are supported. These events are described in more
  detail below in the `Non-cooperative userfaultfd`_ section.

- ``UFFD_FEATURE_MISSING_HUGETLBFS`` and ``UFFD_FEATURE_MISSING_SHMEM``
  indicate that the kernel supports ``UFFDIO_REGISTER_MODE_MISSING``
  registrations for hugetlbfs and shared memory (covering all shmem APIs,
  i.e. tmpfs, ``IPCSHM``, ``/dev/zero``, ``MAP_SHARED``, ``memfd_create``,
  etc) virtual memory areas, respectively.

- ``UFFD_FEATURE_MINOR_HUGETLBFS`` indicates that the kernel supports
  ``UFFDIO_REGISTER_MODE_MINOR`` registration for hugetlbfs virtual memory
  areas.

The userland application should set the feature flags it intends to use
when invoking the ``UFFDIO_API`` ioctl, to request that those features be
enabled if supported.

Once the ``userfaultfd`` API has been enabled the ``UFFDIO_REGISTER``
ioctl should be invoked (if present in the returned ``uffdio_api.ioctls``
bitmask) to register a memory range in the ``userfaultfd`` by setting the
uffdio_register structure accordingly. The ``uffdio_register.mode``
bitmask will specify to the kernel which kind of faults to track for
the range (``UFFDIO_REGISTER_MODE_MISSING`` would track missing
pages). The ``UFFDIO_REGISTER`` ioctl will return the
the range. The ``UFFDIO_REGISTER`` ioctl will return the
``uffdio_register.ioctls`` bitmask of ioctls that are suitable to resolve
userfaults on the range registered. Not all ioctls will necessarily be
supported for all memory types depending on the underlying virtual
memory backend (anonymous memory vs tmpfs vs real filebacked
mappings).
supported for all memory types (e.g. anonymous memory vs. shmem vs.
hugetlbfs), or all types of intercepted faults.

Userland can use the ``uffdio_register.ioctls`` to manage the virtual
address space in the background (to add or potentially also remove
@@ -100,21 +100,46 @@ memory from the ``userfaultfd`` registered range). This means a userfault
could be triggering just before userland maps in the background the
user-faulted page.

The primary ioctl to resolve userfaults is ``UFFDIO_COPY``. That
atomically copies a page into the userfault registered range and wakes
up the blocked userfaults
(unless ``uffdio_copy.mode & UFFDIO_COPY_MODE_DONTWAKE`` is set).
Other ioctl works similarly to ``UFFDIO_COPY``. They're atomic as in
guaranteeing that nothing can see an half copied page since it'll
keep userfaulting until the copy has finished.
Resolving Userfaults
--------------------

There are three basic ways to resolve userfaults:

- ``UFFDIO_COPY`` atomically copies some existing page contents from
  userspace.

- ``UFFDIO_ZEROPAGE`` atomically zeros the new page.

- ``UFFDIO_CONTINUE`` maps an existing, previously-populated page.

These operations are atomic in the sense that they guarantee nothing can
see a half-populated page, since readers will keep userfaulting until the
operation has finished.

By default, these wake up userfaults blocked on the range in question.
They support a ``UFFDIO_*_MODE_DONTWAKE`` ``mode`` flag, which indicates
that waking will be done separately at some later time.

Which ioctl to choose depends on the kind of page fault, and what we'd
like to do to resolve it:

- For ``UFFDIO_REGISTER_MODE_MISSING`` faults, the fault needs to be
  resolved by either providing a new page (``UFFDIO_COPY``), or mapping
  the zero page (``UFFDIO_ZEROPAGE``). By default, the kernel would map
  the zero page for a missing fault. With userfaultfd, userspace can
  decide what content to provide before the faulting thread continues.

- For ``UFFDIO_REGISTER_MODE_MINOR`` faults, there is an existing page (in
  the page cache). Userspace has the option of modifying the page's
  contents before resolving the fault. Once the contents are correct
  (modified or not), userspace asks the kernel to map the page and let the
  faulting thread continue with ``UFFDIO_CONTINUE``.

Notes:

- If you requested ``UFFDIO_REGISTER_MODE_MISSING`` when registering then
  you must provide some kind of page in your thread after reading from
  the uffd.  You must provide either ``UFFDIO_COPY`` or ``UFFDIO_ZEROPAGE``.
  The normal behavior of the OS automatically providing a zero page on
  an anonymous mmaping is not in place.
- You can tell which kind of fault occurred by examining
  ``pagefault.flags`` within the ``uffd_msg``, checking for the
  ``UFFD_PAGEFAULT_FLAG_*`` flags.

- None of the page-delivering ioctls default to the range that you
  registered with.  You must fill in all fields for the appropriate
@@ -122,9 +147,9 @@ Notes:

- You get the address of the access that triggered the missing page
  event out of a struct uffd_msg that you read in the thread from the
  uffd.  You can supply as many pages as you want with ``UFFDIO_COPY`` or
  ``UFFDIO_ZEROPAGE``.  Keep in mind that unless you used DONTWAKE then
  the first of any of those IOCTLs wakes up the faulting thread.
  uffd.  You can supply as many pages as you want with these IOCTLs.
  Keep in mind that unless you used DONTWAKE then the first of any of
  those IOCTLs wakes up the faulting thread.

- Be sure to test for all errors including
  (``pollfd[0].revents & POLLERR``).  This can happen, e.g. when ranges
+2 −7
Original line number Diff line number Diff line
@@ -6,6 +6,7 @@
config ARC
	def_bool y
	select ARC_TIMERS
	select ARCH_HAS_CACHE_LINE_SIZE
	select ARCH_HAS_DEBUG_VM_PGTABLE
	select ARCH_HAS_DMA_PREP_COHERENT
	select ARCH_HAS_PTE_SPECIAL
@@ -28,6 +29,7 @@ config ARC
	select GENERIC_SMP_IDLE_THREAD
	select HAVE_ARCH_KGDB
	select HAVE_ARCH_TRACEHOOK
	select HAVE_ARCH_TRANSPARENT_HUGEPAGE if ARC_MMU_V4
	select HAVE_DEBUG_STACKOVERFLOW
	select HAVE_DEBUG_KMEMLEAK
	select HAVE_FUTEX_CMPXCHG if FUTEX
@@ -48,9 +50,6 @@ config ARC
	select HAVE_ARCH_JUMP_LABEL if ISA_ARCV2 && !CPU_ENDIAN_BE32
	select SET_FS

config ARCH_HAS_CACHE_LINE_SIZE
	def_bool y

config TRACE_IRQFLAGS_SUPPORT
	def_bool y

@@ -86,10 +85,6 @@ config STACKTRACE_SUPPORT
	def_bool y
	select STACKTRACE

config HAVE_ARCH_TRANSPARENT_HUGEPAGE
	def_bool y
	depends on ARC_MMU_V4

menu "ARC Architecture Configuration"

menu "ARC Platform/SoC/Board"
Loading