Skip to content
  1. Nov 07, 2021
    • Stephen Kitt's avatar
      mm: remove HARDENED_USERCOPY_FALLBACK · 53944f17
      Stephen Kitt authored
      This has served its purpose and is no longer used.  All usercopy
      violations appear to have been handled by now, any remaining instances
      (or new bugs) will cause copies to be rejected.
      
      This isn't a direct revert of commit 2d891fbc
      
       ("usercopy: Allow
      strict enforcement of whitelists"); since usercopy_fallback is
      effectively 0, the fallback handling is removed too.
      
      This also removes the usercopy_fallback module parameter on slab_common.
      
      Link: https://github.com/KSPP/linux/issues/153
      Link: https://lkml.kernel.org/r/20210921061149.1091163-1-steve@sk2.org
      Signed-off-by: default avatarStephen Kitt <steve@sk2.org>
      Suggested-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: Joel Stanley <joel@jms.id.au>	[defconfig change]
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: James Morris <jmorris@namei.org>
      Cc: "Serge E . Hallyn" <serge@hallyn.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      53944f17
    • Brian Geffon's avatar
      zram: introduce an aged idle interface · 755804d1
      Brian Geffon authored
      
      
      This change introduces an aged idle interface to the existing idle sysfs
      file for zram.
      
      When CONFIG_ZRAM_MEMORY_TRACKING is enabled the idle file now also
      accepts an integer argument.  This integer is the age (in seconds) of
      pages to mark as idle.  The idle file still supports 'all' as it always
      has.  This new approach allows for much more control over which pages
      get marked as idle.
      
      [bgeffon@google.com: use IS_ENABLED and cleanup comment]
        Link: https://lkml.kernel.org/r/20210924161128.1508015-1-bgeffon@google.com
      [bgeffon@google.com: Sergey's cleanup suggestions]
        Link: https://lkml.kernel.org/r/20210929143056.13067-1-bgeffon@google.com
      
      Link: https://lkml.kernel.org/r/20210923130115.1344361-1-bgeffon@google.com
      Signed-off-by: default avatarBrian Geffon <bgeffon@google.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Reviewed-by: default avatarSergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Suleiman Souhlal <suleiman@google.com>
      Cc: Jesse Barnes <jsbarnes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      755804d1
    • Dan Carpenter's avatar
      zram: off by one in read_block_state() · a88e03cf
      Dan Carpenter authored
      snprintf() returns the number of bytes it would have printed if there
      were space.  But it does not count the NUL terminator.  So that means
      that if "count == copied" then this has already overflowed by one
      character.
      
      This bug likely isn't super harmful in real life.
      
      Link: https://lkml.kernel.org/r/20210916130404.GA25094@kili
      Fixes: c0265342
      
       ("zram: introduce zram memory tracking")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a88e03cf
    • Jaewon Kim's avatar
      zram_drv: allow reclaim on bio_alloc · 4aabdc14
      Jaewon Kim authored
      
      
      The read_from_bdev_async is not called on atomic context.  So GFP_NOIO
      is available rather than GFP_ATOMIC.  If there were reclaimable pages
      with GFP_NOIO, we can avoid allocation failure and page fault failure.
      
      Link: https://lkml.kernel.org/r/20210908005241.28062-1-jaewon31.kim@samsung.com
      Signed-off-by: default avatarJaewon Kim <jaewon31.kim@samsung.com>
      Reported-by: default avatarYong-Taek Lee <ytk.lee@samsung.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4aabdc14
    • Ira Weiny's avatar
      mm/highmem: remove deprecated kmap_atomic · d2c20e51
      Ira Weiny authored
      
      
      kmap_atomic() is being deprecated in favor of kmap_local_page().
      
      Replace the uses of kmap_atomic() within the highmem code.
      
      On profiling clear_huge_page() using ftrace an improvement of 62% was
      observed on the below setup.
      
      Setup:-
      Below data has been collected on Qualcomm's SM7250 SoC THP enabled
      (kernel v4.19.113) with only CPU-0(Cortex-A55) and CPU-7(Cortex-A76)
      switched on and set to max frequency, also DDR set to perf governor.
      
      FTRACE Data:-
      
      Base data:-
      Number of iterations: 48
      Mean of allocation time: 349.5 us
      std deviation: 74.5 us
      
      v4 data:-
      Number of iterations: 48
      Mean of allocation time: 131 us
      std deviation: 32.7 us
      
      The following simple userspace experiment to allocate
      100MB(BUF_SZ) of pages and writing to it gave us a good insight,
      we observed an improvement of 42% in allocation and writing timings.
      -------------------------------------------------------------
      Test code snippet
      -------------------------------------------------------------
            clock_start();
            buf = malloc(BUF_SZ); /* Allocate 100 MB of memory */
      
              for(i=0; i < BUF_SZ_PAGES; i++)
              {
                      *((int *)(buf + (i*PAGE_SIZE))) = 1;
              }
            clock_end();
      -------------------------------------------------------------
      
      Malloc test timings for 100MB anon allocation:-
      
      Base data:-
      Number of iterations: 100
      Mean of allocation time: 31831 us
      std deviation: 4286 us
      
      v4 data:-
      Number of iterations: 100
      Mean of allocation time: 18193 us
      std deviation: 4915 us
      
      [willy@infradead.org: fix zero_user_segments()]
        Link: https://lkml.kernel.org/r/YYVhHCJcm2DM2G9u@casper.infradead.org
      
      Link: https://lkml.kernel.org/r/20210204073255.20769-2-prathu.baronia@oneplus.com
      Signed-off-by: default avatarIra Weiny <ira.weiny@intel.com>
      Signed-off-by: default avatarPrathu Baronia <prathu.baronia@oneplus.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d2c20e51
    • Miaohe Lin's avatar
      mm/zsmalloc.c: close race window between zs_pool_dec_isolated() and zs_unregister_migration() · afe8605c
      Miaohe Lin authored
      There is one possible race window between zs_pool_dec_isolated() and
      zs_unregister_migration() because wait_for_isolated_drain() checks the
      isolated count without holding class->lock and there is no order inside
      zs_pool_dec_isolated().  Thus the below race window could be possible:
      
        zs_pool_dec_isolated		zs_unregister_migration
          check pool->destroying != 0
      				  pool->destroying = true;
      				  smp_mb();
      				  wait_for_isolated_drain()
      				    wait for pool->isolated_pages == 0
          atomic_long_dec(&pool->isolated_pages);
          atomic_long_read(&pool->isolated_pages) == 0
      
      Since we observe the pool->destroying (false) before atomic_long_dec()
      for pool->isolated_pages, waking pool->migration_wait up is missed.
      
      Fix this by ensure checking pool->destroying happens after the
      atomic_long_dec(&pool->isolated_pages).
      
      Link: https://lkml.kernel.org/r/20210708115027.7557-1-linmiaohe@huawei.com
      Fixes: 701d6785
      
       ("mm/zsmalloc.c: fix race condition in zs_destroy_pool")
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Henry Burns <henryburns@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      afe8605c
    • Alistair Popple's avatar
      mm/rmap.c: avoid double faults migrating device private pages · 3d88705c
      Alistair Popple authored
      
      
      During migration special page table entries are installed for each page
      being migrated.  These entries store the pfn and associated permissions
      of ptes mapping the page being migarted.
      
      Device-private pages use special swap pte entries to distinguish
      read-only vs.  writeable pages which the migration code checks when
      creating migration entries.  Normally this follows a fast path in
      migrate_vma_collect_pmd() which correctly copies the permissions of
      device-private pages over to migration entries when migrating pages back
      to the CPU.
      
      However the slow-path falls back to using try_to_migrate() which
      unconditionally creates read-only migration entries for device-private
      pages.  This leads to unnecessary double faults on the CPU as the new
      pages are always mapped read-only even when they could be mapped
      writeable.  Fix this by correctly copying device-private permissions in
      try_to_migrate_one().
      
      Link: https://lkml.kernel.org/r/20211018045247.3128058-1-apopple@nvidia.com
      Signed-off-by: default avatarAlistair Popple <apopple@nvidia.com>
      Reported-by: default avatarRalph Campbell <rcampbell@nvidia.com>
      Reviewed-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3d88705c
    • David Hildenbrand's avatar
      mm/memory_hotplug: indicate MEMBLOCK_DRIVER_MANAGED with IORESOURCE_SYSRAM_DRIVER_MANAGED · 32befe9e
      David Hildenbrand authored
      
      
      Let's communicate driver-managed regions to memblock, to properly teach
      kexec_file with CONFIG_ARCH_KEEP_MEMBLOCK to not place images on these
      memory regions.
      
      Link: https://lkml.kernel.org/r/20211004093605.5830-6-david@redhat.com
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Jianyong Wu <Jianyong.Wu@arm.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Shahab Vahedi <shahab@synopsys.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vineet Gupta <vgupta@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      32befe9e
    • David Hildenbrand's avatar
      memblock: add MEMBLOCK_DRIVER_MANAGED to mimic IORESOURCE_SYSRAM_DRIVER_MANAGED · f7892d8e
      David Hildenbrand authored
      
      
      Let's add a flag that corresponds to IORESOURCE_SYSRAM_DRIVER_MANAGED,
      indicating that we're dealing with a memory region that is never
      indicated in the firmware-provided memory map, but always detected and
      added by a driver.
      
      Similar to MEMBLOCK_HOTPLUG, most infrastructure has to treat such
      memory regions like ordinary MEMBLOCK_NONE memory regions -- for
      example, when selecting memory regions to add to the vmcore for dumping
      in the crashkernel via for_each_mem_range().
      
      However, especially kexec_file is not supposed to select such memblocks
      via for_each_free_mem_range() / for_each_free_mem_range_reverse() to
      place kexec images, similar to how we handle
      IORESOURCE_SYSRAM_DRIVER_MANAGED without CONFIG_ARCH_KEEP_MEMBLOCK.
      
      We'll make sure that memory hotplug code sets the flag where applicable
      (IORESOURCE_SYSRAM_DRIVER_MANAGED) next.  This prepares architectures
      that need CONFIG_ARCH_KEEP_MEMBLOCK, such as arm64, for virtio-mem
      support.
      
      Note that kexec *must not* indicate this memory to the second kernel and
      *must not* place kexec-images on this memory.  Let's add a comment to
      kexec_walk_memblock(), documenting how we handle MEMBLOCK_DRIVER_MANAGED
      now just like using IORESOURCE_SYSRAM_DRIVER_MANAGED in
      locate_mem_hole_callback() for kexec_walk_resources().
      
      Also note that MEMBLOCK_HOTPLUG cannot be reused due to different
      semantics:
      	MEMBLOCK_HOTPLUG: memory is indicated as "System RAM" in the
      	firmware-provided memory map and added to the system early during
      	boot; kexec *has to* indicate this memory to the second kernel and
      	can place kexec-images on this memory. After memory hotunplug,
      	kexec has to be re-armed. We mostly ignore this flag when
      	"movable_node" is not set on the kernel command line, because
      	then we're told to not care about hotunpluggability of such
      	memory regions.
      
      	MEMBLOCK_DRIVER_MANAGED: memory is not indicated as "System RAM" in
      	the firmware-provided memory map; this memory is always detected
      	and added to the system by a driver; memory might not actually be
      	physically hotunpluggable. kexec *must not* indicate this memory to
      	the second kernel and *must not* place kexec-images on this memory.
      
      Link: https://lkml.kernel.org/r/20211004093605.5830-5-david@redhat.com
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Jianyong Wu <Jianyong.Wu@arm.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Shahab Vahedi <shahab@synopsys.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vineet Gupta <vgupta@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f7892d8e
    • David Hildenbrand's avatar
      memblock: allow to specify flags with memblock_add_node() · 952eea9b
      David Hildenbrand authored
      
      
      We want to specify flags when hotplugging memory.  Let's prepare to pass
      flags to memblock_add_node() by adjusting all existing users.
      
      Note that when hotplugging memory the system is already up and running
      and we might have concurrent memblock users: for example, while we're
      hotplugging memory, kexec_file code might search for suitable memory
      regions to place kexec images.  It's important to add the memory
      directly to memblock via a single call with the right flags, instead of
      adding the memory first and apply flags later: otherwise, concurrent
      memblock users might temporarily stumble over memblocks with wrong
      flags, which will be important in a follow-up patch that introduces a
      new flag to properly handle add_memory_driver_managed().
      
      Link: https://lkml.kernel.org/r/20211004093605.5830-4-david@redhat.com
      Acked-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: Shahab Vahedi <shahab@synopsys.com>	[arch/arc]
      Reviewed-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Jianyong Wu <Jianyong.Wu@arm.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vineet Gupta <vgupta@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      952eea9b
    • David Hildenbrand's avatar
      memblock: improve MEMBLOCK_HOTPLUG documentation · e14b4155
      David Hildenbrand authored
      
      
      The description of MEMBLOCK_HOTPLUG is currently short and consequently
      misleading: we're actually dealing with a memory region that might get
      hotunplugged later (i.e., the platform+firmware supports it), yet it is
      indicated in the firmware-provided memory map as system ram that will
      just get used by the system for any purpose when not taking special
      care.  The firmware marked this memory region as a hot(un)plugged (e.g.,
      hotplugged before reboot), implying that it might get hotunplugged again
      later.
      
      Whether we consider this information depends on the "movable_node"
      kernel commandline parameter: only with "movable_node" set, we'll try
      keeping this memory hotunpluggable, for example, by not serving early
      allocations from this memory region and by letting the buddy manage it
      using the ZONE_MOVABLE.
      
      Let's make this clearer by extending the documentation.
      
      Note: kexec *has to* indicate this memory to the second kernel.  With
      "movable_node" set, we don't want to place kexec-images on this memory.
      Without "movable_node" set, we don't care and can place kexec-images on
      this memory.  In both cases, after successful memory hotunplug, kexec
      has to be re-armed to update the memory map for the second kernel and to
      place the kexec-images somewhere else.
      
      Link: https://lkml.kernel.org/r/20211004093605.5830-3-david@redhat.com
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Jianyong Wu <Jianyong.Wu@arm.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Shahab Vahedi <shahab@synopsys.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vineet Gupta <vgupta@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e14b4155
    • David Hildenbrand's avatar
      mm/memory_hotplug: handle memblock_add_node() failures in add_memory_resource() · 53d38316
      David Hildenbrand authored
      Patch series "mm/memory_hotplug: full support for add_memory_driver_managed() with CONFIG_ARCH_KEEP_MEMBLOCK", v2.
      
      Architectures that require CONFIG_ARCH_KEEP_MEMBLOCK=y, such as arm64,
      don't cleanly support add_memory_driver_managed() yet.  Most
      prominently, kexec_file can still end up placing kexec images on such
      driver-managed memory, resulting in undesired behavior, for example,
      having kexec images located on memory not part of the firmware-provided
      memory map.
      
      Teaching kexec to not place images on driver-managed memory is
      especially relevant for virtio-mem.  Details can be found in commit
      7b7b2721
      
       ("mm/memory_hotplug: introduce
      add_memory_driver_managed()").
      
      Extend memblock with a new flag and set it from memory hotplug code when
      applicable.  This is required to fully support virtio-mem on arm64,
      making also kexec_file behave like on x86-64.
      
      This patch (of 2):
      
      If memblock_add_node() fails, we're most probably running out of memory.
      While this is unlikely to happen, it can happen and having memory added
      without a memblock can be problematic for architectures that use
      memblock to detect valid memory.  Let's fail in a nice way instead of
      silently ignoring the error.
      
      Link: https://lkml.kernel.org/r/20211004093605.5830-1-david@redhat.com
      Link: https://lkml.kernel.org/r/20211004093605.5830-2-david@redhat.com
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Jianyong Wu <Jianyong.Wu@arm.com>
      Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
      Cc: Vineet Gupta <vgupta@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Shahab Vahedi <shahab@synopsys.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      53d38316
    • David Hildenbrand's avatar
      x86: remove memory hotplug support on X86_32 · 5c11f00b
      David Hildenbrand authored
      
      
      CONFIG_MEMORY_HOTPLUG was marked BROKEN over one year and we just
      restricted it to 64 bit.  Let's remove the unused x86 32bit
      implementation and simplify the Kconfig.
      
      Link: https://lkml.kernel.org/r/20210929143600.49379-7-david@redhat.com
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Alex Shi <alexs@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5c11f00b
    • David Hildenbrand's avatar
      mm/memory_hotplug: remove stale function declarations · 43e3aa2a
      David Hildenbrand authored
      
      
      These functions no longer exist.
      
      Link: https://lkml.kernel.org/r/20210929143600.49379-6-david@redhat.com
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Alex Shi <alexs@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      43e3aa2a
    • David Hildenbrand's avatar
      mm/memory_hotplug: remove HIGHMEM leftovers · 6b740c6c
      David Hildenbrand authored
      
      
      We don't support CONFIG_MEMORY_HOTPLUG on 32 bit and consequently not
      HIGHMEM.  Let's remove any leftover code -- including the unused
      "status_change_nid_high" field part of the memory notifier.
      
      Link: https://lkml.kernel.org/r/20210929143600.49379-5-david@redhat.com
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Alex Shi <alexs@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6b740c6c
    • David Hildenbrand's avatar
      mm/memory_hotplug: restrict CONFIG_MEMORY_HOTPLUG to 64 bit · 7ec58a2b
      David Hildenbrand authored
      32 bit support is broken in various ways: for example, we can online
      memory that should actually go to ZONE_HIGHMEM to ZONE_MOVABLE or in
      some cases even to one of the other kernel zones.
      
      We marked it BROKEN in commit b59d02ed
      
       ("mm/memory_hotplug: disable
      the functionality for 32b") almost one year ago.  According to that
      commit it might be broken at least since 2017.  Further, there is hardly
      a sane use case nowadays.
      
      Let's just depend completely on 64bit, dropping the "BROKEN" dependency
      to make clear that we are not going to support it again.  Next, we'll
      remove some HIGHMEM leftovers from memory hotplug code to clean up.
      
      Link: https://lkml.kernel.org/r/20210929143600.49379-4-david@redhat.com
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Alex Shi <alexs@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7ec58a2b
    • David Hildenbrand's avatar
      mm/memory_hotplug: remove CONFIG_MEMORY_HOTPLUG_SPARSE · 50f9481e
      David Hildenbrand authored
      
      
      CONFIG_MEMORY_HOTPLUG depends on CONFIG_SPARSEMEM, so there is no need for
      CONFIG_MEMORY_HOTPLUG_SPARSE anymore; adjust all instances to use
      CONFIG_MEMORY_HOTPLUG and remove CONFIG_MEMORY_HOTPLUG_SPARSE.
      
      Link: https://lkml.kernel.org/r/20210929143600.49379-3-david@redhat.com
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: Shuah Khan <skhan@linuxfoundation.org>	[kselftest]
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Alex Shi <alexs@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      50f9481e
    • David Hildenbrand's avatar
      mm/memory_hotplug: remove CONFIG_X86_64_ACPI_NUMA dependency from CONFIG_MEMORY_HOTPLUG · 71b6f2dd
      David Hildenbrand authored
      
      
      Patch series "mm/memory_hotplug: Kconfig and 32 bit cleanups".
      
      Some cleanups around CONFIG_MEMORY_HOTPLUG, including removing 32 bit
      leftovers of memory hotplug support.
      
      This patch (of 6):
      
      SPARSEMEM is the only possible memory model for x86-64, FLATMEM is not
      possible:
      
      	config ARCH_FLATMEM_ENABLE
      		def_bool y
      		depends on X86_32 && !NUMA
      
      And X86_64_ACPI_NUMA (obviously) only supports x86-64:
      
      	config X86_64_ACPI_NUMA
      		def_bool y
      		depends on X86_64 && NUMA && ACPI && PCI
      
      Let's just remove the CONFIG_X86_64_ACPI_NUMA dependency, as it does no
      longer make sense.
      
      Link: https://lkml.kernel.org/r/20210929143600.49379-2-david@redhat.com
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Alex Shi <alexs@kernel.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      71b6f2dd
    • David Hildenbrand's avatar
      memory-hotplug.rst: document the "auto-movable" online policy · 9e122cc1
      David Hildenbrand authored
      Commit e83a437f
      
       ("mm/memory_hotplug: introduce "auto-movable" online
      policy") introduced a new memory online policy to automatically select a
      zone for memory blocks to be onlined.  It added a way to set the active
      online policy and tunables for the auto-movable online policy.
      
      Follow-up commits tweaked the "auto-movable" policy to also consider
      memory device details when selecting zones for memory blocks to be
      onlined.
      
      Let's document the new toggles and how the two online policies we have
      work.
      
      [david@redhat.com: updates]
        Link: https://lkml.kernel.org/r/20211011082058.6076-4-david@redhat.com
      
      Link: https://lkml.kernel.org/r/20210930144117.23641-4-david@redhat.com
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9e122cc1
    • David Hildenbrand's avatar
      memory-hotplug.rst: fix wrong /sys/module/memory_hotplug/parameters/ path · a8db400f
      David Hildenbrand authored
      We accidentially added a superfluous "s".
      
      Link: https://lkml.kernel.org/r/20210930144117.23641-3-david@redhat.com
      Fixes: ac3332c4
      
       ("memory-hotplug.rst: complete admin-guide overhaul")
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a8db400f
    • David Hildenbrand's avatar
      memory-hotplug.rst: fix two instances of "movablecore" that should be "movable_node" · d83fe3c9
      David Hildenbrand authored
      Patch series "memory-hotplug.rst: document the "auto-movable" online
      policy".
      
      Now that the memory-hotplug.rst overhaul is upstream, proper
      documentation for the "auto-movable" online policy, documenting all new
      toggles and options.  Along, two fixes for the original overhaul.
      
      This patch (of 3):
      
      We really want to refer to the "movable_node" kernel command line
      parameter here.
      
      Link: https://lkml.kernel.org/r/20210930144117.23641-2-david@redhat.com
      Fixes: ac3332c4
      
       ("memory-hotplug.rst: complete admin-guide overhaul")
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d83fe3c9
    • Tang Yizhou's avatar
      mm/memory_hotplug: add static qualifier for online_policy_to_str() · ac62554b
      Tang Yizhou authored
      
      
      online_policy_to_str is only used in memory_hotplug.c and should be
      defined as static.
      
      Link: https://lkml.kernel.org/r/20210913024534.26161-1-tangyizhou@huawei.com
      Signed-off-by: default avatarTang Yizhou <tangyizhou@huawei.com>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ac62554b
    • David Hildenbrand's avatar
      selftests/vm: make MADV_POPULATE_(READ|WRITE) use in-tree headers · 39b2e5ca
      David Hildenbrand authored
      
      
      The madv_populate selftest currently builds with a warning when the
      local installed headers (via the distribution) don't include
      MADV_POPULATE_READ and MADV_POPULATE_WRITE.  The warning is correct,
      because the test cannot locate the necessary header.
      
      The reason is that the in-tree installed headers (usr/include) have a
      "linux" instead of a "sys" subdirectory.
      
      Including "linux/mman.h" instead of "sys/mman.h" doesn't work (e.g.,
      mmap() and madvise() are not defined that way).  The only thing that
      seems to work is including "linux/mman.h" in addition to "sys/mman.h".
      
      We can get rid of our availability check and simplify.
      
      Link: https://lkml.kernel.org/r/20211015165758.41374-1-david@redhat.com
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reported-by: default avatarShuah Khan <skhan@linuxfoundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      39b2e5ca
    • Lin Feng's avatar
      mm: vmstat.c: make extfrag_index show more pretty · a9970586
      Lin Feng authored
      
      
      fragmentation_index may return -1000 and the corresponding formated
      value showed by seq_printf will take a negative signatrue, but other
      positive formated values don't take a positive signatrue, so the output
      becomes unaligned.
      
      before:
        Node 0, zone      DMA -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000
        Node 0, zone    DMA32 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000
        Node 0, zone   Normal -1.000 -1.000 -1.000 -1.000 0.931 0.966 0.983 0.992 0.996 0.998 0.999
      
      after this patch:
        Node 0, zone      DMA -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000
        Node 0, zone    DMA32 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000
        Node 0, zone   Normal -1.000 -1.000 -1.000 -1.000  0.931  0.966  0.983  0.992  0.996  0.998  0.999
      
      Link: https://lkml.kernel.org/r/20211019103241.134797-1-linf@wangsu.com
      Signed-off-by: default avatarLin Feng <linf@wangsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a9970586
    • Liu Shixin's avatar
      mm/vmstat: annotate data race for zone->free_area[order].nr_free · af1c31ac
      Liu Shixin authored
      
      
      KCSAN reports a data-race on v5.10 which also exists on mainline:
      
        BUG: KCSAN: data-race in extfrag_for_order+0x33/0x2d0
      
        race at unknown origin, with read to 0xffff9ee9bfffab48 of 8 bytes by task 34 on cpu 1:
         extfrag_for_order+0x33/0x2d0
         kcompactd+0x5f0/0xce0
         kthread+0x1f9/0x220
         ret_from_fork+0x22/0x30
      
        Reported by Kernel Concurrency Sanitizer on:
        CPU: 1 PID: 34 Comm: kcompactd0 Not tainted 5.10.0+ #2
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      
      Access to zone->free_area[order].nr_free in extfrag_for_order() and
      frag_show_print() is lockless.  That's intentional and the stats are a
      rough estimate anyway.  Annotate them with data_race().
      
      [liushixin2@huawei.com: add comments]
        Link: https://lkml.kernel.org/r/20210918084655.2696522-1-liushixin2@huawei.com
      
      Link: https://lkml.kernel.org/r/20210908015606.3999871-1-liushixin2@huawei.com
      Signed-off-by: default avatarLiu Shixin <liushixin2@huawei.com>
      Cc: "Paul E . McKenney" <paulmck@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      af1c31ac
    • Pedro Demarchi Gomes's avatar
      selftests: vm: add KSM huge pages merging time test · 32525489
      Pedro Demarchi Gomes authored
      
      
      Add test case of KSM merging time using mostly huge pages
      
      Link: https://lkml.kernel.org/r/20211013044045.360251-1-pedrodemargomes@gmail.com
      Signed-off-by: default avatarPedro Demarchi Gomes <pedrodemargomes@gmail.com>
      Cc: Zhansaya Bagdauletkyzy <zhansayabagdaulet@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      32525489
    • Aneesh Kumar K.V's avatar
      selftest/vm: fix ksm selftest to run with different NUMA topologies · e3820ab2
      Aneesh Kumar K.V authored
      Platforms can have non-contiguous NUMA nodes like below
      
         #numactl  -H
        available: 2 nodes (0,8)
        .....
        node distances:
        node   0   8
          0:  10  40
          8:  40  10
      
         #numactl  -H
        available: 1 nodes (1)
        ....
        node distances:
        node   1
          1:  10
      
      Hence update the test to not assume the presence of Node 0 and 1 and
      also use numa_num_configured_nodes() instead of numa_max_node for
      finding whether to skip the test.
      
      Link: https://lkml.kernel.org/r/20210914141414.350759-1-aneesh.kumar@linux.ibm.com
      Fixes: 82e717ad
      
       ("selftests: vm: add KSM merging across nodes test")
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Reviewed-by: default avatarPasha Tatashin <pasha.tatashin@soleen.com>
      Cc: Zhansaya Bagdauletkyzy <zhansayabagdaulet@gmail.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Tyler Hicks <tyhicks@linux.microsoft.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e3820ab2
    • Kefeng Wang's avatar
      mm: nommu: kill arch_get_unmapped_area() · 916caa12
      Kefeng Wang authored
      
      
      When nommu, the arch_get_unmapped_area() will not be called, just kill
      it.
      
      Link: https://lkml.kernel.org/r/20210910061906.36299-1-wangkefeng.wang@huawei.com
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      916caa12
    • Lin Feng's avatar
      mm/readahead.c: fix incorrect comments for get_init_ra_size · fb25a77d
      Lin Feng authored
      
      
      In fact, formated values returned by get_init_ra_size are not that
      intuitive.  This patch make the comments reflect its truth.
      
      Link: https://lkml.kernel.org/r/20211019104812.135602-1-linf@wangsu.com
      Signed-off-by: default avatarLin Feng <linf@wangsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fb25a77d
    • Rongwei Wang's avatar
      mm, thp: fix incorrect unmap behavior for private pages · 8468e937
      Rongwei Wang authored
      When truncating pagecache on file THP, the private pages of a process
      should not be unmapped mapping.  This incorrect behavior on a dynamic
      shared libraries which will cause related processes to happen core dump.
      
      A simple test for a DSO (Prerequisite is the DSO mapped in file THP):
      
          int main(int argc, char *argv[])
          {
      	int fd;
      
      	fd = open(argv[1], O_WRONLY);
      	if (fd < 0) {
      		perror("open");
      	}
      
      	close(fd);
      	return 0;
          }
      
      The test only to open a target DSO, and do nothing.  But this operation
      will lead one or more process to happen core dump.  This patch mainly to
      fix this bug.
      
      Link: https://lkml.kernel.org/r/20211025092134.18562-3-rongwei.wang@linux.alibaba.com
      Fixes: eb6ecbed
      
       ("mm, thp: relax the VM_DENYWRITE constraint on file-backed THPs")
      Signed-off-by: default avatarRongwei Wang <rongwei.wang@linux.alibaba.com>
      Tested-by: default avatarXu Yu <xuyu@linux.alibaba.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Song Liu <song@kernel.org>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Collin Fijalkovich <cfijalkovich@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8468e937
    • Rongwei Wang's avatar
      mm, thp: lock filemap when truncating page cache · 55fc0d91
      Rongwei Wang authored
      Patch series "fix two bugs for file THP".
      
      This patch (of 2):
      
      Transparent huge page has supported read-only non-shmem files.  The
      file- backed THP is collapsed by khugepaged and truncated when written
      (for shared libraries).
      
      However, there is a race when multiple writers truncate the same page
      cache concurrently.
      
      In that case, subpage(s) of file THP can be revealed by find_get_entry
      in truncate_inode_pages_range, which will trigger PageTail BUG_ON in
      truncate_inode_page, as follows:
      
          page:000000009e420ff2 refcount:1 mapcount:0 mapping:0000000000000000 index:0x7ff pfn:0x50c3ff
          head:0000000075ff816d order:9 compound_mapcount:0 compound_pincount:0
          flags: 0x37fffe0000010815(locked|uptodate|lru|arch_1|head)
          raw: 37fffe0000000000 fffffe0013108001 dead000000000122 dead000000000400
          raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
          head: 37fffe0000010815 fffffe001066bd48 ffff000404183c20 0000000000000000
          head: 0000000000000600 0000000000000000 00000001ffffffff ffff000c0345a000
          page dumped because: VM_BUG_ON_PAGE(PageTail(page))
          ------------[ cut here ]------------
          kernel BUG at mm/truncate.c:213!
          Internal error: Oops - BUG: 0 [#1] SMP
          Modules linked in: xfs(E) libcrc32c(E) rfkill(E) ...
          CPU: 14 PID: 11394 Comm: check_madvise_d Kdump: ...
          Hardware name: ECS, BIOS 0.0.0 02/06/2015
          pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
          Call trace:
           truncate_inode_page+0x64/0x70
           truncate_inode_pages_range+0x550/0x7e4
           truncate_pagecache+0x58/0x80
           do_dentry_open+0x1e4/0x3c0
           vfs_open+0x38/0x44
           do_open+0x1f0/0x310
           path_openat+0x114/0x1dc
           do_filp_open+0x84/0x134
           do_sys_openat2+0xbc/0x164
           __arm64_sys_openat+0x74/0xc0
           el0_svc_common.constprop.0+0x88/0x220
           do_el0_svc+0x30/0xa0
           el0_svc+0x20/0x30
           el0_sync_handler+0x1a4/0x1b0
           el0_sync+0x180/0x1c0
          Code: aa0103e0 900061e1 910ec021 9400d300 (d4210000)
      
      This patch mainly to lock filemap when one enter truncate_pagecache(),
      avoiding truncating the same page cache concurrently.
      
      Link: https://lkml.kernel.org/r/20211025092134.18562-1-rongwei.wang@linux.alibaba.com
      Link: https://lkml.kernel.org/r/20211025092134.18562-2-rongwei.wang@linux.alibaba.com
      Fixes: eb6ecbed
      
       ("mm, thp: relax the VM_DENYWRITE constraint on file-backed THPs")
      Signed-off-by: default avatarXu Yu <xuyu@linux.alibaba.com>
      Signed-off-by: default avatarRongwei Wang <rongwei.wang@linux.alibaba.com>
      Suggested-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Tested-by: default avatarSong Liu <song@kernel.org>
      Cc: Collin Fijalkovich <cfijalkovich@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      55fc0d91
    • George G. Davis's avatar
      selftests/vm/transhuge-stress: fix ram size thinko · 39cad887
      George G. Davis authored
      When executing transhuge-stress with an argument to specify the virtual
      memory size for testing, the ram size is reported as 0, e.g.
      
        transhuge-stress 384
        thp-mmap: allocate 192 transhuge pages, using 384 MiB virtual memory and 0 MiB of ram
        thp-mmap: 0.184 s/loop, 0.957 ms/page,   2090.265 MiB/s  192 succeed,    0 failed
      
      This appears to be due to a thinko in commit 0085d61f
      ("selftests/vm/transhuge-stress: stress test for memory compaction"),
      where, at a guess, the intent was to base "xyz MiB of ram" on `ram`
      size.
      
      Here are results after using `ram` size:
      
        thp-mmap: allocate 192 transhuge pages, using 384 MiB virtual memory and 14 MiB of ram
      
      Link: https://lkml.kernel.org/r/20210825135843.29052-1-george_davis@mentor.com
      Fixes: 0085d61f
      
       ("selftests/vm/transhuge-stress: stress test for memory compaction")
      Signed-off-by: default avatarGeorge G. Davis <davis.george@siemens.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Eugeniu Rosca <erosca@de.adit-jv.com>
      Cc: Shuah Khan <skhan@linuxfoundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      39cad887
    • Yang Shi's avatar
      mm: migrate: make demotion knob depend on migration · 20f9ba4f
      Yang Shi authored
      
      
      The memory demotion needs to call migrate_pages() to do the jobs.  And
      it is controlled by a knob, however, the knob doesn't depend on
      CONFIG_MIGRATION.  The knob could be truned on even though MIGRATION is
      disabled, this will not cause any crash since migrate_pages() would just
      return -ENOSYS.  But it is definitely not optimal to go through demotion
      path then retry regular swap every time.
      
      And it doesn't make too much sense to have the knob visible to the users
      when !MIGRATION.  Move the related code from mempolicy.[h|c] to
      migrate.[h|c].
      
      Link: https://lkml.kernel.org/r/20211015005559.246709-1-shy828301@gmail.com
      Signed-off-by: default avatarYang Shi <shy828301@gmail.com>
      Acked-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      20f9ba4f
    • John Hubbard's avatar
      mm/migrate: de-duplicate migrate_reason strings · 8eb42bea
      John Hubbard authored
      
      
      In order to remove the need to manually keep three different files in
      synch, provide a common definition of the mapping between enum
      migrate_reason, and the associated strings for each enum item.
      
      1. Use the tracing system's mapping of enums to strings, by redefining
         and reusing the MIGRATE_REASON and supporting macros, and using that
         to populate the string array in mm/debug.c.
      
      2. Move enum migrate_reason to migrate_mode.h. This is not strictly
         necessary for this patch, but migrate mode and migrate reason go
         together, so this will slightly clarify things.
      
      Link: https://lkml.kernel.org/r/20210922041755.141817-2-jhubbard@nvidia.com
      Signed-off-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
      Reviewed-by: default avatarWeizhao Ouyang <o451686892@gmail.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8eb42bea
    • Zhenguo Yao's avatar
      hugetlbfs: extend the definition of hugepages parameter to support node allocation · b5389086
      Zhenguo Yao authored
      
      
      We can specify the number of hugepages to allocate at boot.  But the
      hugepages is balanced in all nodes at present.  In some scenarios, we
      only need hugepages in one node.  For example: DPDK needs hugepages
      which are in the same node as NIC.
      
      If DPDK needs four hugepages of 1G size in node1 and system has 16 numa
      nodes we must reserve 64 hugepages on the kernel cmdline.  But only four
      hugepages are used.  The others should be free after boot.  If the
      system memory is low(for example: 64G), it will be an impossible task.
      
      So extend the hugepages parameter to support specifying hugepages on a
      specific node.  For example add following parameter:
      
        hugepagesz=1G hugepages=0:1,1:3
      
      It will allocate 1 hugepage in node0 and 3 hugepages in node1.
      
      Link: https://lkml.kernel.org/r/20211005054729.86457-1-yaozhenguo1@gmail.com
      Signed-off-by: default avatarZhenguo Yao <yaozhenguo1@gmail.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Zhenguo Yao <yaozhenguo1@gmail.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b5389086
    • Sultan Alsawaf's avatar
      mm: mark the OOM reaper thread as freezable · 3723929e
      Sultan Alsawaf authored
      The OOM reaper alters user address space which might theoretically alter
      the snapshot if reaping is allowed to happen after the freezer quiescent
      state.  To this end, the reaper kthread uses wait_event_freezable()
      while waiting for any work so that it cannot run while the system
      freezes.
      
      However, the current implementation doesn't respect the freezer because
      all kernel threads are created with the PF_NOFREEZE flag, so they are
      automatically excluded from freezing operations.  This means that the
      OOM reaper can race with system snapshotting if it has work to do while
      the system is being frozen.
      
      Fix this by adding a set_freezable() call which will clear the
      PF_NOFREEZE flag and thus make the OOM reaper visible to the freezer.
      
      Please note that the OOM reaper altering the snapshot this way is mostly
      a theoretical concern and has not been observed in practice.
      
      Link: https://lkml.kernel.org/r/20210921165758.6154-1-sultan@kerneltoast.com
      Link: https://lkml.kernel.org/r/20210918233920.9174-1-sultan@kerneltoast.com
      Fixes: aac45363
      
       ("mm, oom: introduce oom reaper")
      Signed-off-by: default avatarSultan Alsawaf <sultan@kerneltoast.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3723929e
    • Mike Rapoport's avatar
      memblock: use memblock_free for freeing virtual pointers · 4421cca0
      Mike Rapoport authored
      
      
      Rename memblock_free_ptr() to memblock_free() and use memblock_free()
      when freeing a virtual pointer so that memblock_free() will be a
      counterpart of memblock_alloc()
      
      The callers are updated with the below semantic patch and manual
      addition of (void *) casting to pointers that are represented by
      unsigned long variables.
      
          @@
          identifier vaddr;
          expression size;
          @@
          (
          - memblock_phys_free(__pa(vaddr), size);
          + memblock_free(vaddr, size);
          |
          - memblock_free_ptr(vaddr, size);
          + memblock_free(vaddr, size);
          )
      
      [sfr@canb.auug.org.au: fixup]
        Link: https://lkml.kernel.org/r/20211018192940.3d1d532f@canb.auug.org.au
      
      Link: https://lkml.kernel.org/r/20210930185031.18648-7-rppt@kernel.org
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4421cca0
    • Mike Rapoport's avatar
      memblock: rename memblock_free to memblock_phys_free · 3ecc6834
      Mike Rapoport authored
      
      
      Since memblock_free() operates on a physical range, make its name
      reflect it and rename it to memblock_phys_free(), so it will be a
      logical counterpart to memblock_phys_alloc().
      
      The callers are updated with the below semantic patch:
      
          @@
          expression addr;
          expression size;
          @@
          - memblock_free(addr, size);
          + memblock_phys_free(addr, size);
      
      Link: https://lkml.kernel.org/r/20210930185031.18648-6-rppt@kernel.org
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3ecc6834
    • Mike Rapoport's avatar
      memblock: stop aliasing __memblock_free_late with memblock_free_late · 621d9739
      Mike Rapoport authored
      
      
      memblock_free_late() is a NOP wrapper for __memblock_free_late(), there
      is no point to keep this indirection.
      
      Drop the wrapper and rename __memblock_free_late() to
      memblock_free_late().
      
      Link: https://lkml.kernel.org/r/20210930185031.18648-5-rppt@kernel.org
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      621d9739
    • Mike Rapoport's avatar
      memblock: drop memblock_free_early_nid() and memblock_free_early() · fa277171
      Mike Rapoport authored
      
      
      memblock_free_early_nid() is unused and memblock_free_early() is an
      alias for memblock_free().
      
      Replace calls to memblock_free_early() with calls to memblock_free() and
      remove memblock_free_early() and memblock_free_early_nid().
      
      Link: https://lkml.kernel.org/r/20210930185031.18648-4-rppt@kernel.org
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fa277171