Skip to content
  1. Apr 06, 2023
    • Lorenzo Stoakes's avatar
      fs/proc/kcore: convert read_kcore() to read_kcore_iter() · 46c0d6d0
      Lorenzo Stoakes authored
      
      
      For the time being we still use a bounce buffer for vread(), however in
      the next patch we will convert this to interact directly with the iterator
      and eliminate the bounce buffer altogether.
      
      Link: https://lkml.kernel.org/r/ebe12c8d70eebd71f487d80095605f3ad0d1489c.1679511146.git.lstoakes@gmail.com
      Signed-off-by: default avatarLorenzo Stoakes <lstoakes@gmail.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Liu Shixin <liushixin2@huawei.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      46c0d6d0
    • Lorenzo Stoakes's avatar
      fs/proc/kcore: avoid bounce buffer for ktext data · 2e1c0170
      Lorenzo Stoakes authored
      Patch series "convert read_kcore(), vread() to use iterators", v8.
      
      While reviewing Baoquan's recent changes to permit vread() access to
      vm_map_ram regions of vmalloc allocations, Willy pointed out [1] that it
      would be nice to refactor vread() as a whole, since its only user is
      read_kcore() and the existing form of vread() necessitates the use of a
      bounce buffer.
      
      This patch series does exactly that, as well as adjusting how we read the
      kernel text section to avoid the use of a bounce buffer in this case as
      well.
      
      This has been tested against the test case which motivated Baoquan's
      changes in the first place [2] which continues to function correctly, as
      do the vmalloc self tests.
      
      
      This patch (of 4):
      
      Commit df04abfd
      
       ("fs/proc/kcore.c: Add bounce buffer for ktext data")
      introduced the use of a bounce buffer to retrieve kernel text data for
      /proc/kcore in order to avoid failures arising from hardened user copies
      enabled by CONFIG_HARDENED_USERCOPY in check_kernel_text_object().
      
      We can avoid doing this if instead of copy_to_user() we use
      _copy_to_user() which bypasses the hardening check.  This is more
      efficient than using a bounce buffer and simplifies the code.
      
      We do so as part an overall effort to eliminate bounce buffer usage in the
      function with an eye to converting it an iterator read.
      
      Link: https://lkml.kernel.org/r/cover.1679566220.git.lstoakes@gmail.com
      Link: https://lore.kernel.org/all/Y8WfDSRkc%2FOHP3oD@casper.infradead.org/ [1]
      Link: https://lore.kernel.org/all/87ilk6gos2.fsf@oracle.com/T/#u [2]
      Link: https://lkml.kernel.org/r/fd39b0bfa7edc76d360def7d034baaee71d90158.1679511146.git.lstoakes@gmail.com
      Signed-off-by: default avatarLorenzo Stoakes <lstoakes@gmail.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Liu Shixin <liushixin2@huawei.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2e1c0170
    • Kirill A. Shutemov's avatar
      mm/page_alloc: make deferred page init free pages in MAX_ORDER blocks · 3f6dac0f
      Kirill A. Shutemov authored
      
      
      Normal page init path frees pages during the boot in MAX_ORDER chunks, but
      deferred page init path does it in pageblock blocks.
      
      Change deferred page init path to work in MAX_ORDER blocks.
      
      For cases when MAX_ORDER is larger than pageblock, set migrate type to
      MIGRATE_MOVABLE for all pageblocks covered by the page.
      
      Link: https://lkml.kernel.org/r/20230321002415.20843-1-kirill.shutemov@linux.intel.com
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3f6dac0f
    • Lorenzo Stoakes's avatar
      drm/ttm: remove comment referencing now-removed vmf_insert_mixed_prot() · 4a06f6f3
      Lorenzo Stoakes authored
      
      
      This function no longer exists, however the prot != vma->vm_page_prot case
      discussion has been retained and moved to vmf_insert_pfn_prot() so refer
      to this instead.
      
      Link: https://lkml.kernel.org/r/db403b3622b94a87bd93528cc1d6b44ae88adcdd.1678661628.git.lstoakes@gmail.com
      Signed-off-by: default avatarLorenzo Stoakes <lstoakes@gmail.com>
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Aaron Tomlin <atomlin@atomlin.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Frederic Weisbecker <frederic@kernel.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: "Russell King (Oracle)" <linux@armlinux.org.uk>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4a06f6f3
    • Lorenzo Stoakes's avatar
      mm: remove vmf_insert_pfn_xxx_prot() for huge page-table entries · 7b806d22
      Lorenzo Stoakes authored
      This functionality's sole user, the drm ttm module, removed support for it
      in commit 0d979509
      
       ("drm/ttm: remove ttm_bo_vm_insert_huge()") as the
      whole approach is currently unworkable without a PMD/PUD special bit and
      updates to GUP.
      
      Link: https://lkml.kernel.org/r/604c2ad79659d4b8a6e3e1611c6219d5d3233988.1678661628.git.lstoakes@gmail.com
      Signed-off-by: default avatarLorenzo Stoakes <lstoakes@gmail.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Aaron Tomlin <atomlin@atomlin.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Frederic Weisbecker <frederic@kernel.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: "Russell King (Oracle)" <linux@armlinux.org.uk>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7b806d22
    • Lorenzo Stoakes's avatar
      mm: remove unused vmf_insert_mixed_prot() · 28d8b812
      Lorenzo Stoakes authored
      Patch series "Remove drm/ttm-specific mm changes".
      
      Functionality was added specifically for the DRM TTM driver to support
      mapping memory for VM_MIXEDMAP VMAs with customised protection flags,
      however this has now been rolled back as issues were found with this
      approach.
      
      This series removes the mm changes too, retaining some of the useful
      comments.
      
      
      This patch (of 3):
      
      The sole user of vmf_insert_mixed_prot(), the drm ttm module, stopped
      using this in commit f91142c6
      
       ("drm/ttm: nuke VM_MIXEDMAP on BO
      mappings v3") citing use of VM_MIXEDMAP in this case being terribly
      broken.
      
      Remove this now-dead code and references to it, but retain the useful
      description of the prot != vma->vm_page_prot case, moving it to
      vmf_insert_pfn_prot() instead.
      
      Link: https://lkml.kernel.org/r/cover.1678661628.git.lstoakes@gmail.com
      Link: https://lkml.kernel.org/r/a069644388e6f1593a7020d15840e6fc9f39bcaf.1678661628.git.lstoakes@gmail.com
      Signed-off-by: default avatarLorenzo Stoakes <lstoakes@gmail.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Aaron Tomlin <atomlin@atomlin.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Frederic Weisbecker <frederic@kernel.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: "Russell King (Oracle)" <linux@armlinux.org.uk>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      28d8b812
    • Tomas Mudrunka's avatar
      mm/memtest: add results of early memtest to /proc/meminfo · bd23024b
      Tomas Mudrunka authored
      
      
      Currently the memtest results were only presented in dmesg.
      
      When running a large fleet of devices without ECC RAM it's currently not
      easy to do bulk monitoring for memory corruption.  You have to parse
      dmesg, but that's a ring buffer so the error might disappear after some
      time.  In general I do not consider dmesg to be a great API to query RAM
      status.
      
      In several companies I've seen such errors remain undetected and cause
      issues for way too long.  So I think it makes sense to provide a
      monitoring API, so that we can safely detect and act upon them.
      
      This adds /proc/meminfo entry which can be easily used by scripts.
      
      Link: https://lkml.kernel.org/r/20230321103430.7130-1-tomas.mudrunka@gmail.com
      Signed-off-by: default avatarTomas Mudrunka <tomas.mudrunka@gmail.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      bd23024b
    • Mike Rapoport (IBM)'s avatar
      MAINTAINERS: extend memblock entry to include MM initialization · c9bb5273
      Mike Rapoport (IBM) authored
      
      
      and add mm/mm_init.c to memblock entry in MAINTAINERS
      
      Link: https://lkml.kernel.org/r/20230321170513.2401534-15-rppt@kernel.org
      Signed-off-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Doug Berger <opendmb@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c9bb5273
    • Mike Rapoport (IBM)'s avatar
      mm: move vmalloc_init() declaration to mm/internal.h · b6714911
      Mike Rapoport (IBM) authored
      
      
      vmalloc_init() is called only from mm_core_init(), there is no need to
      declare it in include/linux/vmalloc.h
      
      Move vmalloc_init() declaration to mm/internal.h
      
      Link: https://lkml.kernel.org/r/20230321170513.2401534-14-rppt@kernel.org
      Signed-off-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Doug Berger <opendmb@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b6714911
    • Mike Rapoport (IBM)'s avatar
      mm: move kmem_cache_init() declaration to mm/slab.h · d5d2c02a
      Mike Rapoport (IBM) authored
      
      
      kmem_cache_init() is called only from mm_core_init(), there is no need to
      declare it in include/linux/slab.h
      
      Move kmem_cache_init() declaration to mm/slab.h
      
      Link: https://lkml.kernel.org/r/20230321170513.2401534-13-rppt@kernel.org
      Signed-off-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Doug Berger <opendmb@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d5d2c02a
    • Mike Rapoport (IBM)'s avatar
      mm: move mem_init_print_info() to mm_init.c · eb8589b4
      Mike Rapoport (IBM) authored
      
      
      mem_init_print_info() is only called from mm_core_init().
      
      Move it close to the caller and make it static.
      
      Link: https://lkml.kernel.org/r/20230321170513.2401534-12-rppt@kernel.org
      Signed-off-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Doug Berger <opendmb@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      eb8589b4
    • Mike Rapoport (IBM)'s avatar
      init,mm: fold late call to page_ext_init() to page_alloc_init_late() · de57807e
      Mike Rapoport (IBM) authored
      
      
      When deferred initialization of struct pages is enabled, page_ext_init()
      must be called after all the deferred initialization is done, but there is
      no point to keep it a separate call from kernel_init_freeable() right
      after page_alloc_init_late().
      
      Fold the call to page_ext_init() into page_alloc_init_late() and localize
      deferred_struct_pages variable.
      
      Link: https://lkml.kernel.org/r/20230321170513.2401534-11-rppt@kernel.org
      Signed-off-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Doug Berger <opendmb@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      de57807e
    • Mike Rapoport (IBM)'s avatar
      mm: move init_mem_debugging_and_hardening() to mm/mm_init.c · f2fc4b44
      Mike Rapoport (IBM) authored
      
      
      init_mem_debugging_and_hardening() is only called from mm_core_init().
      
      Move it close to the caller, make it static and rename it to
      mem_debugging_and_hardening_init() for consistency with surrounding
      convention.
      
      Link: https://lkml.kernel.org/r/20230321170513.2401534-10-rppt@kernel.org
      Signed-off-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Doug Berger <opendmb@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f2fc4b44
    • Mike Rapoport (IBM)'s avatar
      mm: call {ptlock,pgtable}_cache_init() directly from mm_core_init() · 4cd1e9ed
      Mike Rapoport (IBM) authored
      
      
      and drop pgtable_init() as it has no real value and its name is
      misleading.
      
      Link: https://lkml.kernel.org/r/20230321170513.2401534-9-rppt@kernel.org
      Signed-off-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Doug Berger <opendmb@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Sergei Shtylyov <sergei.shtylyov@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4cd1e9ed
    • Mike Rapoport (IBM)'s avatar
      init,mm: move mm_init() to mm/mm_init.c and rename it to mm_core_init() · b7ec1bf3
      Mike Rapoport (IBM) authored
      
      
      Make mm_init() a part of mm/ codebase.  mm_core_init() better describes
      what the function does and does not clash with mm_init() in kernel/fork.c
      
      Link: https://lkml.kernel.org/r/20230321170513.2401534-8-rppt@kernel.org
      Signed-off-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Doug Berger <opendmb@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b7ec1bf3
    • Mike Rapoport (IBM)'s avatar
      init: fold build_all_zonelists() and page_alloc_init_cpuhp() to mm_init() · 9cca1839
      Mike Rapoport (IBM) authored
      
      
      Both build_all_zonelists() and page_alloc_init_cpuhp() must be called
      after SMP setup is complete but before the page allocator is set up.
      
      Still, they both are a part of memory management initialization, so move
      them to mm_init().
      
      Link: https://lkml.kernel.org/r/20230321170513.2401534-7-rppt@kernel.org
      Signed-off-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Doug Berger <opendmb@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9cca1839
    • Mike Rapoport (IBM)'s avatar
      mm/page_alloc: rename page_alloc_init() to page_alloc_init_cpuhp() · c4fbed4b
      Mike Rapoport (IBM) authored
      
      
      The page_alloc_init() name is really misleading because all this function
      does is sets up CPU hotplug callbacks for the page allocator.
      
      Rename it to page_alloc_init_cpuhp() so that name will reflect what the
      function does.
      
      Link: https://lkml.kernel.org/r/20230321170513.2401534-6-rppt@kernel.org
      Signed-off-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Doug Berger <opendmb@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c4fbed4b
    • Mike Rapoport (IBM)'s avatar
      mm: handle hashdist initialization in mm/mm_init.c · 534ef4e1
      Mike Rapoport (IBM) authored
      
      
      The hashdist variable must be initialized before the first call to
      alloc_large_system_hash() and free_area_init() looks like a better place
      for it than page_alloc_init().
      
      Move hashdist handling to mm/mm_init.c
      
      Link: https://lkml.kernel.org/r/20230321170513.2401534-5-rppt@kernel.org
      Signed-off-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Doug Berger <opendmb@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      534ef4e1
    • Mike Rapoport (IBM)'s avatar
      mm: move most of core MM initialization to mm/mm_init.c · 9420f89d
      Mike Rapoport (IBM) authored
      
      
      The bulk of memory management initialization code is spread all over
      mm/page_alloc.c and makes navigating through page allocator functionality
      difficult.
      
      Move most of the functions marked __init and __meminit to mm/mm_init.c to
      make it better localized and allow some more spare room before
      mm/page_alloc.c reaches 10k lines.
      
      No functional changes.
      
      Link: https://lkml.kernel.org/r/20230321170513.2401534-4-rppt@kernel.org
      Signed-off-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Doug Berger <opendmb@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9420f89d
    • Mike Rapoport (IBM)'s avatar
      mm/page_alloc: add helper for checking if check_pages_enabled · fce0b421
      Mike Rapoport (IBM) authored
      
      
      Instead of duplicating long static_branch_enabled(&check_pages_enabled)
      wrap it in a helper function is_check_pages_enabled()
      
      Link: https://lkml.kernel.org/r/20230321170513.2401534-3-rppt@kernel.org
      Signed-off-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Doug Berger <opendmb@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      fce0b421
    • Mike Rapoport (IBM)'s avatar
      mips: fix comment about pgtable_init() · 12b9ac6d
      Mike Rapoport (IBM) authored
      
      
      Patch series "mm: move core MM initialization to mm/mm_init.c", v2.
      
      This set moves most of the core MM initialization to mm/mm_init.c.
      
      This largely includes free_area_init() and its helpers, functions used at
      boot time, mm_init() from init/main.c and some of the functions it calls.
      
      Aside from gaining some more space before mm/page_alloc.c hits 10k lines,
      this makes mm/page_alloc.c to be mostly about buddy allocator and moves
      the init code out of the way, which IMO improves maintainability.
      
      Besides, this allows to move a couple of declarations out of include/linux
      and make them private to mm/.
      
      And as an added bonus there a slight decrease in vmlinux size.  For
      tinyconfig and defconfig on x86 I've got
      
      tinyconfig:
         text	   data	    bss	    dec	    hex	filename
       853206	 289376	1200128	2342710	 23bf36	a/vmlinux
       853198	 289344	1200128	2342670	 23bf0e	b/vmlinux
      
      defconfig:
          text   	   data	    bss	    dec	    	    hex	filename
      26152959	9730634	2170884	38054477	244aa4d	a/vmlinux
      26152945	9730602	2170884	38054431	244aa1f	b/vmlinux
      
      
      This patch (of 14):
      
      Comment about fixrange_init() says that its called from pgtable_init()
      while the actual caller is pagetabe_init().
      
      Update comment to match the code.
      
      Link: https://lkml.kernel.org/r/20230321170513.2401534-1-rppt@kernel.org
      Link: https://lkml.kernel.org/r/20230321170513.2401534-2-rppt@kernel.org
      Signed-off-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarPhilippe Mathieu-Daud <philmd@linaro.org>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Doug Berger <opendmb@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      12b9ac6d
    • Lorenzo Stoakes's avatar
      MAINTAINERS: add Lorenzo as vmalloc reviewer · 307eecd5
      Lorenzo Stoakes authored
      
      
      I have recently been involved in both reviewing and submitting patches to
      the vmalloc code in mm and would be willing and happy to help out with
      review going forward if it would be helpful!
      
      Link: https://lkml.kernel.org/r/55f663af6100c84a71a0065ac0ed22463aa340de.1679421959.git.lstoakes@gmail.com
      Signed-off-by: default avatarLorenzo Stoakes <lstoakes@gmail.com>
      Acked-by: default avatarUladzislau Rezki (Sony) <urezki@gmail.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      307eecd5
    • Mike Rapoport (IBM)'s avatar
      mm: move get_page_from_free_area() to mm/page_alloc.c · 5d671eb4
      Mike Rapoport (IBM) authored
      
      
      The get_page_from_free_area() helper is only used in mm/page_alloc.c so
      move it there to reduce noise in include/linux/mmzone.h
      
      Link: https://lkml.kernel.org/r/20230319114214.2133332-1-rppt@kernel.org
      Signed-off-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Reviewed-by: default avatarLorenzo Stoakes <lstoakes@gmail.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      5d671eb4
    • Lorenzo Stoakes's avatar
      mm: prefer fault_around_pages to fault_around_bytes · 53d36a56
      Lorenzo Stoakes authored
      
      
      All use of this value is now at page granularity, so specify the variable
      as such too.  This simplifies the logic.
      
      We maintain the debugfs entry to ensure that there are no user-visible
      changes.
      
      Link: https://lkml.kernel.org/r/4995bad07fe9baa51c786fa0d81819dddfb57654.1679089214.git.lstoakes@gmail.com
      Signed-off-by: default avatarLorenzo Stoakes <lstoakes@gmail.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      53d36a56
    • Lorenzo Stoakes's avatar
      mm: refactor do_fault_around() · 9042599e
      Lorenzo Stoakes authored
      
      
      Patch series "Refactor do_fault_around()"
      
      Refactor do_fault_around() to avoid bitwise tricks and rather difficult to
      follow logic.  Additionally, prefer fault_around_pages to
      fault_around_bytes as the operations are performed at a base page
      granularity.
      
      
      This patch (of 2):
      
      The existing logic is confusing and fails to abstract a number of bitwise
      tricks.
      
      Use ALIGN_DOWN() to perform alignment, pte_index() to obtain a PTE index
      and represent the address range using PTE offsets, which naturally make it
      clear that the operation is intended to occur within only a single PTE and
      prevent spanning of more than one page table.
      
      We rely on the fact that fault_around_bytes will always be page-aligned,
      at least one page in size, a power of two and that it will not exceed
      PAGE_SIZE * PTRS_PER_PTE in size (i.e.  the address space mapped by a
      PTE).  These are all guaranteed by fault_around_bytes_set().
      
      Link: https://lkml.kernel.org/r/cover.1679089214.git.lstoakes@gmail.com
      Link: https://lkml.kernel.org/r/d125db1c3665a63b80cea29d56407825482e2262.1679089214.git.lstoakes@gmail.com
      Signed-off-by: default avatarLorenzo Stoakes <lstoakes@gmail.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9042599e
    • Baolin Wang's avatar
      mm: compaction: fix the possible deadlock when isolating hugetlb pages · 1c06b6a5
      Baolin Wang authored
      When trying to isolate a migratable pageblock, it can contain several
      normal pages or several hugetlb pages (e.g. CONT-PTE 64K hugetlb on arm64)
      in a pageblock. That means we may hold the lru lock of a normal page to
      continue to isolate the next hugetlb page by isolate_or_dissolve_huge_page()
      in the same migratable pageblock.
      
      However in the isolate_or_dissolve_huge_page(), it may allocate a new hugetlb
      page and dissolve the old one by alloc_and_dissolve_hugetlb_folio() if the
      hugetlb's refcount is zero. That means we can still enter the direct compaction
      path to allocate a new hugetlb page under the current lru lock, which
      may cause possible deadlock.
      
      To avoid this possible deadlock, we should release the lru lock when
      trying to isolate a hugetbl page.  Moreover it does not make sense to take
      the lru lock to isolate a hugetlb, which is not in the lru list.
      
      Link: https://lkml.kernel.org/r/7ab3bffebe59fb419234a68dec1e4572a2518563.1678962352.git.baolin.wang@linux.alibaba.com
      Fixes: 369fa227
      
       ("mm: make alloc_contig_range handle free hugetlb pages")
      Signed-off-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: William Lam <william.lam@bytedance.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1c06b6a5
    • Baolin Wang's avatar
      mm: compaction: consider the number of scanning compound pages in isolate fail path · 56d48d8d
      Baolin Wang authored
      commit b717d6b9
      
       ("mm: compaction: include compound page count for
      scanning in pageblock isolation") added compound page statistics for
      scanning in pageblock isolation, to make sure the number of scanned pages
      is always larger than the number of isolated pages when isolating
      mirgratable or free pageblock.
      
      However, when failing to isolate the pages when scanning the migratable or
      free pageblocks, the isolation failure path did not consider the scanning
      statistics of the compound pages, which result in showing the incorrect
      number of scanned pages in tracepoints or in vmstats which will confuse
      people about the page scanning pressure in memory compaction.
      
      Thus we should take into account the number of scanning pages when failing
      to isolate the compound pages to make the statistics accurate.
      
      Link: https://lkml.kernel.org/r/73d6250a90707649cc010731aedc27f946d722ed.1678962352.git.baolin.wang@linux.alibaba.com
      Signed-off-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: William Lam <william.lam@bytedance.com>
      
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      56d48d8d
    • Vlastimil Babka's avatar
      mm/mremap: simplify vma expansion again · 4bfbe371
      Vlastimil Babka authored
      This effectively reverts d014cd7c
      
       ("mm, mremap: fix mremap() expanding
      for vma's with vm_ops->close()").  After the recent changes, vma_merge()
      is able to handle the expansion properly even when the vma being expanded
      has a vm_ops->close operation, so we don't need to special case it
      anymore.
      
      Link: https://lkml.kernel.org/r/20230309111258.24079-11-vbabka@suse.cz
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarLorenzo Stoakes <lstoakes@gmail.com>
      Reviewed-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4bfbe371
    • Vlastimil Babka's avatar
      mm/mmap: start distinguishing if vma can be removed in mergeability test · 714965ca
      Vlastimil Babka authored
      Since pre-git times, is_mergeable_vma() returns false for a vma with
      vm_ops->close, so that no owner assumptions are violated in case the vma
      is removed as part of the merge.
      
      This check is currently very conservative and can prevent merging even
      situations where vma can't be removed, such as simple expansion of
      previous vma, as evidenced by commit d014cd7c ("mm, mremap: fix
      mremap() expanding for vma's with vm_ops->close()")
      
      In order to allow more merging when appropriate and simplify the code that
      was made more complex by commit d014cd7c, start distinguishing cases
      where the vma can be really removed, and allow merging with vm_ops->close
      otherwise.
      
      As a first step, add a may_remove_vma parameter to is_mergeable_vma(). 
      can_vma_merge_before() sets it to true, because when called from
      vma_merge(), a removal of the vma is possible.
      
      In can_vma_merge_after(), pass the parameter as false, because no
      removal can occur in each of its callers:
      - vma_merge() calls it on the 'prev' vma, which is never removed
      - mmap_region() and do_brk_flags() call it to determine if it can expand
        a vma, which is not removed
      
      As a result, vma's with vm_ops->close may now merge with compatible ranges
      in more situations than previously.  We can also revert commit
      d014cd7c
      
       as the next step to simplify mremap code again.
      
      [vbabka@suse.cz: adjust comment as suggested by Lorenzo]
        Link: https://lkml.kernel.org/r/74f2ea6c-f1a9-6dd7-260c-25e660f42379@suse.cz
      Link: https://lkml.kernel.org/r/20230309111258.24079-10-vbabka@suse.cz
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      714965ca
    • Vlastimil Babka's avatar
      mm/mmap/vma_merge: convert mergeability checks to return bool · 2dbf4010
      Vlastimil Babka authored
      
      
      The comments already mention returning 'true' so make the code match them.
      
      Link: https://lkml.kernel.org/r/20230309111258.24079-9-vbabka@suse.cz
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2dbf4010
    • Vlastimil Babka's avatar
      mm/mmap/vma_merge: rename adj_next to adj_start · 1e76454f
      Vlastimil Babka authored
      
      
      The variable 'adj_next' holds the value by which we adjust vm_start of a
      vma in variable 'adjust', that's either 'next' or 'mid', so the current
      name is inaccurate.  Rename it to 'adj_start'.
      
      Link: https://lkml.kernel.org/r/20230309111258.24079-8-vbabka@suse.cz
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1e76454f
    • Vlastimil Babka's avatar
      mm/mmap/vma_merge: set mid to NULL if not applicable · 9e8a39d2
      Vlastimil Babka authored
      
      
      There are several places where we test if 'mid' is really the area NNNN in
      the diagram and the tests have two variants and are non-obvious to follow.
      Instead, set 'mid' to NULL up-front if it's not the NNNN area, and
      simplify the tests.
      
      Also update the description in comment accordingly.
      
      [vbabka@suse.cz: adjust/add comments as suggested by Lorenzo]
        Link: https://lkml.kernel.org/r/def43190-53f7-a607-d1b0-b657565f4288@suse.cz
      Link: https://lkml.kernel.org/r/20230309111258.24079-7-vbabka@suse.cz
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9e8a39d2
    • Vlastimil Babka's avatar
      mm/mmap/vma_merge: initialize mid and next in natural order · 5cd70b96
      Vlastimil Babka authored
      
      
      It is more intuitive to go from prev to mid and then next.  No functional
      change.
      
      Link: https://lkml.kernel.org/r/20230309111258.24079-6-vbabka@suse.cz
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarLorenzo Stoakes <lstoakes@gmail.com>
      Reviewed-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      5cd70b96
    • Vlastimil Babka's avatar
      mm/mmap/vma_merge: use the proper vma pointer in case 4 · 183b7a60
      Vlastimil Babka authored
      
      
      Almost all cases now use the 'next' pointer for the vma following the
      merged area, and the cases diagram shows it as XXXX.  Case 4 is different
      as it uses 'mid' and NNNN, so change it for consistency.  No functional
      change.
      
      Link: https://lkml.kernel.org/r/20230309111258.24079-5-vbabka@suse.cz
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarLorenzo Stoakes <lstoakes@gmail.com>
      Reviewed-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      183b7a60
    • Vlastimil Babka's avatar
      mm/mmap/vma_merge: use the proper vma pointers in cases 1 and 6 · 5ff783f1
      Vlastimil Babka authored
      
      
      Case 1 is now shown in the comment as next vma being merged with prev, so
      use 'next' instead of 'mid'.  In case 1 they both point to the same vma.
      
      As a consequence, in case 6, the dup_anon_vma() is now tried first on
      'next' and then on 'mid', before it was the opposite order.  This is not a
      functional change, as those two vma's cannnot have a different anon_vma,
      as that would have prevented the merging in the first place.
      
      Link: https://lkml.kernel.org/r/20230309111258.24079-4-vbabka@suse.cz
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarLorenzo Stoakes <lstoakes@gmail.com>
      Reviewed-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      5ff783f1
    • Vlastimil Babka's avatar
      mm/mmap/vma_merge: use the proper vma pointer in case 3 · 097d70c6
      Vlastimil Babka authored
      
      
      In case 3 we we use 'next' for everything but vma_pgoff.  So use 'next'
      for that as well, instead of 'mid', for consistency.  Then in case 8 we
      have to use 'mid' explicitly, which should also make the intent more
      obvious.
      
      Adjust the diagram for cases 1-3 in the comment to match the code - we are
      using 'next' for case 3 so mark the range with XXXX instead of NNNN.  For
      case 2 that's a no-op as the code doesn't touch 'next' or 'mid'.  For case
      1 it's now wrong but that will be fixed next.
      
      No functional change.
      
      Link: https://lkml.kernel.org/r/20230309111258.24079-3-vbabka@suse.cz
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarLorenzo Stoakes <lstoakes@gmail.com>
      Reviewed-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      097d70c6
    • Vlastimil Babka's avatar
      mm/mmap/vma_merge: use only primary pointers for preparing merge · 50dac011
      Vlastimil Babka authored
      Patch series "cleanup vma_merge() and improve mergeability tests".
      
      My initial goal here was to try making the check for vm_ops->close in
      is_mergeable_vma() only be applied for vma's that would be truly removed
      as part of the merge (see Patch 9).  This would then allow reverting the
      quick fix d014cd7c
      
       ("mm, mremap: fix mremap() expanding for vma's with
      vm_ops->close()").  This was successful enough to allow the revert (Patch
      10).  Checks using can_vma_merge_before() are still pessimistic about
      possible vma removal, and making them precise would probably complicate
      the vma_merge() code too much.
      
      Liam's 6.3-rc1 simplification of vma_merge() and removal of __vma_adjust()
      was very much helpful in understanding the vma_merge() implementation and
      especially when vma removals can happen, which is now very obvious.  While
      studing the code, I've found ways to make it hopefully even more easy to
      follow, so that's the patches 1-8.  That made me also notice a bug that's
      now already fixed in 6.3-rc1.
      
      
      This patch (of 10):
      
      In the merging preparation part of vma_merge(), some vma pointer variables
      are assigned for later execution of the merge, but also read from in the
      block itself.  The code is easier follow and check against the cases
      diagram in the comment if the code reads only from the "primary" vma
      variables prev, mid, next instead.  No functional change.
      
      Link: https://lkml.kernel.org/r/20230309111258.24079-1-vbabka@suse.cz
      Link: https://lkml.kernel.org/r/20230309111258.24079-2-vbabka@suse.cz
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarLorenzo Stoakes <lstoakes@gmail.com&gt;]>
      Reviewed-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      50dac011
    • Axel Rasmussen's avatar
      mm: userfaultfd: add UFFDIO_CONTINUE_MODE_WP to install WP PTEs · 02891844
      Axel Rasmussen authored
      
      
      UFFDIO_COPY already has UFFDIO_COPY_MODE_WP, so when installing a new PTE
      to resolve a missing fault, one can install a write-protected one.  This
      is useful when using UFFDIO_REGISTER_MODE_{MISSING,WP} in combination.
      
      This was motivated by testing HugeTLB HGM [1], and in particular its
      interaction with userfaultfd features.  Existing userfaultfd code supports
      using WP and MINOR modes together (i.e.  you can register an area with
      both enabled), but without this CONTINUE flag the combination is in
      practice unusable.
      
      So, add an analogous UFFDIO_CONTINUE_MODE_WP, which does the same thing as
      UFFDIO_COPY_MODE_WP, but for *minor* faults.
      
      Update the selftest to do some very basic exercising of the new flag.
      
      Update Documentation/ to describe how these flags are used (neither the
      COPY nor the new CONTINUE versions of this mode flag were described there
      before).
      
      [1]: https://patchwork.kernel.org/project/linux-mm/cover/20230218002819.1486479-1-jthoughton@google.com/
      
      Link: https://lkml.kernel.org/r/20230314221250.682452-5-axelrasmussen@google.com
      Signed-off-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Acked-by: default avatarPeter Xu <peterx@redhat.com>
      Acked-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      02891844
    • Axel Rasmussen's avatar
      mm: userfaultfd: combine 'mode' and 'wp_copy' arguments · d9712937
      Axel Rasmussen authored
      
      
      Many userfaultfd ioctl functions take both a 'mode' and a 'wp_copy'
      argument.  In future commits we plan to plumb the flags through to more
      places, so we'd be proliferating the very long argument list even further.
      
      Let's take the time to simplify the argument list.  Combine the two
      arguments into one - and generalize, so when we add more flags in the
      future, it doesn't imply more function arguments.
      
      Since the modes (copy, zeropage, continue) are mutually exclusive, store
      them as an integer value (0, 1, 2) in the low bits.  Place combine-able
      flag bits in the high bits.
      
      This is quite similar to an earlier patch proposed by Nadav Amit
      ("userfaultfd: introduce uffd_flags" [1]).  The main difference is that
      patch only handled flags, whereas this patch *also* combines the "mode"
      argument into the same type to shorten the argument list.
      
      [1]: https://lore.kernel.org/all/20220619233449.181323-2-namit@vmware.com/
      
      Link: https://lkml.kernel.org/r/20230314221250.682452-4-axelrasmussen@google.com
      Signed-off-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Acked-by: default avatarJames Houghton <jthoughton@google.com>
      Acked-by: default avatarPeter Xu <peterx@redhat.com>
      Acked-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d9712937
    • Axel Rasmussen's avatar
      mm: userfaultfd: don't pass around both mm and vma · 61c50040
      Axel Rasmussen authored
      
      
      Quite a few userfaultfd functions took both mm and vma pointers as
      arguments.  Since the mm is trivially accessible via vma->vm_mm, there's
      no reason to pass both; it just needlessly extends the already long
      argument list.
      
      Get rid of the mm pointer, where possible, to shorten the argument list.
      
      Link: https://lkml.kernel.org/r/20230314221250.682452-3-axelrasmussen@google.com
      Signed-off-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Acked-by: default avatarPeter Xu <peterx@redhat.com>
      Acked-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: James Houghton <jthoughton@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      61c50040