Skip to content
  1. Jun 20, 2023
    • Vishal Moola (Oracle)'s avatar
      mmzone: introduce folio_is_zone_movable() · 708ff491
      Vishal Moola (Oracle) authored
      
      
      Patch series "Replace is_longterm_pinnable_page()", v2.
      
      This patchset introduces some more helper functions for the folio
      conversions, and converts all callers of is_longterm_pinnable_page() to
      use folios.
      
      
      This patch (of 5):
      
      Introduce folio_is_zone_movable() to act as a folio equivalent for
      is_zone_movable_page().  This is to assist in later folio conversions.
      
      Link: https://lkml.kernel.org/r/20230614021312.34085-1-vishal.moola@gmail.com
      Link: https://lkml.kernel.org/r/20230614021312.34085-2-vishal.moola@gmail.com
      Signed-off-by: default avatarVishal Moola (Oracle) <vishal.moola@gmail.com>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      708ff491
    • Marco Elver's avatar
      kasan: add support for kasan.fault=panic_on_write · 452c03fd
      Marco Elver authored
      
      
      KASAN's boot time kernel parameter 'kasan.fault=' currently supports
      'report' and 'panic', which results in either only reporting bugs or also
      panicking on reports.
      
      However, some users may wish to have more control over when KASAN reports
      result in a kernel panic: in particular, KASAN reported invalid _writes_
      are of special interest, because they have greater potential to corrupt
      random kernel memory or be more easily exploited.
      
      To panic on invalid writes only, introduce 'kasan.fault=panic_on_write',
      which allows users to choose to continue running on invalid reads, but
      panic only on invalid writes.
      
      Link: https://lkml.kernel.org/r/20230614095158.1133673-1-elver@google.com
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Reviewed-by: default avatarAlexander Potapenko <glider@google.com>
      Cc: Aleksandr Nogikh <nogikh@google.com>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Taras Madan <tarasmadan@google.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      452c03fd
    • Sergey Senozhatsky's avatar
      zram: further limit recompression threshold · cb0551ad
      Sergey Senozhatsky authored
      
      
      Recompression threshold should be below huge-size-class watermark.  Any
      object larger than huge-size-class is a "huge object" and occupies a
      whole physical page on the zsmalloc side, in other words it's
      incompressible, as far as zsmalloc is concerned.
      
      Link: https://lkml.kernel.org/r/20230614141338.3480029-1-senozhatsky@chromium.org
      Signed-off-by: default avatarSergey Senozhatsky <senozhatsky@chromium.org>
      Suggested-by: default avatarBrian Geffon <bgeffon@google.com>
      Acked-by: default avatarBrian Geffon <bgeffon@google.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      cb0551ad
    • Domenico Cerasuolo's avatar
      mm: zswap: invaldiate entry after writeback · 418fd29d
      Domenico Cerasuolo authored
      
      
      When an entry started writeback, it used to be invalidated with ref count
      logic alone, meaning that it would stay on the tree until all references
      were put.  The problem with this behavior is that as soon as the writeback
      started, the ownership of the data held by the entry is passed to the
      swapcache and should not be left in zswap too.  Currently there are no
      known issues because of this, but this change explicitly invalidates an
      entry that started writeback to reduce opportunities for future bugs.
      
      This patch is a follow up on the series titled "mm: zswap: move writeback
      LRU from zpool to zswap" + commit f090b7949768("mm: zswap: support
      exclusive loads").
      
      Link: https://lkml.kernel.org/r/20230614143122.74471-1-cerasuolodomenico@gmail.com
      Signed-off-by: default avatarDomenico Cerasuolo <cerasuolodomenico@gmail.com>
      Suggested-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Nhat Pham <nphamcs@gmail.com>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Vitaly Wool <vitaly.wool@konsulko.com>
      Cc: Yosry Ahmed <yosryahmed@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      418fd29d
    • Kefeng Wang's avatar
      mm: kill lock|unlock_page_memcg() · 6c77b607
      Kefeng Wang authored
      Since commit c7c3dec1
      
       ("mm: rmap: remove lock_page_memcg()"),
      no more user, kill lock_page_memcg() and unlock_page_memcg().
      
      Link: https://lkml.kernel.org/r/20230614143612.62575-1-wangkefeng.wang@huawei.com
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6c77b607
    • Kassey Li's avatar
      mm/page_owner/cma: show pfn in cma/page_owner with hex format · 399fd496
      Kassey Li authored
      
      
      cma: display pfn as well as pfn_to_page(pfn)
      
      page_owner: display pfn in hex rather than decimal
      
      Link: https://lkml.kernel.org/r/20230613092533.15449-1-quic_yingangl@quicinc.com
      Signed-off-by: default avatarKassey Li <quic_yingangl@quicinc.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      399fd496
    • Matthew Wilcox (Oracle)'s avatar
      buffer: convert block_truncate_page() to use a folio · 6d68f644
      Matthew Wilcox (Oracle) authored
      
      
      Support large folios in block_truncate_page() and avoid three hidden calls
      to compound_head().
      
      [willy@infradead.org: fix check of filemap_grab_folio() return value]
        Link: https://lkml.kernel.org/r/ZItZOt+XxV12HtzL@casper.infradead.org
      Link: https://lkml.kernel.org/r/20230612210141.730128-15-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6d68f644
    • Matthew Wilcox (Oracle)'s avatar
      buffer: use a folio in __find_get_block_slow() · eee25182
      Matthew Wilcox (Oracle) authored
      
      
      Saves a call to compound_head() and may be needed to support block size >
      PAGE_SIZE.
      
      Link: https://lkml.kernel.org/r/20230612210141.730128-14-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      eee25182
    • Matthew Wilcox (Oracle)'s avatar
      buffer: convert link_dev_buffers to take a folio · 08d84add
      Matthew Wilcox (Oracle) authored
      
      
      Its one caller already has a folio, so switch it to use the folio API. 
      Removes a hidden call to compound_head().
      
      Link: https://lkml.kernel.org/r/20230612210141.730128-13-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      08d84add
    • Matthew Wilcox (Oracle)'s avatar
      buffer: convert init_page_buffers() to folio_init_buffers() · 6f24ce6b
      Matthew Wilcox (Oracle) authored
      
      
      Use the folio API and pass the folio from both callers.  Saves a hidden
      call to compound_head().
      
      Link: https://lkml.kernel.org/r/20230612210141.730128-12-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6f24ce6b
    • Matthew Wilcox (Oracle)'s avatar
      buffer: convert grow_dev_page() to use a folio · 3c98a41c
      Matthew Wilcox (Oracle) authored
      
      
      Get a folio from the page cache instead of a page, then use the folio API
      throughout.  Removes a few calls to compound_head() and may be needed to
      support block size > PAGE_SIZE.
      
      Link: https://lkml.kernel.org/r/20230612210141.730128-11-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3c98a41c
    • Matthew Wilcox (Oracle)'s avatar
      buffer: convert page_zero_new_buffers() to folio_zero_new_buffers() · 4a9622f2
      Matthew Wilcox (Oracle) authored
      
      
      Most of the callers already have a folio; convert reiserfs_write_end() to
      have a folio.  Removes a couple of hidden calls to compound_head().
      
      Link: https://lkml.kernel.org/r/20230612210141.730128-10-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4a9622f2
    • Matthew Wilcox (Oracle)'s avatar
      buffer: convert __block_commit_write() to take a folio · 8c6cb3e3
      Matthew Wilcox (Oracle) authored
      
      
      This removes a hidden call to compound_head() inside
      __block_commit_write() and moves it to those callers which are still page
      based.  Also make block_write_end() safe for large folios.
      
      Link: https://lkml.kernel.org/r/20230612210141.730128-9-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8c6cb3e3
    • Matthew Wilcox (Oracle)'s avatar
      buffer: convert block_page_mkwrite() to use a folio · fe181377
      Matthew Wilcox (Oracle) authored
      
      
      If any page in a folio is dirtied, dirty the entire folio.  Removes a
      number of hidden calls to compound_head() and references to page->mapping
      and page->index.  Fixes a pre-existing bug where we could mark a folio as
      dirty if the file is truncated to a multiple of the page size just as we
      take the page fault.  I don't believe this bug has any bad effect, it's
      just inefficient.
      
      Link: https://lkml.kernel.org/r/20230612210141.730128-8-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      fe181377
    • Matthew Wilcox (Oracle)'s avatar
      buffer: make block_write_full_page() handle large folios correctly · bb0ea598
      Matthew Wilcox (Oracle) authored
      
      
      Keep the interface as struct page, but work entirely on the folio
      internally.  Removes several PAGE_SIZE assumptions and removes some
      references to page->index and page->mapping.
      
      Link: https://lkml.kernel.org/r/20230612210141.730128-7-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Tested-by: default avatarBob Peterson <rpeterso@redhat.com>
      Reviewed-by: default avatarBob Peterson <rpeterso@redhat.com>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      bb0ea598
    • Matthew Wilcox (Oracle)'s avatar
      gfs2: support ludicrously large folios in gfs2_trans_add_databufs() · 285e0fc9
      Matthew Wilcox (Oracle) authored
      
      
      We may someday support folios larger than 4GB, so use a size_t for the
      byte count within a folio to prevent unpleasant truncations.
      
      Link: https://lkml.kernel.org/r/20230612210141.730128-6-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Tested-by: default avatarBob Peterson <rpeterso@redhat.com>
      Reviewed-by: default avatarBob Peterson <rpeterso@redhat.com>
      Reviewed-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      285e0fc9
    • Matthew Wilcox (Oracle)'s avatar
      buffer: convert __block_write_full_page() to __block_write_full_folio() · 53418a18
      Matthew Wilcox (Oracle) authored
      
      
      Remove nine hidden calls to compound_head() by using a folio instead of a
      page.
      
      Link: https://lkml.kernel.org/r/20230612210141.730128-5-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Tested-by: default avatarBob Peterson <rpeterso@redhat.com>
      Reviewed-by: default avatarBob Peterson <rpeterso@redhat.com>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      53418a18
    • Matthew Wilcox (Oracle)'s avatar
      gfs2: convert gfs2_write_jdata_page() to gfs2_write_jdata_folio() · c1401fd1
      Matthew Wilcox (Oracle) authored
      
      
      Add support for large folios and remove some accesses to page->mapping and
      page->index.
      
      Link: https://lkml.kernel.org/r/20230612210141.730128-4-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Tested-by: default avatarBob Peterson <rpeterso@redhat.com>
      Reviewed-by: default avatarBob Peterson <rpeterso@redhat.com>
      Reviewed-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c1401fd1
    • Matthew Wilcox (Oracle)'s avatar
      gfs2: pass a folio to __gfs2_jdata_write_folio() · d0cfcaee
      Matthew Wilcox (Oracle) authored
      
      
      Remove a couple of folio->page conversions in the callers, and two calls
      to compound_head() in the function itself.  Rename it from
      __gfs2_jdata_writepage() to __gfs2_jdata_write_folio().
      
      Link: https://lkml.kernel.org/r/20230612210141.730128-3-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Tested-by: default avatarBob Peterson <rpeterso@redhat.com>
      Reviewed-by: default avatarBob Peterson <rpeterso@redhat.com>
      Reviewed-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d0cfcaee
    • Matthew Wilcox (Oracle)'s avatar
      gfs2: use a folio inside gfs2_jdata_writepage() · c0ba597d
      Matthew Wilcox (Oracle) authored
      
      
      Patch series "gfs2/buffer folio changes for 6.5", v3.
      
      This kind of started off as a gfs2 patch series, then became entwined with
      buffer heads once I realised that gfs2 was the only remaining caller of
      __block_write_full_page().  For those not in the gfs2 world, the big point
      of this series is that block_write_full_page() should now handle large
      folios correctly.
      
      
      This patch (of 14):
      
      Replace a few implicit calls to compound_head() with one explicit one.
      
      Link: https://lkml.kernel.org/r/20230612210141.730128-1-willy@infradead.org
      Link: https://lkml.kernel.org/r/20230612210141.730128-2-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Tested-by: default avatarBob Peterson <rpeterso@redhat.com>
      Reviewed-by: default avatarBob Peterson <rpeterso@redhat.com>
      Reviewed-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c0ba597d
    • Nick Desaulniers's avatar
      mm/khugepaged: use DEFINE_READ_MOSTLY_HASHTABLE macro · e1ad3e66
      Nick Desaulniers authored
      
      
      These are equivalent, but DEFINE_READ_MOSTLY_HASHTABLE exists to define
      a hashtable in the .data..read_mostly section.
      
      Link: https://lkml.kernel.org/r/20230609-khugepage-v1-1-dad4e8382298@google.com
      Signed-off-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e1ad3e66
    • Yu Ma's avatar
      percpu-internal/pcpu_chunk: re-layout pcpu_chunk structure to reduce false sharing · 3a6358c0
      Yu Ma authored
      
      
      When running UnixBench/Execl throughput case, false sharing is observed
      due to frequent read on base_addr and write on free_bytes, chunk_md.
      
      UnixBench/Execl represents a class of workload where bash scripts are
      spawned frequently to do some short jobs.  It will do system call on execl
      frequently, and execl will call mm_init to initialize mm_struct of the
      process.  mm_init will call __percpu_counter_init for percpu_counters
      initialization.  Then pcpu_alloc is called to read the base_addr of
      pcpu_chunk for memory allocation.  Inside pcpu_alloc, it will call
      pcpu_alloc_area to allocate memory from a specified chunk.  This function
      will update "free_bytes" and "chunk_md" to record the rest free bytes and
      other meta data for this chunk.  Correspondingly, pcpu_free_area will also
      update these 2 members when free memory.
      
      Call trace from perf is as below:
      +   57.15%  0.01%  execl   [kernel.kallsyms] [k] __percpu_counter_init
      +   57.13%  0.91%  execl   [kernel.kallsyms] [k] pcpu_alloc
      -   55.27% 54.51%  execl   [kernel.kallsyms] [k] osq_lock
         - 53.54% 0x654278696e552f34
              main
              __execve
              entry_SYSCALL_64_after_hwframe
              do_syscall_64
              __x64_sys_execve
              do_execveat_common.isra.47
              alloc_bprm
              mm_init
              __percpu_counter_init
              pcpu_alloc
            - __mutex_lock.isra.17
      
      In current pcpu_chunk layout, `base_addr' is in the same cache line with
      `free_bytes' and `chunk_md', and `base_addr' is at the last 8 bytes.  This
      patch moves `bound_map' up to `base_addr', to let `base_addr' locate in a
      new cacheline.
      
      With this change, on Intel Sapphire Rapids 112C/224T platform, based on
      v6.4-rc4, the 160 parallel score improves by 24%.
      
      The pcpu_chunk struct is a backing data structure per chunk, so the
      additional memory should not be dramatic.  A chunk covers ballpark
      between 64kb and 512kb memory depending on some config and boot time
      stuff, so I believe the additional memory used here is nominal at best.
      
      Working the #s on my desktop:
      Percpu:            58624 kB
      28 cores -> ~2.1MB of percpu memory.
      At say ~128KB per chunk -> 33 chunks, generously 40 chunks.
      Adding alignment might bump the chunk size ~64 bytes, so in total ~2KB
      of overhead?
      
      I believe we can do a little better to avoid eating that full padding,
      so likely less than that.
      
      [dennis@kernel.org: changelog details]
      Link: https://lkml.kernel.org/r/20230610030730.110074-1-yu.ma@intel.com
      Signed-off-by: default avatarYu Ma <yu.ma@intel.com>
      Reviewed-by: default avatarTim Chen <tim.c.chen@linux.intel.com>
      Acked-by: default avatarDennis Zhou <dennis@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3a6358c0
    • Miaohe Lin's avatar
      memory tier: remove unneeded !IS_ENABLED(CONFIG_MIGRATION) check · 33ee4f18
      Miaohe Lin authored
      
      
      establish_demotion_targets() is defined while CONFIG_MIGRATION is
      enabled. There's no need to check it again.
      
      Link: https://lkml.kernel.org/r/20230610034114.981861-1-linmiaohe@huawei.com
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      33ee4f18
    • Miaohe Lin's avatar
      mm: compaction: mark kcompactd_run() and kcompactd_stop() __meminit · 833dfc00
      Miaohe Lin authored
      
      
      Add __meminit to kcompactd_run() and kcompactd_stop() to ensure they're
      default to __init when memory hotplug is not enabled.
      
      Link: https://lkml.kernel.org/r/20230610034615.997813-1-linmiaohe@huawei.com
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      833dfc00
    • YueHaibing's avatar
      mm: remove unused vma_init_lock() · e4d86756
      YueHaibing authored
      commit c7f8f31c
      
       ("mm: separate vma->lock from vm_area_struct")
      left this behind.
      
      Link: https://lkml.kernel.org/r/20230610101956.20592-1-yuehaibing@huawei.com
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e4d86756
    • YueHaibing's avatar
      kernel: pid_namespace: remove unused set_memfd_noexec_scope() · 3efd33b7
      YueHaibing authored
      
      
      This inline function is unused, remove it.
      
      Link: https://lkml.kernel.org/r/20230610102858.31488-1-yuehaibing@huawei.com
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Cc: Jeff Xu <jeffxu@google.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3efd33b7
    • Liam R. Howlett's avatar
      userfaultfd: fix regression in userfaultfd_unmap_prep() · 65ac1320
      Liam R. Howlett authored
      Android reported a performance regression in the userfaultfd unmap path. 
      A closer inspection on the userfaultfd_unmap_prep() change showed that a
      second tree walk would be necessary in the reworked code.
      
      Fix the regression by passing each VMA that will be unmapped through to
      the userfaultfd_unmap_prep() function as they are added to the unmap list,
      instead of re-walking the tree for the VMA.
      
      Link: https://lkml.kernel.org/r/20230601015402.2819343-1-Liam.Howlett@oracle.com
      Fixes: 69dbe6da
      
       ("userfaultfd: use maple tree iterator to iterate VMAs")
      Signed-off-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Reported-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Suggested-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      65ac1320
    • Tarun Sahu's avatar
      mm/folio: replace set_compound_order with folio_set_order · 1e3be485
      Tarun Sahu authored
      
      
      The patch ("mm/folio: Avoid special handling for order value 0 in
      folio_set_order") [1] removed the need for special handling of order = 0
      in folio_set_order.  Now, folio_set_order and set_compound_order becomes
      similar function.  This patch removes the set_compound_order and uses
      folio_set_order instead.
      
      [1] https://lore.kernel.org/all/20230609183032.13E08C433D2@smtp.kernel.org/
      
      Link: https://lkml.kernel.org/r/20230612093514.689846-1-tsahu@linux.ibm.com
      Signed-off-by: default avatarTarun Sahu <tsahu@linux.ibm.com>
      Reviewed-by Sidhartha Kumar <sidhartha.kumar@oracle.com>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1e3be485
    • Domenico Cerasuolo's avatar
      mm: zswap: remove zswap_header · 0bb48849
      Domenico Cerasuolo authored
      
      
      Previously, zswap_header served the purpose of storing the swpentry within
      zpool pages.  This allowed zpool implementations to pass relevant
      information to the writeback function.  However, with the current
      implementation, writeback is directly handled within zswap.  Consequently,
      there is no longer a necessity for zswap_header, as the swp_entry_t can be
      stored directly in zswap_entry.
      
      Link: https://lkml.kernel.org/r/20230612093815.133504-8-cerasuolodomenico@gmail.com
      Signed-off-by: default avatarDomenico Cerasuolo <cerasuolodomenico@gmail.com>
      Tested-by: default avatarYosry Ahmed <yosryahmed@google.com>
      Suggested-by: default avatarYosry Ahmed <yosryahmed@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Nhat Pham <nphamcs@gmail.com>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Vitaly Wool <vitaly.wool@konsulko.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0bb48849
    • Domenico Cerasuolo's avatar
      mm: zswap: simplify writeback function · ff9d5ba2
      Domenico Cerasuolo authored
      
      
      zswap_writeback_entry() used to be a callback for the backends, which
      don't know about struct zswap_entry.
      
      Now that the only user is the generic zswap LRU reclaimer, it can be
      simplified: pass the pinned zswap_entry directly, and consolidate the
      refcount management in the shrink function.
      
      Link: https://lkml.kernel.org/r/20230612093815.133504-7-cerasuolodomenico@gmail.com
      Signed-off-by: default avatarDomenico Cerasuolo <cerasuolodomenico@gmail.com>
      Tested-by: default avatarYosry Ahmed <yosryahmed@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Nhat Pham <nphamcs@gmail.com>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Vitaly Wool <vitaly.wool@konsulko.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ff9d5ba2
    • Domenico Cerasuolo's avatar
      mm: zswap: remove shrink from zpool interface · 35499e2b
      Domenico Cerasuolo authored
      
      
      Now that all three zswap backends have removed their shrink code, it is
      no longer necessary for the zpool interface to include shrink/writeback
      endpoints.
      
      Link: https://lkml.kernel.org/r/20230612093815.133504-6-cerasuolodomenico@gmail.com
      Signed-off-by: default avatarDomenico Cerasuolo <cerasuolodomenico@gmail.com>
      Reviewed-by: default avatarYosry Ahmed <yosryahmed@google.com>
      Acked-by: default avatarNhat Pham <nphamcs@gmail.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarSergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Vitaly Wool <vitaly.wool@konsulko.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      35499e2b
    • Domenico Cerasuolo's avatar
      mm: zswap: remove page reclaim logic from zsmalloc · b3067742
      Domenico Cerasuolo authored
      
      
      Switch zsmalloc to the new generic zswap LRU and remove its custom
      implementation.
      
      Link: https://lkml.kernel.org/r/20230612093815.133504-5-cerasuolodomenico@gmail.com
      Signed-off-by: default avatarDomenico Cerasuolo <cerasuolodomenico@gmail.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarNhat Pham <nphamcs@gmail.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Tested-by: default avatarYosry Ahmed <yosryahmed@google.com>
      Acked-by: default avatarSergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Vitaly Wool <vitaly.wool@konsulko.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b3067742
    • Domenico Cerasuolo's avatar
      mm: zswap: remove page reclaim logic from z3fold · e774a7bc
      Domenico Cerasuolo authored
      
      
      Switch z3fold to the new generic zswap LRU and remove its custom
      implementation.
      
      Link: https://lkml.kernel.org/r/20230612093815.133504-4-cerasuolodomenico@gmail.com
      Signed-off-by: default avatarDomenico Cerasuolo <cerasuolodomenico@gmail.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Nhat Pham <nphamcs@gmail.com>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Vitaly Wool <vitaly.wool@konsulko.com>
      Cc: Yosry Ahmed <yosryahmed@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e774a7bc
    • Domenico Cerasuolo's avatar
      mm: zswap: remove page reclaim logic from zbud · 1be537c6
      Domenico Cerasuolo authored
      
      
      Switch zbud to the new generic zswap LRU and remove its custom
      implementation.
      
      Link: https://lkml.kernel.org/r/20230612093815.133504-3-cerasuolodomenico@gmail.com
      Signed-off-by: default avatarDomenico Cerasuolo <cerasuolodomenico@gmail.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Nhat Pham <nphamcs@gmail.com>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Vitaly Wool <vitaly.wool@konsulko.com>
      Cc: Yosry Ahmed <yosryahmed@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1be537c6
    • Domenico Cerasuolo's avatar
      mm: zswap: add pool shrinking mechanism · f999f38b
      Domenico Cerasuolo authored
      
      
      Patch series "mm: zswap: move writeback LRU from zpool to zswap", v3.
      
      This series aims to improve the zswap reclaim mechanism by reorganizing
      the LRU management. In the current implementation, the LRU is maintained
      within each zpool driver, resulting in duplicated code across the three
      drivers. The proposed change consists in moving the LRU management from
      the individual implementations up to the zswap layer.
      
      The primary objective of this refactoring effort is to simplify the
      codebase. By unifying the reclaim loop and consolidating LRU handling
      within zswap, we can eliminate redundant code and improve
      maintainability. Additionally, this change enables the reclamation of
      stored pages in their actual LRU order. Presently, the zpool drivers
      link backing pages in an LRU, causing compressed pages with different
      LRU positions to be written back simultaneously.
      
      The series consists of several patches. The first patch implements the
      LRU and the reclaim loop in zswap, but it is not used yet because all
      three driver implementations are marked as zpool_evictable.
      The following three commits modify each zpool driver to be not
      zpool_evictable, allowing the use of the reclaim loop in zswap.
      As the drivers removed their shrink functions, the zpool interface is
      then trimmed by removing zpool_evictable, zpool_ops, and zpool_shrink.
      Finally, the code in zswap is further cleaned up by simplifying the
      writeback function and removing the now unnecessary zswap_header.
      
      
      This patch (of 7):
      
      Each zpool driver (zbud, z3fold and zsmalloc) implements its own shrink
      function, which is called from zpool_shrink.  However, with this commit, a
      unified shrink function is added to zswap.  The ultimate goal is to
      eliminate the need for zpool_shrink once all zpool implementations have
      dropped their shrink code.
      
      To ensure the functionality of each commit, this change focuses solely on
      adding the mechanism itself.  No modifications are made to the backends,
      meaning that functionally, there are no immediate changes.  The zswap
      mechanism will only come into effect once the backends have removed their
      shrink code.  The subsequent commits will address the modifications needed
      in the backends.
      
      Link: https://lkml.kernel.org/r/20230612093815.133504-1-cerasuolodomenico@gmail.com
      Link: https://lkml.kernel.org/r/20230612093815.133504-2-cerasuolodomenico@gmail.com
      Signed-off-by: default avatarDomenico Cerasuolo <cerasuolodomenico@gmail.com>
      Acked-by: default avatarNhat Pham <nphamcs@gmail.com>
      Tested-by: default avatarYosry Ahmed <yosryahmed@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarYosry Ahmed <yosryahmed@google.com>
      Reviewed-by: default avatarSergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Vitaly Wool <vitaly.wool@konsulko.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f999f38b
    • Muhammad Usama Anjum's avatar
      selftests: mm: remove duplicate unneeded defines · 0183d777
      Muhammad Usama Anjum authored
      
      
      Remove all defines which aren't needed after correctly including the
      kernel header files.
      
      Link: https://lkml.kernel.org/r/20230612095347.996335-2-usama.anjum@collabora.com
      Signed-off-by: default avatarMuhammad Usama Anjum <usama.anjum@collabora.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Stefan Roesch <shr@devkernel.io>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0183d777
    • Muhammad Usama Anjum's avatar
      selftests: mm: remove wrong kernel header inclusion · 1e6d1e36
      Muhammad Usama Anjum authored
      It is wrong to include unprocessed user header files directly.  They are
      processed to "<source_tree>/usr/include" by running "make headers" and
      they are included in selftests by kselftest makefiles automatically with
      help of KHDR_INCLUDES variable.  These headers should always bulilt first
      before building kselftests.
      
      Link: https://lkml.kernel.org/r/20230612095347.996335-1-usama.anjum@collabora.com
      Fixes: 07115fcc
      
       ("selftests/mm: add new selftests for KSM")
      Signed-off-by: default avatarMuhammad Usama Anjum <usama.anjum@collabora.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Stefan Roesch <shr@devkernel.io>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1e6d1e36
    • Ryan Roberts's avatar
      mm: ptep_get() conversion · c33c7948
      Ryan Roberts authored
      
      
      Convert all instances of direct pte_t* dereferencing to instead use
      ptep_get() helper.  This means that by default, the accesses change from a
      C dereference to a READ_ONCE().  This is technically the correct thing to
      do since where pgtables are modified by HW (for access/dirty) they are
      volatile and therefore we should always ensure READ_ONCE() semantics.
      
      But more importantly, by always using the helper, it can be overridden by
      the architecture to fully encapsulate the contents of the pte.  Arch code
      is deliberately not converted, as the arch code knows best.  It is
      intended that arch code (arm64) will override the default with its own
      implementation that can (e.g.) hide certain bits from the core code, or
      determine young/dirty status by mixing in state from another source.
      
      Conversion was done using Coccinelle:
      
      ----
      
      // $ make coccicheck \
      //          COCCI=ptepget.cocci \
      //          SPFLAGS="--include-headers" \
      //          MODE=patch
      
      virtual patch
      
      @ depends on patch @
      pte_t *v;
      @@
      
      - *v
      + ptep_get(v)
      
      ----
      
      Then reviewed and hand-edited to avoid multiple unnecessary calls to
      ptep_get(), instead opting to store the result of a single call in a
      variable, where it is correct to do so.  This aims to negate any cost of
      READ_ONCE() and will benefit arch-overrides that may be more complex.
      
      Included is a fix for an issue in an earlier version of this patch that
      was pointed out by kernel test robot.  The issue arose because config
      MMU=n elides definition of the ptep helper functions, including
      ptep_get().  HUGETLB_PAGE=n configs still define a simple
      huge_ptep_clear_flush() for linking purposes, which dereferences the ptep.
      So when both configs are disabled, this caused a build error because
      ptep_get() is not defined.  Fix by continuing to do a direct dereference
      when MMU=n.  This is safe because for this config the arch code cannot be
      trying to virtualize the ptes because none of the ptep helpers are
      defined.
      
      Link: https://lkml.kernel.org/r/20230612151545.3317766-4-ryan.roberts@arm.com
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Link: https://lore.kernel.org/oe-kbuild-all/202305120142.yXsNEo6H-lkp@intel.com/
      Signed-off-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Dave Airlie <airlied@gmail.com>
      Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c33c7948
    • Ryan Roberts's avatar
      mm: move ptep_get() and pmdp_get() helpers · 6c1d2a07
      Ryan Roberts authored
      
      
      There are many call sites that directly dereference a pte_t pointer.  This
      makes it very difficult to properly encapsulate a page table in the arch
      code without having to allocate shadow page tables.
      
      We will shortly solve this by replacing all the call sites with ptep_get()
      calls.  But there are call sites above the function definition in the
      header file, so let's move ptep_get() to an earlier location to solve that
      problem.  And move pmdp_get() at the same time to keep it close to
      ptep_get().
      
      Link: https://lkml.kernel.org/r/20230612151545.3317766-3-ryan.roberts@arm.com
      Signed-off-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Dave Airlie <airlied@gmail.com>
      Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: kernel test robot <lkp@intel.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6c1d2a07
    • Ryan Roberts's avatar
      mm: ptdump should use ptep_get_lockless() · 426931e7
      Ryan Roberts authored
      
      
      Patch series "Encapsulate PTE contents from non-arch code", v3.
      
      A series to improve the encapsulation of pte entries by disallowing
      non-arch code from directly dereferencing pte_t pointers.
      
      This means that by default, the accesses change from a C dereference to a
      READ_ONCE().  This is technically the correct thing to do since where
      pgtables are modified by HW (for access/dirty) they are volatile and
      therefore we should always ensure READ_ONCE() semantics.
      
      But more importantly, by always using the helper, it can be overridden by
      the architecture to fully encapsulate the contents of the pte.  Arch code
      is deliberately not converted, as the arch code knows best.  It is
      intended that arch code (arm64) will override the default with its own
      implementation that can (e.g.) hide certain bits from the core code, or
      determine young/dirty status by mixing in state from another source.
      
      
      This patch (of 3):
      
      The page table dumper uses walk_page_range_novma() to walk the page
      tables, which does not lock the PTL before calling the pte_entry()
      callback.  Therefore, the page table dumper's callback must use
      ptep_get_lockless() rather than ptep_get() to ensure that the pte it reads
      is not torn or otherwise corrupt when racing with writers.
      
      Link: https://lkml.kernel.org/r/20230612151545.3317766-1-ryan.roberts@arm.com
      Link: https://lkml.kernel.org/r/20230612151545.3317766-2-ryan.roberts@arm.com
      Signed-off-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Dave Airlie <airlied@gmail.com>
      Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: kernel test robot <lkp@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      426931e7