Skip to content
  1. Jun 20, 2023
    • Matthew Wilcox (Oracle)'s avatar
      buffer: convert block_truncate_page() to use a folio · 6d68f644
      Matthew Wilcox (Oracle) authored
      
      
      Support large folios in block_truncate_page() and avoid three hidden calls
      to compound_head().
      
      [willy@infradead.org: fix check of filemap_grab_folio() return value]
        Link: https://lkml.kernel.org/r/ZItZOt+XxV12HtzL@casper.infradead.org
      Link: https://lkml.kernel.org/r/20230612210141.730128-15-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6d68f644
    • Matthew Wilcox (Oracle)'s avatar
      buffer: use a folio in __find_get_block_slow() · eee25182
      Matthew Wilcox (Oracle) authored
      
      
      Saves a call to compound_head() and may be needed to support block size >
      PAGE_SIZE.
      
      Link: https://lkml.kernel.org/r/20230612210141.730128-14-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      eee25182
    • Matthew Wilcox (Oracle)'s avatar
      buffer: convert link_dev_buffers to take a folio · 08d84add
      Matthew Wilcox (Oracle) authored
      
      
      Its one caller already has a folio, so switch it to use the folio API. 
      Removes a hidden call to compound_head().
      
      Link: https://lkml.kernel.org/r/20230612210141.730128-13-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      08d84add
    • Matthew Wilcox (Oracle)'s avatar
      buffer: convert init_page_buffers() to folio_init_buffers() · 6f24ce6b
      Matthew Wilcox (Oracle) authored
      
      
      Use the folio API and pass the folio from both callers.  Saves a hidden
      call to compound_head().
      
      Link: https://lkml.kernel.org/r/20230612210141.730128-12-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6f24ce6b
    • Matthew Wilcox (Oracle)'s avatar
      buffer: convert grow_dev_page() to use a folio · 3c98a41c
      Matthew Wilcox (Oracle) authored
      
      
      Get a folio from the page cache instead of a page, then use the folio API
      throughout.  Removes a few calls to compound_head() and may be needed to
      support block size > PAGE_SIZE.
      
      Link: https://lkml.kernel.org/r/20230612210141.730128-11-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3c98a41c
    • Matthew Wilcox (Oracle)'s avatar
      buffer: convert page_zero_new_buffers() to folio_zero_new_buffers() · 4a9622f2
      Matthew Wilcox (Oracle) authored
      
      
      Most of the callers already have a folio; convert reiserfs_write_end() to
      have a folio.  Removes a couple of hidden calls to compound_head().
      
      Link: https://lkml.kernel.org/r/20230612210141.730128-10-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4a9622f2
    • Matthew Wilcox (Oracle)'s avatar
      buffer: convert __block_commit_write() to take a folio · 8c6cb3e3
      Matthew Wilcox (Oracle) authored
      
      
      This removes a hidden call to compound_head() inside
      __block_commit_write() and moves it to those callers which are still page
      based.  Also make block_write_end() safe for large folios.
      
      Link: https://lkml.kernel.org/r/20230612210141.730128-9-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8c6cb3e3
    • Matthew Wilcox (Oracle)'s avatar
      buffer: convert block_page_mkwrite() to use a folio · fe181377
      Matthew Wilcox (Oracle) authored
      
      
      If any page in a folio is dirtied, dirty the entire folio.  Removes a
      number of hidden calls to compound_head() and references to page->mapping
      and page->index.  Fixes a pre-existing bug where we could mark a folio as
      dirty if the file is truncated to a multiple of the page size just as we
      take the page fault.  I don't believe this bug has any bad effect, it's
      just inefficient.
      
      Link: https://lkml.kernel.org/r/20230612210141.730128-8-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      fe181377
    • Matthew Wilcox (Oracle)'s avatar
      buffer: make block_write_full_page() handle large folios correctly · bb0ea598
      Matthew Wilcox (Oracle) authored
      
      
      Keep the interface as struct page, but work entirely on the folio
      internally.  Removes several PAGE_SIZE assumptions and removes some
      references to page->index and page->mapping.
      
      Link: https://lkml.kernel.org/r/20230612210141.730128-7-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Tested-by: default avatarBob Peterson <rpeterso@redhat.com>
      Reviewed-by: default avatarBob Peterson <rpeterso@redhat.com>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      bb0ea598
    • Matthew Wilcox (Oracle)'s avatar
      gfs2: support ludicrously large folios in gfs2_trans_add_databufs() · 285e0fc9
      Matthew Wilcox (Oracle) authored
      
      
      We may someday support folios larger than 4GB, so use a size_t for the
      byte count within a folio to prevent unpleasant truncations.
      
      Link: https://lkml.kernel.org/r/20230612210141.730128-6-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Tested-by: default avatarBob Peterson <rpeterso@redhat.com>
      Reviewed-by: default avatarBob Peterson <rpeterso@redhat.com>
      Reviewed-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      285e0fc9
    • Matthew Wilcox (Oracle)'s avatar
      buffer: convert __block_write_full_page() to __block_write_full_folio() · 53418a18
      Matthew Wilcox (Oracle) authored
      
      
      Remove nine hidden calls to compound_head() by using a folio instead of a
      page.
      
      Link: https://lkml.kernel.org/r/20230612210141.730128-5-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Tested-by: default avatarBob Peterson <rpeterso@redhat.com>
      Reviewed-by: default avatarBob Peterson <rpeterso@redhat.com>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      53418a18
    • Matthew Wilcox (Oracle)'s avatar
      gfs2: convert gfs2_write_jdata_page() to gfs2_write_jdata_folio() · c1401fd1
      Matthew Wilcox (Oracle) authored
      
      
      Add support for large folios and remove some accesses to page->mapping and
      page->index.
      
      Link: https://lkml.kernel.org/r/20230612210141.730128-4-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Tested-by: default avatarBob Peterson <rpeterso@redhat.com>
      Reviewed-by: default avatarBob Peterson <rpeterso@redhat.com>
      Reviewed-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c1401fd1
    • Matthew Wilcox (Oracle)'s avatar
      gfs2: pass a folio to __gfs2_jdata_write_folio() · d0cfcaee
      Matthew Wilcox (Oracle) authored
      
      
      Remove a couple of folio->page conversions in the callers, and two calls
      to compound_head() in the function itself.  Rename it from
      __gfs2_jdata_writepage() to __gfs2_jdata_write_folio().
      
      Link: https://lkml.kernel.org/r/20230612210141.730128-3-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Tested-by: default avatarBob Peterson <rpeterso@redhat.com>
      Reviewed-by: default avatarBob Peterson <rpeterso@redhat.com>
      Reviewed-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d0cfcaee
    • Matthew Wilcox (Oracle)'s avatar
      gfs2: use a folio inside gfs2_jdata_writepage() · c0ba597d
      Matthew Wilcox (Oracle) authored
      
      
      Patch series "gfs2/buffer folio changes for 6.5", v3.
      
      This kind of started off as a gfs2 patch series, then became entwined with
      buffer heads once I realised that gfs2 was the only remaining caller of
      __block_write_full_page().  For those not in the gfs2 world, the big point
      of this series is that block_write_full_page() should now handle large
      folios correctly.
      
      
      This patch (of 14):
      
      Replace a few implicit calls to compound_head() with one explicit one.
      
      Link: https://lkml.kernel.org/r/20230612210141.730128-1-willy@infradead.org
      Link: https://lkml.kernel.org/r/20230612210141.730128-2-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Tested-by: default avatarBob Peterson <rpeterso@redhat.com>
      Reviewed-by: default avatarBob Peterson <rpeterso@redhat.com>
      Reviewed-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c0ba597d
    • Nick Desaulniers's avatar
      mm/khugepaged: use DEFINE_READ_MOSTLY_HASHTABLE macro · e1ad3e66
      Nick Desaulniers authored
      
      
      These are equivalent, but DEFINE_READ_MOSTLY_HASHTABLE exists to define
      a hashtable in the .data..read_mostly section.
      
      Link: https://lkml.kernel.org/r/20230609-khugepage-v1-1-dad4e8382298@google.com
      Signed-off-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e1ad3e66
    • Yu Ma's avatar
      percpu-internal/pcpu_chunk: re-layout pcpu_chunk structure to reduce false sharing · 3a6358c0
      Yu Ma authored
      
      
      When running UnixBench/Execl throughput case, false sharing is observed
      due to frequent read on base_addr and write on free_bytes, chunk_md.
      
      UnixBench/Execl represents a class of workload where bash scripts are
      spawned frequently to do some short jobs.  It will do system call on execl
      frequently, and execl will call mm_init to initialize mm_struct of the
      process.  mm_init will call __percpu_counter_init for percpu_counters
      initialization.  Then pcpu_alloc is called to read the base_addr of
      pcpu_chunk for memory allocation.  Inside pcpu_alloc, it will call
      pcpu_alloc_area to allocate memory from a specified chunk.  This function
      will update "free_bytes" and "chunk_md" to record the rest free bytes and
      other meta data for this chunk.  Correspondingly, pcpu_free_area will also
      update these 2 members when free memory.
      
      Call trace from perf is as below:
      +   57.15%  0.01%  execl   [kernel.kallsyms] [k] __percpu_counter_init
      +   57.13%  0.91%  execl   [kernel.kallsyms] [k] pcpu_alloc
      -   55.27% 54.51%  execl   [kernel.kallsyms] [k] osq_lock
         - 53.54% 0x654278696e552f34
              main
              __execve
              entry_SYSCALL_64_after_hwframe
              do_syscall_64
              __x64_sys_execve
              do_execveat_common.isra.47
              alloc_bprm
              mm_init
              __percpu_counter_init
              pcpu_alloc
            - __mutex_lock.isra.17
      
      In current pcpu_chunk layout, `base_addr' is in the same cache line with
      `free_bytes' and `chunk_md', and `base_addr' is at the last 8 bytes.  This
      patch moves `bound_map' up to `base_addr', to let `base_addr' locate in a
      new cacheline.
      
      With this change, on Intel Sapphire Rapids 112C/224T platform, based on
      v6.4-rc4, the 160 parallel score improves by 24%.
      
      The pcpu_chunk struct is a backing data structure per chunk, so the
      additional memory should not be dramatic.  A chunk covers ballpark
      between 64kb and 512kb memory depending on some config and boot time
      stuff, so I believe the additional memory used here is nominal at best.
      
      Working the #s on my desktop:
      Percpu:            58624 kB
      28 cores -> ~2.1MB of percpu memory.
      At say ~128KB per chunk -> 33 chunks, generously 40 chunks.
      Adding alignment might bump the chunk size ~64 bytes, so in total ~2KB
      of overhead?
      
      I believe we can do a little better to avoid eating that full padding,
      so likely less than that.
      
      [dennis@kernel.org: changelog details]
      Link: https://lkml.kernel.org/r/20230610030730.110074-1-yu.ma@intel.com
      Signed-off-by: default avatarYu Ma <yu.ma@intel.com>
      Reviewed-by: default avatarTim Chen <tim.c.chen@linux.intel.com>
      Acked-by: default avatarDennis Zhou <dennis@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3a6358c0
    • Miaohe Lin's avatar
      memory tier: remove unneeded !IS_ENABLED(CONFIG_MIGRATION) check · 33ee4f18
      Miaohe Lin authored
      
      
      establish_demotion_targets() is defined while CONFIG_MIGRATION is
      enabled. There's no need to check it again.
      
      Link: https://lkml.kernel.org/r/20230610034114.981861-1-linmiaohe@huawei.com
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      33ee4f18
    • Miaohe Lin's avatar
      mm: compaction: mark kcompactd_run() and kcompactd_stop() __meminit · 833dfc00
      Miaohe Lin authored
      
      
      Add __meminit to kcompactd_run() and kcompactd_stop() to ensure they're
      default to __init when memory hotplug is not enabled.
      
      Link: https://lkml.kernel.org/r/20230610034615.997813-1-linmiaohe@huawei.com
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      833dfc00
    • YueHaibing's avatar
      mm: remove unused vma_init_lock() · e4d86756
      YueHaibing authored
      commit c7f8f31c
      
       ("mm: separate vma->lock from vm_area_struct")
      left this behind.
      
      Link: https://lkml.kernel.org/r/20230610101956.20592-1-yuehaibing@huawei.com
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e4d86756
    • YueHaibing's avatar
      kernel: pid_namespace: remove unused set_memfd_noexec_scope() · 3efd33b7
      YueHaibing authored
      
      
      This inline function is unused, remove it.
      
      Link: https://lkml.kernel.org/r/20230610102858.31488-1-yuehaibing@huawei.com
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Cc: Jeff Xu <jeffxu@google.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3efd33b7
    • Liam R. Howlett's avatar
      userfaultfd: fix regression in userfaultfd_unmap_prep() · 65ac1320
      Liam R. Howlett authored
      Android reported a performance regression in the userfaultfd unmap path. 
      A closer inspection on the userfaultfd_unmap_prep() change showed that a
      second tree walk would be necessary in the reworked code.
      
      Fix the regression by passing each VMA that will be unmapped through to
      the userfaultfd_unmap_prep() function as they are added to the unmap list,
      instead of re-walking the tree for the VMA.
      
      Link: https://lkml.kernel.org/r/20230601015402.2819343-1-Liam.Howlett@oracle.com
      Fixes: 69dbe6da
      
       ("userfaultfd: use maple tree iterator to iterate VMAs")
      Signed-off-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Reported-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Suggested-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      65ac1320
    • Tarun Sahu's avatar
      mm/folio: replace set_compound_order with folio_set_order · 1e3be485
      Tarun Sahu authored
      
      
      The patch ("mm/folio: Avoid special handling for order value 0 in
      folio_set_order") [1] removed the need for special handling of order = 0
      in folio_set_order.  Now, folio_set_order and set_compound_order becomes
      similar function.  This patch removes the set_compound_order and uses
      folio_set_order instead.
      
      [1] https://lore.kernel.org/all/20230609183032.13E08C433D2@smtp.kernel.org/
      
      Link: https://lkml.kernel.org/r/20230612093514.689846-1-tsahu@linux.ibm.com
      Signed-off-by: default avatarTarun Sahu <tsahu@linux.ibm.com>
      Reviewed-by Sidhartha Kumar <sidhartha.kumar@oracle.com>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1e3be485
    • Domenico Cerasuolo's avatar
      mm: zswap: remove zswap_header · 0bb48849
      Domenico Cerasuolo authored
      
      
      Previously, zswap_header served the purpose of storing the swpentry within
      zpool pages.  This allowed zpool implementations to pass relevant
      information to the writeback function.  However, with the current
      implementation, writeback is directly handled within zswap.  Consequently,
      there is no longer a necessity for zswap_header, as the swp_entry_t can be
      stored directly in zswap_entry.
      
      Link: https://lkml.kernel.org/r/20230612093815.133504-8-cerasuolodomenico@gmail.com
      Signed-off-by: default avatarDomenico Cerasuolo <cerasuolodomenico@gmail.com>
      Tested-by: default avatarYosry Ahmed <yosryahmed@google.com>
      Suggested-by: default avatarYosry Ahmed <yosryahmed@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Nhat Pham <nphamcs@gmail.com>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Vitaly Wool <vitaly.wool@konsulko.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0bb48849
    • Domenico Cerasuolo's avatar
      mm: zswap: simplify writeback function · ff9d5ba2
      Domenico Cerasuolo authored
      
      
      zswap_writeback_entry() used to be a callback for the backends, which
      don't know about struct zswap_entry.
      
      Now that the only user is the generic zswap LRU reclaimer, it can be
      simplified: pass the pinned zswap_entry directly, and consolidate the
      refcount management in the shrink function.
      
      Link: https://lkml.kernel.org/r/20230612093815.133504-7-cerasuolodomenico@gmail.com
      Signed-off-by: default avatarDomenico Cerasuolo <cerasuolodomenico@gmail.com>
      Tested-by: default avatarYosry Ahmed <yosryahmed@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Nhat Pham <nphamcs@gmail.com>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Vitaly Wool <vitaly.wool@konsulko.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ff9d5ba2
    • Domenico Cerasuolo's avatar
      mm: zswap: remove shrink from zpool interface · 35499e2b
      Domenico Cerasuolo authored
      
      
      Now that all three zswap backends have removed their shrink code, it is
      no longer necessary for the zpool interface to include shrink/writeback
      endpoints.
      
      Link: https://lkml.kernel.org/r/20230612093815.133504-6-cerasuolodomenico@gmail.com
      Signed-off-by: default avatarDomenico Cerasuolo <cerasuolodomenico@gmail.com>
      Reviewed-by: default avatarYosry Ahmed <yosryahmed@google.com>
      Acked-by: default avatarNhat Pham <nphamcs@gmail.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarSergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Vitaly Wool <vitaly.wool@konsulko.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      35499e2b
    • Domenico Cerasuolo's avatar
      mm: zswap: remove page reclaim logic from zsmalloc · b3067742
      Domenico Cerasuolo authored
      
      
      Switch zsmalloc to the new generic zswap LRU and remove its custom
      implementation.
      
      Link: https://lkml.kernel.org/r/20230612093815.133504-5-cerasuolodomenico@gmail.com
      Signed-off-by: default avatarDomenico Cerasuolo <cerasuolodomenico@gmail.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarNhat Pham <nphamcs@gmail.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Tested-by: default avatarYosry Ahmed <yosryahmed@google.com>
      Acked-by: default avatarSergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Vitaly Wool <vitaly.wool@konsulko.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b3067742
    • Domenico Cerasuolo's avatar
      mm: zswap: remove page reclaim logic from z3fold · e774a7bc
      Domenico Cerasuolo authored
      
      
      Switch z3fold to the new generic zswap LRU and remove its custom
      implementation.
      
      Link: https://lkml.kernel.org/r/20230612093815.133504-4-cerasuolodomenico@gmail.com
      Signed-off-by: default avatarDomenico Cerasuolo <cerasuolodomenico@gmail.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Nhat Pham <nphamcs@gmail.com>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Vitaly Wool <vitaly.wool@konsulko.com>
      Cc: Yosry Ahmed <yosryahmed@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e774a7bc
    • Domenico Cerasuolo's avatar
      mm: zswap: remove page reclaim logic from zbud · 1be537c6
      Domenico Cerasuolo authored
      
      
      Switch zbud to the new generic zswap LRU and remove its custom
      implementation.
      
      Link: https://lkml.kernel.org/r/20230612093815.133504-3-cerasuolodomenico@gmail.com
      Signed-off-by: default avatarDomenico Cerasuolo <cerasuolodomenico@gmail.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Nhat Pham <nphamcs@gmail.com>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Vitaly Wool <vitaly.wool@konsulko.com>
      Cc: Yosry Ahmed <yosryahmed@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1be537c6
    • Domenico Cerasuolo's avatar
      mm: zswap: add pool shrinking mechanism · f999f38b
      Domenico Cerasuolo authored
      
      
      Patch series "mm: zswap: move writeback LRU from zpool to zswap", v3.
      
      This series aims to improve the zswap reclaim mechanism by reorganizing
      the LRU management. In the current implementation, the LRU is maintained
      within each zpool driver, resulting in duplicated code across the three
      drivers. The proposed change consists in moving the LRU management from
      the individual implementations up to the zswap layer.
      
      The primary objective of this refactoring effort is to simplify the
      codebase. By unifying the reclaim loop and consolidating LRU handling
      within zswap, we can eliminate redundant code and improve
      maintainability. Additionally, this change enables the reclamation of
      stored pages in their actual LRU order. Presently, the zpool drivers
      link backing pages in an LRU, causing compressed pages with different
      LRU positions to be written back simultaneously.
      
      The series consists of several patches. The first patch implements the
      LRU and the reclaim loop in zswap, but it is not used yet because all
      three driver implementations are marked as zpool_evictable.
      The following three commits modify each zpool driver to be not
      zpool_evictable, allowing the use of the reclaim loop in zswap.
      As the drivers removed their shrink functions, the zpool interface is
      then trimmed by removing zpool_evictable, zpool_ops, and zpool_shrink.
      Finally, the code in zswap is further cleaned up by simplifying the
      writeback function and removing the now unnecessary zswap_header.
      
      
      This patch (of 7):
      
      Each zpool driver (zbud, z3fold and zsmalloc) implements its own shrink
      function, which is called from zpool_shrink.  However, with this commit, a
      unified shrink function is added to zswap.  The ultimate goal is to
      eliminate the need for zpool_shrink once all zpool implementations have
      dropped their shrink code.
      
      To ensure the functionality of each commit, this change focuses solely on
      adding the mechanism itself.  No modifications are made to the backends,
      meaning that functionally, there are no immediate changes.  The zswap
      mechanism will only come into effect once the backends have removed their
      shrink code.  The subsequent commits will address the modifications needed
      in the backends.
      
      Link: https://lkml.kernel.org/r/20230612093815.133504-1-cerasuolodomenico@gmail.com
      Link: https://lkml.kernel.org/r/20230612093815.133504-2-cerasuolodomenico@gmail.com
      Signed-off-by: default avatarDomenico Cerasuolo <cerasuolodomenico@gmail.com>
      Acked-by: default avatarNhat Pham <nphamcs@gmail.com>
      Tested-by: default avatarYosry Ahmed <yosryahmed@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarYosry Ahmed <yosryahmed@google.com>
      Reviewed-by: default avatarSergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Vitaly Wool <vitaly.wool@konsulko.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f999f38b
    • Muhammad Usama Anjum's avatar
      selftests: mm: remove duplicate unneeded defines · 0183d777
      Muhammad Usama Anjum authored
      
      
      Remove all defines which aren't needed after correctly including the
      kernel header files.
      
      Link: https://lkml.kernel.org/r/20230612095347.996335-2-usama.anjum@collabora.com
      Signed-off-by: default avatarMuhammad Usama Anjum <usama.anjum@collabora.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Stefan Roesch <shr@devkernel.io>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0183d777
    • Muhammad Usama Anjum's avatar
      selftests: mm: remove wrong kernel header inclusion · 1e6d1e36
      Muhammad Usama Anjum authored
      It is wrong to include unprocessed user header files directly.  They are
      processed to "<source_tree>/usr/include" by running "make headers" and
      they are included in selftests by kselftest makefiles automatically with
      help of KHDR_INCLUDES variable.  These headers should always bulilt first
      before building kselftests.
      
      Link: https://lkml.kernel.org/r/20230612095347.996335-1-usama.anjum@collabora.com
      Fixes: 07115fcc
      
       ("selftests/mm: add new selftests for KSM")
      Signed-off-by: default avatarMuhammad Usama Anjum <usama.anjum@collabora.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Stefan Roesch <shr@devkernel.io>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1e6d1e36
    • Ryan Roberts's avatar
      mm: ptep_get() conversion · c33c7948
      Ryan Roberts authored
      
      
      Convert all instances of direct pte_t* dereferencing to instead use
      ptep_get() helper.  This means that by default, the accesses change from a
      C dereference to a READ_ONCE().  This is technically the correct thing to
      do since where pgtables are modified by HW (for access/dirty) they are
      volatile and therefore we should always ensure READ_ONCE() semantics.
      
      But more importantly, by always using the helper, it can be overridden by
      the architecture to fully encapsulate the contents of the pte.  Arch code
      is deliberately not converted, as the arch code knows best.  It is
      intended that arch code (arm64) will override the default with its own
      implementation that can (e.g.) hide certain bits from the core code, or
      determine young/dirty status by mixing in state from another source.
      
      Conversion was done using Coccinelle:
      
      ----
      
      // $ make coccicheck \
      //          COCCI=ptepget.cocci \
      //          SPFLAGS="--include-headers" \
      //          MODE=patch
      
      virtual patch
      
      @ depends on patch @
      pte_t *v;
      @@
      
      - *v
      + ptep_get(v)
      
      ----
      
      Then reviewed and hand-edited to avoid multiple unnecessary calls to
      ptep_get(), instead opting to store the result of a single call in a
      variable, where it is correct to do so.  This aims to negate any cost of
      READ_ONCE() and will benefit arch-overrides that may be more complex.
      
      Included is a fix for an issue in an earlier version of this patch that
      was pointed out by kernel test robot.  The issue arose because config
      MMU=n elides definition of the ptep helper functions, including
      ptep_get().  HUGETLB_PAGE=n configs still define a simple
      huge_ptep_clear_flush() for linking purposes, which dereferences the ptep.
      So when both configs are disabled, this caused a build error because
      ptep_get() is not defined.  Fix by continuing to do a direct dereference
      when MMU=n.  This is safe because for this config the arch code cannot be
      trying to virtualize the ptes because none of the ptep helpers are
      defined.
      
      Link: https://lkml.kernel.org/r/20230612151545.3317766-4-ryan.roberts@arm.com
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Link: https://lore.kernel.org/oe-kbuild-all/202305120142.yXsNEo6H-lkp@intel.com/
      Signed-off-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Dave Airlie <airlied@gmail.com>
      Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c33c7948
    • Ryan Roberts's avatar
      mm: move ptep_get() and pmdp_get() helpers · 6c1d2a07
      Ryan Roberts authored
      
      
      There are many call sites that directly dereference a pte_t pointer.  This
      makes it very difficult to properly encapsulate a page table in the arch
      code without having to allocate shadow page tables.
      
      We will shortly solve this by replacing all the call sites with ptep_get()
      calls.  But there are call sites above the function definition in the
      header file, so let's move ptep_get() to an earlier location to solve that
      problem.  And move pmdp_get() at the same time to keep it close to
      ptep_get().
      
      Link: https://lkml.kernel.org/r/20230612151545.3317766-3-ryan.roberts@arm.com
      Signed-off-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Dave Airlie <airlied@gmail.com>
      Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: kernel test robot <lkp@intel.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6c1d2a07
    • Ryan Roberts's avatar
      mm: ptdump should use ptep_get_lockless() · 426931e7
      Ryan Roberts authored
      
      
      Patch series "Encapsulate PTE contents from non-arch code", v3.
      
      A series to improve the encapsulation of pte entries by disallowing
      non-arch code from directly dereferencing pte_t pointers.
      
      This means that by default, the accesses change from a C dereference to a
      READ_ONCE().  This is technically the correct thing to do since where
      pgtables are modified by HW (for access/dirty) they are volatile and
      therefore we should always ensure READ_ONCE() semantics.
      
      But more importantly, by always using the helper, it can be overridden by
      the architecture to fully encapsulate the contents of the pte.  Arch code
      is deliberately not converted, as the arch code knows best.  It is
      intended that arch code (arm64) will override the default with its own
      implementation that can (e.g.) hide certain bits from the core code, or
      determine young/dirty status by mixing in state from another source.
      
      
      This patch (of 3):
      
      The page table dumper uses walk_page_range_novma() to walk the page
      tables, which does not lock the PTL before calling the pte_entry()
      callback.  Therefore, the page table dumper's callback must use
      ptep_get_lockless() rather than ptep_get() to ensure that the pte it reads
      is not torn or otherwise corrupt when racing with writers.
      
      Link: https://lkml.kernel.org/r/20230612151545.3317766-1-ryan.roberts@arm.com
      Link: https://lkml.kernel.org/r/20230612151545.3317766-2-ryan.roberts@arm.com
      Signed-off-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Dave Airlie <airlied@gmail.com>
      Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: kernel test robot <lkp@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      426931e7
    • Catalin Marinas's avatar
      sh: move the ARCH_DMA_MINALIGN definition to asm/cache.h · e6926a4d
      Catalin Marinas authored
      
      
      The sh architecture defines ARCH_DMA_MINALIGN in asm/page.h.  Move it to
      asm/cache.h to allow a generic ARCH_DMA_MINALIGN definition in
      linux/cache.h without redefine errors/warnings.
      
      Link: https://lkml.kernel.org/r/20230613155245.1228274-4-catalin.marinas@arm.com
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: kernel test robot <lkp@intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e6926a4d
    • Catalin Marinas's avatar
      microblaze: move the ARCH_{DMA,SLAB}_MINALIGN definitions to asm/cache.h · 4ea57ce4
      Catalin Marinas authored
      
      
      The microblaze architecture defines ARCH_DMA_MINALIGN in asm/page.h.  Move
      it to asm/cache.h to allow a generic ARCH_DMA_MINALIGN definition in
      linux/cache.h without redefine errors/warnings.
      
      While at it, also move ARCH_SLAB_MINALIGN to asm/cache.h for
      consistency.
      
      Link: https://lkml.kernel.org/r/20230613155245.1228274-3-catalin.marinas@arm.com
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: kernel test robot <lkp@intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4ea57ce4
    • Catalin Marinas's avatar
      powerpc: move the ARCH_DMA_MINALIGN definition to asm/cache.h · 78615c4d
      Catalin Marinas authored
      
      
      Patch series "Move the ARCH_DMA_MINALIGN definition to asm/cache.h".
      
      The ARCH_KMALLOC_MINALIGN reduction series defines a generic
      ARCH_DMA_MINALIGN in linux/cache.h:
      
      https://lore.kernel.org/r/20230612153201.554742-2-catalin.marinas@arm.com/
      
      Unfortunately, this causes a duplicate definition warning for
      microblaze, powerpc (32-bit only) and sh as these architectures define
      ARCH_DMA_MINALIGN in a different file than asm/cache.h. Move the macro
      to asm/cache.h to avoid this issue and also bring them in line with the
      other architectures.
      
      
      This patch (of 3):
      
      The powerpc architecture defines ARCH_DMA_MINALIGN in asm/page_32.h and
      only if CONFIG_NOT_COHERENT_CACHE is enabled (32-bit platforms only). 
      Move this macro to asm/cache.h to allow a generic ARCH_DMA_MINALIGN
      definition in linux/cache.h without redefine errors/warnings.
      
      Link: https://lkml.kernel.org/r/20230613155245.1228274-1-catalin.marinas@arm.com
      Link: https://lkml.kernel.org/r/20230613155245.1228274-2-catalin.marinas@arm.com
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Closes: https://lore.kernel.org/oe-kbuild-all/202306131053.1ybvRRhO-lkp@intel.com/
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      78615c4d
    • Catalin Marinas's avatar
      arm64: enable ARCH_WANT_KMALLOC_DMA_BOUNCE for arm64 · 1c1a429e
      Catalin Marinas authored
      
      
      With the DMA bouncing of unaligned kmalloc() buffers now in place, enable
      it for arm64 to allow the kmalloc-{8,16,32,48,96} caches.  In addition,
      always create the swiotlb buffer even when the end of RAM is within the
      32-bit physical address range (the swiotlb buffer can still be disabled on
      the kernel command line).
      
      Link: https://lkml.kernel.org/r/20230612153201.554742-18-catalin.marinas@arm.com
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Tested-by: default avatarIsaac J. Manjarres <isaacmanjarres@google.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Jerry Snitselaar <jsnitsel@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mike Snitzer <snitzer@kernel.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Saravana Kannan <saravanak@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1c1a429e
    • Catalin Marinas's avatar
      mm: slab: reduce the kmalloc() minimum alignment if DMA bouncing possible · b035f5a6
      Catalin Marinas authored
      
      
      If an architecture opted in to DMA bouncing of unaligned kmalloc() buffers
      (ARCH_WANT_KMALLOC_DMA_BOUNCE), reduce the minimum kmalloc() cache
      alignment below cache-line size to ARCH_KMALLOC_MINALIGN.
      
      Link: https://lkml.kernel.org/r/20230612153201.554742-17-catalin.marinas@arm.com
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Tested-by: default avatarIsaac J. Manjarres <isaacmanjarres@google.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Jerry Snitselaar <jsnitsel@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mike Snitzer <snitzer@kernel.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Saravana Kannan <saravanak@google.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b035f5a6
    • Catalin Marinas's avatar
      iommu/dma: force bouncing if the size is not cacheline-aligned · 861370f4
      Catalin Marinas authored
      
      
      Similarly to the direct DMA, bounce small allocations as they may have
      originated from a kmalloc() cache not safe for DMA. Unlike the direct
      DMA, iommu_dma_map_sg() cannot call iommu_dma_map_sg_swiotlb() for all
      non-coherent devices as this would break some cases where the iova is
      expected to be contiguous (dmabuf). Instead, scan the scatterlist for
      any small sizes and only go the swiotlb path if any element of the list
      needs bouncing (note that iommu_dma_map_page() would still only bounce
      those buffers which are not DMA-aligned).
      
      To avoid scanning the scatterlist on the 'sync' operations, introduce an
      SG_DMA_SWIOTLB flag set by iommu_dma_map_sg_swiotlb(). The
      dev_use_swiotlb() function together with the newly added
      dev_use_sg_swiotlb() now check for both untrusted devices and unaligned
      kmalloc() buffers (suggested by Robin Murphy).
      
      Link: https://lkml.kernel.org/r/20230612153201.554742-16-catalin.marinas@arm.com
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Tested-by: default avatarIsaac J. Manjarres <isaacmanjarres@google.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Jerry Snitselaar <jsnitsel@redhat.com>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mike Snitzer <snitzer@kernel.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Saravana Kannan <saravanak@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      861370f4