Skip to content
  1. Mar 29, 2023
    • Jaewon Kim's avatar
      dma-buf: system_heap: avoid reclaim for order 4 · 3ccefdea
      Jaewon Kim authored
      
      
      Using order 4 pages would be helpful for IOMMUs mapping, but trying to get
      order 4 pages could spend quite much time in the page allocation.  From
      the perspective of responsiveness, the deterministic memory allocation
      speed, I think, is quite important.
      
      The order 4 allocation with __GFP_RECLAIM may spend much time in reclaim
      and compation logic.  __GFP_NORETRY also may affect.  These cause
      unpredictable delay.
      
      To get reasonable allocation speed from dma-buf system heap, use
      HIGH_ORDER_GFP for order 4 to avoid reclaim.  And let me remove
      meaningless __GFP_COMP for order 0.
      
      According to my tests, order 4 with MID_ORDER_GFP could get more number
      of order 4 pages but the elapsed times could be very slow.
      
               time	order 8	order 4	order 0
           584 usec	0	160	0
        28,428 usec	0	160	0
       100,701 usec	0	160	0
        76,645 usec	0	160	0
        25,522 usec	0	160	0
        38,798 usec	0	160	0
        89,012 usec	0	160	0
        23,015 usec	0	160	0
        73,360 usec	0	160	0
        76,953 usec	0	160	0
        31,492 usec	0	160	0
        75,889 usec	0	160	0
        84,551 usec	0	160	0
        84,352 usec	0	160	0
        57,103 usec	0	160	0
        93,452 usec	0	160	0
      
      If HIGH_ORDER_GFP is used for order 4, the number of order 4 could be
      decreased but the elapsed time results were quite stable and fast enough.
      
               time	order 8	order 4	order 0
         1,356 usec	0	155	80
         1,901 usec	0	11	2384
         1,912 usec	0	0	2560
         1,911 usec	0	0	2560
         1,884 usec	0	0	2560
         1,577 usec	0	0	2560
         1,366 usec	0	0	2560
         1,711 usec	0	0	2560
         1,635 usec	0	28	2112
           544 usec	10	0	0
           633 usec	2	128	0
           848 usec	0	160	0
           729 usec	0	160	0
         1,000 usec	0	160	0
         1,358 usec	0	160	0
         2,638 usec	0	31	2064
      
      Link: https://lkml.kernel.org/r/20230303050332.10138-1-jaewon31.kim@samsung.com
      Signed-off-by: default avatarJaewon Kim <jaewon31.kim@samsung.com>
      Reviewed-by: default avatarJohn Stultz <jstultz@google.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Cc: T.J. Mercier <tjmercier@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3ccefdea
    • Alexander Potapenko's avatar
      kmsan: add memsetXX tests · 78c74aee
      Alexander Potapenko authored
      
      
      Add tests ensuring that memset16()/memset32()/memset64() are instrumented
      by KMSAN and correctly initialize the memory.
      
      Link: https://lkml.kernel.org/r/20230303141433.3422671-4-glider@google.com
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      78c74aee
    • Alexander Potapenko's avatar
      x86: kmsan: use C versions of memset16/memset32/memset64 · 27f644dc
      Alexander Potapenko authored
      
      
      KMSAN must see as many memory accesses as possible to prevent false
      positive reports.  Fall back to versions of
      memset16()/memset32()/memset64() implemented in lib/string.c instead of
      those written in assembly.
      
      Link: https://lkml.kernel.org/r/20230303141433.3422671-3-glider@google.com
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Suggested-by: default avatarTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      27f644dc
    • Alexander Potapenko's avatar
      kmsan: another take at fixing memcpy tests · d3402925
      Alexander Potapenko authored
      commit 5478afc5
      
       ("kmsan: fix memcpy tests") uses OPTIMIZER_HIDE_VAR()
      to hide the uninitialized var from the compiler optimizations.
      
      However OPTIMIZER_HIDE_VAR(uninit) enforces an immediate check of @uninit,
      so memcpy tests did not actually check the behavior of memcpy(), because
      they always contained a KMSAN report.
      
      Replace OPTIMIZER_HIDE_VAR() with a file-local macro that just clobbers
      the memory with a barrier(), and add a test case for memcpy() that does
      not expect an error report.
      
      Also reflow kmsan_test.c with clang-format.
      
      Link: https://lkml.kernel.org/r/20230303141433.3422671-2-glider@google.com
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d3402925
    • Alexander Potapenko's avatar
      x86: kmsan: don't rename memintrinsics in uninstrumented files · 6dc4bd4e
      Alexander Potapenko authored
      
      
      clang -fsanitize=kernel-memory already replaces calls to
      memset/memcpy/memmove and their __builtin_ versions with
      __msan_memset/__msan_memcpy/__msan_memmove in instrumented files, so
      there is no need to override them.
      
      In non-instrumented versions we are now required to leave memset() and
      friends intact, so we cannot replace them with __msan_XXX() functions.
      
      Link: https://lkml.kernel.org/r/20230303141433.3422671-1-glider@google.com
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Suggested-by: default avatarMarco Elver <elver@google.com>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6dc4bd4e
    • Peter Xu's avatar
      mm/khugepaged: cleanup memcg uncharge for failure path · 7cb1d7ef
      Peter Xu authored
      
      
      Explicit memcg uncharging is not needed when the memcg accounting has the
      same lifespan of the page/folio.  That becomes the case for khugepaged
      after Yang & Zach's recent rework so the hpage will be allocated for each
      collapse rather than being cached.
      
      Cleanup the explicit memcg uncharge in khugepaged failure path and leave
      that for put_page().
      
      Link: https://lkml.kernel.org/r/20230303151218.311015-1-peterx@redhat.com
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Suggested-by: default avatarZach O'Keefe <zokeefe@google.com>
      Reviewed-by: default avatarZach O'Keefe <zokeefe@google.com>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Cc: David Stevens <stevensd@chromium.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7cb1d7ef
    • Anshuman Khandual's avatar
      mm/debug_vm_pgtable: replace pte_mkhuge() with arch_make_huge_pte() · 9dabf6e1
      Anshuman Khandual authored
      Since the following commit arch_make_huge_pte() should be used directly in
      generic memory subsystem as a platform provided page table helper, instead
      of pte_mkhuge().  Change hugetlb_basic_tests() to call
      arch_make_huge_pte() directly, and update its relevant documentation entry
      as required.
      
      'commit 16785bd7
      
       ("mm: merge pte_mkhuge() call into arch_make_huge_pte()")'
      
      Link: https://lkml.kernel.org/r/20230302114845.421674-1-anshuman.khandual@arm.com
      Signed-off-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Reported-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
        Link: https://lore.kernel.org/all/1ea45095-0926-a56a-a273-816709e9075e@csgroup.eu/
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9dabf6e1
    • Anshuman Khandual's avatar
      mm/migrate: drop pte_mkhuge() in remove_migration_pte() · 1da28f1b
      Anshuman Khandual authored
      Since the following commit, arch_make_huge_pte() should be used directly
      in generic memory subsystem as a platform provided page table helper,
      instead of pte_mkhuge().  This just drops pte_mkhuge() from
      remove_migration_pte(), which has now become redundant.
      
      'commit 16785bd7
      
       ("mm: merge pte_mkhuge() call into arch_make_huge_pte()")'
      
      Link: https://lkml.kernel.org/r/20230302025349.358341-1-anshuman.khandual@arm.com
      Signed-off-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Reported-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
        Link: https://lore.kernel.org/all/1ea45095-0926-a56a-a273-816709e9075e@csgroup.eu/
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1da28f1b
    • Kefeng Wang's avatar
      mm: swap: remove unneeded cgroup_throttle_swaprate() · 3e4fb13a
      Kefeng Wang authored
      
      
      All the callers of cgroup_throttle_swaprate() are converted to
      folio_throttle_swaprate(), so make __cgroup_throttle_swaprate() to take a
      folio, and rename it to __folio_throttle_swaprate(), also rename gfp_mask
      to gfp and drop redundant extern keyword.  finally, drop unused
      cgroup_throttle_swaprate().
      
      Link: https://lkml.kernel.org/r/20230302115835.105364-8-wangkefeng.wang@huawei.com
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3e4fb13a
    • Kefeng Wang's avatar
      mm: memory: use folio_throttle_swaprate() in do_cow_fault() · 68fa572b
      Kefeng Wang authored
      
      
      Directly use folio_throttle_swaprate() instead of
      cgroup_throttle_swaprate().
      
      Link: https://lkml.kernel.org/r/20230302115835.105364-7-wangkefeng.wang@huawei.com
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      68fa572b
    • Kefeng Wang's avatar
      mm: memory: use folio_throttle_swaprate() in do_anonymous_page() · e2bf3e2c
      Kefeng Wang authored
      
      
      Directly use folio_throttle_swaprate() instead of
      cgroup_throttle_swaprate().
      
      Link: https://lkml.kernel.org/r/20230302115835.105364-6-wangkefeng.wang@huawei.com
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e2bf3e2c
    • Kefeng Wang's avatar
      mm: memory: use folio_throttle_swaprate() in wp_page_copy() · 4d4f75bf
      Kefeng Wang authored
      
      
      Directly use folio_throttle_swaprate() instead of
      cgroup_throttle_swaprate().
      
      Link: https://lkml.kernel.org/r/20230302115835.105364-5-wangkefeng.wang@huawei.com
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4d4f75bf
    • Kefeng Wang's avatar
      mm: memory: use folio_throttle_swaprate() in page_copy_prealloc() · e601ded4
      Kefeng Wang authored
      
      
      Directly use folio_throttle_swaprate() instead of
      cgroup_throttle_swaprate().
      
      Link: https://lkml.kernel.org/r/20230302115835.105364-4-wangkefeng.wang@huawei.com
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e601ded4
    • Kefeng Wang's avatar
      mm: memory: use folio_throttle_swaprate() in do_swap_page() · 4231f842
      Kefeng Wang authored
      
      
      Directly use folio_throttle_swaprate() instead of
      cgroup_throttle_swaprate().
      
      Link: https://lkml.kernel.org/r/20230302115835.105364-3-wangkefeng.wang@huawei.com
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4231f842
    • Kefeng Wang's avatar
      mm: huge_memory: convert __do_huge_pmd_anonymous_page() to use a folio · cfe3236d
      Kefeng Wang authored
      
      
      Patch series "mm: remove cgroup_throttle_swaprate() completely", v2.
      
      Convert all the caller functions of cgroup_throttle_swaprate() to use
      folios, and use folio_throttle_swaprate(), which allows us to remove
      cgroup_throttle_swaprate() completely.
      
      
      This patch (of 7):
      
      Convert from page to folio within __do_huge_pmd_anonymous_page(), as we
      need the precise page which is to be stored at this PTE in the folio, the
      function still keep a page as the parameter.
      
      Link: https://lkml.kernel.org/r/20230302115835.105364-1-wangkefeng.wang@huawei.com
      Link: https://lkml.kernel.org/r/20230302115835.105364-2-wangkefeng.wang@huawei.com
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      cfe3236d
    • Peter Collingbourne's avatar
      kasan: call clear_page with a match-all tag instead of changing page tag · 16d91faf
      Peter Collingbourne authored
      
      
      Instead of changing the page's tag solely in order to obtain a pointer
      with a match-all tag and then changing it back again, just convert the
      pointer that we get from kmap_atomic() into one with a match-all tag
      before passing it to clear_page().
      
      On a certain microarchitecture, this has been observed to cause a
      measurable improvement in microbenchmark performance, presumably as a
      result of being able to avoid the atomic operations on the page tag.
      
      Link: https://lkml.kernel.org/r/20230216195924.3287772-1-pcc@google.com
      Signed-off-by: default avatarPeter Collingbourne <pcc@google.com>
      Link: https://linux-review.googlesource.com/id/I0249822cc29097ca7a04ad48e8eb14871f80e711
      Reviewed-by: default avatarAndrey Konovalov <andreyknvl@gmail.com>
      Reviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Evgenii Stepanov <eugenis@google.com>
      Cc: Peter Collingbourne <pcc@google.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      16d91faf
    • Ivan Orlov's avatar
      selftests: cgroup: add 'malloc' failures checks in test_memcontrol · af7df1c9
      Ivan Orlov authored
      
      
      There are several 'malloc' calls in test_memcontrol, which can be
      unsuccessful.  This patch will add 'malloc' failures checking to give more
      details about test's fail reasons and avoid possible undefined behavior
      during the future null dereference (like the one in
      alloc_anon_50M_check_swap function).
      
      Link: https://lkml.kernel.org/r/20230226131634.34366-1-ivan.orlov0322@gmail.com
      Signed-off-by: default avatarIvan Orlov <ivan.orlov0322@gmail.com>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Acked-by: default avatarShakeel Butt <shakeelb@google.com>
      Acked-by: default avatarRoman Gushchin <roman.gushchin@linux.dev>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Zefan Li <lizefan.x@bytedance.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      af7df1c9
    • Uros Bizjak's avatar
      mm/rmap: use atomic_try_cmpxchg in set_tlb_ubc_flush_pending · bdeb9188
      Uros Bizjak authored
      
      
      Use atomic_try_cmpxchg instead of atomic_cmpxchg (*ptr, old, new) == old
      in set_tlb_ubc_flush_pending.  86 CMPXCHG instruction returns success in
      ZF flag, so this change saves a compare after cmpxchg (and related move
      instruction in front of cmpxchg).
      
      Also, try_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg
      fails.
      
      No functional change intended.
      
      Link: https://lkml.kernel.org/r/20230227214228.3533299-1-ubizjak@gmail.com
      Signed-off-by: default avatarUros Bizjak <ubizjak@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      bdeb9188
    • Hyeonggon Yoo's avatar
      mm/debug: use %pGt to display page_type in dump_page() · f2421a16
      Hyeonggon Yoo authored
      
      
      Some page flags are stored in page_type rather than ->flags field.
      Use newly introduced page type %pGt in dump_page().
      
      Below are some examples:
      
      page:00000000da7184dd refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x101cb3
      flags: 0x2ffff0000000000(node=0|zone=2|lastcpupid=0xffff)
      page_type: 0xffffffff()
      raw: 02ffff0000000000 0000000000000000 dead000000000122 0000000000000000
      raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
      page dumped because: newly allocated page
      
      page:00000000da7184dd refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 pfn:0x101cb3
      flags: 0x2ffff0000000000(node=0|zone=2|lastcpupid=0xffff)
      page_type: 0xffffff7f(buddy)
      raw: 02ffff0000000000 ffff88813fff8e80 ffff88813fff8e80 0000000000000000
      raw: 0000000000000000 0000000000000000 00000000ffffff7f 0000000000000000
      page dumped because: freed page
      
      page:0000000042202316 refcount:3 mapcount:2 mapping:0000000000000000 index:0x7f634722a pfn:0x11994e
      memcg:ffff888100135000
      anon flags: 0x2ffff0000080024(uptodate|active|swapbacked|node=0|zone=2|lastcpupid=0xffff)
      page_type: 0x1()
      raw: 02ffff0000080024 0000000000000000 dead000000000122 ffff8881193398f1
      raw: 00000007f634722a 0000000000000000 0000000300000001 ffff888100135000
      page dumped because: user-mapped page
      
      Link: https://lkml.kernel.org/r/20230130042514.2418-4-42.hyeyoo@gmail.com
      Signed-off-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: John Ogness <john.ogness@linutronix.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f2421a16
    • Hyeonggon Yoo's avatar
      mm, printk: introduce new format %pGt for page_type · 4c85c0be
      Hyeonggon Yoo authored
      
      
      %pGp format is used to display 'flags' field of a struct page.  However,
      some page flags (i.e.  PG_buddy, see page-flags.h for more details) are
      stored in page_type field.  To display human-readable output of page_type,
      introduce %pGt format.
      
      It is important to note the meaning of bits are different in page_type. 
      if page_type is 0xffffffff, no flags are set.  Setting PG_buddy
      (0x00000080) flag results in a page_type of 0xffffff7f.  Clearing a bit
      actually means setting a flag.  Bits in page_type are inverted when
      displaying type names.
      
      Only values for which page_type_has_type() returns true are considered as
      page_type, to avoid confusion with mapcount values.  if it returns false,
      only raw values are displayed and not page type names.
      
      Link: https://lkml.kernel.org/r/20230130042514.2418-3-42.hyeyoo@gmail.com
      Signed-off-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Reviewed-by: Petr Mladek <pmladek@suse.com>	[vsprintf part]
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: John Ogness <john.ogness@linutronix.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4c85c0be
    • Hyeonggon Yoo's avatar
      mmflags.h: use less error prone method to define pageflag_names · e26fcc02
      Hyeonggon Yoo authored
      
      
      Patch series "mm, printk: introduce new format for page_type", v4.
      
      This series moves PG_slab page flag to page_type, freeing one bit in
      page->flags and introduces %pGt format that prints human-readable
      page_type like %pGp for printing page flags.
      
      See changelog of patch 2 for more implementation details.
      
      Thanks everyone that gave valuable comments.
      
      
      This patch (of 3):
      
      Use helper macro to decrease chances of typo when defining pageflag_names.
      
      Link: https://lkml.kernel.org/r/20230130042514.2418-1-42.hyeyoo@gmail.com
      Link: https://lore.kernel.org/lkml/Y6AycLbpjVzXM5I9@smile.fi.intel.com
      Link: https://lkml.kernel.org/r/20230130042514.2418-2-42.hyeyoo@gmail.com
      Signed-off-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Suggested-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Reviewed-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: John Ogness <john.ogness@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e26fcc02
    • Stefan Roesch's avatar
      mm: add tracepoints to ksm · 739100c8
      Stefan Roesch authored
      
      
      This adds the following tracepoints to ksm:
      - start / stop scan
      - ksm enter / exit
      - merge a page
      - merge a page with ksm
      - remove a page
      - remove a rmap item
      
      This patch has been split off from the RFC patch series "mm:
      process/cgroup ksm support".
      
      Link: https://lkml.kernel.org/r/20230210214645.2720847-1-shr@devkernel.io
      Signed-off-by: default avatarStefan Roesch <shr@devkernel.io>
      Reviewed-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      739100c8
    • Nicholas Piggin's avatar
      powerpc/64s: enable MMU_LAZY_TLB_SHOOTDOWN · 77f68ebe
      Nicholas Piggin authored
      
      
      On a 16-socket 192-core POWER8 system, the context_switch1_threads
      benchmark from will-it-scale (see earlier changelog), upstream can achieve
      a rate of about 1 million context switches per second, due to contention
      on the mm refcount.
      
      64s meets the prerequisites for CONFIG_MMU_LAZY_TLB_SHOOTDOWN, so enable
      the option.  This increases the above benchmark to 118 million context
      switches per second.
      
      This generates 314 additional IPI interrupts on a 144 CPU system doing a
      kernel compile, which is in the noise in terms of kernel cycles.
      
      Link: https://lkml.kernel.org/r/20230203071837.1136453-6-npiggin@gmail.com
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      77f68ebe
    • Nicholas Piggin's avatar
      lazy tlb: shoot lazies, non-refcounting lazy tlb mm reference handling scheme · 2655421a
      Nicholas Piggin authored
      
      
      On big systems, the mm refcount can become highly contented when doing a
      lot of context switching with threaded applications.  user<->idle switch
      is one of the important cases.  Abandoning lazy tlb entirely slows this
      switching down quite a bit in the common uncontended case, so that is not
      viable.
      
      Implement a scheme where lazy tlb mm references do not contribute to the
      refcount, instead they get explicitly removed when the refcount reaches
      zero.
      
      The final mmdrop() sends IPIs to all CPUs in the mm_cpumask and they
      switch away from this mm to init_mm if it was being used as the lazy tlb
      mm.  Enabling the shoot lazies option therefore requires that the arch
      ensures that mm_cpumask contains all CPUs that could possibly be using mm.
      A DEBUG_VM option IPIs every CPU in the system after this to ensure there
      are no references remaining before the mm is freed.
      
      Shootdown IPIs cost could be an issue, but they have not been observed to
      be a serious problem with this scheme, because short-lived processes tend
      not to migrate CPUs much, therefore they don't get much chance to leave
      lazy tlb mm references on remote CPUs.  There are a lot of options to
      reduce them if necessary, described in comments.
      
      The near-worst-case can be benchmarked with will-it-scale:
      
        context_switch1_threads -t $(($(nproc) / 2))
      
      This will create nproc threads (nproc / 2 switching pairs) all sharing the
      same mm that spread over all CPUs so each CPU does thread->idle->thread
      switching.
      
      [ Rik came up with basically the same idea a few years ago, so credit
        to him for that. ]
      
      Link: https://lore.kernel.org/linux-mm/20230118080011.2258375-1-npiggin@gmail.com/
      Link: https://lore.kernel.org/all/20180728215357.3249-11-riel@surriel.com/
      Link: https://lkml.kernel.org/r/20230203071837.1136453-5-npiggin@gmail.com
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2655421a
    • Nicholas Piggin's avatar
      lazy tlb: allow lazy tlb mm refcounting to be configurable · 88e3009b
      Nicholas Piggin authored
      
      
      Add CONFIG_MMU_TLB_REFCOUNT which enables refcounting of the lazy tlb mm
      when it is context switched.  This can be disabled by architectures that
      don't require this refcounting if they clean up lazy tlb mms when the last
      refcount is dropped.  Currently this is always enabled, so the patch
      introduces no functional change.
      
      Link: https://lkml.kernel.org/r/20230203071837.1136453-4-npiggin@gmail.com
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      88e3009b
    • Nicholas Piggin's avatar
      lazy tlb: introduce lazy tlb mm refcount helper functions · aa464ba9
      Nicholas Piggin authored
      
      
      Add explicit _lazy_tlb annotated functions for lazy tlb mm refcounting. 
      This makes the lazy tlb mm references more obvious, and allows the
      refcounting scheme to be modified in later changes.  There is no
      functional change with this patch.
      
      Link: https://lkml.kernel.org/r/20230203071837.1136453-3-npiggin@gmail.com
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      aa464ba9
    • Nicholas Piggin's avatar
      kthread: simplify kthread_use_mm refcounting · 6cad87b0
      Nicholas Piggin authored
      
      
      Patch series "shoot lazy tlbs (lazy tlb refcount scalability
      improvement)", v7.
      
      This series improves scalability of context switching between user and
      kernel threads on large systems with a threaded process spread across a
      lot of CPUs.
      
      Discussion of v6 here:
      https://lore.kernel.org/linux-mm/20230118080011.2258375-1-npiggin@gmail.com/
      
      
      This patch (of 5):
      
      Remove the special case avoiding refcounting when the mm to be used is the
      same as the kernel thread's active (lazy tlb) mm.  kthread_use_mm() should
      not be such a performance critical path that this matters much.  This
      simplifies a later change to lazy tlb mm refcounting.
      
      Link: https://lkml.kernel.org/r/20230203071837.1136453-1-npiggin@gmail.com
      Link: https://lkml.kernel.org/r/20230203071837.1136453-2-npiggin@gmail.com
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6cad87b0
    • Taejoon Song's avatar
      mm/zswap: try to avoid worst-case scenario on same element pages · 62bf1258
      Taejoon Song authored
      The worst-case scenario on finding same element pages is that almost all
      elements are same at the first glance but only last few elements are
      different.
      
      Since the same element tends to be grouped from the beginning of the
      pages, if we check the first element with the last element before looping
      through all elements, we might have some chances to quickly detect
      non-same element pages.
      
      1. Test is done under LG webOS TV (64-bit arch)
      2. Dump the swap-out pages (~819200 pages)
      3. Analyze the pages with simple test script which counts the iteration
         number and measures the speed at off-line
      
      Under 64-bit arch, the worst iteration count is PAGE_SIZE / 8 bytes = 512.
      The speed is based on the time to consume page_same_filled() function
      only.  The result, on average, is listed as below:
      
                                         Num of Iter    Speed(MB/s)
      Looping-Forward (Orig)                 38            99265
      Looping-Backward                       36           102725
      Last-element-check (This Patch)        33           125072
      
      The result shows that the average iteration count decreases by 13% and the
      speed increases by 25% with this patch.  This patch does not increase the
      overall time complexity, though.
      
      I also ran simpler version which uses backward loop.  Just looping
      backward also makes some improvement, but less than this patch.
      
      A similar change has already been made to zram in 90f82cbf
      
       ("zram: try
      to avoid worst-case scenario on same element pages").
      
      Link: https://lkml.kernel.org/r/20230205190036.1730134-1-taejoon.song@lge.com
      Signed-off-by: default avatarTaejoon Song <taejoon.song@lge.com>
      Reviewed-by: default avatarSergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Taejoon Song <taejoon.song@lge.com>
      Cc: Vitaly Wool <vitaly.wool@konsulko.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: <yjay.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      62bf1258
    • T.J. Alumbaugh's avatar
      mm: multi-gen LRU: improve design doc · 32d32ef1
      T.J. Alumbaugh authored
      
      
      This patch improves the design doc. Specifically,
        1. add a section for the per-memcg mm_struct list, and
        2. add a section for the PID controller.
      
      Link: https://lkml.kernel.org/r/20230214035445.1250139-2-talumbau@google.com
      Signed-off-by: default avatarT.J. Alumbaugh <talumbau@google.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      32d32ef1
    • T.J. Alumbaugh's avatar
      mm: multi-gen LRU: clean up sysfs code · 9a52b2f3
      T.J. Alumbaugh authored
      
      
      This patch cleans up the sysfs code. Specifically,
        1. use sysfs_emit(),
        2. use __ATTR_RW(), and
        3. constify multi-gen LRU struct attribute_group.
      
      Link: https://lkml.kernel.org/r/20230214035445.1250139-1-talumbau@google.com
      Signed-off-by: default avatarT.J. Alumbaugh <talumbau@google.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9a52b2f3
    • Ma Wupeng's avatar
      x86/mm/pat: clear VM_PAT if copy_p4d_range failed · d155df53
      Ma Wupeng authored
      
      
      Syzbot reports a warning in untrack_pfn().  Digging into the root we found
      that this is due to memory allocation failure in pmd_alloc_one.  And this
      failure is produced due to failslab.
      
      In copy_page_range(), memory alloaction for pmd failed.  During the error
      handling process in copy_page_range(), mmput() is called to remove all
      vmas.  While untrack_pfn this empty pfn, warning happens.
      
      Here's a simplified flow:
      
      dup_mm
        dup_mmap
          copy_page_range
            copy_p4d_range
              copy_pud_range
                copy_pmd_range
                  pmd_alloc
                    __pmd_alloc
                      pmd_alloc_one
                        page = alloc_pages(gfp, 0);
                          if (!page)
                            return NULL;
          mmput
              exit_mmap
                unmap_vmas
                  unmap_single_vma
                    untrack_pfn
                      follow_phys
                        WARN_ON_ONCE(1);
      
      Since this vma is not generate successfully, we can clear flag VM_PAT.  In
      this case, untrack_pfn() will not be called while cleaning this vma.
      
      Function untrack_pfn_moved() has also been renamed to fit the new logic.
      
      Link: https://lkml.kernel.org/r/20230217025615.1595558-1-mawupeng1@huawei.com
      Signed-off-by: default avatarMa Wupeng <mawupeng1@huawei.com>
      Reported-by: default avatar <syzbot+5f488e922d047d8f00cc@syzkaller.appspotmail.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d155df53
    • Muhammad Usama Anjum's avatar
      mm/userfaultfd: support WP on multiple VMAs · a1b92a3f
      Muhammad Usama Anjum authored
      
      
      mwriteprotect_range() errors out if [start, end) doesn't fall in one VMA. 
      We are facing a use case where multiple VMAs are present in one range of
      interest.  For example, the following pseudocode reproduces the error
      which we are trying to fix:
      
      - Allocate memory of size 16 pages with PROT_NONE with mmap
      - Register userfaultfd
      - Change protection of the first half (1 to 8 pages) of memory to
        PROT_READ | PROT_WRITE. This breaks the memory area in two VMAs.
      - Now UFFDIO_WRITEPROTECT_MODE_WP on the whole memory of 16 pages errors
        out.
      
      This is a simple use case where user may or may not know if the memory
      area has been divided into multiple VMAs.
      
      We need an implementation which doesn't disrupt the already present users.
      So keeping things simple, stop going over all the VMAs if any one of the
      VMA hasn't been registered in WP mode.  While at it, remove the un-needed
      error check as well.
      
      [akpm@linux-foundation.org: s/VM_WARN_ON_ONCE/VM_WARN_ONCE/ to fix build]
      Link: https://lkml.kernel.org/r/20230217105558.832710-1-usama.anjum@collabora.com
      Signed-off-by: default avatarMuhammad Usama Anjum <usama.anjum@collabora.com>
      Acked-by: default avatarPeter Xu <peterx@redhat.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reported-by: default avatarPaul Gofman <pgofman@codeweavers.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a1b92a3f
    • Vlastimil Babka's avatar
      mm, page_alloc: reduce page alloc/free sanity checks · 700d2e9a
      Vlastimil Babka authored
      Historically, we have performed sanity checks on all struct pages being
      allocated or freed, making sure they have no unexpected page flags or
      certain field values.  This can detect insufficient cleanup and some cases
      of use-after-free, although on its own it can't always identify the
      culprit.  The result is a warning and the "bad page" being leaked.
      
      The checks do need some cpu cycles, so in 4.7 with commits 479f854a
      ("mm, page_alloc: defer debugging checks of pages allocated from the PCP")
      and 4db7548c ("mm, page_alloc: defer debugging checks of freed pages
      until a PCP drain") they were no longer performed in the hot paths when
      allocating and freeing from pcplists, but only when pcplists are bypassed,
      refilled or drained.  For debugging purposes, with CONFIG_DEBUG_VM enabled
      the checks were instead still done in the hot paths and not when refilling
      or draining pcplists.
      
      With 4462b32c
      
       ("mm, page_alloc: more extensive free page checking with
      debug_pagealloc"), enabling debug_pagealloc also moved the sanity checks
      back to hot pahs.  When both debug_pagealloc and CONFIG_DEBUG_VM are
      enabled, the checks are done both in hotpaths and pcplist refill/drain.
      
      Even though the non-debug default today might seem to be a sensible
      tradeoff between overhead and ability to detect bad pages, on closer look
      it's arguably not.  As most allocations go through the pcplists, catching
      any bad pages when refilling or draining pcplists has only a small chance,
      insufficient for debugging or serious hardening purposes.  On the other
      hand the cost of the checks is concentrated in the already expensive
      drain/refill batching operations, and those are done under the often
      contended zone lock.  That was recently identified as an issue for page
      allocation and the zone lock contention reduced by moving the checks
      outside of the locked section with a patch "mm: reduce lock contention of
      pcp buffer refill", but the cost of the checks is still visible compared
      to their removal [1].  In the pcplist draining path free_pcppages_bulk()
      the checks are still done under zone->lock.
      
      Thus, remove the checks from pcplist refill and drain paths completely.
      Introduce a static key check_pages_enabled to control checks during page
      allocation a freeing (whether pcplist is used or bypassed). The static
      key is enabled if either is true:
      
      - kernel is built with CONFIG_DEBUG_VM=y (debugging)
      - debug_pagealloc or page poisoning is boot-time enabled (debugging)
      - init_on_alloc or init_on_free is boot-time enabled (hardening)
      
      The resulting user visible changes:
      - no checks when draining/refilling pcplists - less overhead, with
        likely no practical reduction of ability to catch bad pages
      - no checks when bypassing pcplists in default config (no
        debugging/hardening) - less overhead etc. as above
      - on typical hardened kernels [2], checks are now performed on each page
        allocation/free (previously only when bypassing/draining/refilling
        pcplists) - the init_on_alloc/init_on_free enabled should be sufficient
        indication for preferring more costly alloc/free operations for
        hardening purposes and we shouldn't need to introduce another toggle
      - code (various wrappers) removal and simplification
      
      [1] https://lore.kernel.org/all/68ba44d8-6899-c018-dcb3-36f3a96e6bea@sra.uni-hannover.de/
      [2] https://lore.kernel.org/all/63ebc499.a70a0220.9ac51.29ea@mx.google.com/
      
      [akpm@linux-foundation.org: coding-style cleanups]
      [akpm@linux-foundation.org: make check_pages_enabled static]
      Link: https://lkml.kernel.org/r/20230216095131.17336-1-vbabka@suse.cz
      Reported-by: default avatarAlexander Halbuer <halbuer@sra.uni-hannover.de>
      Reported-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      700d2e9a
    • Alexander Halbuer's avatar
      mm: reduce lock contention of pcp buffer refill · 2ede3c13
      Alexander Halbuer authored
      
      
      rmqueue_bulk() batches the allocation of multiple elements to refill the
      per-CPU buffers into a single hold of the zone lock.  Each element is
      allocated and checked using check_pcp_refill().  The check touches every
      related struct page which is especially expensive for higher order
      allocations (huge pages).
      
      This patch reduces the time holding the lock by moving the check out of
      the critical section similar to rmqueue_buddy() which allocates a single
      element.
      
      Measurements of parallel allocation-heavy workloads show a reduction of
      the average huge page allocation latency of 50 percent for two cores and
      nearly 90 percent for 24 cores.
      
      Link: https://lkml.kernel.org/r/20230201162549.68384-1-halbuer@sra.uni-hannover.de
      Signed-off-by: default avatarAlexander Halbuer <halbuer@sra.uni-hannover.de>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2ede3c13
    • Thomas Weißschuh's avatar
      mm: cma: make kobj_type structure constant · a4a4659d
      Thomas Weißschuh authored
      Since commit ee6d3dd4
      
       ("driver core: make kobj_type constant.") the
      driver core allows the usage of const struct kobj_type.
      
      Take advantage of this to constify the structure definition to prevent
      modification at runtime.
      
      Link: https://lkml.kernel.org/r/20230220-kobj_type-mm-cma-v1-1-45996cff1a81@weissschuh.net
      Signed-off-by: default avatarThomas Weißschuh <linux@weissschuh.net>
      Cc: Wedson Almeida Filho <wedsonaf@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a4a4659d
    • Peter Xu's avatar
      mm/khugepaged: alloc_charge_hpage() take care of mem charge errors · 94c02ad7
      Peter Xu authored
      
      
      If memory charge failed, instead of returning the hpage but with an error,
      allow the function to cleanup the folio properly, which is normally what a
      function should do in this case - either return successfully, or return
      with no side effect of partial runs with an indicated error.
      
      This will also avoid the caller calling mem_cgroup_uncharge()
      unnecessarily with either anon or shmem path (even if it's safe to do so).
      
      Link: https://lkml.kernel.org/r/20230222195247.791227-1-peterx@redhat.com
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarDavid Stevens <stevensd@chromium.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Reviewed-by: default avatarZach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      94c02ad7
    • Muchun Song's avatar
      mm: hugetlb_vmemmap: simplify hugetlb_vmemmap_init() a bit · 12318566
      Muchun Song authored
      
      
      The check of IS_ENABLED(CONFIG_PROC_SYSCTL) is unnecessary since
      register_sysctl_init() will be empty in this case.  So, there is no
      warnings after removing the check.
      
      Link: https://lkml.kernel.org/r/20230223065947.64134-1-songmuchun@bytedance.com
      Signed-off-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      12318566
    • Florian Fainelli's avatar
      mailmap: add an entry for Leonard Crestez · bdd034de
      Florian Fainelli authored
      
      
      Link: https://lkml.kernel.org/r/20230324130737.3360169-1-f.fainelli@gmail.com
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
      Cc: Colin Ian King <colin.i.king@gmail.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Kirill Tkhai <tkhai@ya.ru>
      Cc: Konrad Dybcio <konrad.dybcio@linaro.org>
      Cc: Leonard Crestez <cdleonard@gmail.com>
      Cc: Qais Yousef <qyousef@layalina.io>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Vasily Averin <vasily.averin@linux.dev>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      bdd034de
    • Muchun Song's avatar
      mm: kfence: fix handling discontiguous page · 1f2803b2
      Muchun Song authored
      The struct pages could be discontiguous when the kfence pool is allocated
      via alloc_contig_pages() with CONFIG_SPARSEMEM and
      !CONFIG_SPARSEMEM_VMEMMAP.
      
      This may result in setting PG_slab and memcg_data to a arbitrary
      address (may be not used as a struct page), which in the worst case
      might corrupt the kernel.
      
      So the iteration should use nth_page().
      
      Link: https://lkml.kernel.org/r/20230323025003.94447-1-songmuchun@bytedance.com
      Fixes: 0ce20dd8
      
       ("mm: add Kernel Electric-Fence infrastructure")
      Signed-off-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Reviewed-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: SeongJae Park <sjpark@amazon.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1f2803b2
    • Muchun Song's avatar
      mm: kfence: fix PG_slab and memcg_data clearing · 3ee2d747
      Muchun Song authored
      It does not reset PG_slab and memcg_data when KFENCE fails to initialize
      kfence pool at runtime.  It is reporting a "Bad page state" message when
      kfence pool is freed to buddy.  The checking of whether it is a compound
      head page seems unnecessary since we already guarantee this when
      allocating kfence pool.   Remove the check to simplify the code.
      
      Link: https://lkml.kernel.org/r/20230320030059.20189-1-songmuchun@bytedance.com
      Fixes: 0ce20dd8
      
       ("mm: add Kernel Electric-Fence infrastructure")
      Signed-off-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: SeongJae Park <sjpark@amazon.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3ee2d747