Skip to content
  1. Mar 29, 2023
    • Hyeonggon Yoo's avatar
      mm/debug: use %pGt to display page_type in dump_page() · f2421a16
      Hyeonggon Yoo authored
      
      
      Some page flags are stored in page_type rather than ->flags field.
      Use newly introduced page type %pGt in dump_page().
      
      Below are some examples:
      
      page:00000000da7184dd refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x101cb3
      flags: 0x2ffff0000000000(node=0|zone=2|lastcpupid=0xffff)
      page_type: 0xffffffff()
      raw: 02ffff0000000000 0000000000000000 dead000000000122 0000000000000000
      raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
      page dumped because: newly allocated page
      
      page:00000000da7184dd refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 pfn:0x101cb3
      flags: 0x2ffff0000000000(node=0|zone=2|lastcpupid=0xffff)
      page_type: 0xffffff7f(buddy)
      raw: 02ffff0000000000 ffff88813fff8e80 ffff88813fff8e80 0000000000000000
      raw: 0000000000000000 0000000000000000 00000000ffffff7f 0000000000000000
      page dumped because: freed page
      
      page:0000000042202316 refcount:3 mapcount:2 mapping:0000000000000000 index:0x7f634722a pfn:0x11994e
      memcg:ffff888100135000
      anon flags: 0x2ffff0000080024(uptodate|active|swapbacked|node=0|zone=2|lastcpupid=0xffff)
      page_type: 0x1()
      raw: 02ffff0000080024 0000000000000000 dead000000000122 ffff8881193398f1
      raw: 00000007f634722a 0000000000000000 0000000300000001 ffff888100135000
      page dumped because: user-mapped page
      
      Link: https://lkml.kernel.org/r/20230130042514.2418-4-42.hyeyoo@gmail.com
      Signed-off-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: John Ogness <john.ogness@linutronix.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f2421a16
    • Hyeonggon Yoo's avatar
      mm, printk: introduce new format %pGt for page_type · 4c85c0be
      Hyeonggon Yoo authored
      
      
      %pGp format is used to display 'flags' field of a struct page.  However,
      some page flags (i.e.  PG_buddy, see page-flags.h for more details) are
      stored in page_type field.  To display human-readable output of page_type,
      introduce %pGt format.
      
      It is important to note the meaning of bits are different in page_type. 
      if page_type is 0xffffffff, no flags are set.  Setting PG_buddy
      (0x00000080) flag results in a page_type of 0xffffff7f.  Clearing a bit
      actually means setting a flag.  Bits in page_type are inverted when
      displaying type names.
      
      Only values for which page_type_has_type() returns true are considered as
      page_type, to avoid confusion with mapcount values.  if it returns false,
      only raw values are displayed and not page type names.
      
      Link: https://lkml.kernel.org/r/20230130042514.2418-3-42.hyeyoo@gmail.com
      Signed-off-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Reviewed-by: Petr Mladek <pmladek@suse.com>	[vsprintf part]
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: John Ogness <john.ogness@linutronix.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4c85c0be
    • Hyeonggon Yoo's avatar
      mmflags.h: use less error prone method to define pageflag_names · e26fcc02
      Hyeonggon Yoo authored
      
      
      Patch series "mm, printk: introduce new format for page_type", v4.
      
      This series moves PG_slab page flag to page_type, freeing one bit in
      page->flags and introduces %pGt format that prints human-readable
      page_type like %pGp for printing page flags.
      
      See changelog of patch 2 for more implementation details.
      
      Thanks everyone that gave valuable comments.
      
      
      This patch (of 3):
      
      Use helper macro to decrease chances of typo when defining pageflag_names.
      
      Link: https://lkml.kernel.org/r/20230130042514.2418-1-42.hyeyoo@gmail.com
      Link: https://lore.kernel.org/lkml/Y6AycLbpjVzXM5I9@smile.fi.intel.com
      Link: https://lkml.kernel.org/r/20230130042514.2418-2-42.hyeyoo@gmail.com
      Signed-off-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Suggested-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Reviewed-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: John Ogness <john.ogness@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e26fcc02
    • Stefan Roesch's avatar
      mm: add tracepoints to ksm · 739100c8
      Stefan Roesch authored
      
      
      This adds the following tracepoints to ksm:
      - start / stop scan
      - ksm enter / exit
      - merge a page
      - merge a page with ksm
      - remove a page
      - remove a rmap item
      
      This patch has been split off from the RFC patch series "mm:
      process/cgroup ksm support".
      
      Link: https://lkml.kernel.org/r/20230210214645.2720847-1-shr@devkernel.io
      Signed-off-by: default avatarStefan Roesch <shr@devkernel.io>
      Reviewed-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      739100c8
    • Nicholas Piggin's avatar
      powerpc/64s: enable MMU_LAZY_TLB_SHOOTDOWN · 77f68ebe
      Nicholas Piggin authored
      
      
      On a 16-socket 192-core POWER8 system, the context_switch1_threads
      benchmark from will-it-scale (see earlier changelog), upstream can achieve
      a rate of about 1 million context switches per second, due to contention
      on the mm refcount.
      
      64s meets the prerequisites for CONFIG_MMU_LAZY_TLB_SHOOTDOWN, so enable
      the option.  This increases the above benchmark to 118 million context
      switches per second.
      
      This generates 314 additional IPI interrupts on a 144 CPU system doing a
      kernel compile, which is in the noise in terms of kernel cycles.
      
      Link: https://lkml.kernel.org/r/20230203071837.1136453-6-npiggin@gmail.com
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      77f68ebe
    • Nicholas Piggin's avatar
      lazy tlb: shoot lazies, non-refcounting lazy tlb mm reference handling scheme · 2655421a
      Nicholas Piggin authored
      
      
      On big systems, the mm refcount can become highly contented when doing a
      lot of context switching with threaded applications.  user<->idle switch
      is one of the important cases.  Abandoning lazy tlb entirely slows this
      switching down quite a bit in the common uncontended case, so that is not
      viable.
      
      Implement a scheme where lazy tlb mm references do not contribute to the
      refcount, instead they get explicitly removed when the refcount reaches
      zero.
      
      The final mmdrop() sends IPIs to all CPUs in the mm_cpumask and they
      switch away from this mm to init_mm if it was being used as the lazy tlb
      mm.  Enabling the shoot lazies option therefore requires that the arch
      ensures that mm_cpumask contains all CPUs that could possibly be using mm.
      A DEBUG_VM option IPIs every CPU in the system after this to ensure there
      are no references remaining before the mm is freed.
      
      Shootdown IPIs cost could be an issue, but they have not been observed to
      be a serious problem with this scheme, because short-lived processes tend
      not to migrate CPUs much, therefore they don't get much chance to leave
      lazy tlb mm references on remote CPUs.  There are a lot of options to
      reduce them if necessary, described in comments.
      
      The near-worst-case can be benchmarked with will-it-scale:
      
        context_switch1_threads -t $(($(nproc) / 2))
      
      This will create nproc threads (nproc / 2 switching pairs) all sharing the
      same mm that spread over all CPUs so each CPU does thread->idle->thread
      switching.
      
      [ Rik came up with basically the same idea a few years ago, so credit
        to him for that. ]
      
      Link: https://lore.kernel.org/linux-mm/20230118080011.2258375-1-npiggin@gmail.com/
      Link: https://lore.kernel.org/all/20180728215357.3249-11-riel@surriel.com/
      Link: https://lkml.kernel.org/r/20230203071837.1136453-5-npiggin@gmail.com
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2655421a
    • Nicholas Piggin's avatar
      lazy tlb: allow lazy tlb mm refcounting to be configurable · 88e3009b
      Nicholas Piggin authored
      
      
      Add CONFIG_MMU_TLB_REFCOUNT which enables refcounting of the lazy tlb mm
      when it is context switched.  This can be disabled by architectures that
      don't require this refcounting if they clean up lazy tlb mms when the last
      refcount is dropped.  Currently this is always enabled, so the patch
      introduces no functional change.
      
      Link: https://lkml.kernel.org/r/20230203071837.1136453-4-npiggin@gmail.com
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      88e3009b
    • Nicholas Piggin's avatar
      lazy tlb: introduce lazy tlb mm refcount helper functions · aa464ba9
      Nicholas Piggin authored
      
      
      Add explicit _lazy_tlb annotated functions for lazy tlb mm refcounting. 
      This makes the lazy tlb mm references more obvious, and allows the
      refcounting scheme to be modified in later changes.  There is no
      functional change with this patch.
      
      Link: https://lkml.kernel.org/r/20230203071837.1136453-3-npiggin@gmail.com
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      aa464ba9
    • Nicholas Piggin's avatar
      kthread: simplify kthread_use_mm refcounting · 6cad87b0
      Nicholas Piggin authored
      
      
      Patch series "shoot lazy tlbs (lazy tlb refcount scalability
      improvement)", v7.
      
      This series improves scalability of context switching between user and
      kernel threads on large systems with a threaded process spread across a
      lot of CPUs.
      
      Discussion of v6 here:
      https://lore.kernel.org/linux-mm/20230118080011.2258375-1-npiggin@gmail.com/
      
      
      This patch (of 5):
      
      Remove the special case avoiding refcounting when the mm to be used is the
      same as the kernel thread's active (lazy tlb) mm.  kthread_use_mm() should
      not be such a performance critical path that this matters much.  This
      simplifies a later change to lazy tlb mm refcounting.
      
      Link: https://lkml.kernel.org/r/20230203071837.1136453-1-npiggin@gmail.com
      Link: https://lkml.kernel.org/r/20230203071837.1136453-2-npiggin@gmail.com
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6cad87b0
    • Taejoon Song's avatar
      mm/zswap: try to avoid worst-case scenario on same element pages · 62bf1258
      Taejoon Song authored
      The worst-case scenario on finding same element pages is that almost all
      elements are same at the first glance but only last few elements are
      different.
      
      Since the same element tends to be grouped from the beginning of the
      pages, if we check the first element with the last element before looping
      through all elements, we might have some chances to quickly detect
      non-same element pages.
      
      1. Test is done under LG webOS TV (64-bit arch)
      2. Dump the swap-out pages (~819200 pages)
      3. Analyze the pages with simple test script which counts the iteration
         number and measures the speed at off-line
      
      Under 64-bit arch, the worst iteration count is PAGE_SIZE / 8 bytes = 512.
      The speed is based on the time to consume page_same_filled() function
      only.  The result, on average, is listed as below:
      
                                         Num of Iter    Speed(MB/s)
      Looping-Forward (Orig)                 38            99265
      Looping-Backward                       36           102725
      Last-element-check (This Patch)        33           125072
      
      The result shows that the average iteration count decreases by 13% and the
      speed increases by 25% with this patch.  This patch does not increase the
      overall time complexity, though.
      
      I also ran simpler version which uses backward loop.  Just looping
      backward also makes some improvement, but less than this patch.
      
      A similar change has already been made to zram in 90f82cbf
      
       ("zram: try
      to avoid worst-case scenario on same element pages").
      
      Link: https://lkml.kernel.org/r/20230205190036.1730134-1-taejoon.song@lge.com
      Signed-off-by: default avatarTaejoon Song <taejoon.song@lge.com>
      Reviewed-by: default avatarSergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Taejoon Song <taejoon.song@lge.com>
      Cc: Vitaly Wool <vitaly.wool@konsulko.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: <yjay.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      62bf1258
    • T.J. Alumbaugh's avatar
      mm: multi-gen LRU: improve design doc · 32d32ef1
      T.J. Alumbaugh authored
      
      
      This patch improves the design doc. Specifically,
        1. add a section for the per-memcg mm_struct list, and
        2. add a section for the PID controller.
      
      Link: https://lkml.kernel.org/r/20230214035445.1250139-2-talumbau@google.com
      Signed-off-by: default avatarT.J. Alumbaugh <talumbau@google.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      32d32ef1
    • T.J. Alumbaugh's avatar
      mm: multi-gen LRU: clean up sysfs code · 9a52b2f3
      T.J. Alumbaugh authored
      
      
      This patch cleans up the sysfs code. Specifically,
        1. use sysfs_emit(),
        2. use __ATTR_RW(), and
        3. constify multi-gen LRU struct attribute_group.
      
      Link: https://lkml.kernel.org/r/20230214035445.1250139-1-talumbau@google.com
      Signed-off-by: default avatarT.J. Alumbaugh <talumbau@google.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9a52b2f3
    • Ma Wupeng's avatar
      x86/mm/pat: clear VM_PAT if copy_p4d_range failed · d155df53
      Ma Wupeng authored
      
      
      Syzbot reports a warning in untrack_pfn().  Digging into the root we found
      that this is due to memory allocation failure in pmd_alloc_one.  And this
      failure is produced due to failslab.
      
      In copy_page_range(), memory alloaction for pmd failed.  During the error
      handling process in copy_page_range(), mmput() is called to remove all
      vmas.  While untrack_pfn this empty pfn, warning happens.
      
      Here's a simplified flow:
      
      dup_mm
        dup_mmap
          copy_page_range
            copy_p4d_range
              copy_pud_range
                copy_pmd_range
                  pmd_alloc
                    __pmd_alloc
                      pmd_alloc_one
                        page = alloc_pages(gfp, 0);
                          if (!page)
                            return NULL;
          mmput
              exit_mmap
                unmap_vmas
                  unmap_single_vma
                    untrack_pfn
                      follow_phys
                        WARN_ON_ONCE(1);
      
      Since this vma is not generate successfully, we can clear flag VM_PAT.  In
      this case, untrack_pfn() will not be called while cleaning this vma.
      
      Function untrack_pfn_moved() has also been renamed to fit the new logic.
      
      Link: https://lkml.kernel.org/r/20230217025615.1595558-1-mawupeng1@huawei.com
      Signed-off-by: default avatarMa Wupeng <mawupeng1@huawei.com>
      Reported-by: default avatar <syzbot+5f488e922d047d8f00cc@syzkaller.appspotmail.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d155df53
    • Muhammad Usama Anjum's avatar
      mm/userfaultfd: support WP on multiple VMAs · a1b92a3f
      Muhammad Usama Anjum authored
      
      
      mwriteprotect_range() errors out if [start, end) doesn't fall in one VMA. 
      We are facing a use case where multiple VMAs are present in one range of
      interest.  For example, the following pseudocode reproduces the error
      which we are trying to fix:
      
      - Allocate memory of size 16 pages with PROT_NONE with mmap
      - Register userfaultfd
      - Change protection of the first half (1 to 8 pages) of memory to
        PROT_READ | PROT_WRITE. This breaks the memory area in two VMAs.
      - Now UFFDIO_WRITEPROTECT_MODE_WP on the whole memory of 16 pages errors
        out.
      
      This is a simple use case where user may or may not know if the memory
      area has been divided into multiple VMAs.
      
      We need an implementation which doesn't disrupt the already present users.
      So keeping things simple, stop going over all the VMAs if any one of the
      VMA hasn't been registered in WP mode.  While at it, remove the un-needed
      error check as well.
      
      [akpm@linux-foundation.org: s/VM_WARN_ON_ONCE/VM_WARN_ONCE/ to fix build]
      Link: https://lkml.kernel.org/r/20230217105558.832710-1-usama.anjum@collabora.com
      Signed-off-by: default avatarMuhammad Usama Anjum <usama.anjum@collabora.com>
      Acked-by: default avatarPeter Xu <peterx@redhat.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reported-by: default avatarPaul Gofman <pgofman@codeweavers.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a1b92a3f
    • Vlastimil Babka's avatar
      mm, page_alloc: reduce page alloc/free sanity checks · 700d2e9a
      Vlastimil Babka authored
      Historically, we have performed sanity checks on all struct pages being
      allocated or freed, making sure they have no unexpected page flags or
      certain field values.  This can detect insufficient cleanup and some cases
      of use-after-free, although on its own it can't always identify the
      culprit.  The result is a warning and the "bad page" being leaked.
      
      The checks do need some cpu cycles, so in 4.7 with commits 479f854a
      ("mm, page_alloc: defer debugging checks of pages allocated from the PCP")
      and 4db7548c ("mm, page_alloc: defer debugging checks of freed pages
      until a PCP drain") they were no longer performed in the hot paths when
      allocating and freeing from pcplists, but only when pcplists are bypassed,
      refilled or drained.  For debugging purposes, with CONFIG_DEBUG_VM enabled
      the checks were instead still done in the hot paths and not when refilling
      or draining pcplists.
      
      With 4462b32c
      
       ("mm, page_alloc: more extensive free page checking with
      debug_pagealloc"), enabling debug_pagealloc also moved the sanity checks
      back to hot pahs.  When both debug_pagealloc and CONFIG_DEBUG_VM are
      enabled, the checks are done both in hotpaths and pcplist refill/drain.
      
      Even though the non-debug default today might seem to be a sensible
      tradeoff between overhead and ability to detect bad pages, on closer look
      it's arguably not.  As most allocations go through the pcplists, catching
      any bad pages when refilling or draining pcplists has only a small chance,
      insufficient for debugging or serious hardening purposes.  On the other
      hand the cost of the checks is concentrated in the already expensive
      drain/refill batching operations, and those are done under the often
      contended zone lock.  That was recently identified as an issue for page
      allocation and the zone lock contention reduced by moving the checks
      outside of the locked section with a patch "mm: reduce lock contention of
      pcp buffer refill", but the cost of the checks is still visible compared
      to their removal [1].  In the pcplist draining path free_pcppages_bulk()
      the checks are still done under zone->lock.
      
      Thus, remove the checks from pcplist refill and drain paths completely.
      Introduce a static key check_pages_enabled to control checks during page
      allocation a freeing (whether pcplist is used or bypassed). The static
      key is enabled if either is true:
      
      - kernel is built with CONFIG_DEBUG_VM=y (debugging)
      - debug_pagealloc or page poisoning is boot-time enabled (debugging)
      - init_on_alloc or init_on_free is boot-time enabled (hardening)
      
      The resulting user visible changes:
      - no checks when draining/refilling pcplists - less overhead, with
        likely no practical reduction of ability to catch bad pages
      - no checks when bypassing pcplists in default config (no
        debugging/hardening) - less overhead etc. as above
      - on typical hardened kernels [2], checks are now performed on each page
        allocation/free (previously only when bypassing/draining/refilling
        pcplists) - the init_on_alloc/init_on_free enabled should be sufficient
        indication for preferring more costly alloc/free operations for
        hardening purposes and we shouldn't need to introduce another toggle
      - code (various wrappers) removal and simplification
      
      [1] https://lore.kernel.org/all/68ba44d8-6899-c018-dcb3-36f3a96e6bea@sra.uni-hannover.de/
      [2] https://lore.kernel.org/all/63ebc499.a70a0220.9ac51.29ea@mx.google.com/
      
      [akpm@linux-foundation.org: coding-style cleanups]
      [akpm@linux-foundation.org: make check_pages_enabled static]
      Link: https://lkml.kernel.org/r/20230216095131.17336-1-vbabka@suse.cz
      Reported-by: default avatarAlexander Halbuer <halbuer@sra.uni-hannover.de>
      Reported-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      700d2e9a
    • Alexander Halbuer's avatar
      mm: reduce lock contention of pcp buffer refill · 2ede3c13
      Alexander Halbuer authored
      
      
      rmqueue_bulk() batches the allocation of multiple elements to refill the
      per-CPU buffers into a single hold of the zone lock.  Each element is
      allocated and checked using check_pcp_refill().  The check touches every
      related struct page which is especially expensive for higher order
      allocations (huge pages).
      
      This patch reduces the time holding the lock by moving the check out of
      the critical section similar to rmqueue_buddy() which allocates a single
      element.
      
      Measurements of parallel allocation-heavy workloads show a reduction of
      the average huge page allocation latency of 50 percent for two cores and
      nearly 90 percent for 24 cores.
      
      Link: https://lkml.kernel.org/r/20230201162549.68384-1-halbuer@sra.uni-hannover.de
      Signed-off-by: default avatarAlexander Halbuer <halbuer@sra.uni-hannover.de>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2ede3c13
    • Thomas Weißschuh's avatar
      mm: cma: make kobj_type structure constant · a4a4659d
      Thomas Weißschuh authored
      Since commit ee6d3dd4
      
       ("driver core: make kobj_type constant.") the
      driver core allows the usage of const struct kobj_type.
      
      Take advantage of this to constify the structure definition to prevent
      modification at runtime.
      
      Link: https://lkml.kernel.org/r/20230220-kobj_type-mm-cma-v1-1-45996cff1a81@weissschuh.net
      Signed-off-by: default avatarThomas Weißschuh <linux@weissschuh.net>
      Cc: Wedson Almeida Filho <wedsonaf@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a4a4659d
    • Peter Xu's avatar
      mm/khugepaged: alloc_charge_hpage() take care of mem charge errors · 94c02ad7
      Peter Xu authored
      
      
      If memory charge failed, instead of returning the hpage but with an error,
      allow the function to cleanup the folio properly, which is normally what a
      function should do in this case - either return successfully, or return
      with no side effect of partial runs with an indicated error.
      
      This will also avoid the caller calling mem_cgroup_uncharge()
      unnecessarily with either anon or shmem path (even if it's safe to do so).
      
      Link: https://lkml.kernel.org/r/20230222195247.791227-1-peterx@redhat.com
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarDavid Stevens <stevensd@chromium.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Reviewed-by: default avatarZach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      94c02ad7
    • Muchun Song's avatar
      mm: hugetlb_vmemmap: simplify hugetlb_vmemmap_init() a bit · 12318566
      Muchun Song authored
      
      
      The check of IS_ENABLED(CONFIG_PROC_SYSCTL) is unnecessary since
      register_sysctl_init() will be empty in this case.  So, there is no
      warnings after removing the check.
      
      Link: https://lkml.kernel.org/r/20230223065947.64134-1-songmuchun@bytedance.com
      Signed-off-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      12318566
    • Florian Fainelli's avatar
      mailmap: add an entry for Leonard Crestez · bdd034de
      Florian Fainelli authored
      
      
      Link: https://lkml.kernel.org/r/20230324130737.3360169-1-f.fainelli@gmail.com
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
      Cc: Colin Ian King <colin.i.king@gmail.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Kirill Tkhai <tkhai@ya.ru>
      Cc: Konrad Dybcio <konrad.dybcio@linaro.org>
      Cc: Leonard Crestez <cdleonard@gmail.com>
      Cc: Qais Yousef <qyousef@layalina.io>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Vasily Averin <vasily.averin@linux.dev>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      bdd034de
    • Muchun Song's avatar
      mm: kfence: fix handling discontiguous page · 1f2803b2
      Muchun Song authored
      The struct pages could be discontiguous when the kfence pool is allocated
      via alloc_contig_pages() with CONFIG_SPARSEMEM and
      !CONFIG_SPARSEMEM_VMEMMAP.
      
      This may result in setting PG_slab and memcg_data to a arbitrary
      address (may be not used as a struct page), which in the worst case
      might corrupt the kernel.
      
      So the iteration should use nth_page().
      
      Link: https://lkml.kernel.org/r/20230323025003.94447-1-songmuchun@bytedance.com
      Fixes: 0ce20dd8
      
       ("mm: add Kernel Electric-Fence infrastructure")
      Signed-off-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Reviewed-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: SeongJae Park <sjpark@amazon.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1f2803b2
    • Muchun Song's avatar
      mm: kfence: fix PG_slab and memcg_data clearing · 3ee2d747
      Muchun Song authored
      It does not reset PG_slab and memcg_data when KFENCE fails to initialize
      kfence pool at runtime.  It is reporting a "Bad page state" message when
      kfence pool is freed to buddy.  The checking of whether it is a compound
      head page seems unnecessary since we already guarantee this when
      allocating kfence pool.   Remove the check to simplify the code.
      
      Link: https://lkml.kernel.org/r/20230320030059.20189-1-songmuchun@bytedance.com
      Fixes: 0ce20dd8
      
       ("mm: add Kernel Electric-Fence infrastructure")
      Signed-off-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: SeongJae Park <sjpark@amazon.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3ee2d747
    • Shiyang Ruan's avatar
      fsdax: dedupe should compare the min of two iters' length · e900ba10
      Shiyang Ruan authored
      In an dedupe comparison iter loop, the length of iomap_iter decreases
      because it implies the remaining length after each iteration.
      
      The dedupe command will fail with -EIO if the range is larger than one 
      page size and not aligned to the page size.  Also report warning in dmesg:
      
      [ 4338.498374] ------------[ cut here ]------------
      [ 4338.498689] WARNING: CPU: 3 PID: 1415645 at fs/iomap/iter.c:16 
      ...
      
      The compare function should use the min length of the current iters,
      not the total length.
      
      Link: https://lkml.kernel.org/r/1679469958-2-1-git-send-email-ruansy.fnst@fujitsu.com
      Fixes: 0e79e373
      
       ("fsdax: dedupe: iter two files at the same time")
      Signed-off-by: default avatarShiyang Ruan <ruansy.fnst@fujitsu.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e900ba10
    • Shiyang Ruan's avatar
      fsdax: unshare: zero destination if srcmap is HOLE or UNWRITTEN · 13dd4e04
      Shiyang Ruan authored
      unshare copies data from source to destination.  But if the source is
      HOLE or UNWRITTEN extents, we should zero the destination, otherwise
      the HOLE or UNWRITTEN part will be user-visible old data of the new
      allocated extent.
      
      Found by running generic/649 while mounting with -o dax=always on pmem.
      
      Link: https://lkml.kernel.org/r/1679483469-2-1-git-send-email-ruansy.fnst@fujitsu.com
      Fixes: d984648e
      
       ("fsdax,xfs: port unshare to fsdax")
      Signed-off-by: default avatarShiyang Ruan <ruansy.fnst@fujitsu.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Darrick J. Wong <djwong@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      13dd4e04
    • Tiezhu Yang's avatar
      lib/Kconfig.debug: correct help info of LOCKDEP_STACK_TRACE_HASH_BITS · f478b998
      Tiezhu Yang authored
      We can see the following definition in kernel/locking/lockdep_internals.h:
      
        #define STACK_TRACE_HASH_SIZE	(1 << CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS)
      
      CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS is related with STACK_TRACE_HASH_SIZE
      instead of MAX_STACK_TRACE_ENTRIES, fix it.
      
      Link: https://lkml.kernel.org/r/1679380508-20830-1-git-send-email-yangtiezhu@loongson.cn
      Fixes: 5dc33592
      
       ("lockdep: Allow tuning tracing capacity constants.")
      Signed-off-by: default avatarTiezhu Yang <yangtiezhu@loongson.cn>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f478b998
    • ye xingchen's avatar
      Kconfig.debug: fix SCHED_DEBUG dependency · 35260cf5
      ye xingchen authored
      
      
      The path for SCHED_DEBUG is /sys/kernel/debug/sched.  So, SCHED_DEBUG
      should depend on DEBUG_FS, not PROC_FS.
      
      Link: https://lkml.kernel.org/r/202301291110098787982@zte.com.cn
      Signed-off-by: default avatarye xingchen <ye.xingchen@zte.com.cn>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Geert Uytterhoeven <geert+renesas@glider.be>
      Cc: Josh Poimboeuf <jpoimboe@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      35260cf5
    • Leonard Göhrs's avatar
      .mailmap: add entry for Leonard Göhrs · 1a4b52ce
      Leonard Göhrs authored
      My very first kernel commit:
      
        e4e1d47c
      
       ("ALSA: ppc: remove redundant checks in PS3 driver probe")
      
      was sent with the umlaut in my last name transcribed (Göhrs -> Goehrs).
      
      Add a mailmap entry so all my commits use the same name.
      
      Link: https://lkml.kernel.org/r/20230321145525.1317230-1-l.goehrs@pengutronix.de
      Signed-off-by: default avatarLeonard Göhrs <l.goehrs@pengutronix.de>
      Acked-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1a4b52ce
  2. Mar 27, 2023
    • Linus Torvalds's avatar
      Linux 6.3-rc4 · 197b6b60
      Linus Torvalds authored
      v6.3-rc4
      197b6b60
    • Linus Torvalds's avatar
      Merge tag 'usb-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 0ec57cfa
      Linus Torvalds authored
      Pull USB / Thunderbolt driver fixes from Greg KH:
       "Here are a small set of USB and Thunderbolt driver fixes for reported
        problems and a documentation update, for 6.3-rc4.
      
        Included in here are:
      
         - documentation update for uvc gadget driver
      
         - small thunderbolt driver fixes
      
         - cdns3 driver fixes
      
         - dwc3 driver fixes
      
         - dwc2 driver fixes
      
         - chipidea driver fixes
      
         - typec driver fixes
      
         - onboard_usb_hub device id updates
      
         - quirk updates
      
        All of these have been in linux-next with no reported problems"
      
      * tag 'usb-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (30 commits)
        usb: dwc2: fix a race, don't power off/on phy for dual-role mode
        usb: dwc2: fix a devres leak in hw_enable upon suspend resume
        usb: chipidea: core: fix possible concurrent when switch role
        usb: chipdea: core: fix return -EINVAL if request role is the same with current role
        thunderbolt: Rename shadowed variables bit to interrupt_bit and auto_clear_bit
        thunderbolt: Disable interrupt auto clear for rings
        thunderbolt: Use const qualifier for `ring_interrupt_index`
        usb: gadget: Use correct endianness of the wLength field for WebUSB
        uas: Add US_FL_NO_REPORT_OPCODES for JMicron JMS583Gen 2
        usb: cdnsp: changes PCI Device ID to fix conflict with CNDS3 driver
        usb: cdns3: Fix issue with using incorrect PCI device function
        usb: cdnsp: Fixes issue with redundant Status Stage
        MAINTAINERS: make me a reviewer of USB/IP
        thunderbolt: Use scale field when allocating USB3 bandwidth
        thunderbolt: Limit USB3 bandwidth of certain Intel USB4 host routers
        thunderbolt: Call tb_check_quirks() after initializing adapters
        thunderbolt: Add missing UNSET_INBOUND_SBTX for retimer access
        thunderbolt: Fix memory leak in margining
        usb: dwc2: drd: fix inconsistent mode if role-switch-default-mode="host"
        docs: usb: Add documentation for the UVC Gadget
        ...
      0ec57cfa
    • Linus Torvalds's avatar
      Merge tag 'sched_urgent_for_v6.3_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 18940c88
      Linus Torvalds authored
      Pull scheduler fix from Borislav Petkov:
      
       - Fix a corner case where vruntime of a task is not being sanitized
      
      * tag 'sched_urgent_for_v6.3_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/fair: Sanitize vruntime of entity being migrated
      18940c88
    • Linus Torvalds's avatar
      Merge tag 'perf_urgent_for_v6.3_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 974fc943
      Linus Torvalds authored
      Pull perf fix from Borislav Petkov:
      
       - Properly clear perf event status tracking in the AMD perf event
         overflow handler
      
      * tag 'perf_urgent_for_v6.3_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86/amd/core: Always clear status for idx
      974fc943
    • Linus Torvalds's avatar
      Merge tag 'core_urgent_for_v6.3_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f6cdaeb0
      Linus Torvalds authored
      Pull core fixes from Borislav Petkov:
      
       - Do the delayed RCU wakeup for kthreads in the proper order so that
         former doesn't get ignored
      
       - A noinstr warning fix
      
      * tag 'core_urgent_for_v6.3_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        entry/rcu: Check TIF_RESCHED _after_ delayed RCU wake-up
        entry: Fix noinstr warning in __enter_from_user_mode()
      f6cdaeb0
    • Linus Torvalds's avatar
      Merge tag 'x86_urgent_for_v6.3_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 986c6374
      Linus Torvalds authored
      Pull x86 fixes from Borislav Petkov:
      
       - Add a AMX ptrace self test
      
       - Prevent a false-positive warning when retrieving the (invalid)
         address of dynamic FPU features in their init state which are not
         saved in init_fpstate at all
      
       - Randomize per-CPU entry areas only when KASLR is enabled
      
      * tag 'x86_urgent_for_v6.3_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        selftests/x86/amx: Add a ptrace test
        x86/fpu/xstate: Prevent false-positive warning in __copy_xstate_uabi_buf()
        x86/mm: Do not shuffle CPU entry areas without KASLR
      986c6374
  3. Mar 26, 2023
    • Linus Torvalds's avatar
      Merge tag 'smb3-client-fixes-6.3-rc3' of git://git.samba.org/sfrench/cifs-2.6 · 6485ac65
      Linus Torvalds authored
      Pull cifs client fixes from Steve French:
       "Twelve cifs/smb3 client fixes (most also for stable)
      
         - forced umount fix
      
         - fix for two perf regressions
      
         - reconnect fixes
      
         - small debugging improvements
      
         - multichannel fixes"
      
      * tag 'smb3-client-fixes-6.3-rc3' of git://git.samba.org/sfrench/cifs-2.6:
        smb3: fix unusable share after force unmount failure
        cifs: fix dentry lookups in directory handle cache
        smb3: lower default deferred close timeout to address perf regression
        cifs: fix missing unload_nls() in smb2_reconnect()
        cifs: avoid race conditions with parallel reconnects
        cifs: append path to open_enter trace event
        cifs: print session id while listing open files
        cifs: dump pending mids for all channels in DebugData
        cifs: empty interface list when server doesn't support query interfaces
        cifs: do not poll server interfaces too regularly
        cifs: lock chan_lock outside match_session
        cifs: check only tcon status on tcon related functions
      6485ac65
    • Linus Torvalds's avatar
      Merge tag 'nfsd-6.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux · da8e7da1
      Linus Torvalds authored
      Pull nfsd fix from Chuck Lever:
      
       - Fix a crash when using NFS with krb5p
      
      * tag 'nfsd-6.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
        SUNRPC: Fix a crash in gss_krb5_checksum()
      da8e7da1
    • Linus Torvalds's avatar
      Merge tag 'xfs-6.3-fixes-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 5b9ff397
      Linus Torvalds authored
      Pull yet more xfs bug fixes from Darrick Wong:
       "The first bugfix addresses a longstanding problem where we use the
        wrong file mapping cursors when trying to compute the speculative
        preallocation quantity. This has been causing sporadic crashes when
        alwayscow mode is engaged.
      
        The other two fixes correct minor problems in more recent changes.
      
         - Fix the new allocator tracepoints because git am mismerged the
           changes such that the trace_XXX got rebased to be in function YYY
           instead of XXX
      
         - Ensure that the perag AGFL_RESET state is consistent with whatever
           we've just read off the disk
      
         - Fix a bug where we used the wrong iext cursor during a write begin"
      
      * tag 'xfs-6.3-fixes-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: fix mismerged tracepoints
        xfs: clear incore AGFL_RESET state if it's not needed
        xfs: pass the correct cursor to xfs_iomap_prealloc_size
      5b9ff397
    • Linus Torvalds's avatar
      Merge tag 'xfs-6.3-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · f768b35a
      Linus Torvalds authored
      Pull xfs percpu counter fixes from Darrick Wong:
       "We discovered a filesystem summary counter corruption problem that was
        traced to cpu hot-remove racing with the call to percpu_counter_sum
        that sets the free block count in the superblock when writing it to
        disk. The root cause is that percpu_counter_sum doesn't cull from
        dying cpus and hence misses those counter values if the cpu shutdown
        hooks have not yet run to merge the values.
      
        I'm hoping this is a fairly painless fix to the problem, since the
        dying cpu mask should generally be empty. It's been in for-next for a
        week without any complaints from the bots.
      
         - Fix a race in the percpu counters summation code where the
           summation failed to add in the values for any CPUs that were dying
           but not yet dead. This fixes some minor discrepancies and incorrect
           assertions when running generic/650"
      
      * tag 'xfs-6.3-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        pcpcntr: remove percpu_counter_sum_all()
        fork: remove use of percpu_counter_sum_all
        pcpcntrs: fix dying cpu summation race
        cpumask: introduce for_each_cpu_or
      f768b35a
    • Linus Torvalds's avatar
      Merge tag 'xfs-6.3-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · d7044263
      Linus Torvalds authored
      Pull xfs fixes from Darrick Wong:
       "This batch started with some debugging enhancements to the new
        allocator refactoring that we put in 6.3-rc1 to assist developers in
        rebasing their dev branches.
      
        As for more serious code changes -- there's a bug fix to make the
        lockless allocator scan the whole filesystem before resorting to the
        locking allocator. We're also adding a selftest for the venerable
        directory/xattr hash function to make sure that it produces consistent
        results so that we can address any fallout as soon as possible.
      
         - Add a few debugging assertions so that people (me) trying to port
           code to the new allocator functions don't mess up the caller
           requirements
      
         - Relax some overly cautious lock ordering enforcement in the new
           allocator code, which means that file allocations will locklessly
           scan for the best space they can get before backing off to the
           traditional lock-and-really-get-it behavior
      
         - Add tracepoints to make it easier to trace the xfs allocator
           behavior
      
         - Actually test the dir/xattr hash algorithm to make sure it produces
           consistent results across all the platforms XFS supports"
      
      * tag 'xfs-6.3-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: test dir/attr hash when loading module
        xfs: add tracepoints for each of the externally visible allocators
        xfs: walk all AGs if TRYLOCK passed to xfs_alloc_vextent_iterate_ags
        xfs: try to idiot-proof the allocators
      d7044263
    • Linus Torvalds's avatar
      Merge tag 'hwmon-for-v6.3-rc4' of... · 4bdec23f
      Linus Torvalds authored
      Merge tag 'hwmon-for-v6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
      
      Pull hwmon fixes from Guenter Roeck:
      
       - it87: Fix voltage scaling for chips with 10.9mV ADCs
      
       - xgene: Fix ioremap and memremap leak
      
       - peci/cputemp: Fix miscalculated DTS temperature for SKX
      
       - hwmon core: fix potential sensor registration failure with thermal
         subsystem if of_node is missing
      
      * tag 'hwmon-for-v6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
        hwmon (it87): Fix voltage scaling for chips with 10.9mV  ADCs
        hwmon: (xgene) Fix ioremap and memremap leak
        hwmon: fix potential sensor registration fail if of_node is missing
        hwmon: (peci/cputemp) Fix miscalculated DTS for SKX
      4bdec23f
  4. Mar 25, 2023
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2023-03-24-17-09' of... · 65aca32e
      Linus Torvalds authored
      Merge tag 'mm-hotfixes-stable-2023-03-24-17-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
      
      Pull misc fixes from Andrew Morton:
       "21 hotfixes, 8 of which are cc:stable. 11 are for MM, the remainder
        are for other subsystems"
      
      * tag 'mm-hotfixes-stable-2023-03-24-17-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits)
        mm: mmap: remove newline at the end of the trace
        mailmap: add entries for Richard Leitner
        kcsan: avoid passing -g for test
        kfence: avoid passing -g for test
        mm: kfence: fix using kfence_metadata without initialization in show_object()
        lib: dhry: fix unstable smp_processor_id(_) usage
        mailmap: add entry for Enric Balletbo i Serra
        mailmap: map Sai Prakash Ranjan's old address to his current one
        mailmap: map Rajendra Nayak's old address to his current one
        Revert "kasan: drop skip_kasan_poison variable in free_pages_prepare"
        mailmap: add entry for Tobias Klauser
        kasan, powerpc: don't rename memintrinsics if compiler adds prefixes
        mm/ksm: fix race with VMA iteration and mm_struct teardown
        kselftest: vm: fix unused variable warning
        mm: fix error handling for map_deny_write_exec
        mm: deduplicate error handling for map_deny_write_exec
        checksyscalls: ignore fstat to silence build warning on LoongArch
        nilfs2: fix kernel-infoleak in nilfs_ioctl_wrap_copy()
        test_maple_tree: add more testing for mas_empty_area()
        maple_tree: fix mas_skip_node() end slot detection
        ...
      65aca32e