Skip to content
  1. Sep 09, 2021
    • Mike Rapoport's avatar
      nds32/setup: remove unused memblock_region variable in setup_memory() · ddb13122
      Mike Rapoport authored
      
      
      kernel test robot reports unused variable warning:
      
         arch/nds32/kernel/setup.c:247:26: warning: Unused variable: region
         [unusedVariable]
          struct memblock_region *region;
                                  ^
      
      Remove the unused variable.
      
      Link: https://lkml.kernel.org/r/20210712125218.28951-1-rppt@kernel.org
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Reviewed-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ddb13122
    • yanghui's avatar
      mm/mempolicy: fix a race between offset_il_node and mpol_rebind_task · 276aeee1
      yanghui authored
      
      
      Servers happened below panic:
      
        Kernel version:5.4.56
        BUG: unable to handle page fault for address: 0000000000002c48
        RIP: 0010:__next_zones_zonelist+0x1d/0x40
        Call Trace:
          __alloc_pages_nodemask+0x277/0x310
          alloc_page_interleave+0x13/0x70
          handle_mm_fault+0xf99/0x1390
          __do_page_fault+0x288/0x500
          do_page_fault+0x30/0x110
          page_fault+0x3e/0x50
      
      The reason for the panic is that MAX_NUMNODES is passed in the third
      parameter in __alloc_pages_nodemask(preferred_nid).  So access to
      zonelist->zoneref->zone_idx in __next_zones_zonelist will cause a panic.
      
      In offset_il_node(), first_node() returns nid from pol->v.nodes, after
      this other threads may chang pol->v.nodes before next_node().  This race
      condition will let next_node return MAX_NUMNODES.  So put pol->nodes in
      a local variable.
      
      The race condition is between offset_il_node and cpuset_change_task_nodemask:
      
        CPU0:                                     CPU1:
        alloc_pages_vma()
          interleave_nid(pol,)
            offset_il_node(pol,)
              first_node(pol->v.nodes)            cpuset_change_task_nodemask
                              //nodes==0xc          mpol_rebind_task
                                                      mpol_rebind_policy
                                                        mpol_rebind_nodemask(pol,nodes)
                              //nodes==0x3
              next_node(nid, pol->v.nodes)//return MAX_NUMNODES
      
      Link: https://lkml.kernel.org/r/20210906034658.48721-1-yanghui.def@bytedance.com
      Signed-off-by: default avataryanghui <yanghui.def@bytedance.com>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      276aeee1
    • Naohiro Aota's avatar
      mm/kmemleak: allow __GFP_NOLOCKDEP passed to kmemleak's gfp · 79d37050
      Naohiro Aota authored
      In a memory pressure situation, I'm seeing the lockdep WARNING below.
      Actually, this is similar to a known false positive which is already
      addressed by commit 6dcde60e
      
       ("xfs: more lockdep whackamole with
      kmem_alloc*").
      
      This warning still persists because it's not from kmalloc() itself but
      from an allocation for kmemleak object.  While kmalloc() itself suppress
      the warning with __GFP_NOLOCKDEP, gfp_kmemleak_mask() is dropping the
      flag for the kmemleak's allocation.
      
      Allow __GFP_NOLOCKDEP to be passed to kmemleak's allocation, so that the
      warning for it is also suppressed.
      
        ======================================================
        WARNING: possible circular locking dependency detected
        5.14.0-rc7-BTRFS-ZNS+ #37 Not tainted
        ------------------------------------------------------
        kswapd0/288 is trying to acquire lock:
        ffff88825ab45df0 (&xfs_nondir_ilock_class){++++}-{3:3}, at: xfs_ilock+0x8a/0x250
      
        but task is already holding lock:
        ffffffff848cc1e0 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #1 (fs_reclaim){+.+.}-{0:0}:
               fs_reclaim_acquire+0x112/0x160
               kmem_cache_alloc+0x48/0x400
               create_object.isra.0+0x42/0xb10
               kmemleak_alloc+0x48/0x80
               __kmalloc+0x228/0x440
               kmem_alloc+0xd3/0x2b0
               kmem_alloc_large+0x5a/0x1c0
               xfs_attr_copy_value+0x112/0x190
               xfs_attr_shortform_getvalue+0x1fc/0x300
               xfs_attr_get_ilocked+0x125/0x170
               xfs_attr_get+0x329/0x450
               xfs_get_acl+0x18d/0x430
               get_acl.part.0+0xb6/0x1e0
               posix_acl_xattr_get+0x13a/0x230
               vfs_getxattr+0x21d/0x270
               getxattr+0x126/0x310
               __x64_sys_fgetxattr+0x1a6/0x2a0
               do_syscall_64+0x3b/0x90
               entry_SYSCALL_64_after_hwframe+0x44/0xae
      
        -> #0 (&xfs_nondir_ilock_class){++++}-{3:3}:
               __lock_acquire+0x2c0f/0x5a00
               lock_acquire+0x1a1/0x4b0
               down_read_nested+0x50/0x90
               xfs_ilock+0x8a/0x250
               xfs_can_free_eofblocks+0x34f/0x570
               xfs_inactive+0x411/0x520
               xfs_fs_destroy_inode+0x2c8/0x710
               destroy_inode+0xc5/0x1a0
               evict+0x444/0x620
               dispose_list+0xfe/0x1c0
               prune_icache_sb+0xdc/0x160
               super_cache_scan+0x31e/0x510
               do_shrink_slab+0x337/0x8e0
               shrink_slab+0x362/0x5c0
               shrink_node+0x7a7/0x1a40
               balance_pgdat+0x64e/0xfe0
               kswapd+0x590/0xa80
               kthread+0x38c/0x460
               ret_from_fork+0x22/0x30
      
        other info that might help us debug this:
         Possible unsafe locking scenario:
               CPU0                    CPU1
               ----                    ----
          lock(fs_reclaim);
                                       lock(&xfs_nondir_ilock_class);
                                       lock(fs_reclaim);
          lock(&xfs_nondir_ilock_class);
      
         *** DEADLOCK ***
        3 locks held by kswapd0/288:
         #0: ffffffff848cc1e0 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30
         #1: ffffffff848a08d8 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0x269/0x5c0
         #2: ffff8881a7a820e8 (&type->s_umount_key#60){++++}-{3:3}, at: super_cache_scan+0x5a/0x510
      
      Link: https://lkml.kernel.org/r/20210907055659.3182992-1-naohiro.aota@wdc.com
      Signed-off-by: default avatarNaohiro Aota <naohiro.aota@wdc.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: "Darrick J . Wong" <djwong@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      79d37050
    • Liam Howlett's avatar
      mmap_lock: change trace and locking order · 10994316
      Liam Howlett authored
      
      
      Print to the trace log before releasing the lock to avoid racing with
      other trace log printers of the same lock type.
      
      Link: https://lkml.kernel.org/r/20210903022041.1843024-1-Liam.Howlett@oracle.com
      Signed-off-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Suggested-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Michel Lespinasse <walken.cr@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      10994316
    • Miaohe Lin's avatar
      mm/page_alloc.c: avoid accessing uninitialized pcp page migratetype · 053cfda1
      Miaohe Lin authored
      If it's not prepared to free unref page, the pcp page migratetype is
      unset.  Thus we will get rubbish from get_pcppage_migratetype() and
      might list_del(&page->lru) again after it's already deleted from the list
      leading to grumble about data corruption.
      
      Link: https://lkml.kernel.org/r/20210902115447.57050-1-linmiaohe@huawei.com
      Fixes: df1acc85
      
       ("mm/page_alloc: avoid conflating IRQs disabled with zone->lock")
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      053cfda1
    • Rik van Riel's avatar
      mm,vmscan: fix divide by zero in get_scan_count · 32d4f4b7
      Rik van Riel authored
      Commit f56ce412 ("mm: memcontrol: fix occasional OOMs due to
      proportional memory.low reclaim") introduced a divide by zero corner
      case when oomd is being used in combination with cgroup memory.low
      protection.
      
      When oomd decides to kill a cgroup, it will force the cgroup memory to
      be reclaimed after killing the tasks, by writing to the memory.max file
      for that cgroup, forcing the remaining page cache and reclaimable slab
      to be reclaimed down to zero.
      
      Previously, on cgroups with some memory.low protection that would result
      in the memory being reclaimed down to the memory.low limit, or likely
      not at all, having the page cache reclaimed asynchronously later.
      
      With f56ce412 the oomd write to memory.max tries to reclaim all the
      way down to zero, which may race with another reclaimer, to the point of
      ending up with the divide by zero below.
      
      This patch implements the obvious fix.
      
      Link: https://lkml.kernel.org/r/20210826220149.058089c6@imladris.surriel.com
      Fixes: f56ce412
      
       ("mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim")
      Signed-off-by: default avatarRik van Riel <riel@surriel.com>
      Acked-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarChris Down <chris@chrisdown.name>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      32d4f4b7
    • Liu Zixian's avatar
      mm/hugetlb: initialize hugetlb_usage in mm_init · 13db8c50
      Liu Zixian authored
      After fork, the child process will get incorrect (2x) hugetlb_usage.  If
      a process uses 5 2MB hugetlb pages in an anonymous mapping,
      
      	HugetlbPages:	   10240 kB
      
      and then forks, the child will show,
      
      	HugetlbPages:	   20480 kB
      
      The reason for double the amount is because hugetlb_usage will be copied
      from the parent and then increased when we copy page tables from parent
      to child.  Child will have 2x actual usage.
      
      Fix this by adding hugetlb_count_init in mm_init.
      
      Link: https://lkml.kernel.org/r/20210826071742.877-1-liuzixian4@huawei.com
      Fixes: 5d317b2b
      
       ("mm: hugetlb: proc: add HugetlbPages field to /proc/PID/status")
      Signed-off-by: default avatarLiu Zixian <liuzixian4@huawei.com>
      Reviewed-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      13db8c50
    • Li Zhijian's avatar
      mm/hmm: bypass devmap pte when all pfn requested flags are fulfilled · 4b42fb21
      Li Zhijian authored
      Previously, we noticed the one rpma example was failed[1] since commit
      36f30e48 ("IB/core: Improve ODP to use hmm_range_fault()"), where it
      will use ODP feature to do RDMA WRITE between fsdax files.
      
      After digging into the code, we found hmm_vma_handle_pte() will still
      return EFAULT even though all the its requesting flags has been
      fulfilled.  That's because a DAX page will be marked as (_PAGE_SPECIAL |
      PAGE_DEVMAP) by pte_mkdevmap().
      
      Link: https://github.com/pmem/rpma/issues/1142 [1]
      Link: https://lkml.kernel.org/r/20210830094232.203029-1-lizhijian@cn.fujitsu.com
      Fixes: 40550627
      
       ("mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling")
      Signed-off-by: default avatarLi Zhijian <lizhijian@cn.fujitsu.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4b42fb21
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 2d338201
      Linus Torvalds authored
      Merge more updates from Andrew Morton:
       "147 patches, based on 7d2a07b7.
      
        Subsystems affected by this patch series: mm (memory-hotplug, rmap,
        ioremap, highmem, cleanups, secretmem, kfence, damon, and vmscan),
        alpha, percpu, procfs, misc, core-kernel, MAINTAINERS, lib,
        checkpatch, epoll, init, nilfs2, coredump, fork, pids, criu, kconfig,
        selftests, ipc, and scripts"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (94 commits)
        scripts: check_extable: fix typo in user error message
        mm/workingset: correct kernel-doc notations
        ipc: replace costly bailout check in sysvipc_find_ipc()
        selftests/memfd: remove unused variable
        Kconfig.debug: drop selecting non-existing HARDLOCKUP_DETECTOR_ARCH
        configs: remove the obsolete CONFIG_INPUT_POLLDEV
        prctl: allow to setup brk for et_dyn executables
        pid: cleanup the stale comment mentioning pidmap_init().
        kernel/fork.c: unexport get_{mm,task}_exe_file
        coredump: fix memleak in dump_vma_snapshot()
        fs/coredump.c: log if a core dump is aborted due to changed file permissions
        nilfs2: use refcount_dec_and_lock() to fix potential UAF
        nilfs2: fix memory leak in nilfs_sysfs_delete_snapshot_group
        nilfs2: fix memory leak in nilfs_sysfs_create_snapshot_group
        nilfs2: fix memory leak in nilfs_sysfs_delete_##name##_group
        nilfs2: fix memory leak in nilfs_sysfs_create_##name##_group
        nilfs2: fix NULL pointer in nilfs_##name##_attr_release
        nilfs2: fix memory leak in nilfs_sysfs_create_device_group
        trap: cleanup trap_init()
        init: move usermodehelper_enable() to populate_rootfs()
        ...
      2d338201
    • Linus Torvalds's avatar
      Merge tag 'mm-slub-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux · cc09ee80
      Linus Torvalds authored
      Pull SLUB updates from Vlastimil Babka:
       "SLUB: reduce irq disabled scope and make it RT compatible
      
        This series was initially inspired by Mel's pcplist local_lock
        rewrite, and also interest to better understand SLUB's locking and the
        new primitives and RT variants and implications. It makes SLUB
        compatible with PREEMPT_RT and generally more preemption-friendly,
        apparently without significant regressions, as the fast paths are not
        affected.
      
        The main changes to SLUB by this series:
      
         - irq disabling is now only done for minimum amount of time needed to
           protect the strict kmem_cache_cpu fields, and as part of spin lock,
           local lock and bit lock operations to make them irq-safe
      
         - SLUB is fully PREEMPT_RT compatible
      
        The series should now be sufficiently tested in both RT and !RT
        configs, mainly thanks to Mike.
      
        The RFC/v1 version also got basic performance screening by Mel that
        didn't show major regressions. Mike's testing with hackbench of v2 on
        !RT reported negligible differences [6]:
      
          virgin(ish) tip
          5.13.0.g60ab3ed-tip
                    7,320.67 msec task-clock                #    7.792 CPUs utilized            ( +-  0.31% )
                     221,215      context-switches          #    0.030 M/sec                    ( +-  3.97% )
                      16,234      cpu-migrations            #    0.002 M/sec                    ( +-  4.07% )
                      13,233      page-faults               #    0.002 M/sec                    ( +-  0.91% )
              27,592,205,252      cycles                    #    3.769 GHz                      ( +-  0.32% )
               8,309,495,040      instructions              #    0.30  insn per cycle           ( +-  0.37% )
               1,555,210,607      branches                  #  212.441 M/sec                    ( +-  0.42% )
                   5,484,209      branch-misses             #    0.35% of all branches          ( +-  2.13% )
      
                     0.93949 +- 0.00423 seconds time elapsed  ( +-  0.45% )
                     0.94608 +- 0.00384 seconds time elapsed  ( +-  0.41% ) (repeat)
                     0.94422 +- 0.00410 seconds time elapsed  ( +-  0.43% )
      
          5.13.0.g60ab3ed-tip +slub-local-lock-v2r3
                    7,343.57 msec task-clock                #    7.776 CPUs utilized            ( +-  0.44% )
                     223,044      context-switches          #    0.030 M/sec                    ( +-  3.02% )
                      16,057      cpu-migrations            #    0.002 M/sec                    ( +-  4.03% )
                      13,164      page-faults               #    0.002 M/sec                    ( +-  0.97% )
              27,684,906,017      cycles                    #    3.770 GHz                      ( +-  0.45% )
               8,323,273,871      instructions              #    0.30  insn per cycle           ( +-  0.28% )
               1,556,106,680      branches                  #  211.901 M/sec                    ( +-  0.31% )
                   5,463,468      branch-misses             #    0.35% of all branches          ( +-  1.33% )
      
                     0.94440 +- 0.00352 seconds time elapsed  ( +-  0.37% )
                     0.94830 +- 0.00228 seconds time elapsed  ( +-  0.24% ) (repeat)
                     0.93813 +- 0.00440 seconds time elapsed  ( +-  0.47% ) (repeat)
      
        RT configs showed some throughput regressions, but that's expected
        tradeoff for the preemption improvements through the RT mutex. It
        didn't prevent the v2 to be incorporated to the 5.13 RT tree [7],
        leading to testing exposure and bugfixes.
      
        Before the series, SLUB is lockless in both allocation and free fast
        paths, but elsewhere, it's disabling irqs for considerable periods of
        time - especially in allocation slowpath and the bulk allocation,
        where IRQs are re-enabled only when a new page from the page allocator
        is needed, and the context allows blocking. The irq disabled sections
        can then include deactivate_slab() which walks a full freelist and
        frees the slab back to page allocator or unfreeze_partials() going
        through a list of percpu partial slabs. The RT tree currently has some
        patches mitigating these, but we can do much better in mainline too.
      
        Patches 1-6 are straightforward improvements or cleanups that could
        exist outside of this series too, but are prerequsities.
      
        Patches 7-9 are also preparatory code changes without functional
        changes, but not so useful without the rest of the series.
      
        Patch 10 simplifies the fast paths on systems with preemption, based
        on (hopefully correct) observation that the current loops to verify
        tid are unnecessary.
      
        Patches 11-20 focus on reducing irq disabled scope in the allocation
        slowpath:
      
         - patch 11 moves disabling of irqs into ___slab_alloc() from its
           callers, which are the allocation slowpath, and bulk allocation.
           Instead these callers only disable preemption to stabilize the cpu.
      
         - The following patches then gradually reduce the scope of disabled
           irqs in ___slab_alloc() and the functions called from there. As of
           patch 14, the re-enabling of irqs based on gfp flags before calling
           the page allocator is removed from allocate_slab(). As of patch 17,
           it's possible to reach the page allocator (in case of existing
           slabs depleted) without disabling and re-enabling irqs a single
           time.
      
        Pathces 21-26 reduce the scope of disabled irqs in functions related
        to unfreezing percpu partial slab.
      
        Patch 27 is preparatory. Patch 28 is adopted from the RT tree and
        converts the flushing of percpu slabs on all cpus from using IPI to
        workqueue, so that the processing isn't happening with irqs disabled
        in the IPI handler. The flushing is not performance critical so it
        should be acceptable.
      
        Patch 29 also comes from RT tree and makes object_map_lock RT
        compatible.
      
        Patch 30 make slab_lock irq-safe on RT where we cannot rely on having
        irq disabled from the list_lock spin lock usage.
      
        Patch 31 changes kmem_cache_cpu->partial handling in put_cpu_partial()
        from cmpxchg loop to a short irq disabled section, which is used by
        all other code modifying the field. This addresses a theoretical race
        scenario pointed out by Jann, and makes the critical section safe wrt
        with RT local_lock semantics after the conversion in patch 35.
      
        Patch 32 changes preempt disable to migrate disable, so that the
        nested list_lock spinlock is safe to take on RT. Because
        migrate_disable() is a function call even on !RT, a small set of
        private wrappers is introduced to keep using the cheaper
        preempt_disable() on !PREEMPT_RT configurations. As of this patch,
        SLUB should be already compatible with RT's lock semantics.
      
        Finally, patch 33 changes irq disabled sections that protect
        kmem_cache_cpu fields in the slow paths, with a local lock. However on
        PREEMPT_RT it means the lockless fast paths can now preempt slow paths
        which don't expect that, so the local lock has to be taken also in the
        fast paths and they are no longer lockless. RT folks seem to not mind
        this tradeoff. The patch also updates the locking documentation in the
        file's comment"
      
      Mike Galbraith and Mel Gorman verified that their earlier testing
      observations still hold for the final series:
      
      Link: https://lore.kernel.org/lkml/89ba4f783114520c167cc915ba949ad2c04d6790.camel@gmx.de/
      Link: https://lore.kernel.org/lkml/20210907082010.GB3959@techsingularity.net/
      
      * tag 'mm-slub-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux: (33 commits)
        mm, slub: convert kmem_cpu_slab protection to local_lock
        mm, slub: use migrate_disable() on PREEMPT_RT
        mm, slub: protect put_cpu_partial() with disabled irqs instead of cmpxchg
        mm, slub: make slab_lock() disable irqs with PREEMPT_RT
        mm: slub: make object_map_lock a raw_spinlock_t
        mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context
        mm, slab: split out the cpu offline variant of flush_slab()
        mm, slub: don't disable irqs in slub_cpu_dead()
        mm, slub: only disable irq with spin_lock in __unfreeze_partials()
        mm, slub: separate detaching of partial list in unfreeze_partials() from unfreezing
        mm, slub: detach whole partial list at once in unfreeze_partials()
        mm, slub: discard slabs in unfreeze_partials() without irqs disabled
        mm, slub: move irq control into unfreeze_partials()
        mm, slub: call deactivate_slab() without disabling irqs
        mm, slub: make locking in deactivate_slab() irq-safe
        mm, slub: move reset of c->page and freelist out of deactivate_slab()
        mm, slub: stop disabling irqs around get_partial()
        mm, slub: check new pages with restored irqs
        mm, slub: validate slab from partial list or page allocator before making it cpu slab
        mm, slub: restore irqs around calling new_slab()
        ...
      cc09ee80
    • Randy Dunlap's avatar
      scripts: check_extable: fix typo in user error message · b285437d
      Randy Dunlap authored
      
      
      Fix typo ("and" should be "an") in an error message.
      
      Link: https://lkml.kernel.org/r/20210727002943.29774-1-rdunlap@infradead.org
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b285437d
    • Randy Dunlap's avatar
      mm/workingset: correct kernel-doc notations · 560a8705
      Randy Dunlap authored
      
      
      Use the documented kernel-doc format to prevent kernel-doc warnings.
      
      mm/workingset.c:256: warning: No description found for return value of 'workingset_eviction'
      mm/workingset.c:285: warning: Function parameter or member 'folio' not described in 'workingset_refault'
      mm/workingset.c:285: warning: Excess function parameter 'page' description in 'workingset_refault'
      
      Link: https://lkml.kernel.org/r/20210808203153.10678-1-rdunlap@infradead.org
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      560a8705
    • Rafael Aquini's avatar
      ipc: replace costly bailout check in sysvipc_find_ipc() · 20401d10
      Rafael Aquini authored
      sysvipc_find_ipc() was left with a costly way to check if the offset
      position fed to it is bigger than the total number of IPC IDs in use.  So
      much so that the time it takes to iterate over /proc/sysvipc/* files grows
      exponentially for a custom benchmark that creates "N" SYSV shm segments
      and then times the read of /proc/sysvipc/shm (milliseconds):
      
          12 msecs to read   1024 segs from /proc/sysvipc/shm
          18 msecs to read   2048 segs from /proc/sysvipc/shm
          65 msecs to read   4096 segs from /proc/sysvipc/shm
         325 msecs to read   8192 segs from /proc/sysvipc/shm
        1303 msecs to read  16384 segs from /proc/sysvipc/shm
        5182 msecs to read  32768 segs from /proc/sysvipc/shm
      
      The root problem lies with the loop that computes the total amount of ids
      in use to check if the "pos" feeded to sysvipc_find_ipc() grew bigger than
      "ids->in_use".  That is a quite inneficient way to get to the maximum
      index in the id lookup table, specially when that value is already
      provided by struct ipc_ids.max_idx.
      
      This patch follows up on the optimization introduced via commit
      15df03c8
      
       ("sysvipc: make get_maxid O(1) again") and gets rid of the
      aforementioned costly loop replacing it by a simpler checkpoint based on
      ipc_get_maxidx() returned value, which allows for a smooth linear increase
      in time complexity for the same custom benchmark:
      
           2 msecs to read   1024 segs from /proc/sysvipc/shm
           2 msecs to read   2048 segs from /proc/sysvipc/shm
           4 msecs to read   4096 segs from /proc/sysvipc/shm
           9 msecs to read   8192 segs from /proc/sysvipc/shm
          19 msecs to read  16384 segs from /proc/sysvipc/shm
          39 msecs to read  32768 segs from /proc/sysvipc/shm
      
      Link: https://lkml.kernel.org/r/20210809203554.1562989-1-aquini@redhat.com
      Signed-off-by: default avatarRafael Aquini <aquini@redhat.com>
      Acked-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Acked-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Cc: Waiman Long <llong@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      20401d10
    • Greg Thelen's avatar
      selftests/memfd: remove unused variable · d42990f4
      Greg Thelen authored
      Commit 54402986 ("selftests/memfd: add tests for F_SEAL_FUTURE_WRITE
      seal") added an unused variable to mfd_assert_reopen_fd().
      
      Delete the unused variable.
      
      Link: https://lkml.kernel.org/r/20210702045509.1517643-1-gthelen@google.com
      Fixes: 54402986
      
       ("selftests/memfd: add tests for F_SEAL_FUTURE_WRITE seal")
      Signed-off-by: default avatarGreg Thelen <gthelen@google.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d42990f4
    • Lukas Bulwahn's avatar
      Kconfig.debug: drop selecting non-existing HARDLOCKUP_DETECTOR_ARCH · 6fe26259
      Lukas Bulwahn authored
      Commit 05a4a952 ("kernel/watchdog: split up config options") adds a
      new config HARDLOCKUP_DETECTOR, which selects the non-existing config
      HARDLOCKUP_DETECTOR_ARCH.
      
      Hence, ./scripts/checkkconfigsymbols.py warns:
      
      HARDLOCKUP_DETECTOR_ARCH Referencing files: lib/Kconfig.debug
      
      Simply drop selecting the non-existing HARDLOCKUP_DETECTOR_ARCH.
      
      Link: https://lkml.kernel.org/r/20210806115618.22088-1-lukas.bulwahn@gmail.com
      Fixes: 05a4a952
      
       ("kernel/watchdog: split up config options")
      Signed-off-by: default avatarLukas Bulwahn <lukas.bulwahn@gmail.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Masahiro Yamada <masahiroy@kernel.org>
      Cc: Babu Moger <babu.moger@oracle.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6fe26259
    • Zenghui Yu's avatar
      configs: remove the obsolete CONFIG_INPUT_POLLDEV · 4cb398fe
      Zenghui Yu authored
      This CONFIG option was removed in commit 278b13ce
      
       ("Input: remove
      input_polled_dev implementation") so there's no point to keep it in
      defconfigs any longer.
      
      Get rid of the leftover for all arches.
      
      Link: https://lkml.kernel.org/r/20210726074741.1062-1-yuzenghui@huawei.com
      Signed-off-by: default avatarZenghui Yu <yuzenghui@huawei.com>
      Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4cb398fe
    • Cyrill Gorcunov's avatar
      prctl: allow to setup brk for et_dyn executables · e1fbbd07
      Cyrill Gorcunov authored
      Keno Fischer reported that when a binray loaded via ld-linux-x the
      prctl(PR_SET_MM_MAP) doesn't allow to setup brk value because it lays
      before mm:end_data.
      
      For example a test program shows
      
       | # ~/t
       |
       | start_code      401000
       | end_code        401a15
       | start_stack     7ffce4577dd0
       | start_data	   403e10
       | end_data        40408c
       | start_brk	   b5b000
       | sbrk(0)         b5b000
      
      and when executed via ld-linux
      
       | # /lib64/ld-linux-x86-64.so.2 ~/t
       |
       | start_code      7fc25b0a4000
       | end_code        7fc25b0c4524
       | start_stack     7fffcc6b2400
       | start_data	   7fc25b0ce4c0
       | end_data        7fc25b0cff98
       | start_brk	   55555710c000
       | sbrk(0)         55555710c000
      
      This of course prevent criu from restoring such programs.  Looking into
      how kernel operates with brk/start_brk inside brk() syscall I don't see
      any problem if we allow to setup brk/start_brk without checking for
      end_data.  Even if someone pass some weird address here on a purpose then
      the worst possible result will be an unexpected unmapping of existing vma
      (own vma, since prctl works with the callers memory) but test for
      RLIMIT_DATA is still valid and a user won't be able to gain more memory in
      case of expanding VMAs via new values shipped with prctl call.
      
      Link: https://lkml.kernel.org/r/20210121221207.GB2174@grain
      Fixes: bbdc6076
      
       ("binfmt_elf: move brk out of mmap when doing direct loader exec")
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@gmail.com>
      Reported-by: default avatarKeno Fischer <keno@juliacomputing.com>
      Acked-by: default avatarAndrey Vagin <avagin@gmail.com>
      Tested-by: default avatarAndrey Vagin <avagin@gmail.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
      Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e1fbbd07
    • Takahiro Itazuri's avatar
      pid: cleanup the stale comment mentioning pidmap_init(). · 5b91a75b
      Takahiro Itazuri authored
      pidmap_init() has already been replaced with pid_idr_init() in the commit
      95846ecf
      
       ("pid: replace pid bitmap implementation with IDR API").
      Cleanup the stale comment which still mentions it.
      
      Link: https://lkml.kernel.org/r/20210714120713.19825-1-itazur@amazon.com
      Signed-off-by: default avatarTakahiro Itazuri <itazur@amazon.com>
      Cc: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5b91a75b
    • Christoph Hellwig's avatar
      kernel/fork.c: unexport get_{mm,task}_exe_file · 05da8113
      Christoph Hellwig authored
      
      
      Only used by core code and the tomoyo which can't be a module either.
      
      Link: https://lkml.kernel.org/r/20210820095430.445242-1-hch@lst.de
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      05da8113
    • QiuXi's avatar
      coredump: fix memleak in dump_vma_snapshot() · 6fcac87e
      QiuXi authored
      dump_vma_snapshot() allocs memory for *vma_meta, when dump_vma_snapshot()
      returns -EFAULT, the memory will be leaked, so we free it correctly.
      
      Link: https://lkml.kernel.org/r/20210810020441.62806-1-qiuxi1@huawei.com
      Fixes: a07279c9
      
       ("binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot")
      Signed-off-by: default avatarQiuXi <qiuxi1@huawei.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Jann Horn <jannh@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6fcac87e
    • David Oberhollenzer's avatar
      fs/coredump.c: log if a core dump is aborted due to changed file permissions · dbd9d6f8
      David Oberhollenzer authored
      
      
      For obvious security reasons, a core dump is aborted if the filesystem
      cannot preserve ownership or permissions of the dump file.
      
      This affects filesystems like e.g.  vfat, but also something like a 9pfs
      share in a Qemu test setup, running as a regular user, depending on the
      security model used.  In those cases, the result is an empty core file and
      a confused user.
      
      To hopefully save other people a lot of time figuring out the cause, this
      patch adds a simple log message for those specific cases.
      
      [akpm@linux-foundation.org: s/|%s/%s/ in printk text]
      
      Link: https://lkml.kernel.org/r/20210701233151.102720-1-david.oberhollenzer@sigma-star.at
      Signed-off-by: default avatarDavid Oberhollenzer <david.oberhollenzer@sigma-star.at>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dbd9d6f8
    • Zhen Lei's avatar
      nilfs2: use refcount_dec_and_lock() to fix potential UAF · 98e2e409
      Zhen Lei authored
      When the refcount is decreased to 0, the resource reclamation branch is
      entered.  Before CPU0 reaches the race point (1), CPU1 may obtain the
      spinlock and traverse the rbtree to find 'root', see
      nilfs_lookup_root().
      
      Although CPU1 will call refcount_inc() to increase the refcount, it is
      obviously too late.  CPU0 will release 'root' directly, CPU1 then
      accesses 'root' and triggers UAF.
      
      Use refcount_dec_and_lock() to ensure that both the operations of
      decrease refcount to 0 and link deletion are lock protected eliminates
      this risk.
      
      	     CPU0                      CPU1
      	nilfs_put_root():
      		    <-------- (1)
      				spin_lock(&nilfs->ns_cptree_lock);
      				rb_erase(&root->rb_node, &nilfs->ns_cptree);
      				spin_unlock(&nilfs->ns_cptree_lock);
      
      	kfree(root);
      		    <-------- use-after-free
      
        refcount_t: underflow; use-after-free.
        WARNING: CPU: 2 PID: 9476 at lib/refcount.c:28 \
        refcount_warn_saturate+0x1cf/0x210 lib/refcount.c:28
        Modules linked in:
        CPU: 2 PID: 9476 Comm: syz-executor.0 Not tainted 5.10.45-rc1+ #3
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ...
        RIP: 0010:refcount_warn_saturate+0x1cf/0x210 lib/refcount.c:28
        ... ...
        Call Trace:
           __refcount_sub_and_test include/linux/refcount.h:283 [inline]
           __refcount_dec_and_test include/linux/refcount.h:315 [inline]
           refcount_dec_and_test include/linux/refcount.h:333 [inline]
           nilfs_put_root+0xc1/0xd0 fs/nilfs2/the_nilfs.c:795
           nilfs_segctor_destroy fs/nilfs2/segment.c:2749 [inline]
           nilfs_detach_log_writer+0x3fa/0x570 fs/nilfs2/segment.c:2812
           nilfs_put_super+0x2f/0xf0 fs/nilfs2/super.c:467
           generic_shutdown_super+0xcd/0x1f0 fs/super.c:464
           kill_block_super+0x4a/0x90 fs/super.c:1446
           deactivate_locked_super+0x6a/0xb0 fs/super.c:335
           deactivate_super+0x85/0x90 fs/super.c:366
           cleanup_mnt+0x277/0x2e0 fs/namespace.c:1118
           __cleanup_mnt+0x15/0x20 fs/namespace.c:1125
           task_work_run+0x8e/0x110 kernel/task_work.c:151
           tracehook_notify_resume include/linux/tracehook.h:188 [inline]
           exit_to_user_mode_loop kernel/entry/common.c:164 [inline]
           exit_to_user_mode_prepare+0x13c/0x170 kernel/entry/common.c:191
           syscall_exit_to_user_mode+0x16/0x30 kernel/entry/common.c:266
           do_syscall_64+0x45/0x80 arch/x86/entry/common.c:56
           entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      There is no reproduction program, and the above is only theoretical
      analysis.
      
      Link: https://lkml.kernel.org/r/1629859428-5906-1-git-send-email-konishi.ryusuke@gmail.com
      Fixes: ba65ae47
      
       ("nilfs2: add checkpoint tree to nilfs object")
      Link: https://lkml.kernel.org/r/20210723012317.4146-1-thunder.leizhen@huawei.com
      Signed-off-by: default avatarZhen Lei <thunder.leizhen@huawei.com>
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      98e2e409
    • Nanyong Sun's avatar
      nilfs2: fix memory leak in nilfs_sysfs_delete_snapshot_group · 17243e1c
      Nanyong Sun authored
      
      
      kobject_put() should be used to cleanup the memory associated with the
      kobject instead of kobject_del().  See the section "Kobject removal" of
      "Documentation/core-api/kobject.rst".
      
      Link: https://lkml.kernel.org/r/20210629022556.3985106-7-sunnanyong@huawei.com
      Link: https://lkml.kernel.org/r/1625651306-10829-7-git-send-email-konishi.ryusuke@gmail.com
      Signed-off-by: default avatarNanyong Sun <sunnanyong@huawei.com>
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      17243e1c
    • Nanyong Sun's avatar
      nilfs2: fix memory leak in nilfs_sysfs_create_snapshot_group · b2fe39c2
      Nanyong Sun authored
      
      
      If kobject_init_and_add returns with error, kobject_put() is needed here
      to avoid memory leak, because kobject_init_and_add may return error
      without freeing the memory associated with the kobject it allocated.
      
      Link: https://lkml.kernel.org/r/20210629022556.3985106-6-sunnanyong@huawei.com
      Link: https://lkml.kernel.org/r/1625651306-10829-6-git-send-email-konishi.ryusuke@gmail.com
      Signed-off-by: default avatarNanyong Sun <sunnanyong@huawei.com>
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b2fe39c2
    • Nanyong Sun's avatar
      nilfs2: fix memory leak in nilfs_sysfs_delete_##name##_group · a3e18125
      Nanyong Sun authored
      
      
      The kobject_put() should be used to cleanup the memory associated with the
      kobject instead of kobject_del.  See the section "Kobject removal" of
      "Documentation/core-api/kobject.rst".
      
      Link: https://lkml.kernel.org/r/20210629022556.3985106-5-sunnanyong@huawei.com
      Link: https://lkml.kernel.org/r/1625651306-10829-5-git-send-email-konishi.ryusuke@gmail.com
      Signed-off-by: default avatarNanyong Sun <sunnanyong@huawei.com>
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a3e18125
    • Nanyong Sun's avatar
      nilfs2: fix memory leak in nilfs_sysfs_create_##name##_group · 24f8cb1e
      Nanyong Sun authored
      
      
      If kobject_init_and_add return with error, kobject_put() is needed here to
      avoid memory leak, because kobject_init_and_add may return error without
      freeing the memory associated with the kobject it allocated.
      
      Link: https://lkml.kernel.org/r/20210629022556.3985106-4-sunnanyong@huawei.com
      Link: https://lkml.kernel.org/r/1625651306-10829-4-git-send-email-konishi.ryusuke@gmail.com
      Signed-off-by: default avatarNanyong Sun <sunnanyong@huawei.com>
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      24f8cb1e
    • Nanyong Sun's avatar
      nilfs2: fix NULL pointer in nilfs_##name##_attr_release · dbc6e7d4
      Nanyong Sun authored
      
      
      In nilfs_##name##_attr_release, kobj->parent should not be referenced
      because it is a NULL pointer.  The release() method of kobject is always
      called in kobject_put(kobj), in the implementation of kobject_put(), the
      kobj->parent will be assigned as NULL before call the release() method.
      So just use kobj to get the subgroups, which is more efficient and can fix
      a NULL pointer reference problem.
      
      Link: https://lkml.kernel.org/r/20210629022556.3985106-3-sunnanyong@huawei.com
      Link: https://lkml.kernel.org/r/1625651306-10829-3-git-send-email-konishi.ryusuke@gmail.com
      Signed-off-by: default avatarNanyong Sun <sunnanyong@huawei.com>
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dbc6e7d4
    • Nanyong Sun's avatar
      nilfs2: fix memory leak in nilfs_sysfs_create_device_group · 5f5dec07
      Nanyong Sun authored
      
      
      Patch series "nilfs2: fix incorrect usage of kobject".
      
      This patchset from Nanyong Sun fixes memory leak issues and a NULL
      pointer dereference issue caused by incorrect usage of kboject in nilfs2
      sysfs implementation.
      
      This patch (of 6):
      
      Reported by syzkaller:
      
        BUG: memory leak
        unreferenced object 0xffff888100ca8988 (size 8):
        comm "syz-executor.1", pid 1930, jiffies 4294745569 (age 18.052s)
        hex dump (first 8 bytes):
        6c 6f 6f 70 31 00 ff ff loop1...
        backtrace:
          kstrdup+0x36/0x70 mm/util.c:60
          kstrdup_const+0x35/0x60 mm/util.c:83
          kvasprintf_const+0xf1/0x180 lib/kasprintf.c:48
          kobject_set_name_vargs+0x56/0x150 lib/kobject.c:289
          kobject_add_varg lib/kobject.c:384 [inline]
          kobject_init_and_add+0xc9/0x150 lib/kobject.c:473
          nilfs_sysfs_create_device_group+0x150/0x7d0 fs/nilfs2/sysfs.c:986
          init_nilfs+0xa21/0xea0 fs/nilfs2/the_nilfs.c:637
          nilfs_fill_super fs/nilfs2/super.c:1046 [inline]
          nilfs_mount+0x7b4/0xe80 fs/nilfs2/super.c:1316
          legacy_get_tree+0x105/0x210 fs/fs_context.c:592
          vfs_get_tree+0x8e/0x2d0 fs/super.c:1498
          do_new_mount fs/namespace.c:2905 [inline]
          path_mount+0xf9b/0x1990 fs/namespace.c:3235
          do_mount+0xea/0x100 fs/namespace.c:3248
          __do_sys_mount fs/namespace.c:3456 [inline]
          __se_sys_mount fs/namespace.c:3433 [inline]
          __x64_sys_mount+0x14b/0x1f0 fs/namespace.c:3433
          do_syscall_x64 arch/x86/entry/common.c:50 [inline]
          do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
          entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      If kobject_init_and_add return with error, then the cleanup of kobject
      is needed because memory may be allocated in kobject_init_and_add
      without freeing.
      
      And the place of cleanup_dev_kobject should use kobject_put to free the
      memory associated with the kobject.  As the section "Kobject removal" of
      "Documentation/core-api/kobject.rst" says, kobject_del() just makes the
      kobject "invisible", but it is not cleaned up.  And no more cleanup will
      do after cleanup_dev_kobject, so kobject_put is needed here.
      
      Link: https://lkml.kernel.org/r/1625651306-10829-1-git-send-email-konishi.ryusuke@gmail.com
      Link: https://lkml.kernel.org/r/1625651306-10829-2-git-send-email-konishi.ryusuke@gmail.com
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Link: https://lkml.kernel.org/r/20210629022556.3985106-2-sunnanyong@huawei.com
      Signed-off-by: default avatarNanyong Sun <sunnanyong@huawei.com>
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5f5dec07
    • Kefeng Wang's avatar
      trap: cleanup trap_init() · 8b097881
      Kefeng Wang authored
      
      
      There are some empty trap_init() definitions in different ARCHs, Introduce
      a new weak trap_init() function to clean them up.
      
      Link: https://lkml.kernel.org/r/20210812123602.76356-1-wangkefeng.wang@huawei.com
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>	[arm32]
      Acked-by: Vineet Gupta						[arc]
      Acked-by: Michael Ellerman <mpe@ellerman.id.au>			[powerpc]
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <palmerdabbelt@google.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8b097881
    • Rasmus Villemoes's avatar
      init: move usermodehelper_enable() to populate_rootfs() · b234ed6d
      Rasmus Villemoes authored
      Currently, usermodehelper is enabled right before PID1 starts going
      through the initcalls. However, any call of a usermodehelper from a
      pure_, core_, postcore_, arch_, subsys_ or fs_ initcall is futile, as
      there is no filesystem contents yet.
      
      Up until commit e7cb072e ("init/initramfs.c: do unpacking
      asynchronously"), such calls, whether via some request_module(), a
      legacy uevent "/sbin/hotplug" notification or something else, would
      just fail silently with (presumably) -ENOENT from
      kernel_execve(). However, that commit introduced the
      wait_for_initramfs() synchronization hook which must be called from
      the usermodehelper exec path right before the kernel_execve, in order
      that request_module() et al done from *after* rootfs_initcall()
      time (i.e. device_ and late_ initcalls) would continue to find a
      populated initramfs as they used to.
      
      Any call of wait_for_initramfs() done before the unpacking has been
      scheduled (i.e. before rootfs_initcall time) must just return
      immediately [and let the caller find an empty file system] in order
      not to deadlock the machine. I mistakenly thought, and my limited
      testing confirmed, that there were no such calls, so I added a
      pr_warn_once() in wait_for_initramfs(). It turns out that one can
      indeed hit request_module() as well as kobject_uevent_env() during
      those early init calls, leading to a user-visible warning in the
      kernel log emitted consistently for certain configurations.
      
      We could just remove the pr_warn_once(), but I think it's better to
      postpone enabling the usermodehelper framework until there is at least
      some chance of finding the executable. That is also a little more
      efficient in that a lot of work done in umh.c will be elided. However,
      it does change the error seen by those early callers from -ENOENT to
      -EBUSY, so there is a risk of a regression if any caller care about
      the exact error value.
      
      Link: https://lkml.kernel.org/r/20210728134638.329060-1-linux@rasmusvillemoes.dk
      Fixes: e7cb072e
      
       ("init/initramfs.c: do unpacking asynchronously")
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Reported-by: default avatarAlexander Egorenkov <egorenar@linux.ibm.com>
      Reported-by: default avatarBruno Goncalves <bgoncalv@redhat.com>
      Reported-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b234ed6d
    • Nicholas Piggin's avatar
      fs/epoll: use a per-cpu counter for user's watches count · 1e1c1583
      Nicholas Piggin authored
      
      
      This counter tracks the number of watches a user has, to compare against
      the 'max_user_watches' limit. This causes a scalability bottleneck on
      SPECjbb2015 on large systems as there is only one user. Changing to a
      per-cpu counter increases throughput of the benchmark by about 30% on a
      16-socket, > 1000 thread system.
      
      [rdunlap@infradead.org: fix build errors in kernel/user.c when CONFIG_EPOLL=n]
      [npiggin@gmail.com: move ifdefs into wrapper functions, slightly improve panic message]
        Link: https://lkml.kernel.org/r/1628051945.fens3r99ox.astroid@bobo.none
      [akpm@linux-foundation.org: tweak user_epoll_alloc(), per Guenter]
        Link: https://lkml.kernel.org/r/20210804191421.GA1900577@roeck-us.net
      
      Link: https://lkml.kernel.org/r/20210802032013.2751916-1-npiggin@gmail.com
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Reported-by: default avatarAnton Blanchard <anton@ozlabs.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-fou...>
      1e1c1583
    • Joe Perches's avatar
      checkpatch: improve GIT_COMMIT_ID test · 4ce9f970
      Joe Perches authored
      
      
      The preferred git commit id reference has the form
      
      	commit <SHA-1> ("Title line")
      
      where SHA-1 is the commit hex hash with a minimum lenth of 12 and ("Title
      line") is the complete title line of the commit with a (" prefix and ")
      suffix.
      
      The current tests fail when the "Title line" has one or more embedded
      double quotes.
      
      Improve the test that finds the commit SHA-1 hex hash then ("Title line")
      by using $balanced_parens for a maximum of 3 consecutive lines.
      
      [akpm@linux-foundation.org: add missing &&]
      
      Link: https://lkml.kernel.org/r/976c6cdd680db4b55ae31b5fc2d1779da5c0dc66.camel@perches.com
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Cc: Dwaipayan Ray <dwaipayanray1@gmail.com>
      Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
      Cc: Denis Efremov <efremov@linux.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4ce9f970
    • Mimi Zohar's avatar
      checkpatch: make email address check case insensitive · 046fc741
      Mimi Zohar authored
      Instead of checkpatch requiring the patch author to exactly match the
      signed-off-by tag, commit 48ca2d8a
      
       ("checkpatch: add new warnings to
      author signoff checks.") safely relaxed this requirement.
      
      Although the local-part of an email address (local-part@domain), may be
      case sensitive, exploiting the case sensitivity of mailbox local-parts
      impedes interoperability and is discouraged.  Mailbox domains follow
      normal DNS rules and are hence not case sensitive.  (Refer to
      https://datatracker.ietf.org/doc/html/rfc5321#section-2.4.)
      
      Further relax the patch author and signed-off-by tag comparison by making
      the email address check case insensitive.
      
      Link: https://lkml.kernel.org/r/20210816112725.173206-1-zohar@linux.ibm.com
      Signed-off-by: default avatarMimi Zohar <zohar@linux.ibm.com>
      Acked-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      046fc741
    • Joe Perches's avatar
      checkpatch: support wide strings · d2af5aa6
      Joe Perches authored
      
      
      Allow prefixing typical strings with L for wide strings and u for unicode
      strings.
      
      Link: https://lkml.kernel.org/r/20210801170733.1.I3f9784fd3c1007d08ec2e70b151d137687575495@changeid
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarSimon Glass <sjg@chromium.org>
      Cc: Dwaipayan Ray <dwaipayanray1@gmail.com>
      Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d2af5aa6
    • Andy Shevchenko's avatar
      tools: rename bitmap_alloc() to bitmap_zalloc() · 7fc5b571
      Andy Shevchenko authored
      
      
      Rename bitmap_alloc() to bitmap_zalloc() in tools to follow the bitmap API
      in the kernel.
      
      No functional changes intended.
      
      Link: https://lkml.kernel.org/r/20210814211713.180533-14-yury.norov@gmail.com
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarYury Norov <yury.norov@gmail.com>
      Suggested-by: default avatarYury Norov <yury.norov@gmail.com>
      Acked-by: default avatarYury Norov <yury.norov@gmail.com>
      Tested-by: default avatarWolfram Sang <wsa+renesas@sang-engineering.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Lobakin <alobakin@pm.me>
      Cc: Alexey Klimov <aklimov@redhat.com>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Ulf Hansson <ulf.hansson@linaro.org>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7fc5b571
    • Randy Dunlap's avatar
      lib/iov_iter.c: fix kernel-doc warnings · 44e55997
      Randy Dunlap authored
      
      
      Fix all kernel-doc warnings in lib/iov_iter.c:
      
      lib/iov_iter.c:695: warning: Function parameter or member 'i' not described in '_copy_mc_to_iter'
      lib/iov_iter.c:695: warning: Excess function parameter 'iter' description in '_copy_mc_to_iter'
      lib/iov_iter.c:695: warning: No description found for return value of '_copy_mc_to_iter'
      lib/iov_iter.c:758: warning: Function parameter or member 'i' not described in '_copy_from_iter_flushcache'
      lib/iov_iter.c:758: warning: Excess function parameter 'iter' description in '_copy_from_iter_flushcache'
      lib/iov_iter.c:758: warning: No description found for return value of '_copy_from_iter_flushcache'
      
      Link: https://lkml.kernel.org/r/20210809051053.6531-1-rdunlap@infradead.org
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      44e55997
    • Randy Dunlap's avatar
      lib/dump_stack: correct kernel-doc notation · 83a29beb
      Randy Dunlap authored
      
      
      Fix kernel-doc warnings in dump_stack.c:
      
      lib/dump_stack.c:97: warning: Function parameter or member 'log_lvl' not described in 'dump_stack_lvl'
      lib/dump_stack.c:97: warning: expecting prototype for dump_stack(). Prototype was for dump_stack_lvl() instead
      
      Link: https://lkml.kernel.org/r/20210809051643.17567-1-rdunlap@infradead.org
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      83a29beb
    • Daniel Latypov's avatar
      lib/test: convert test_sort.c to use KUnit · 36f33b56
      Daniel Latypov authored
      This follows up commit ebd09577 ("lib/test: convert
      lib/test_list_sort.c to use KUnit").
      
      Converting this test to KUnit makes the test a bit shorter, standardizes
      how it reports pass/fail, and adds an easier way to run the test [1].
      
      Like ebd09577
      
      , this leaves the file and Kconfig option name the same,
      but slightly changes their dependencies (needs CONFIG_KUNIT).
      
      [1] Can be run via
      $ ./tools/testing/kunit/kunit.py run --kunitconfig /dev/stdin <<EOF
      CONFIG_KUNIT=y
      CONFIG_TEST_SORT=y
      EOF
      
      [11:30:27] Starting KUnit Kernel ...
      [11:30:30] ============================================================
      [11:30:30] ======== [PASSED] lib_sort ========
      [11:30:30] [PASSED] test_sort
      [11:30:30] ============================================================
      [11:30:30] Testing complete. 1 tests run. 0 failed. 0 crashed. 0 skipped.
      [11:30:30] Elapsed time: 37.032s total, 0.001s configuring, 34.090s building, 0.000s running
      
      Note: this is the time it took after a `make mrproper`.
      
      With an incremental rebuild, this looks more like:
      [11:38:58] Elapsed time: 6.444s total, 0.001s configuring, 3.416s building, 0.000s running
      
      Since the test has no dependencies, it can also be run (with some other
      tests) with just:
      $ ./tools/testing/kunit/kunit.py run
      
      Link: https://lkml.kernel.org/r/20210715232441.1380885-1-dlatypov@google.com
      Signed-off-by: default avatarDaniel Latypov <dlatypov@google.com>
      Cc: Pravin Shedge <pravin.shedge4linux@gmail.com>
      Cc: Brendan Higgins <brendanhiggins@google.com>
      Cc: David Gow <davidgow@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      36f33b56
    • Geert Uytterhoeven's avatar
      math: RATIONAL_KUNIT_TEST should depend on RATIONAL instead of selecting it · 8ba739ed
      Geert Uytterhoeven authored
      RATIONAL_KUNIT_TEST selects RATIONAL, thus enabling an optional feature
      the user may not want to have enabled.  Fix this by making the test depend
      on RATIONAL instead.
      
      Link: https://lkml.kernel.org/r/20210706100945.3803694-3-geert@linux-m68k.org
      Fixes: b6c75c4a
      
       ("lib/math/rational: add Kunit test cases")
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Brendan Higgins <brendanhiggins@google.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Trent Piepho <tpiepho@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8ba739ed
    • Geert Uytterhoeven's avatar
      math: make RATIONAL tristate · bcda5fd3
      Geert Uytterhoeven authored
      
      
      Patch series "math: RATIONAL and RATIONAL_KUNIT_TEST improvements".
      
      This series makes the RATIONAL symbol tristate, so it is not forced
      builtin if all users are modular, and makes the RATIONAL_KUNIT_TEST depend
      on RATIONAL, to avoid enabling RATIONAL if there are no real users.
      
      This patch (of 2):
      
      All but one symbols that select RATIONAL are tristate, but RATIONAL itself
      is bool.  Change it to tristate, so the rational fractions support code
      can be modular if no builtin code relies on it.
      
      Link: https://lkml.kernel.org/r/20210706100945.3803694-1-geert@linux-m68k.org
      Link: https://lkml.kernel.org/r/20210706100945.3803694-2-geert@linux-m68k.org
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Reviewed-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Trent Piepho <tpiepho@gmail.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Brendan Higgins <brendanhiggins@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bcda5fd3