Skip to content
  1. Jun 30, 2021
    • Roman Gushchin's avatar
      writeback, cgroup: release dying cgwbs by switching attached inodes · c22d70a1
      Roman Gushchin authored
      
      
      Asynchronously try to release dying cgwbs by switching attached inodes to
      the nearest living ancestor wb.  It helps to get rid of per-cgroup
      writeback structures themselves and of pinned memory and block cgroups,
      which are significantly larger structures (mostly due to large per-cpu
      statistics data).  This prevents memory waste and helps to avoid different
      scalability problems caused by large piles of dying cgroups.
      
      Reuse the existing mechanism of inode switching used for foreign inode
      detection.  To speed things up batch up to 115 inode switching in a single
      operation (the maximum number is selected so that the resulting struct
      inode_switch_wbs_context can fit into 1024 bytes).  Because every
      switching consists of two steps divided by an RCU grace period, it would
      be too slow without batching.  Please note that the whole batch counts as
      a single operation (when increasing/decreasing isw_nr_in_flight).  This
      allows to keep umounting working (flush the switching queue), however
      prevents cleanups from consuming the whole switching quota and effectively
      blocking the frn switching.
      
      A cgwb cleanup operation can fail due to different reasons (e.g.  not
      enough memory, the cgwb has an in-flight/pending io, an attached inode in
      a wrong state, etc).  In this case the next scheduled cleanup will make a
      new attempt.  An attempt is made each time a new cgwb is offlined (in
      other words a memcg and/or a blkcg is deleted by a user).  In the future
      an additional attempt scheduled by a timer can be implemented.
      
      [guro@fb.com: replace open-coded "115" with arithmetic]
        Link: https://lkml.kernel.org/r/YMEcSBcq/VXMiPPO@carbon.dhcp.thefacebook.com
      [guro@fb.com: add smp_mb() to inode_prepare_wbs_switch()]
        Link: https://lkml.kernel.org/r/YMFa+guFw7OFjf3X@carbon.dhcp.thefacebook.com
      [willy@infradead.org: fix documentation]
        Link: https://lkml.kernel.org/r/20210615200242.1716568-2-willy@infradead.org
      
      Link: https://lkml.kernel.org/r/20210608230225.2078447-9-guro@fb.com
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarDennis Zhou <dennis@kernel.org>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c22d70a1
    • Roman Gushchin's avatar
      writeback, cgroup: support switching multiple inodes at once · f5fbe6b7
      Roman Gushchin authored
      
      
      Currently only a single inode can be switched to another writeback
      structure at once.  That means to switch an inode a separate
      inode_switch_wbs_context structure must be allocated, and a separate rcu
      callback and work must be scheduled.
      
      It's fine for the existing ad-hoc switching, which is not happening that
      often, but sub-optimal for massive switching required in order to release
      a writeback structure.  To prepare for it, let's add a support for
      switching multiple inodes at once.
      
      Instead of containing a single inode pointer, inode_switch_wbs_context
      will contain a NULL-terminated array of inode pointers.
      inode_do_switch_wbs() will be called for each inode.
      
      To optimize the locking bdi->wb_switch_rwsem, old_wb's and new_wb's
      list_locks will be acquired and released only once altogether for all
      inodes.  wb_wakeup() will be also be called only once.  Instead of calling
      wb_put(old_wb) after each successful switch, wb_put_many() is introduced
      and used.
      
      Link: https://lkml.kernel.org/r/20210608230225.2078447-8-guro@fb.com
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Acked-by: default avatarDennis Zhou <dennis@kernel.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f5fbe6b7
    • Roman Gushchin's avatar
      writeback, cgroup: split out the functional part of inode_switch_wbs_work_fn() · 72d4512e
      Roman Gushchin authored
      
      
      Split out the functional part of the inode_switch_wbs_work_fn() function
      as inode_do switch_wbs() to reuse it later for switching inodes attached
      to dying cgwbs.
      
      This commit doesn't bring any functional changes.
      
      Link: https://lkml.kernel.org/r/20210608230225.2078447-7-guro@fb.com
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarDennis Zhou <dennis@kernel.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      72d4512e
    • Roman Gushchin's avatar
      writeback, cgroup: keep list of inodes attached to bdi_writeback · f3b6a6df
      Roman Gushchin authored
      
      
      Currently there is no way to iterate over inodes attached to a specific
      cgwb structure.  It limits the ability to efficiently reclaim the
      writeback structure itself and associated memory and block cgroup
      structures without scanning all inodes belonging to a sb, which can be
      prohibitively expensive.
      
      While dirty/in-active-writeback an inode belongs to one of the
      bdi_writeback's io lists: b_dirty, b_io, b_more_io and b_dirty_time.  Once
      cleaned up, it's removed from all io lists.  So the inode->i_io_list can
      be reused to maintain the list of inodes, attached to a bdi_writeback
      structure.
      
      This patch introduces a new wb->b_attached list, which contains all inodes
      which were dirty at least once and are attached to the given cgwb.  Inodes
      attached to the root bdi_writeback structures are never placed on such
      list.  The following patch will use this list to try to release cgwbs
      structures more efficiently.
      
      Link: https://lkml.kernel.org/r/20210608230225.2078447-6-guro@fb.com
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Suggested-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarDennis Zhou <dennis@kernel.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f3b6a6df
    • Roman Gushchin's avatar
      writeback, cgroup: switch to rcu_work API in inode_switch_wbs() · 29264d92
      Roman Gushchin authored
      
      
      Inode's wb switching requires two steps divided by an RCU grace period.
      It's currently implemented as an RCU callback inode_switch_wbs_rcu_fn(),
      which schedules inode_switch_wbs_work_fn() as a work.
      
      Switching to the rcu_work API allows to do the same in a cleaner and
      slightly shorter form.
      
      Link: https://lkml.kernel.org/r/20210608230225.2078447-5-guro@fb.com
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarDennis Zhou <dennis@kernel.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      29264d92
    • Roman Gushchin's avatar
      writeback, cgroup: increment isw_nr_in_flight before grabbing an inode · 8826ee4f
      Roman Gushchin authored
      
      
      isw_nr_in_flight is used to determine whether the inode switch queue
      should be flushed from the umount path.  Currently it's increased after
      grabbing an inode and even scheduling the switch work.  It means the
      umount path can walk past cleanup_offline_cgwb() with active inode
      references, which can result in a "Busy inodes after unmount." message and
      use-after-free issues (with inode->i_sb which gets freed).
      
      Fix it by incrementing isw_nr_in_flight before doing anything with the
      inode and decrementing in the case when switching wasn't scheduled.
      
      The problem hasn't yet been seen in the real life and was discovered by
      Jan Kara by looking into the code.
      
      Link: https://lkml.kernel.org/r/20210608230225.2078447-4-guro@fb.com
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Suggested-by: default avatarJan Kara <jack@suse.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8826ee4f
    • Roman Gushchin's avatar
      writeback, cgroup: add smp_mb() to cgroup_writeback_umount() · 592fa002
      Roman Gushchin authored
      
      
      A full memory barrier is required between clearing SB_ACTIVE flag in
      generic_shutdown_super() and checking isw_nr_in_flight in
      cgroup_writeback_umount(), otherwise a new switch operation might be
      scheduled after atomic_read(&isw_nr_in_flight) returned 0.  This would
      result in a non-flushed isw_wq, and a potential crash.
      
      The problem hasn't yet been seen in the real life and was discovered by
      Jan Kara by looking into the code.
      
      Link: https://lkml.kernel.org/r/20210608230225.2078447-3-guro@fb.com
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Suggested-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Jan Kara <jack@suse.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      592fa002
    • Roman Gushchin's avatar
      writeback, cgroup: do not switch inodes with I_WILL_FREE flag · 4ade5867
      Roman Gushchin authored
      
      
      Patch series "cgroup, blkcg: prevent dirty inodes to pin dying memory cgroups", v9.
      
      When an inode is getting dirty for the first time it's associated with a
      wb structure (see __inode_attach_wb()).  It can later be switched to
      another wb (if e.g.  some other cgroup is writing a lot of data to the
      same inode), but otherwise stays attached to the original wb until being
      reclaimed.
      
      The problem is that the wb structure holds a reference to the original
      memory and blkcg cgroups.  So if an inode has been dirty once and later is
      actively used in read-only mode, it has a good chance to pin down the
      original memory and blkcg cgroups forever.  This is often the case with
      services bringing data for other services, e.g.  updating some rpm
      packages.
      
      In the real life it becomes a problem due to a large size of the memcg
      structure, which can easily be 1000x larger than an inode.  Also a really
      large number of dying cgroups can raise different scalability issues, e.g.
      making the memory reclaim costly and less effective.
      
      To solve the problem inodes should be eventually detached from the
      corresponding writeback structure.  It's inefficient to do it after every
      writeback completion.  Instead it can be done whenever the original memory
      cgroup is offlined and writeback structure is getting killed.  Scanning
      over a (potentially long) list of inodes and detach them from the
      writeback structure can take quite some time.  To avoid scanning all
      inodes, attached inodes are kept on a new list (b_attached).  To make it
      less noticeable to a user, the scanning and switching is performed from a
      work context.
      
      Big thanks to Jan Kara, Dennis Zhou, Hillf Danton and Tejun Heo for their
      ideas and contribution to this patchset.
      
      This patch (of 8):
      
      If an inode's state has I_WILL_FREE flag set, the inode will be freed
      soon, so there is no point in trying to switch the inode to a different
      cgwb.
      
      I_WILL_FREE was ignored since the introduction of the inode switching, so
      it looks like it doesn't lead to any noticeable issues for a user.  This
      is why the patch is not intended for a stable backport.
      
      Link: https://lkml.kernel.org/r/20210608230225.2078447-1-guro@fb.com
      Link: https://lkml.kernel.org/r/20210608230225.2078447-2-guro@fb.com
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Suggested-by: default avatarJan Kara <jack@suse.cz>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Acked-by: default avatarDennis Zhou <dennis@kernel.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4ade5867
    • Chi Wu's avatar
      mm/page-writeback: use __this_cpu_inc() in account_page_dirtied() · 87e37897
      Chi Wu authored
      
      
      As account_page_dirtied() was always protected by xa_lock_irqsave(), so
      using __this_cpu_inc() is better.
      
      Link: https://lkml.kernel.org/r/20210512144742.4764-1-wuchi.zero@gmail.com
      Signed-off-by: default avatarChi Wu <wuchi.zero@gmail.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Howard Cochran <hcochran@kernelspring.com>
      Cc: Miklos Szeredi <mszeredi@redhat.com>
      Cc: Sedat Dilek <sedat.dilek@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      87e37897
    • Chi Wu's avatar
      mm/page-writeback: update the comment of Dirty position control · 03231554
      Chi Wu authored
      
      
      As the value of pos_ratio_polynom() clamp between 0 and 2LL <<
      RATELIMIT_CALC_SHIFT, the global control line should be consistent with
      it.
      
      Link: https://lkml.kernel.org/r/20210511103606.3732-1-wuchi.zero@gmail.com
      Signed-off-by: default avatarChi Wu <wuchi.zero@gmail.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Howard Cochran <hcochran@kernelspring.com>
      Cc: Miklos Szeredi <mszeredi@redhat.com>
      Cc: Sedat Dilek <sedat.dilek@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      03231554
    • Chi Wu's avatar
      mm/page-writeback: Fix performance when BDI's share of ratio is 0. · ab19939a
      Chi Wu authored
      Fix performance when BDI's share of ratio is 0.
      
      The issue is similar to commit 74d36944
      
       ("writeback: Fix
      performance regression in wb_over_bg_thresh()").
      
      Balance_dirty_pages and the writeback worker will also disagree on
      whether writeback when a BDI uses BDI_CAP_STRICTLIMIT and BDI's share
      of the thresh ratio is zero.
      
      For example, A thread on cpu0 writes 32 pages and then
      balance_dirty_pages, it will wake up background writeback and pauses
      because wb_dirty > wb->wb_thresh = 0 (share of thresh ratio is zero).
      A thread may runs on cpu0 again because scheduler prefers pre_cpu.
      Then writeback worker may runs on other cpus(1,2..) which causes the
      value of wb_stat(wb, WB_RECLAIMABLE) in wb_over_bg_thresh is 0 and does
      not writeback and returns.
      
      Thus, balance_dirty_pages keeps looping, sleeping and then waking up the
      worker who will do nothing. It remains stuck in this state until the
      writeback worker hit the right dirty cpu or the dirty pages expire.
      
      The fix that we should get the wb_stat_sum radically when thresh is low.
      
      Link: https://lkml.kernel.org/r/20210428225046.16301-1-wuchi.zero@gmail.com
      Signed-off-by: default avatarChi Wu <wuchi.zero@gmail.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Miklos Szeredi <mszeredi@redhat.com>
      Cc: Sedat Dilek <sedat.dilek@gmail.com>
      Cc: Jens Axboe <axboe@fb.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ab19939a
    • Kefeng Wang's avatar
      mm: page-writeback: kill get_writeback_state() comments · 5defd497
      Kefeng Wang authored
      
      
      The get_writeback_state() has gone since 2006, kill related comments.
      
      Link: https://lkml.kernel.org/r/20210508125026.56600-1-wangkefeng.wang@huawei.com
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5defd497
    • Gavin Shan's avatar
      virtio_balloon: specify page reporting order if needed · f8af4d08
      Gavin Shan authored
      
      
      The page reporting won't be triggered if the freeing page can't come up
      with a free area, whose size is equal or bigger than the threshold (page
      reporting order).  The default page reporting order, equal to
      @pageblock_order, is too huge on some architectures to trigger page
      reporting.  One example is ARM64 when 64KB base page size is used.
      
            PAGE_SIZE:          64KB
            pageblock_order:    13       (512MB)
            MAX_ORDER:          14
      
      This specifies the page reporting order to 5 (2MB) for this specific case
      so that page reporting can be triggered.
      
      Link: https://lkml.kernel.org/r/20210625014710.42954-5-gshan@redhat.com
      Signed-off-by: default avatarGavin Shan <gshan@redhat.com>
      Reviewed-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f8af4d08
    • Gavin Shan's avatar
      mm/page_reporting: allow driver to specify reporting order · 9f849c6f
      Gavin Shan authored
      
      
      The page reporting order (threshold) is sticky to @pageblock_order by
      default.  The page reporting can never be triggered because the freeing
      page can't come up with a free area like that huge.  The situation becomes
      worse when the system memory becomes heavily fragmented.
      
      For example, the following configurations are used on ARM64 when 64KB base
      page size is enabled.  In this specific case, the page reporting won't be
      triggered until the freeing page comes up with a 512MB free area.  That's
      hard to be met, especially when the system memory becomes heavily
      fragmented.
      
         PAGE_SIZE:          64KB
         HPAGE_SIZE:         512MB
         pageblock_order:    13       (512MB)
         MAX_ORDER:          14
      
      This allows the drivers to specify the page reporting order when the page
      reporting device is registered.  It falls back to @pageblock_order if it's
      not specified by the driver.  The existing users (hv_balloon and
      virtio_balloon) don't specify it and @pageblock_order is still taken as
      their page reporting order.  So this shouldn't introduce any functional
      changes.
      
      Link: https://lkml.kernel.org/r/20210625014710.42954-4-gshan@redhat.com
      Signed-off-by: default avatarGavin Shan <gshan@redhat.com>
      Reviewed-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9f849c6f
    • Gavin Shan's avatar
      mm/page_reporting: export reporting order as module parameter · f58780a8
      Gavin Shan authored
      
      
      The macro PAGE_REPORTING_MIN_ORDER is defined as the page reporting
      threshold.  It can't be adjusted at runtime.
      
      This introduces a variable (@page_reporting_order) to replace the marcro
      (PAGE_REPORTING_MIN_ORDER).  MAX_ORDER is assigned to it initially,
      meaning the page reporting is disabled.  It will be specified by driver if
      valid one is provided.  Otherwise, it will fall back to @pageblock_order.
      It's also exported so that the page reporting order can be adjusted at
      runtime.
      
      Link: https://lkml.kernel.org/r/20210625014710.42954-3-gshan@redhat.com
      Signed-off-by: default avatarGavin Shan <gshan@redhat.com>
      Suggested-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f58780a8
    • Gavin Shan's avatar
      mm/page_reporting: fix code style in __page_reporting_request() · 5631de54
      Gavin Shan authored
      
      
      Patch series "mm/page_reporting: Make page reporting work on arm64 with 64KB page size", v4.
      
      The page reporting threshold is currently equal to @pageblock_order, which
      is 13 and 512MB on arm64 with 64KB base page size selected.  The page
      reporting won't be triggered if the freeing page can't come up with a free
      area like that huge.  The condition is hard to be met, especially when the
      system memory becomes fragmented.
      
      This series intends to solve the issue by having page reporting threshold
      as 5 (2MB) on arm64 with 64KB base page size.  The patches are organized
      as:
      
         PATCH[1/4] Fix some coding style in __page_reporting_request().
         PATCH[2/4] Represents page reporting order with variable so that it can
                    be exported as module parameter.
         PATCH[3/4] Allows the device driver (e.g. virtio_balloon) to specify
                    the page reporting order when the device info is registered.
         PATCH[4/4] Specifies the page reporting order to 5, corresponding to
                    2MB in size on ARM64 when 64KB base page size is used.
      
      This patch (of 4):
      
      The lines of comments would be starting with one, instead two space.  This
      corrects the style.
      
      Link: https://lkml.kernel.org/r/20210625014710.42954-1-gshan@redhat.com
      Link: https://lkml.kernel.org/r/20210625014710.42954-2-gshan@redhat.com
      Signed-off-by: default avatarGavin Shan <gshan@redhat.com>
      Reviewed-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5631de54
    • Nicolas Saenz Julienne's avatar
      mm: mmap_lock: use local locks instead of disabling preemption · 832b5072
      Nicolas Saenz Julienne authored
      mmap_lock will explicitly disable/enable preemption upon manipulating its
      local CPU variables.  This is to be expected, but in this case, it doesn't
      play well with PREEMPT_RT.  The preemption disabled code section also
      takes a spin-lock.  Spin-locks in RT systems will try to schedule, which
      is exactly what we're trying to avoid.
      
      To mitigate this, convert the explicit preemption handling to local_locks.
      Which are RT aware, and will disable migration instead of preemption when
      PREEMPT_RT=y.
      
      The faulty call trace looks like the following:
          __mmap_lock_do_trace_*()
            preempt_disable()
            get_mm_memcg_path()
              cgroup_path()
                kernfs_path_from_node()
                  spin_lock_irqsave() /* Scheduling while atomic! */
      
      Link: https://lkml.kernel.org/r/20210604163506.2103900-1-nsaenzju@redhat.com
      Fixes: 2b5067a8
      
       ("mm: mmap_lock: add tracepoints around lock acquisition ")
      Signed-off-by: default avatarNicolas Saenz Julienne <nsaenzju@redhat.com>
      Tested-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Reviewed-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      832b5072
    • Anshuman Khandual's avatar
      mm/debug_vm_pgtable: ensure THP availability via has_transparent_hugepage() · 65ac1a60
      Anshuman Khandual authored
      On certain platforms, THP support could not just be validated via the
      build option CONFIG_TRANSPARENT_HUGEPAGE.  Instead
      has_transparent_hugepage() also needs to be called upon to verify THP
      runtime support.  Otherwise the debug test will just run into unusable THP
      helpers like in the case of a 4K hash config on powerpc platform [1].
      This just moves all pfn_pmd() and pfn_pud() after THP runtime validation
      with has_transparent_hugepage() which prevents the mentioned problem.
      
      [1] https://bugzilla.kernel.org/show_bug.cgi?id=213069
      
      Link: https://lkml.kernel.org/r/1621397588-19211-1-git-send-email-anshuman.khandual@arm.com
      Fixes: 787d563b
      
       ("mm/debug_vm_pgtable: fix kernel crash by checking for THP support")
      Signed-off-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      65ac1a60
    • Tang Bin's avatar
      tools/vm/page_owner_sort.c: check malloc() return · 85f29cd6
      Tang Bin authored
      
      
      Link: https://lkml.kernel.org/r/20210506131402.10416-1-tangbin@cmss.chinamobile.com
      Signed-off-by: default avatarZhang Shengju <zhangshengju@cmss.chinamobile.com>
      Signed-off-by: default avatarTang Bin <tangbin@cmss.chinamobile.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      85f29cd6
    • Jan Kara's avatar
      dax: fix ENOMEM handling in grab_mapping_entry() · 1a14e377
      Jan Kara authored
      grab_mapping_entry() has a bug in handling of ENOMEM condition.  Suppose
      we have a PMD entry at index i which we are downgrading to a PTE entry.
      grab_mapping_entry() will set pmd_downgrade to true, lock the entry, clear
      the entry in xarray, and decrement mapping->nrpages.  The it will call:
      
      	entry = dax_make_entry(pfn_to_pfn_t(0), flags);
      	dax_lock_entry(xas, entry);
      
      which inserts new PTE entry into xarray.  However this may fail allocating
      the new node.  We handle this by:
      
      	if (xas_nomem(xas, mapping_gfp_mask(mapping) & ~__GFP_HIGHMEM))
      		goto retry;
      
      however pmd_downgrade stays set to true even though 'entry' returned from
      get_unlocked_entry() will be NULL now.  And we will go again through the
      downgrade branch.  This is mostly harmless except that mapping->nrpages is
      decremented again and we temporarily have an invalid entry stored in
      xarray.  Fix the problem by setting pmd_downgrade to false each time we
      lookup the entry we work with so that it matches the entry we found.
      
      Link: https://lkml.kernel.org/r/20210622160015.18004-1-jack@suse.cz
      Fixes: b15cd800
      
       ("dax: Convert page fault handlers to XArray")
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1a14e377
    • Yanfei Xu's avatar
      mm/kmemleak: fix possible wrong memory scanning period · 54dd200c
      Yanfei Xu authored
      
      
      This commit contains 3 modifications:
      
      1. Convert the type of jiffies_scan_wait to "unsigned long".
      
      2. Use READ/WRITE_ONCE() for accessing "jiffies_scan_wait".
      
      3. Fix the possible wrong memory scanning period.  If you set a large
         memory scanning period like blow, then the "secs" variable will be
         non-zero, however the value of "jiffies_scan_wait" will be zero.
      
          echo "scan=0x10000000" > /sys/kernel/debug/kmemleak
      
      It is because the type of the msecs_to_jiffies()'s parameter is "unsigned
      int", and the "secs * 1000" is larger than its max value.  This in turn
      leads a unexpected jiffies_scan_wait, maybe zero.  We corret it by
      replacing kstrtoul() with kstrtouint(), and check the msecs to prevent it
      larger than UINT_MAX.
      
      Link: https://lkml.kernel.org/r/20210613174022.23044-1-yanfei.xu@windriver.com
      Signed-off-by: default avatarYanfei Xu <yanfei.xu@windriver.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      54dd200c
    • Georgi Djakov's avatar
      mm/slub: add taint after the errors are printed · 65ebdeef
      Georgi Djakov authored
      
      
      When running the kernel with panic_on_taint, the usual slub debug error
      messages are not being printed when object corruption happens.  That's
      because we panic in add_taint(), which is called before printing the
      additional information.  This is a bit unfortunate as the error messages
      are actually very useful, especially before a panic.  Let's fix this by
      moving add_taint() after the errors are printed on the console.
      
      Link: https://lkml.kernel.org/r/1623860738-146761-1-git-send-email-quic_c_gdjako@quicinc.com
      Signed-off-by: default avatarGeorgi Djakov <quic_c_gdjako@quicinc.com>
      Acked-by: default avatarRafael Aquini <aquini@redhat.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarAaron Tomlin <atomlin@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      65ebdeef
    • Faiyaz Mohammed's avatar
      mm: slub: move sysfs slab alloc/free interfaces to debugfs · 64dd6849
      Faiyaz Mohammed authored
      
      
      alloc_calls and free_calls implementation in sysfs have two issues, one is
      PAGE_SIZE limitation of sysfs and other is it does not adhere to "one
      value per file" rule.
      
      To overcome this issues, move the alloc_calls and free_calls
      implementation to debugfs.
      
      Debugfs cache will be created if SLAB_STORE_USER flag is set.
      
      Rename the alloc_calls/free_calls to alloc_traces/free_traces, to be
      inline with what it does.
      
      [faiyazm@codeaurora.org: fix the leak of alloc/free traces debugfs interface]
        Link: https://lkml.kernel.org/r/1624248060-30286-1-git-send-email-faiyazm@codeaurora.org
      
      Link: https://lkml.kernel.org/r/1623438200-19361-1-git-send-email-faiyazm@codeaurora.org
      Signed-off-by: default avatarFaiyaz Mohammed <faiyazm@codeaurora.org>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      64dd6849
    • Stephen Boyd's avatar
      slub: force on no_hash_pointers when slub_debug is enabled · 79270291
      Stephen Boyd authored
      
      
      Obscuring the pointers that slub shows when debugging makes for some
      confusing slub debug messages:
      
       Padding overwritten. 0x0000000079f0674a-0x000000000d4dce17
      
      Those addresses are hashed for kernel security reasons.  If we're trying
      to be secure with slub_debug on the commandline we have some big problems
      given that we dump whole chunks of kernel memory to the kernel logs.
      Let's force on the no_hash_pointers commandline flag when slub_debug is on
      the commandline.  This makes slub debug messages more meaningful and if by
      chance a kernel address is in some slub debug object dump we will have a
      better chance of figuring out what went wrong.
      
      Note that we don't use %px in the slub code because we want to reduce the
      number of places that %px is used in the kernel.  This also nicely prints
      a big fat warning at kernel boot if slub_debug is on the commandline so
      that we know that this kernel shouldn't be used on production systems.
      
      [akpm@linux-foundation.org: fix build with CONFIG_SLUB_DEBUG=n]
      
      Link: https://lkml.kernel.org/r/20210601182202.3011020-5-swboyd@chromium.org
      Signed-off-by: default avatarStephen Boyd <swboyd@chromium.org>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarPetr Mladek <pmladek@suse.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      79270291
    • Joe Perches's avatar
      slub: indicate slab_fix() uses printf formats · 582d1212
      Joe Perches authored
      
      
      Ideally, slab_fix() would be marked with __printf and the format here
      would not use \n as that's emitted by the slab_fix().  Make these changes.
      
      Link: https://lkml.kernel.org/r/20210601182202.3011020-4-swboyd@chromium.org
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarStephen Boyd <swboyd@chromium.org>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      582d1212
    • Stephen Boyd's avatar
      slub: actually use 'message' in restore_bytes() · 1a88ef87
      Stephen Boyd authored
      
      
      The message argument isn't used here.  Let's pass the string to the printk
      message so that the developer can figure out what's happening, instead of
      guessing that a redzone is being restored, etc.
      
      Link: https://lkml.kernel.org/r/20210601182202.3011020-3-swboyd@chromium.org
      Signed-off-by: default avatarStephen Boyd <swboyd@chromium.org>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1a88ef87
    • Stephen Boyd's avatar
      slub: restore slub_debug=- behavior · 02ac47d0
      Stephen Boyd authored
      Petch series "slub: Print non-hashed pointers in slub debugging", v3.
      
      I was doing some debugging recently and noticed that my pointers were
      being hashed while slub_debug was on the kernel commandline.  Let's force
      on the no hash pointer option when slub_debug is on the kernel commandline
      so that the prints are more meaningful.
      
      The first two patches are something else I noticed while looking at the
      code.  The message argument is never used so the debugging messages are
      not as clear as they could be and the slub_debug=- behavior seems to be
      busted.  Then there's a printf fixup from Joe and the final patch is the
      one that force disables pointer hashing.
      
      This patch (of 4):
      
      Passing slub_debug=- on the kernel commandline is supposed to disable slub
      debugging.  This is especially useful with CONFIG_SLUB_DEBUG_ON where the
      default is to have slub debugging enabled in the build.  Due to some code
      reorganization this behavior was dropped, but the code to make it work
      mostly stuck around.  Restore the previous behavior by disabling the
      static key when we parse the commandline and see that we're trying to
      disable slub debugging.
      
      Link: https://lkml.kernel.org/r/20210601182202.3011020-1-swboyd@chromium.org
      Link: https://lkml.kernel.org/r/20210601182202.3011020-2-swboyd@chromium.org
      Fixes: ca0cab65
      
       ("mm, slub: introduce static key for slub_debug()")
      Signed-off-by: default avatarStephen Boyd <swboyd@chromium.org>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      02ac47d0
    • Hyeonggon Yoo's avatar
      mm, slub: change run-time assertion in kmalloc_index() to compile-time · 588c7fa0
      Hyeonggon Yoo authored
      
      
      Currently when size is not supported by kmalloc_index, compiler will
      generate a run-time BUG() while compile-time error is also possible, and
      better.  So change BUG to BUILD_BUG_ON_MSG to make compile-time check
      possible.
      
      Also remove code that allocates more than 32MB because current
      implementation supports only up to 32MB.
      
      [42.hyeyoo@gmail.com: fix support for clang 10]
        Link: https://lkml.kernel.org/r/20210518181247.GA10062@hyeyoo
      [vbabka@suse.cz: fix false-positive assert in kernel/bpf/local_storage.c]
        Link: https://lkml.kernel.org/r/bea97388-01df-8eac-091b-a3c89b4a4a09@suse.czLink: https://lkml.kernel.org/r/20210511173448.GA54466@hyeyoo
      [elver@google.com: kfence fix]
        Link: https://lkml.kernel.org/r/20210512195227.245000695c9014242e9a00e5@linux-foundation.org
      
      Signed-off-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Marco Elver <elver@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      588c7fa0
    • Oliver Glitta's avatar
      slub: remove resiliency_test() function · 3d8e374c
      Oliver Glitta authored
      
      
      Function resiliency_test() is hidden behind #ifdef SLUB_RESILIENCY_TEST
      that is not part of Kconfig, so nobody runs it.
      
      This function is replaced with KUnit test for SLUB added by the previous
      patch "selftests: add a KUnit test for SLUB debugging functionality".
      
      Link: https://lkml.kernel.org/r/20210511150734.3492-3-glittao@gmail.com
      Signed-off-by: default avatarOliver Glitta <glittao@gmail.com>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Oliver Glitta <glittao@gmail.com>
      Cc: Brendan Higgins <brendanhiggins@google.com>
      Cc: Daniel Latypov <dlatypov@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3d8e374c
    • Oliver Glitta's avatar
      mm/slub, kunit: add a KUnit test for SLUB debugging functionality · 1f9f78b1
      Oliver Glitta authored
      
      
      SLUB has resiliency_test() function which is hidden behind #ifdef
      SLUB_RESILIENCY_TEST that is not part of Kconfig, so nobody runs it.
      KUnit should be a proper replacement for it.
      
      Try changing byte in redzone after allocation and changing pointer to next
      free node, first byte, 50th byte and redzone byte.  Check if validation
      finds errors.
      
      There are several differences from the original resiliency test: Tests
      create own caches with known state instead of corrupting shared kmalloc
      caches.
      
      The corruption of freepointer uses correct offset, the original resiliency
      test got broken with freepointer changes.
      
      Scratch changing random byte test, because it does not have meaning in
      this form where we need deterministic results.
      
      Add new option CONFIG_SLUB_KUNIT_TEST in Kconfig.  Tests next_pointer,
      first_word and clobber_50th_byte do not run with KASAN option on.  Because
      the test deliberately modifies non-allocated objects.
      
      Use kunit_resource to count errors in cache and silence bug reports.
      Count error whenever slab_bug() or slab_fix() is called or when the count
      of pages is wrong.
      
      [glittao@gmail.com: remove unused function test_exit(), from SLUB KUnit test]
        Link: https://lkml.kernel.org/r/20210512140656.12083-1-glittao@gmail.com
      [akpm@linux-foundation.org: export kasan_enable/disable_current to modules]
      
      Link: https://lkml.kernel.org/r/20210511150734.3492-2-glittao@gmail.com
      Signed-off-by: default avatarOliver Glitta <glittao@gmail.com>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarDaniel Latypov <dlatypov@google.com>
      Acked-by: default avatarMarco Elver <elver@google.com>
      Cc: Brendan Higgins <brendanhiggins@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1f9f78b1
    • Vlastimil Babka's avatar
      kunit: make test->lock irq safe · 26c6cb7c
      Vlastimil Babka authored
      
      
      The upcoming SLUB kunit test will be calling kunit_find_named_resource()
      from a context with disabled interrupts.  That means kunit's test->lock
      needs to be IRQ safe to avoid potential deadlocks and lockdep splats.
      
      This patch therefore changes the test->lock usage to spin_lock_irqsave()
      and spin_unlock_irqrestore().
      
      Link: https://lkml.kernel.org/r/20210511150734.3492-1-glittao@gmail.com
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarOliver Glitta <glittao@gmail.com>
      Reviewed-by: default avatarBrendan Higgins <brendanhiggins@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Daniel Latypov <dlatypov@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      26c6cb7c
    • gumingtao's avatar
      slab: use __func__ to trace function name · 4acaa7d5
      gumingtao authored
      
      
      It is better to use __func__ to trace function name.
      
      Link: https://lkml.kernel.org/r/31fdbad5c45cd1e26be9ff37be321b8586b80fee.1624355507.git.gumingtao@xiaomi.com
      Signed-off-by: default avatargumingtao <gumingtao@xiaomi.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Reviewed-by: default avatarAaron Tomlin <atomlin@redhat.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4acaa7d5
    • Wang Qing's avatar
      doc: watchdog: modify the doc related to "watchdog/%u" · 256f7a67
      Wang Qing authored
      
      
      "watchdog/%u" threads has be replaced by cpu_stop_work.  The current
      description is extremely misleading.
      
      Link: https://lkml.kernel.org/r/1619687073-24686-5-git-send-email-wangqing@vivo.com
      Signed-off-by: default avatarWang Qing <wangqing@vivo.com>
      Reviewed-by: default avatarPetr Mladek <pmladek@suse.com>
      Cc: "Guilherme G. Piccoli" <gpiccoli@canonical.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
      Cc: Qais Yousef <qais.yousef@arm.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Santosh Sivaraj <santosh@fossix.org>
      Cc: Stephen Kitt <steve@sk2.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      256f7a67
    • Wang Qing's avatar
      doc: watchdog: modify the explanation related to watchdog thread · e55fda8c
      Wang Qing authored
      
      
      "watchdog/%u" threads has be replaced by cpu_stop_work.  The current
      description is extremely misleading.
      
      Link: https://lkml.kernel.org/r/1619687073-24686-4-git-send-email-wangqing@vivo.com
      Signed-off-by: default avatarWang Qing <wangqing@vivo.com>
      Reviewed-by: default avatarPetr Mladek <pmladek@suse.com>
      Cc: "Guilherme G. Piccoli" <gpiccoli@canonical.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
      Cc: Qais Yousef <qais.yousef@arm.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Santosh Sivaraj <santosh@fossix.org>
      Cc: Stephen Kitt <steve@sk2.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e55fda8c
    • Wang Qing's avatar
      kernel: watchdog: modify the explanation related to watchdog thread · b124ac45
      Wang Qing authored
      
      
      The watchdog thread has been replaced by cpu_stop_work, modify the
      explanation related.
      
      Link: https://lkml.kernel.org/r/1619687073-24686-2-git-send-email-wangqing@vivo.com
      Signed-off-by: default avatarWang Qing <wangqing@vivo.com>
      Reviewed-by: default avatarPetr Mladek <pmladek@suse.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
      Cc: Joe Perches <joe@perches.com>
      Cc: Stephen Kitt <steve@sk2.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: "Guilherme G. Piccoli" <gpiccoli@canonical.com>
      Cc: Qais Yousef <qais.yousef@arm.com>
      Cc: Santosh Sivaraj <santosh@fossix.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b124ac45
    • Colin Ian King's avatar
      ocfs2: remove redundant initialization of variable ret · 7ed6d4e4
      Colin Ian King authored
      
      
      The variable ret is being initialized with a value that is never read, the
      assignment is redundant and can be removed.
      
      Addresses-Coverity: ("Unused value")
      Link: https://lkml.kernel.org/r/20210613135148.74658-1-colin.king@canonical.com
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Acked-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7ed6d4e4
    • Chen Huang's avatar
      ocfs2: replace simple_strtoull() with kstrtoull() · f0f798db
      Chen Huang authored
      
      
      simple_strtoull() is deprecated in some situation since it does not check
      for the range overflow, use kstrtoull() instead.
      
      Link: https://lkml.kernel.org/r/20210526092020.554341-3-chenhuang5@huawei.com
      Signed-off-by: default avatarChen Huang <chenhuang5@huawei.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f0f798db
    • Wan Jiabing's avatar
      ocfs2: remove repeated uptodate check for buffer · 01f01399
      Wan Jiabing authored
      In commit 60f91826
      
       ("buffer: Avoid setting buffer bits that are
      already set"), function set_buffer_##name was added a test_bit() to check
      buffer, which is the same as function buffer_##name.  The
      !buffer_uptodate(bh) here is a repeated check.  Remove it.
      
      Link: https://lkml.kernel.org/r/20210425025702.13628-1-wanjiabing@vivo.com
      Signed-off-by: default avatarWan Jiabing <wanjiabing@vivo.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      01f01399
    • Colin Ian King's avatar
      ocfs2: remove redundant assignment to pointer queue · ca49b6d8
      Colin Ian King authored
      
      
      The pointer queue is being initialized with a value that is never read and
      it is being updated later with a new value.  The initialization is
      redundant and can be removed.
      
      Addresses-Coverity: ("Unused value")
      Link: https://lkml.kernel.org/r/20210513113957.57539-1-colin.king@canonical.com
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Acked-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ca49b6d8
    • Dan Carpenter's avatar
      ocfs2: fix snprintf() checking · 54e948c6
      Dan Carpenter authored
      The snprintf() function returns the number of bytes which would have been
      printed if the buffer was large enough.  In other words it can return ">=
      remain" but this code assumes it returns "== remain".
      
      The run time impact of this bug is not very severe.  The next iteration
      through the loop would trigger a WARN() when we pass a negative limit to
      snprintf().  We would then return success instead of -E2BIG.
      
      The kernel implementation of snprintf() will never return negatives so
      there is no need to check and I have deleted that dead code.
      
      Link: https://lkml.kernel.org/r/20210511135350.GV1955@kadam
      Fixes: a860f6eb ("ocfs2: sysfile interfaces for online file check")
      Fixes: 74ae4e10
      
       ("ocfs2: Create stack glue sysfs files.")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      54e948c6