Skip to content
  1. Feb 25, 2021
    • Mike Kravetz's avatar
      hugetlbfs: remove special hugetlbfs_set_page_dirty() · a4fa34cd
      Mike Kravetz authored
      
      
      Matthew Wilcox noticed that hugetlbfs_set_page_dirty always returns 0.
      Instead, it should return 1 or 0 depending on the previous state of the
      dirty bit.  In addition, the call to compound_head is redundant as it is
      also performed in calling routine set_page_dirty.
      
      Replace the hugetlbfs specific routine hugetlbfs_set_page_dirty with
      __set_page_dirty_no_writeback as it addresses both of these issues.
      
      Link: https://lkml.kernel.org/r/20201221192542.15732-2-mike.kravetz@oracle.com
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Suggested-by: default avatarMatthew Wilcox <willy@infradead.org>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a4fa34cd
    • Mike Kravetz's avatar
      mm/hugetlb: change hugetlb_reserve_pages() to type bool · 33b8f84a
      Mike Kravetz authored
      
      
      While reviewing a bug in hugetlb_reserve_pages, it was noticed that all
      callers ignore the return value.  Any failure is considered an ENOMEM
      error by the callers.
      
      Change the function to be of type bool.  The function will return true if
      the reservation was successful, false otherwise.  Callers currently assume
      a zero return code indicates success.  Change the callers to look for true
      to indicate success.  No functional change, only code cleanup.
      
      Link: https://lkml.kernel.org/r/20201221192542.15732-1-mike.kravetz@oracle.com
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      33b8f84a
    • Tang Yizhou's avatar
      mm, oom: fix a comment in dump_task() · f8159c13
      Tang Yizhou authored
      
      
      If p is a kthread, it will be checked in oom_unkillable_task() so
      we can delete the corresponding comment.
      
      Link: https://lkml.kernel.org/r/20210125133006.7242-1-tangyizhou@huawei.com
      Signed-off-by: default avatarTang Yizhou <tangyizhou@huawei.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f8159c13
    • Miaohe Lin's avatar
      mm/mempolicy: use helper range_in_vma() in queue_pages_test_walk() · ce33135c
      Miaohe Lin authored
      The helper range_in_vma() is introduced via commit 017b1660
      
       ("mm:
      migration: fix migration of huge PMD shared pages"). But we forgot to
      use it in queue_pages_test_walk().
      
      Link: https://lkml.kernel.org/r/20210130091352.20220-1-linmiaohe@huawei.com
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ce33135c
    • Huang Ying's avatar
      numa balancing: migrate on fault among multiple bound nodes · bda420b9
      Huang Ying authored
      
      
      Now, NUMA balancing can only optimize the page placement among the NUMA
      nodes if the default memory policy is used.  Because the memory policy
      specified explicitly should take precedence.  But this seems too strict in
      some situations.  For example, on a system with 4 NUMA nodes, if the
      memory of an application is bound to the node 0 and 1, NUMA balancing can
      potentially migrate the pages between the node 0 and 1 to reduce
      cross-node accessing without breaking the explicit memory binding policy.
      
      So in this patch, we add MPOL_F_NUMA_BALANCING mode flag to
      set_mempolicy() when mode is MPOL_BIND.  With the flag specified, NUMA
      balancing will be enabled within the thread to optimize the page placement
      within the constrains of the specified memory binding policy.  With the
      newly added flag, the NUMA balancing control mechanism becomes,
      
       - sysctl knob numa_balancing can enable/disable the NUMA balancing
         globally.
      
       - even if sysctl numa_balancing is enabled, the NUMA balancing will be
         disabled for the memory areas or applications with the explicit
         memory policy by default.
      
       - MPOL_F_NUMA_BALANCING can be used to enable the NUMA balancing for
         the applications when specifying the explicit memory policy
         (MPOL_BIND).
      
      Various page placement optimization based on the NUMA balancing can be
      done with these flags.  As the first step, in this patch, if the memory of
      the application is bound to multiple nodes (MPOL_BIND), and in the hint
      page fault handler the accessing node are in the policy nodemask, the page
      will be tried to be migrated to the accessing node to reduce the
      cross-node accessing.
      
      If the newly added MPOL_F_NUMA_BALANCING flag is specified by an
      application on an old kernel version without its support, set_mempolicy()
      will return -1 and errno will be set to EINVAL.  The application can use
      this behavior to run on both old and new kernel versions.
      
      And if the MPOL_F_NUMA_BALANCING flag is specified for the mode other than
      MPOL_BIND, set_mempolicy() will return -1 and errno will be set to EINVAL
      as before.  Because we don't support optimization based on the NUMA
      balancing for these modes.
      
      In the previous version of the patch, we tried to reuse MPOL_MF_LAZY for
      mbind().  But that flag is tied to MPOL_MF_MOVE.*, so it seems not a good
      API/ABI for the purpose of the patch.
      
      And because it's not clear whether it's necessary to enable NUMA balancing
      for a specific memory area inside an application, so we only add the flag
      at the thread level (set_mempolicy()) instead of the memory area level
      (mbind()).  We can do that when it become necessary.
      
      To test the patch, we run a test case as follows on a 4-node machine with
      192 GB memory (48 GB per node).
      
      1. Change pmbench memory accessing benchmark to call set_mempolicy()
         to bind its memory to node 1 and 3 and enable NUMA balancing.  Some
         related code snippets are as follows,
      
           #include <numaif.h>
           #include <numa.h>
      
      	struct bitmask *bmp;
      	int ret;
      
      	bmp = numa_parse_nodestring("1,3");
      	ret = set_mempolicy(MPOL_BIND | MPOL_F_NUMA_BALANCING,
      			    bmp->maskp, bmp->size + 1);
      	/* If MPOL_F_NUMA_BALANCING isn't supported, fall back to MPOL_BIND */
      	if (ret < 0 && errno == EINVAL)
      		ret = set_mempolicy(MPOL_BIND, bmp->maskp, bmp->size + 1);
      	if (ret < 0) {
      		perror("Failed to call set_mempolicy");
      		exit(-1);
      	}
      
      2. Run a memory eater on node 3 to use 40 GB memory before running pmbench.
      
      3. Run pmbench with 64 processes, the working-set size of each process
         is 640 MB, so the total working-set size is 64 * 640 MB = 40 GB.  The
         CPU and the memory (as in step 1.) of all pmbench processes is bound
         to node 1 and 3. So, after CPU usage is balanced, some pmbench
         processes run on the CPUs of the node 3 will access the memory of
         the node 1.
      
      4. After the pmbench processes run for 100 seconds, kill the memory
         eater.  Now it's possible for some pmbench processes to migrate
         their pages from node 1 to node 3 to reduce cross-node accessing.
      
      Test results show that, with the patch, the pages can be migrated from
      node 1 to node 3 after killing the memory eater, and the pmbench score
      can increase about 17.5%.
      
      Link: https://lkml.kernel.org/r/20210120061235.148637-2-ying.huang@intel.com
      Signed-off-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bda420b9
    • Vlastimil Babka's avatar
      mm, compaction: make fast_isolate_freepages() stay within zone · 6e2b7044
      Vlastimil Babka authored
      Compaction always operates on pages from a single given zone when
      isolating both pages to migrate and freepages.  Pageblock boundaries are
      intersected with zone boundaries to be safe in case zone starts or ends in
      the middle of pageblock.  The use of pageblock_pfn_to_page() protects
      against non-contiguous pageblocks.
      
      The functions fast_isolate_freepages() and fast_isolate_around() don't
      currently protect the fast freepage isolation thoroughly enough against
      these corner cases, and can result in freepage isolation operate outside
      of zone boundaries:
      
       - in fast_isolate_freepages() if we get a pfn from the first pageblock
         of a zone that starts in the middle of that pageblock, 'highest' can
         be a pfn outside of the zone.
      
         If we fail to isolate anything in this function, we may then call
         fast_isolate_around() on a pfn outside of the zone and there
         effectively do a set_pageblock_skip(page_to_pfn(highest)) which may
         currently hit a VM_BUG_ON() in some configurations
      
       - fast_isolate_around() checks only the zone end boundary and not
         beginning, nor that the pageblock is contiguous (with
         pageblock_pfn_to_page()) so it's possible that we end up calling
         isolate_freepages_block() on a range of pfn's from two different
         zones and end up e.g. isolating freepages under the wrong zone's
         lock.
      
      This patch should fix the above issues.
      
      Link: https://lkml.kernel.org/r/20210217173300.6394-1-vbabka@suse.cz
      Fixes: 5a811889
      
       ("mm, compaction: use free lists to quickly locate a migration target")
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6e2b7044
    • Wonhyuk Yang's avatar
      mm/compaction: fix misbehaviors of fast_find_migrateblock() · 15d28d0d
      Wonhyuk Yang authored
      In the fast_find_migrateblock(), it iterates ocer the freelist to find the
      proper pageblock.  But there are some misbehaviors.
      
      First, if the page we found is equal to cc->migrate_pfn, it is considered
      that we didn't find a suitable pageblock.  Secondly, if the loop was
      terminated because order is less than PAGE_ALLOC_COSTLY_ORDER, it could be
      considered that we found a suitable one.  Thirdly, if the skip bit is set
      on the page block and we goto continue, it doesn't check nr_scanned.
      Fourthly, if the page block's skip bit is set, it checks that page block
      is the last of list, which is unnecessary.
      
      Link: https://lkml.kernel.org/r/20210128130411.6125-1-vvghjk1234@gmail.com
      Fixes: 70b44595
      
       ("mm, compaction: use free lists to quickly locate a migration source")
      Signed-off-by: default avatarWonhyuk Yang <vvghjk1234@gmail.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      15d28d0d
    • Charan Teja Reddy's avatar
      mm/compaction: correct deferral logic for proactive compaction · 40d7e203
      Charan Teja Reddy authored
      
      
      should_proactive_compact_node() returns true when sum of the weighted
      fragmentation score of all the zones in the node is greater than the
      wmark_high of compaction, which then triggers the proactive compaction
      that operates on the individual zones of the node.  But proactive
      compaction runs on the zone only when its weighted fragmentation score
      is greater than wmark_low(=wmark_high - 10).
      
      This means that the sum of the weighted fragmentation scores of all the
      zones can exceed the wmark_high but individual weighted fragmentation zone
      scores can still be less than wmark_low which makes the unnecessary
      trigger of the proactive compaction only to return doing nothing.
      
      Issue with the return of proactive compaction with out even trying is its
      deferral.  It is simply deferred for 1 << COMPACT_MAX_DEFER_SHIFT if the
      scores across the proactive compaction is same, thinking that compaction
      didn't make any progress but in reality it didn't even try.  With the
      delay between successive retries for proactive compaction is 500msec, it
      can result into the deferral for ~30sec with out even trying the proactive
      compaction.
      
      Test scenario is that: compaction_proactiveness=50 thus the wmark_low = 50
      and wmark_high = 60.  System have 2 zones(Normal and Movable) with sizes
      5GB and 6GB respectively.  After opening some apps on the android, the
      weighted fragmentation scores of these zones are 47 and 49 respectively.
      Since the sum of these fragmentation scores are above the wmark_high which
      triggers the proactive compaction and there since the individual zones
      weighted fragmentation scores are below wmark_low, it returns without
      trying the proactive compaction.  As a result the weighted fragmentation
      scores of the zones are still 47 and 49 which makes the existing logic to
      defer the compaction thinking that noprogress is made across the
      compaction.
      
      Fix this by checking just zone fragmentation score, not the weighted, in
      __compact_finished() and use the zones weighted fragmentation score in
      fragmentation_score_node().  In the test case above, If the weighted
      average of is above wmark_high, then individual score (not adjusted) of
      atleast one zone has to be above wmark_high.  Thus it avoids the
      unnecessary trigger and deferrals of the proactive compaction.
      
      Link: https://lkml.kernel.org/r/1610989938-31374-1-git-send-email-charante@codeaurora.org
      Signed-off-by: default avatarCharan Teja Reddy <charante@codeaurora.org>
      Suggested-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarKhalid Aziz <khalid.aziz@oracle.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nitin Gupta <ngupta@nitingupta.dev>
      Cc: Vinayak Menon <vinmenon@codeaurora.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      40d7e203
    • Miaohe Lin's avatar
      mm/compaction: remove duplicated VM_BUG_ON_PAGE !PageLocked · e2d26aa5
      Miaohe Lin authored
      
      
      The VM_BUG_ON_PAGE(!PageLocked(page), page) is also done in PageMovable.
      Remove this explicitly one.
      
      Link: https://lkml.kernel.org/r/20210109081420.46030-1-linmiaohe@huawei.com
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e2d26aa5
    • Alex Shi's avatar
      mm/compaction: remove rcu_read_lock during page compaction · d99fd5fe
      Alex Shi authored
      
      
      isolate_migratepages_block() used rcu_read_lock() with the intention of
      safeguarding against the mem_cgroup being destroyed concurrently; but
      its TestClearPageLRU already protects against that.  Delete the
      unnecessary rcu_read_lock() and _unlock().
      
      Hugh Dickins helped on commit log polishing, Thanks!
      
      Link: https://lkml.kernel.org/r/1608614453-10739-3-git-send-email-alex.shi@linux.alibaba.com
      Signed-off-by: default avatarAlex Shi <alex.shi@linux.alibaba.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d99fd5fe
    • Miaohe Lin's avatar
      z3fold: simplify the zhdr initialization code in init_z3fold_page() · c457cd96
      Miaohe Lin authored
      
      
      We can simplify the zhdr initialization by memset() the zhdr first
      instead of set struct member to zero one by one.  This would also make
      code more compact and clear.
      
      Link: https://lkml.kernel.org/r/20210120085851.16159-1-linmiaohe@huawei.com
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Vitaly Wool <vitaly.wool@konsulko.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c457cd96
    • Miaohe Lin's avatar
      z3fold: remove unused attribute for release_z3fold_page · 70ad3196
      Miaohe Lin authored
      Since commit dcf5aedb
      
       ("z3fold: stricter locking and more careful
      reclaim"), release_z3fold_page() is used again.  So we can drop the
      unused attribute safely.
      
      Link: https://lkml.kernel.org/r/20210120084008.58432-1-linmiaohe@huawei.com
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Vitaly Wool <vitaly.wool@konsulko.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      70ad3196
    • Dave Hansen's avatar
      mm/vmscan: restore zone_reclaim_mode ABI · 51998364
      Dave Hansen authored
      
      
      I went to go add a new RECLAIM_* mode for the zone_reclaim_mode sysctl.
      Like a good kernel developer, I also went to go update the
      documentation.  I noticed that the bits in the documentation didn't
      match the bits in the #defines.
      
      The VM never explicitly checks the RECLAIM_ZONE bit.  The bit is,
      however implicitly checked when checking 'node_reclaim_mode==0'.  The
      RECLAIM_ZONE #define was removed in a cleanup.  That, by itself is fine.
      
      But, when the bit was removed (bit 0) the _other_ bit locations also got
      changed.  That's not OK because the bit values are documented to mean
      one specific thing.  Users surely do not expect the meaning to change
      from kernel to kernel.
      
      The end result is that if someone had a script that did:
      
      	sysctl vm.zone_reclaim_mode=1
      
      it would have gone from enabling node reclaim for clean unmapped pages
      to writing out pages during node reclaim after the commit in question.
      That's not great.
      
      Put the bits back the way they were and add a comment so something like
      this is a bit harder to do again.  Update the documentation to make it
      clear that the first bit is ignored.
      
      Link: https://lkml.kernel.org/r/20210219172555.FF0CDF23@viggo.jf.intel.com
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Fixes: 648b5cf3
      
       ("mm/vmscan: remove unused RECLAIM_OFF/RECLAIM_ZONE")
      Reviewed-by: default avatarBen Widawsky <ben.widawsky@intel.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Cc: Daniel Wagner <dwagner@suse.de>
      Cc: "Tobin C. Harding" <tobin@kernel.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Qian Cai <cai@lca.pw>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      51998364
    • Mike Kravetz's avatar
      hugetlb: fix uninitialized subpool pointer · ff546117
      Mike Kravetz authored
      
      
      Gerald Schaefer reported a panic on s390 in hugepage_subpool_put_pages()
      with linux-next 5.12.0-20210222.
      Call trace:
        hugepage_subpool_put_pages.part.0+0x2c/0x138
        __free_huge_page+0xce/0x310
        alloc_pool_huge_page+0x102/0x120
        set_max_huge_pages+0x13e/0x350
        hugetlb_sysctl_handler_common+0xd8/0x110
        hugetlb_sysctl_handler+0x48/0x58
        proc_sys_call_handler+0x138/0x238
        new_sync_write+0x10e/0x198
        vfs_write.part.0+0x12c/0x238
        ksys_write+0x68/0xf8
        do_syscall+0x82/0xd0
        __do_syscall+0xb4/0xc8
        system_call+0x72/0x98
      
      This is a result of the change which moved the hugetlb page subpool
      pointer from page->private to page[1]->private.  When new pages are
      allocated from the buddy allocator, the private field of the head
      page will be cleared, but the private field of subpages is not modified.
      Therefore, old values may remain.
      
      Fix by initializing hugetlb page subpool pointer in prep_new_huge_page().
      
      Link: https://lkml.kernel.org/r/20210223215544.313871-1-mike.kravetz@oracle.com
      Fixes: f1280272ae4d ("hugetlb: use page.private for hugetlb specific page flags")
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reported-by: default avatarGerald Schaefer <gerald.schaefer@linux.ibm.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ff546117
    • Mike Kravetz's avatar
      include/linux/hugetlb.h: add synchronization information for new hugetlb specific flags · d95c0337
      Mike Kravetz authored
      
      
      Add comments, no functional change.
      
      Link: https://lkml.kernel.org/r/62a80585-2a73-10cc-4a2d-5721540d4ad2@oracle.com
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d95c0337
    • Mike Kravetz's avatar
      hugetlb: convert PageHugeFreed to HPageFreed flag · 6c037149
      Mike Kravetz authored
      
      
      Use new hugetlb specific HPageFreed flag to replace the PageHugeFreed
      interfaces.
      
      Link: https://lkml.kernel.org/r/20210122195231.324857-6-mike.kravetz@oracle.com
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6c037149
    • Mike Kravetz's avatar
      hugetlb: convert PageHugeTemporary() to HPageTemporary flag · 9157c311
      Mike Kravetz authored
      
      
      Use new hugetlb specific HPageTemporary flag to replace the
      PageHugeTemporary() interfaces.  PageHugeTemporary does contain a
      PageHuge() check.  However, this interface is only used within hugetlb
      code where we know we are dealing with a hugetlb page.  Therefore, the
      check can be eliminated.
      
      Link: https://lkml.kernel.org/r/20210122195231.324857-5-mike.kravetz@oracle.com
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9157c311
    • Mike Kravetz's avatar
      hugetlb: convert page_huge_active() HPageMigratable flag · 8f251a3d
      Mike Kravetz authored
      
      
      Use the new hugetlb page specific flag HPageMigratable to replace the
      page_huge_active interfaces.  By it's name, page_huge_active implied that
      a huge page was on the active list.  However, that is not really what code
      checking the flag wanted to know.  It really wanted to determine if the
      huge page could be migrated.  This happens when the page is actually added
      to the page cache and/or task page table.  This is the reasoning behind
      the name change.
      
      The VM_BUG_ON_PAGE() calls in the *_huge_active() interfaces are not
      really necessary as we KNOW the page is a hugetlb page.  Therefore, they
      are removed.
      
      The routine page_huge_active checked for PageHeadHuge before testing the
      active bit.  This is unnecessary in the case where we hold a reference or
      lock and know it is a hugetlb head page.  page_huge_active is also called
      without holding a reference or lock (scan_movable_pages), and can race
      with code freeing the page.  The extra check in page_huge_active shortened
      the race window, but did not prevent the race.  Offline code calling
      scan_movable_pages already deals with these races, so removing the check
      is acceptable.  Add comment to racy code.
      
      [songmuchun@bytedance.com: remove set_page_huge_active() declaration from include/linux/hugetlb.h]
        Link: https://lkml.kernel.org/r/CAMZfGtUda+KoAZscU0718TN61cSFwp4zy=y2oZ=+6Z2TAZZwng@mail.gmail.com
      
      Link: https://lkml.kernel.org/r/20210122195231.324857-3-mike.kravetz@oracle.com
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Reviewed-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8f251a3d
    • Mike Kravetz's avatar
      hugetlb: use page.private for hugetlb specific page flags · d6995da3
      Mike Kravetz authored
      
      
      Patch series "create hugetlb flags to consolidate state", v3.
      
      While discussing a series of hugetlb fixes in [1], it became evident that
      the hugetlb specific page state information is stored in a somewhat
      haphazard manner.  Code dealing with state information would be easier to
      read, understand and maintain if this information was stored in a
      consistent manner.
      
      This series uses page.private of the hugetlb head page for storing a set
      of hugetlb specific page flags.  Routines are priovided for test, set and
      clear of the flags.
      
      [1] https://lore.kernel.org/r/20210106084739.63318-1-songmuchun@bytedance.com
      
      This patch (of 4):
      
      As hugetlbfs evolved, state information about hugetlb pages was added.
      One 'convenient' way of doing this was to use available fields in tail
      pages.  Over time, it has become difficult to know the meaning or contents
      of fields simply by looking at a small bit of code.  Sometimes, the naming
      is just confusing.  For example: The PagePrivate flag indicates a huge
      page reservation was consumed and needs to be restored if an error is
      encountered and the page is freed before it is instantiated.  The
      page.private field contains the pointer to a subpool if the page is
      associated with one.
      
      In an effort to make the code more readable, use page.private to contain
      hugetlb specific page flags.  These flags will have test, set and clear
      functions similar to those used for 'normal' page flags.  More
      importantly, an enum of flag values will be created with names that
      actually reflect their purpose.
      
      In this patch,
      - Create infrastructure for hugetlb specific page flag functions
      - Move subpool pointer to page[1].private to make way for flags
        Create routines with meaningful names to modify subpool field
      - Use new HPageRestoreReserve flag instead of PagePrivate
      
      Conversion of other state information will happen in subsequent patches.
      
      Link: https://lkml.kernel.org/r/20210122195231.324857-1-mike.kravetz@oracle.com
      Link: https://lkml.kernel.org/r/20210122195231.324857-2-mike.kravetz@oracle.com
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d6995da3
    • Oscar Salvador's avatar
      mm: workingset: clarify eviction order and distance calculation · aeddcee6
      Oscar Salvador authored
      
      
      The premise of the refault distance is that it can be seen as a deficit of
      the inactive list space, so that if the inactive list would have had (R -
      E) more slots, the page would not have been evicted but promoted to the
      active list instead.
      
      However, the way the code is ordered right now set us to be off by one, so
      the real number of slots would be (R - E) + 1.  I stumbled upon this when
      trying to understand the code and it puzzled me that the comments did not
      match what the code did.
      
      This it not an issue at all since evictions and refaults tend to happen in
      a number large enough that being off-by-one does not have any impact - and
      since the compiler and CPUs are free to rearrange the execution sequence
      anyway.
      
      But as Johannes says, it is better to re-arrange the code in the proper
      order since otherwise would be misleading to somebody who is actively
      reading and trying to understand the logic of the code - like it happened
      to me.
      
      Link: https://lkml.kernel.org/r/20210201060651.3781-1-osalvador@suse.de
      Signed-off-by: default avatarOscar Salvador <osalvador@suse.de>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      aeddcee6
    • Yu Zhao's avatar
      mm/vmscan.c: make lruvec_lru_size() static · 2091339d
      Yu Zhao authored
      All other references to the function were removed after
      commit b910718a
      
       ("mm: vmscan: detect file thrashing at the reclaim
      root").
      
      Link: https://lore.kernel.org/linux-mm/20201207220949.830352-11-yuzhao@google.com/
      Link: https://lkml.kernel.org/r/20210122220600.906146-11-yuzhao@google.com
      Signed-off-by: default avatarYu Zhao <yuzhao@google.com>
      Reviewed-by: default avatarAlex Shi <alex.shi@linux.alibaba.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2091339d
    • Yu Zhao's avatar
      include/linux/mm_inline.h: fold __update_lru_size() into its sole caller · 289ccba1
      Yu Zhao authored
      All other references to the function were removed after commit
      a892cb6b
      
       ("mm/vmscan.c: use update_lru_size() in update_lru_sizes()").
      
      Link: https://lore.kernel.org/linux-mm/20201207220949.830352-10-yuzhao@google.com/
      Link: https://lkml.kernel.org/r/20210122220600.906146-10-yuzhao@google.com
      Signed-off-by: default avatarYu Zhao <yuzhao@google.com>
      Reviewed-by: default avatarAlex Shi <alex.shi@linux.alibaba.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      289ccba1
    • Yu Zhao's avatar
      include/linux/mm_inline.h: fold page_lru_base_type() into its sole caller · c1770e34
      Yu Zhao authored
      
      
      We've removed all other references to this function.
      
      Link: https://lore.kernel.org/linux-mm/20201207220949.830352-9-yuzhao@google.com/
      Link: https://lkml.kernel.org/r/20210122220600.906146-9-yuzhao@google.com
      Signed-off-by: default avatarYu Zhao <yuzhao@google.com>
      Reviewed-by: default avatarAlex Shi <alex.shi@linux.alibaba.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c1770e34
    • Yu Zhao's avatar
      mm: VM_BUG_ON lru page flags · bc711271
      Yu Zhao authored
      
      
      Move scattered VM_BUG_ONs to two essential places that cover all
      lru list additions and deletions.
      
      Link: https://lore.kernel.org/linux-mm/20201207220949.830352-8-yuzhao@google.com/
      Link: https://lkml.kernel.org/r/20210122220600.906146-8-yuzhao@google.com
      Signed-off-by: default avatarYu Zhao <yuzhao@google.com>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bc711271
    • Yu Zhao's avatar
      mm: add __clear_page_lru_flags() to replace page_off_lru() · 87560179
      Yu Zhao authored
      
      
      Similar to page_off_lru(), the new function does non-atomic clearing
      of PageLRU() in addition to PageActive() and PageUnevictable(), on a
      page that has no references left.
      
      If PageActive() and PageUnevictable() are both set, refuse to clear
      either and leave them to bad_page(). This is a behavior change that
      is meant to help debug.
      
      Link: https://lore.kernel.org/linux-mm/20201207220949.830352-7-yuzhao@google.com/
      Link: https://lkml.kernel.org/r/20210122220600.906146-7-yuzhao@google.com
      Signed-off-by: default avatarYu Zhao <yuzhao@google.com>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      87560179
    • Yu Zhao's avatar
      mm/swap.c: don't pass "enum lru_list" to del_page_from_lru_list() · 46ae6b2c
      Yu Zhao authored
      
      
      The parameter is redundant in the sense that it can be potentially
      extracted from the "struct page" parameter by page_lru(). We need to
      make sure that existing PageActive() or PageUnevictable() remains
      until the function returns. A few places don't conform, and simple
      reordering fixes them.
      
      This patch may have left page_off_lru() seemingly odd, and we'll take
      care of it in the next patch.
      
      Link: https://lore.kernel.org/linux-mm/20201207220949.830352-6-yuzhao@google.com/
      Link: https://lkml.kernel.org/r/20210122220600.906146-6-yuzhao@google.com
      Signed-off-by: default avatarYu Zhao <yuzhao@google.com>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      46ae6b2c
    • Yu Zhao's avatar
      mm/swap.c: don't pass "enum lru_list" to trace_mm_lru_insertion() · 86140453
      Yu Zhao authored
      
      
      The parameter is redundant in the sense that it can be extracted
      from the "struct page" parameter by page_lru() correctly.
      
      Link: https://lore.kernel.org/linux-mm/20201207220949.830352-5-yuzhao@google.com/
      Link: https://lkml.kernel.org/r/20210122220600.906146-5-yuzhao@google.com
      Signed-off-by: default avatarYu Zhao <yuzhao@google.com>
      Reviewed-by: default avatarAlex Shi <alex.shi@linux.alibaba.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      86140453
    • Yu Zhao's avatar
      mm: don't pass "enum lru_list" to lru list addition functions · 3a9c9788
      Yu Zhao authored
      
      
      The "enum lru_list" parameter to add_page_to_lru_list() and
      add_page_to_lru_list_tail() is redundant in the sense that it can
      be extracted from the "struct page" parameter by page_lru().
      
      A caveat is that we need to make sure PageActive() or
      PageUnevictable() is correctly set or cleared before calling
      these two functions. And they are indeed.
      
      Link: https://lore.kernel.org/linux-mm/20201207220949.830352-4-yuzhao@google.com/
      Link: https://lkml.kernel.org/r/20210122220600.906146-4-yuzhao@google.com
      Signed-off-by: default avatarYu Zhao <yuzhao@google.com>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3a9c9788
    • Yu Zhao's avatar
      include/linux/mm_inline.h: shuffle lru list addition and deletion functions · f90d8191
      Yu Zhao authored
      
      
      These functions will call page_lru() in the following patches.  Move them
      below page_lru() to avoid the forward declaration.
      
      Link: https://lore.kernel.org/linux-mm/20201207220949.830352-3-yuzhao@google.com/
      Link: https://lkml.kernel.org/r/20210122220600.906146-3-yuzhao@google.com
      Signed-off-by: default avatarYu Zhao <yuzhao@google.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f90d8191
    • Yu Zhao's avatar
      mm/vmscan.c: use add_page_to_lru_list() · 42895ea7
      Yu Zhao authored
      
      
      Patch series "mm: lru related cleanups", v2.
      
      The cleanups are intended to reduce the verbosity in lru list operations
      and make them less error-prone.  A typical example would be how the
      patches change __activate_page():
      
       static void __activate_page(struct page *page, struct lruvec *lruvec)
       {
       	if (!PageActive(page) && !PageUnevictable(page)) {
      -		int lru = page_lru_base_type(page);
       		int nr_pages = thp_nr_pages(page);
      
      -		del_page_from_lru_list(page, lruvec, lru);
      +		del_page_from_lru_list(page, lruvec);
       		SetPageActive(page);
      -		lru += LRU_ACTIVE;
      -		add_page_to_lru_list(page, lruvec, lru);
      +		add_page_to_lru_list(page, lruvec);
       		trace_mm_lru_activate(page);
      
      There are a few more places like __activate_page() and they are
      unnecessarily repetitive in terms of figuring out which list a page should
      be added onto or deleted from.  And with the duplicated code removed, they
      are easier to read, IMO.
      
      Patch 1 to 5 basically cover the above.  Patch 6 and 7 make code more
      robust by improving bug reporting.  Patch 8, 9 and 10 take care of some
      dangling helpers left in header files.
      
      This patch (of 10):
      
      There is add_page_to_lru_list(), and move_pages_to_lru() should reuse it,
      not duplicate it.
      
      Link: https://lkml.kernel.org/r/20210122220600.906146-1-yuzhao@google.com
      Link: https://lore.kernel.org/linux-mm/20201207220949.830352-2-yuzhao@google.com/
      Link: https://lkml.kernel.org/r/20210122220600.906146-2-yuzhao@google.com
      Signed-off-by: default avatarYu Zhao <yuzhao@google.com>
      Reviewed-by: default avatarAlex Shi <alex.shi@linux.alibaba.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      42895ea7
    • Miaohe Lin's avatar
      mm/workingset.c: avoid unnecessary max_nodes estimation in count_shadow_nodes() · 725cac1c
      Miaohe Lin authored
      
      
      If list_lru_shrink_count is 0, we always return SHRINK_EMPTY regardless of
      the value of max_nodes.  So we can return early if nodes == 0 to save some
      cpu cycles of approximating a reasonable limit for the nodes.
      
      Link: https://lkml.kernel.org/r/20210123073825.46709-1-linmiaohe@huawei.com
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      725cac1c
    • Alex Shi's avatar
      mm/vmscan: __isolate_lru_page_prepare() cleanup · c2135f7c
      Alex Shi authored
      
      
      The function just returns 2 results, so using a 'switch' to deal with its
      result is unnecessary.  Also simplify it to a bool func as Vlastimil
      suggested.
      
      Also remove 'goto' by reusing list_move(), and take Matthew Wilcox's
      suggestion to update comments in function.
      
      Link: https://lkml.kernel.org/r/728874d7-2d93-4049-68c1-dcc3b2d52ccd@linux.alibaba.com
      Signed-off-by: default avatarAlex Shi <alex.shi@linux.alibaba.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c2135f7c
    • Chen Wandun's avatar
      mm/hugetlb: suppress wrong warning info when alloc gigantic page · 7ecc9565
      Chen Wandun authored
      If hugetlb_cma is enabled, it will skip boot time allocation when
      allocating gigantic page, that doesn't means allocation failure, so
      suppress this warning info.
      
      Link: https://lkml.kernel.org/r/20210219123909.13130-1-chenwandun@huawei.com
      Fixes: cf11e85f
      
       ("mm: hugetlb: optionally allocate gigantic hugepages using cma")
      Signed-off-by: default avatarChen Wandun <chenwandun@huawei.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Roman Gushchin <guro@fb.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7ecc9565
    • Mike Kravetz's avatar
      hugetlb: fix copy_huge_page_from_user contig page struct assumption · 3272cfc2
      Mike Kravetz authored
      page structs are not guaranteed to be contiguous for gigantic pages.  The
      routine copy_huge_page_from_user can encounter gigantic pages, yet it
      assumes page structs are contiguous when copying pages from user space.
      
      Since page structs for the target gigantic page are not contiguous, the
      data copied from user space could overwrite other pages not associated
      with the gigantic page and cause data corruption.
      
      Non-contiguous page structs are generally not an issue.  However, they can
      exist with a specific kernel configuration and hotplug operations.  For
      example: Configure the kernel with CONFIG_SPARSEMEM and
      !CONFIG_SPARSEMEM_VMEMMAP.  Then, hotplug add memory for the area where
      the gigantic page will be allocated.
      
      Link: https://lkml.kernel.org/r/20210217184926.33567-2-mike.kravetz@oracle.com
      Fixes: 8fb5debc
      
       ("userfaultfd: hugetlbfs: add hugetlb_mcopy_atomic_pte for userfaultfd support")
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Joao Martins <joao.m.martins@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3272cfc2
    • Mike Kravetz's avatar
      hugetlb: fix update_and_free_page contig page struct assumption · dbfee5ae
      Mike Kravetz authored
      page structs are not guaranteed to be contiguous for gigantic pages.  The
      routine update_and_free_page can encounter a gigantic page, yet it assumes
      page structs are contiguous when setting page flags in subpages.
      
      If update_and_free_page encounters non-contiguous page structs, we can see
      “BUG: Bad page state in process …” errors.
      
      Non-contiguous page structs are generally not an issue.  However, they can
      exist with a specific kernel configuration and hotplug operations.  For
      example: Configure the kernel with CONFIG_SPARSEMEM and
      !CONFIG_SPARSEMEM_VMEMMAP.  Then, hotplug add memory for the area where
      the gigantic page will be allocated.  Zi Yan outlined steps to reproduce
      here [1].
      
      [1] https://lore.kernel.org/linux-mm/16F7C58B-4D79-41C5-9B64-A1A1628F4AF2@nvidia.com/
      
      Link: https://lkml.kernel.org/r/20210217184926.33567-1-mike.kravetz@oracle.com
      Fixes: 944d9fec
      
       ("hugetlb: add support for gigantic page allocation at runtime")
      Signed-off-by: default avatarZi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Joao Martins <joao.m.martins@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dbfee5ae
    • Miaohe Lin's avatar
      mm/hugetlb: use helper huge_page_size() to get hugepage size · aca78307
      Miaohe Lin authored
      
      
      We can use helper huge_page_size() to get the hugepage size directly to
      simplify the code slightly.
      
      [linmiaohe@huawei.com: use helper huge_page_size() to get hugepage size]
        Link: https://lkml.kernel.org/r/20210209021803.49211-1-linmiaohe@huawei.com
      
      Link: https://lkml.kernel.org/r/20210208082450.15716-1-linmiaohe@huawei.com
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      aca78307
    • Miaohe Lin's avatar
      mm/hugetlb: remove unnecessary VM_BUG_ON_PAGE on putback_active_hugepage() · 3f1b0162
      Miaohe Lin authored
      
      
      All callers know they are operating on a hugetlb head page.  So this
      VM_BUG_ON_PAGE can not catch anything useful.
      
      Link: https://lkml.kernel.org/r/20210209071151.44731-1-linmiaohe@huawei.com
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3f1b0162
    • Miaohe Lin's avatar
      mm/hugetlb: use helper function range_in_vma() in page_table_shareable() · 07e51edf
      Miaohe Lin authored
      
      
      We could use helper function range_in_vma() to check whether the vma is in
      the desired range to simplify the code.
      
      Link: https://lkml.kernel.org/r/20210204112949.43051-1-linmiaohe@huawei.com
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      07e51edf
    • Miaohe Lin's avatar
      hugetlb_cgroup: use helper pages_per_huge_page() in hugetlb_cgroup · 8938494c
      Miaohe Lin authored
      
      
      We could use helper function pages_per_huge_page() to get the number of
      pages in a hstate to simplify the code slightly.
      
      Link: https://lkml.kernel.org/r/20210205084513.29624-1-linmiaohe@huawei.com
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8938494c
    • Aneesh Kumar K.V's avatar
      mm/pmem: avoid inserting hugepage PTE entry with fsdax if hugepage support is disabled · bae84953
      Aneesh Kumar K.V authored
      
      
      Differentiate between hardware not supporting hugepages and user disabling
      THP via 'echo never > /sys/kernel/mm/transparent_hugepage/enabled'
      
      For the devdax namespace, the kernel handles the above via the
      supported_alignment attribute and failing to initialize the namespace if
      the namespace align value is not supported on the platform.
      
      For the fsdax namespace, the kernel will continue to initialize the
      namespace.  This can result in the kernel creating a huge pte entry even
      though the hardware don't support the same.
      
      We do want hugepage support with pmem even if the end-user disabled THP
      via sysfs file (/sys/kernel/mm/transparent_hugepage/enabled).  Hence
      differentiate between hardware/firmware lacking support vs user-controlled
      disable of THP and prevent a huge fault if the hardware lacks hugepage
      support.
      
      Link: https://lkml.kernel.org/r/20210205023956.417587-1-aneesh.kumar@linux.ibm.com
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: David Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bae84953