Skip to content
  1. Jul 01, 2021
    • Mike Rapoport's avatar
      arm64: decouple check whether pfn is in linear map from pfn_valid() · 873ba463
      Mike Rapoport authored
      
      
      The intended semantics of pfn_valid() is to verify whether there is a
      struct page for the pfn in question and nothing else.
      
      Yet, on arm64 it is used to distinguish memory areas that are mapped in
      the linear map vs those that require ioremap() to access them.
      
      Introduce a dedicated pfn_is_map_memory() wrapper for
      memblock_is_map_memory() to perform such check and use it where
      appropriate.
      
      Using a wrapper allows to avoid cyclic include dependencies.
      
      While here also update style of pfn_valid() so that both pfn_valid() and
      pfn_is_map_memory() declarations will be consistent.
      
      Link: https://lkml.kernel.org/r/20210511100550.28178-4-rppt@kernel.org
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Reviewed-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      873ba463
    • Mike Rapoport's avatar
      memblock: update initialization of reserved pages · 9092d4f7
      Mike Rapoport authored
      
      
      The struct pages representing a reserved memory region are initialized
      using reserve_bootmem_range() function.  This function is called for each
      reserved region just before the memory is freed from memblock to the buddy
      page allocator.
      
      The struct pages for MEMBLOCK_NOMAP regions are kept with the default
      values set by the memory map initialization which makes it necessary to
      have a special treatment for such pages in pfn_valid() and
      pfn_valid_within().
      
      Split out initialization of the reserved pages to a function with a
      meaningful name and treat the MEMBLOCK_NOMAP regions the same way as the
      reserved regions and mark struct pages for the NOMAP regions as
      PageReserved.
      
      Link: https://lkml.kernel.org/r/20210511100550.28178-3-rppt@kernel.org
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Acked-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Reviewed-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9092d4f7
    • Mike Rapoport's avatar
      include/linux/mmzone.h: add documentation for pfn_valid() · 51c656ae
      Mike Rapoport authored
      
      
      Patch series "arm64: drop pfn_valid_within() and simplify pfn_valid()", v4.
      
      These patches aim to remove CONFIG_HOLES_IN_ZONE and essentially hardwire
      pfn_valid_within() to 1.
      
      The idea is to mark NOMAP pages as reserved in the memory map and restore
      the intended semantics of pfn_valid() to designate availability of struct
      page for a pfn.
      
      With this the core mm will be able to cope with the fact that it cannot
      use NOMAP pages and the holes created by NOMAP ranges within MAX_ORDER
      blocks will be treated correctly even without the need for
      pfn_valid_within.
      
      This patch (of 4):
      
      Add comment describing the semantics of pfn_valid() that clarifies that
      pfn_valid() only checks for availability of a memory map entry (i.e.
      struct page) for a PFN rather than availability of usable memory backing
      that PFN.
      
      The most "generic" version of pfn_valid() used by the configurations with
      SPARSEMEM enabled resides in include/linux/mmzone.h so this is the most
      suitable place for documentation about semantics of pfn_valid().
      
      Link: https://lkml.kernel.org/r/20210511100550.28178-1-rppt@kernel.org
      Link: https://lkml.kernel.org/r/20210511100550.28178-2-rppt@kernel.org
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Suggested-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Reviewed-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Acked-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Reviewed-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      51c656ae
    • Ben Widawsky's avatar
      mm/mempolicy: use unified 'nodes' for bind/interleave/prefer policies · 269fbe72
      Ben Widawsky authored
      
      
      Current structure 'mempolicy' uses a union to store the node info for
      bind/interleave/perfer policies.
      
      	union {
      		short 		 preferred_node; /* preferred */
      		nodemask_t	 nodes;		/* interleave/bind */
      		/* undefined for default */
      	} v;
      
      Since preferred node can also be represented by a nodemask_t with only ont
      bit set, unify these policies with using one nodemask_t 'nodes', which can
      remove a union, simplify the code and make it easier to support future's
      new policy's node info.
      
      Link: https://lore.kernel.org/r/20200630212517.308045-7-ben.widawsky@intel.com
      Link: https://lkml.kernel.org/r/1623399825-75651-1-git-send-email-feng.tang@intel.com
      Co-developed-by: default avatarFeng Tang <feng.tang@intel.com>
      Signed-off-by: default avatarBen Widawsky <ben.widawsky@intel.com>
      Signed-off-by: default avatarFeng Tang <feng.tang@intel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Andi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      269fbe72
    • Yang Shi's avatar
      mm: mempolicy: don't have to split pmd for huge zero page · e5947d23
      Yang Shi authored
      When trying to migrate pages to obey mempolicy, the huge zero page is
      split by inserting base zero pfn to all PTEs, then the page table walk
      fallback to PTE level and just skips zero page.  Skipping zero page for
      mempolicy has been the behavior of kernel since v2.6.16 due to commit
      f4598c8b
      
       ("[PATCH] migration: make sure there is no attempt to migrate
      reserved pages.").  So it seems pointless to split huge zero page, it
      could be just skipped like base zero page.
      
      Set ACTION_CONTINUE to prevent the walk_page_range() split the pmd for
      this case.
      
      Link: https://lkml.kernel.org/r/20210609172146.3594-1-shy828301@gmail.com
      Link: https://lkml.kernel.org/r/20210604203513.240709-1-shy828301@gmail.com
      Signed-off-by: default avatarYang Shi <shy828301@gmail.com>
      Reviewed-by: default avatarZi Yan <ziy@nvidia.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e5947d23
    • Feng Tang's avatar
      mm/mempolicy: unify the parameter sanity check for mbind and set_mempolicy · 95837924
      Feng Tang authored
      
      
      Currently the kernel_mbind() and kernel_set_mempolicy() do almost the same
      operation for parameter sanity check.
      
      Add a helper function to unify the code to reduce the redundancy, and make
      it easier for changing the sanity check code in future.
      
      [thanks to David Rientjes for suggesting using helper function instead of
      macro].
      
      [feng.tang@intel.com: add comment]
        Link: https://lkml.kernel.org/r/1622560492-1294-4-git-send-email-feng.tang@intel.com
      
      Link: https://lkml.kernel.org/r/1622469956-82897-4-git-send-email-feng.tang@intel.com
      Signed-off-by: default avatarFeng Tang <feng.tang@intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Ben Widawsky <ben.widawsky@intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      95837924
    • Feng Tang's avatar
      mm/mempolicy: don't handle MPOL_LOCAL like a fake MPOL_PREFERRED policy · 7858d7bc
      Feng Tang authored
      
      
      MPOL_LOCAL policy has been setup as a real policy, but it is still handled
      like a faked POL_PREFERRED policy with one internal MPOL_F_LOCAL flag bit
      set, and there are many places having to judge the real 'prefer' or the
      'local' policy, which are quite confusing.
      
      In current code, there are 4 cases that MPOL_LOCAL are used:
      
      1. user specifies 'local' policy
      
      2. user specifies 'prefer' policy, but with empty nodemask
      
      3. system 'default' policy is used
      
      4. 'prefer' policy + valid 'preferred' node with MPOL_F_STATIC_NODES
         flag set, and when it is 'rebind' to a nodemask which doesn't contains
         the 'preferred' node, it will perform as 'local' policy
      
      So make 'local' a real policy instead of a fake 'prefer' one, and kill
      MPOL_F_LOCAL bit, which can greatly reduce the confusion for code reading.
      
      For case 4, the logic of mpol_rebind_preferred() is confusing, as Michal
      Hocko pointed out:
      
      : I do believe that rebinding preferred policy is just bogus and it should
      : be dropped altogether on the ground that a preference is a mere hint from
      : userspace where to start the allocation.  Unless I am missing something
      : cpusets will be always authoritative for the final placement.  The
      : preferred node just acts as a starting point and it should be really
      : preserved when cpusets changes.  Otherwise we have a very subtle behavior
      : corner cases.
      
      So dump all the tricky transformation between 'prefer' and 'local', and
      just record the new nodemask of rebinding.
      
      [feng.tang@intel.com: fix a problem in mpol_set_nodemask(), per Michal Hocko]
        Link: https://lkml.kernel.org/r/1622560492-1294-3-git-send-email-feng.tang@intel.com
      [feng.tang@intel.com: refine code and comments of mpol_set_nodemask(), per Michal]
        Link: https://lkml.kernel.org/r/20210603081807.GE56979@shbuild999.sh.intel.com
      
      Link: https://lkml.kernel.org/r/1622469956-82897-3-git-send-email-feng.tang@intel.com
      Signed-off-by: default avatarFeng Tang <feng.tang@intel.com>
      Suggested-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Ben Widawsky <ben.widawsky@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7858d7bc
    • Feng Tang's avatar
      mm/mempolicy: cleanup nodemask intersection check for oom · b26e517a
      Feng Tang authored
      
      
      Patch series "mm/mempolicy: some fix and semantics cleanup", v4.
      
      Current memory policy code has some confusing and ambiguous part about
      MPOL_LOCAL policy, as it is handled as a faked MPOL_PREFERRED one, and
      there are many places having to distinguish them.  Also the nodemask
      intersection check needs cleanup to be more explicit for OOM use, and
      handle MPOL_INTERLEAVE correctly.  This patchset cleans up these and
      unifies the parameter sanity check for mbind() and set_mempolicy().
      
      This patch (of 3):
      
      mempolicy_nodemask_intersects seem to be a general purpose mempolicy
      function.  In fact it is partially tailored for the OOM purpose
      instead.  The oom proper is the only existing user so rename the
      function to make that purpose explicit.
      
      While at it drop the MPOL_INTERLEAVE as those allocations never has a
      nodemask defined (see alloc_page_interleave) so this is a dead code and
      a confusing one because MPOL_INTERLEAVE is a hint rather than a hard
      requirement so it shouldn't be considered during the OOM.
      
      The final code can be reduced to a check for MPOL_BIND which is the
      only memory policy that is a hard requirement and thus relevant to a
      constrained OOM logic.
      
      [mhocko@suse.com: changelog edits]
      
      Link: https://lkml.kernel.org/r/1622560492-1294-1-git-send-email-feng.tang@intel.com
      Link: https://lkml.kernel.org/r/1622560492-1294-2-git-send-email-feng.tang@intel.com
      Link: https://lkml.kernel.org/r/1622469956-82897-1-git-send-email-feng.tang@intel.com
      Link: https://lkml.kernel.org/r/1622469956-82897-2-git-send-email-feng.tang@intel.com
      Signed-off-by: default avatarFeng Tang <feng.tang@intel.com>
      Suggested-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Ben Widawsky <ben.widawsky@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b26e517a
    • Wonhyuk Yang's avatar
      mm/compaction: fix 'limit' in fast_isolate_freepages · b55ca526
      Wonhyuk Yang authored
      Because of 'min(1, ...)', fast_isolate_freepages set 'limit' to 0 or 1.
      This takes away the opportunities of find candinate pages.  So, by making
      enough scans available, increases the probability of finding the
      appropriate freepage.
      
      Tested it on the thpscale and the results are as follows.
      
                                              5.12.0                 5.12.0
                                            valnilla                patched
      Amean     fault-both-1       598.15 (   0.00%)      592.56 (   0.93%)
      Amean     fault-both-3      1494.47 (   0.00%)     1514.35 (  -1.33%)
      Amean     fault-both-5      2519.48 (   0.00%)     2471.76 (   1.89%)
      Amean     fault-both-7      3173.85 (   0.00%)     3079.19 (   2.98%)
      Amean     fault-both-12     8063.83 (   0.00%)     7858.29 (   2.55%)
      Amean     fault-both-18     8781.20 (   0.00%)     7827.70 *  10.86%*
      Amean     fault-both-24    12576.44 (   0.00%)    12250.20 (   2.59%)
      Amean     fault-both-30    18503.27 (   0.00%)    17528.11 *   5.27%*
      Amean     fault-both-32    16133.69 (   0.00%)    13874.24 *  14.00%*
      
                                                 5.12.0         5.12.0
                                                vanilla        patched
      Ops Compaction migrate scanned         6547133.00     5963901.00
      Ops Compaction free scanned           32452453.00    26609101.00
      
                              5.12        5.12
                           vanilla     patched
      Duration User          27.99       28.84
      Duration System       244.08      236.76
      Duration Elapsed       78.27       78.38
      
      Link: https://lkml.kernel.org/r/20210626082443.22547-1-vvghjk1234@gmail.com
      Fixes: 5a811889
      
       ("mm, compaction: use free lists to quickly locate a migration target")
      Signed-off-by: default avatarWonhyuk Yang <vvghjk1234@gmail.com>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b55ca526
    • Liu Xiang's avatar
      mm: compaction: remove duplicate !list_empty(&sublist) check · d2155fe5
      Liu Xiang authored
      
      
      The list_splice_tail(&sublist, freelist) also do !list_empty(&sublist)
      check, so remove the duplicate call.
      
      Link: https://lkml.kernel.org/r/20210609095409.19920-1-liu.xiang@zlingsmart.com
      Signed-off-by: default avatarLiu Xiang <liu.xiang@zlingsmart.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d2155fe5
    • YueHaibing's avatar
      mm/compaction: use DEVICE_ATTR_WO macro · 17adb230
      YueHaibing authored
      
      
      Use DEVICE_ATTR_WO helper instead of plain DEVICE_ATTR, which makes the
      code a bit shorter and easier to read.
      
      Link: https://lkml.kernel.org/r/20210523064521.32912-1-yuehaibing@huawei.com
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      17adb230
    • Miaohe Lin's avatar
      mm/zbud: don't export any zbud API · 2a03085c
      Miaohe Lin authored
      The zbud doesn't need to export any API and it is meant to be used via
      zpool API since the commit 12d79d64
      
       ("mm/zpool: update zswap to use
      zpool").  So we can remove the unneeded zbud.h and move down zpool API to
      avoid any forward declaration.
      
      [linmiaohe@huawei.com: fix unused function warnings when CONFIG_ZPOOL is disabled]
        Link: https://lkml.kernel.org/r/20210619025508.1239386-1-linmiaohe@huawei.com
      
      Link: https://lkml.kernel.org/r/20210608114515.206992-3-linmiaohe@huawei.com
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2a03085c
    • Miaohe Lin's avatar
      mm/zbud: reuse unbuddied[0] as buddied in zbud_pool · f356aeac
      Miaohe Lin authored
      Patch series "Cleanups for zbud", v2.
      
      This series contains just cleanups to save some possible memory in
      zbud_pool and avoid exporting any unneeded zbud API.  More details can be
      found in the respective changelogs
      
      This patch (of 2):
      
      Since commit 9d8c5b52
      
       ("mm: zbud: fix condition check on allocation
      size"), zbud_pool.unbuddied[0] is always unused.  We can reuse it as
      buddied field to save some possible memory.
      
      Link: https://lkml.kernel.org/r/20210608114515.206992-1-linmiaohe@huawei.com
      Link: https://lkml.kernel.org/r/20210608114515.206992-2-linmiaohe@huawei.com
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f356aeac
    • Miaohe Lin's avatar
      mm/z3fold: use release_z3fold_page_locked() to release locked z3fold page · 28473d91
      Miaohe Lin authored
      We should use release_z3fold_page_locked() to release z3fold page when
      it's locked, although it looks harmless to use release_z3fold_page() now.
      
      Link: https://lkml.kernel.org/r/20210619093151.1492174-7-linmiaohe@huawei.com
      Fixes: dcf5aedb
      
       ("z3fold: stricter locking and more careful reclaim")
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: default avatarVitaly Wool <vitaly.wool@konsulko.com>
      Cc: Hillf Danton <hdanton@sina.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      28473d91
    • Miaohe Lin's avatar
      mm/z3fold: fix potential memory leak in z3fold_destroy_pool() · dac0d1cf
      Miaohe Lin authored
      There is a memory leak in z3fold_destroy_pool() as it forgets to
      free_percpu pool->unbuddied.  Call free_percpu for pool->unbuddied to fix
      this issue.
      
      Link: https://lkml.kernel.org/r/20210619093151.1492174-6-linmiaohe@huawei.com
      Fixes: d30561c5
      
       ("z3fold: use per-cpu unbuddied lists")
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: default avatarVitaly Wool <vitaly.wool@konsulko.com>
      Cc: Hillf Danton <hdanton@sina.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dac0d1cf
    • Miaohe Lin's avatar
      mm/z3fold: remove unused function handle_to_z3fold_header() · 767cc6c5
      Miaohe Lin authored
      
      
      handle_to_z3fold_header() is unused now.  So we can remove it.  As a
      result, get_z3fold_header() becomes the only caller of
      __get_z3fold_header() and the argument lock is always true.  Therefore we
      could further fold the __get_z3fold_header() into get_z3fold_header() with
      lock = true.
      
      Link: https://lkml.kernel.org/r/20210619093151.1492174-5-linmiaohe@huawei.com
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: default avatarVitaly Wool <vitaly.wool@konsulko.com>
      Cc: Hillf Danton <hdanton@sina.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      767cc6c5
    • Miaohe Lin's avatar
      mm/z3fold: remove magic number in z3fold_create_pool() · e891f60e
      Miaohe Lin authored
      
      
      It's meaningless to pass a magic number 2 to __alloc_percpu() as there is
      a minimum alignment size of PCPU_MIN_ALLOC_SIZE (> 2) in it.  Also there
      is no special alignment requirement for unbuddied.  So we could replace
      this magic number with nature alignment, i.e.  __alignof__(struct
      list_head), to improve readability.
      
      Link: https://lkml.kernel.org/r/20210619093151.1492174-4-linmiaohe@huawei.com
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: default avatarVitaly Wool <vitaly.wool@konsulko.com>
      Cc: Hillf Danton <hdanton@sina.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e891f60e
    • Miaohe Lin's avatar
      mm/z3fold: avoid possible underflow in z3fold_alloc() · 014284a0
      Miaohe Lin authored
      
      
      It is not enough to just make sure the z3fold header is not larger than
      the page size.  When z3fold header is equal to PAGE_SIZE, we would
      underflow when check alloc size against PAGE_SIZE - ZHDR_SIZE_ALIGNED -
      CHUNK_SIZE in z3fold_alloc().  Make sure there has remaining spaces for
      its buddy to fix this theoretical issue.
      
      Link: https://lkml.kernel.org/r/20210619093151.1492174-3-linmiaohe@huawei.com
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: default avatarVitaly Wool <vitaly.wool@konsulko.com>
      Cc: Hillf Danton <hdanton@sina.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      014284a0
    • Miaohe Lin's avatar
      mm/z3fold: define macro NCHUNKS as TOTAL_CHUNKS - ZHDR_CHUNKS · e3c0db4f
      Miaohe Lin authored
      
      
      Patch series "Cleanup and fixup for z3fold".
      
      This series contains cleanups to remove unused function, redefine macro to
      improve readability and so on.  Also this fixes several bugs in z3fold,
      such as memory leak in z3fold_destroy_pool().  More details can be found
      in the respective changelogs.
      
      This patch (of 6):
      
      To improve code readability, we could define macro NCHUNKS as TOTAL_CHUNKS
      - ZHDR_CHUNKS.  No functional change intended.
      
      Link: https://lkml.kernel.org/r/20210619093151.1492174-1-linmiaohe@huawei.com
      Link: https://lkml.kernel.org/r/20210619093151.1492174-2-linmiaohe@huawei.com
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: default avatarVitaly Wool <vitaly.wool@konsulko.com>
      Cc: Hillf Danton <hdanton@sina.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e3c0db4f
    • David Hildenbrand's avatar
      fs/proc/kcore: use page_offline_(freeze|thaw) · c6d9eee2
      David Hildenbrand authored
      
      
      Let's properly synchronize with drivers that set PageOffline().
      Unfreeze/thaw every now and then, so drivers that want to set
      PageOffline() can make progress.
      
      Link: https://lkml.kernel.org/r/20210526093041.8800-7-david@redhat.com
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Aili Yao <yaoaili@kingsoft.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Jiri Bohac <jbohac@suse.cz>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Wei Liu <wei.liu@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c6d9eee2
    • David Hildenbrand's avatar
      virtio-mem: use page_offline_(start|end) when setting PageOffline() · 6cc26d77
      David Hildenbrand authored
      
      
      Let's properly use page_offline_(start|end) to synchronize setting
      PageOffline(), so we won't have valid page access to unplugged memory
      regions from /proc/kcore.
      
      Existing balloon implementations usually allow reading inflated memory;
      doing so might result in unnecessary overhead in the hypervisor, which is
      currently the case with virtio-mem.
      
      For future virtio-mem use cases, it will be different when using shmem,
      huge pages, !anonymous private mappings, ...  as backing storage for a VM.
      virtio-mem unplugged memory must no longer be accessed and access might
      result in undefined behavior.  There will be a virtio spec extension to
      document this change, including a new feature flag indicating the changed
      behavior.  We really don't want to race against PFN walkers reading random
      page content.
      
      Link: https://lkml.kernel.org/r/20210526093041.8800-6-david@redhat.com
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Aili Yao <yaoaili@kingsoft.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Jiri Bohac <jbohac@suse.cz>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Wei Liu <wei.liu@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6cc26d77
    • David Hildenbrand's avatar
      mm: introduce page_offline_(begin|end|freeze|thaw) to synchronize setting PageOffline() · 82840451
      David Hildenbrand authored
      
      
      A driver might set a page logically offline -- PageOffline() -- and turn
      the page inaccessible in the hypervisor; after that, access to page
      content can be fatal.  One example is virtio-mem; while unplugged memory
      -- marked as PageOffline() can currently be read in the hypervisor, this
      will no longer be the case in the future; for example, when having a
      virtio-mem device backed by huge pages in the hypervisor.
      
      Some special PFN walkers -- i.e., /proc/kcore -- read content of random
      pages after checking PageOffline(); however, these PFN walkers can race
      with drivers that set PageOffline().
      
      Let's introduce page_offline_(begin|end|freeze|thaw) for synchronizing.
      
      page_offline_freeze()/page_offline_thaw() allows for a subsystem to
      synchronize with such drivers, achieving that a page cannot be set
      PageOffline() while frozen.
      
      page_offline_begin()/page_offline_end() is used by drivers that care about
      such races when setting a page PageOffline().
      
      For simplicity, use a rwsem for now; neither drivers nor users are
      performance sensitive.
      
      Link: https://lkml.kernel.org/r/20210526093041.8800-5-david@redhat.com
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Aili Yao <yaoaili@kingsoft.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Jiri Bohac <jbohac@suse.cz>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Wei Liu <wei.liu@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      82840451
    • David Hildenbrand's avatar
      fs/proc/kcore: don't read offline sections, logically offline pages and hwpoisoned pages · 0daa322b
      David Hildenbrand authored
      
      
      Let's avoid reading:
      
      1) Offline memory sections: the content of offline memory sections is
         stale as the memory is effectively unused by the kernel.  On s390x with
         standby memory, offline memory sections (belonging to offline storage
         increments) are not accessible.  With virtio-mem and the hyper-v
         balloon, we can have unavailable memory chunks that should not be
         accessed inside offline memory sections.  Last but not least, offline
         memory sections might contain hwpoisoned pages which we can no longer
         identify because the memmap is stale.
      
      2) PG_offline pages: logically offline pages that are documented as
         "The content of these pages is effectively stale.  Such pages should
         not be touched (read/write/dump/save) except by their owner.".
         Examples include pages inflated in a balloon or unavailble memory
         ranges inside hotplugged memory sections with virtio-mem or the hyper-v
         balloon.
      
      3) PG_hwpoison pages: Reading pages marked as hwpoisoned can be fatal.
         As documented: "Accessing is not safe since it may cause another
         machine check.  Don't touch!"
      
      Introduce is_page_hwpoison(), adding a comment that it is inherently racy
      but best we can really do.
      
      Reading /proc/kcore now performs similar checks as when reading
      /proc/vmcore for kdump via makedumpfile: problematic pages are exclude.
      It's also similar to hibernation code, however, we don't skip hwpoisoned
      pages when processing pages in kernel/power/snapshot.c:saveable_page()
      yet.
      
      Note 1: we can race against memory offlining code, especially memory going
      offline and getting unplugged: however, we will properly tear down the
      identity mapping and handle faults gracefully when accessing this memory
      from kcore code.
      
      Note 2: we can race against drivers setting PageOffline() and turning
      memory inaccessible in the hypervisor.  We'll handle this in a follow-up
      patch.
      
      Link: https://lkml.kernel.org/r/20210526093041.8800-4-david@redhat.com
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Aili Yao <yaoaili@kingsoft.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Jiri Bohac <jbohac@suse.cz>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Wei Liu <wei.liu@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0daa322b
    • David Hildenbrand's avatar
      fs/proc/kcore: pfn_is_ram check only applies to KCORE_RAM · 2711032c
      David Hildenbrand authored
      
      
      Let's resturcture the code, using switch-case, and checking pfn_is_ram()
      only when we are dealing with KCORE_RAM.
      
      Link: https://lkml.kernel.org/r/20210526093041.8800-3-david@redhat.com
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Cc: Aili Yao <yaoaili@kingsoft.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Jiri Bohac <jbohac@suse.cz>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Wei Liu <wei.liu@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2711032c
    • David Hildenbrand's avatar
      fs/proc/kcore: drop KCORE_REMAP and KCORE_OTHER · 3c36b419
      David Hildenbrand authored
      Patch series "fs/proc/kcore: don't read offline sections, logically offline pages and hwpoisoned pages", v3.
      
      Looking for places where the kernel might unconditionally read
      PageOffline() pages, I stumbled over /proc/kcore; turns out /proc/kcore
      needs some more love to not touch some other pages we really don't want to
      read -- i.e., hwpoisoned ones.
      
      Examples for PageOffline() pages are pages inflated in a balloon, memory
      unplugged via virtio-mem, and partially-present sections in memory added
      by the Hyper-V balloon.
      
      When reading pages inflated in a balloon, we essentially produce
      unnecessary load in the hypervisor; holes in partially present sections in
      case of Hyper-V are not accessible and already were a problem for
      /proc/vmcore, fixed in makedumpfile by detecting PageOffline() pages.  In
      the future, virtio-mem might disallow reading unplugged memory -- marked
      as PageOffline() -- in some environments, resulting in undefined behavior
      when accessed; therefore, I'm trying to identify and rework all these
      (corner) cases.
      
      With this series, there is really only access via /dev/mem, /proc/vmcore
      and kdb left after I ripped out /dev/kmem.  kdb is an advanced corner-case
      use case -- we won't care for now if someone explicitly tries to do nasty
      things by reading from/writing to physical addresses we better not touch.
      /dev/mem is a use case we won't support for virtio-mem, at least for now,
      so we'll simply disallow mapping any virtio-mem memory via /dev/mem next.
      /proc/vmcore is really only a problem when dumping the old kernel via
      something that's not makedumpfile (read: basically never), however, we'll
      try sanitizing that as well in the second kernel in the future.
      
      Tested via kcore_dump:
      	https://github.com/schlafwandler/kcore_dump
      
      This patch (of 6):
      
      Commit db779ef6 ("proc/kcore: Remove unused kclist_add_remap()")
      removed the last user of KCORE_REMAP.
      
      Commit 595dd46e
      
       ("vfs/proc/kcore, x86/mm/kcore: Fix SMAP fault when
      dumping vsyscall user page") removed the last user of KCORE_OTHER.
      
      Let's drop both types.  While at it, also drop vaddr in "struct
      kcore_list", used by KCORE_REMAP only.
      
      Link: https://lkml.kernel.org/r/20210526093041.8800-1-david@redhat.com
      Link: https://lkml.kernel.org/r/20210526093041.8800-2-david@redhat.com
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Aili Yao <yaoaili@kingsoft.com>
      Cc: Jiri Bohac <jbohac@suse.cz>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Wei Liu <wei.liu@kernel.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3c36b419
    • Mike Rapoport's avatar
      docs: proc.rst: meminfo: briefly describe gaps in memory accounting · 8d719afc
      Mike Rapoport authored
      
      
      Add a paragraph that explains that it may happen that the counters in
      /proc/meminfo do not add up to the overall memory usage.
      
      Link: https://lkml.kernel.org/r/20210421061127.1182723-1-rppt@kernel.org
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8d719afc
    • Kefeng Wang's avatar
      mm/kconfig: move HOLES_IN_ZONE into mm · 781eb2cd
      Kefeng Wang authored
      
      
      commit a55749639dc1 ("ia64: drop marked broken DISCONTIGMEM and
      VIRTUAL_MEM_MAP") drop VIRTUAL_MEM_MAP, so there is no need HOLES_IN_ZONE
      on ia64.
      
      Also move HOLES_IN_ZONE into mm/Kconfig, select it if architecture needs
      this feature.
      
      Link: https://lkml.kernel.org/r/20210417075946.181402-1-wangkefeng.wang@huawei.com
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Cc: Will Deacon <will@kernel.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      781eb2cd
    • Miaohe Lin's avatar
      mm: workingset: define macro WORKINGSET_SHIFT · 3ebc57f4
      Miaohe Lin authored
      
      
      The magic number 1 is used in several places in workingset.c.  Define a
      macro WORKINGSET_SHIFT for it to improve code readability.
      
      Link: https://lkml.kernel.org/r/20210624122307.1759342-1-linmiaohe@huawei.com
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3ebc57f4
    • Yu Zhao's avatar
      include/trace/events/vmscan.h: remove mm_vmscan_inactive_list_is_low · 764c04a9
      Yu Zhao authored
      mm_vmscan_inactive_list_is_low has no users after commit b91ac374
      
      
      ("mm: vmscan: enforce inactive:active ratio at the reclaim root").
      
      Remove it.
      
      Link: https://lkml.kernel.org/r/20210614194554.2683395-1-yuzhao@google.com
      Signed-off-by: default avatarYu Zhao <yuzhao@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      764c04a9
    • Yu Zhao's avatar
      mm/vmscan.c: fix potential deadlock in reclaim_pages() · 2d2b8d2b
      Yu Zhao authored
      
      
      Theoretically without the protect from memalloc_noreclaim_save() and
      memalloc_noreclaim_restore(), reclaim_pages() can go into the block
      I/O layer recursively and deadlock.
      
      Querying 'reclaim_pages' in our kernel crash databases didn't yield
      any results. So the deadlock seems unlikely to happen. A possible
      explanation is that the only user of reclaim_pages(), i.e.,
      MADV_PAGEOUT, is usually called before memory pressure builds up,
      e.g., on Android and Chrome OS. Under such a condition, allocations in
      the block I/O layer can be fulfilled without diverting to direct
      reclaim and therefore the recursion is avoided.
      
      Link: https://lkml.kernel.org/r/20210622074642.785473-1-yuzhao@google.com
      Link: https://lkml.kernel.org/r/20210614194727.2684053-1-yuzhao@google.com
      Signed-off-by: default avatarYu Zhao <yuzhao@google.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2d2b8d2b
    • Axel Rasmussen's avatar
      userfaultfd/selftests: exercise minor fault handling shmem support · 4a8f021b
      Axel Rasmussen authored
      
      
      Enable test_uffdio_minor for test_type == TEST_SHMEM, and modify the test
      slightly to pass in / check for the right feature flags.
      
      Link: https://lkml.kernel.org/r/20210503180737.2487560-11-axelrasmussen@google.com
      Signed-off-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Brian Geffon <bgeffon@google.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Lokesh Gidra <lokeshgidra@google.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Oliver Upton <oupton@google.com>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Wang Qing <wangqing@vivo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4a8f021b
    • Axel Rasmussen's avatar
      userfaultfd/selftests: reinitialize test context in each test · 8ba6e864
      Axel Rasmussen authored
      
      
      Currently, the context (fds, mmap-ed areas, etc.) are global.  Each test
      mutates this state in some way, in some cases really "clobbering it"
      (e.g., the events test mremap-ing area_dst over the top of area_src, or
      the minor faults tests overwriting the count_verify values in the test
      areas).  We run the tests in a particular order, each test is careful to
      make the right assumptions about its starting state, etc.
      
      But, this is fragile.  It's better for a test's success or failure to not
      depend on what some other prior test case did to the global state.
      
      To that end, clear and reinitialize the test context at the start of each
      test case, so whatever prior test cases did doesn't affect future tests.
      
      This is particularly relevant to this series because the events test's
      mremap of area_dst screws up assumptions the minor fault test was relying
      on.  This wasn't a problem for hugetlb, as we don't mremap in that case.
      
      [peterx@redhat.com: fix conflict between this patch and the uffd pagemap series]
        Link: https://lkml.kernel.org/r/YKQqKrl+/cQ1utrb@t490s
      
      Link: https://lkml.kernel.org/r/20210503180737.2487560-10-axelrasmussen@google.com
      Signed-off-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Brian Geffon <bgeffon@google.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Lokesh Gidra <lokeshgidra@google.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Oliver Upton <oupton@google.com>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Wang Qing <wangqing@vivo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8ba6e864
    • Axel Rasmussen's avatar
      userfaultfd/selftests: create alias mappings in the shmem test · 5bb23edb
      Axel Rasmussen authored
      
      
      Previously, we just allocated two shm areas: area_src and area_dst.  With
      this commit, change this so we also allocate area_src_alias, and
      area_dst_alias.
      
      area_*_alias and area_* (respectively) point to the same underlying
      physical pages, but are different VMAs.  In a future commit in this
      series, we'll leverage this setup to exercise minor fault handling support
      for shmem, just like we do in the hugetlb_shared test.
      
      Link: https://lkml.kernel.org/r/20210503180737.2487560-9-axelrasmussen@google.com
      Signed-off-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Brian Geffon <bgeffon@google.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Lokesh Gidra <lokeshgidra@google.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Oliver Upton <oupton@google.com>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Wang Qing <wangqing@vivo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5bb23edb
    • Axel Rasmussen's avatar
      userfaultfd/selftests: use memfd_create for shmem test type · fa2c2b58
      Axel Rasmussen authored
      
      
      This is a preparatory commit.  In the future, we want to be able to setup
      alias mappings for area_src and area_dst in the shmem test, like we do in
      the hugetlb_shared test.  With a VMA obtained via mmap(MAP_ANONYMOUS |
      MAP_SHARED), it isn't clear how to do this.
      
      So, mmap() with an fd, so we can create alias mappings.  Use memfd_create
      instead of actually passing in a tmpfs path like hugetlb does, since it's
      more convenient / simpler to run, and works just as well.
      
      Future commits will:
      
      1. Setup the alias mappings.
      2. Extend our tests to actually take advantage of this, to test new
         userfaultfd behavior being introduced in this series.
      
      Also, a small fix in the area we're changing: when the hugetlb setup fails
      in main(), pass in the right argv[] so we actually print out the hugetlb
      file path.
      
      Link: https://lkml.kernel.org/r/20210503180737.2487560-8-axelrasmussen@google.com
      Signed-off-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Brian Geffon <bgeffon@google.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Lokesh Gidra <lokeshgidra@google.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Oliver Upton <oupton@google.com>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Wang Qing <wangqing@vivo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fa2c2b58
    • Axel Rasmussen's avatar
      userfaultfd/shmem: modify shmem_mfill_atomic_pte to use install_pte() · 7d64ae3a
      Axel Rasmussen authored
      
      
      In a previous commit, we added the mfill_atomic_install_pte() helper.
      This helper does the job of setting up PTEs for an existing page, to map
      it into a given VMA.  It deals with both the anon and shmem cases, as well
      as the shared and private cases.
      
      In other words, shmem_mfill_atomic_pte() duplicates a case it already
      handles.  So, expose it, and let shmem_mfill_atomic_pte() use it directly,
      to reduce code duplication.
      
      This requires that we refactor shmem_mfill_atomic_pte() a bit:
      
      Instead of doing accounting (shmem_recalc_inode() et al) part-way through
      the PTE setup, do it afterward.  This frees up mfill_atomic_install_pte()
      from having to care about this accounting, and means we don't need to e.g.
      shmem_uncharge() in the error path.
      
      A side effect is this switches shmem_mfill_atomic_pte() to use
      lru_cache_add_inactive_or_unevictable() instead of just lru_cache_add().
      This wrapper does some extra accounting in an exceptional case, if
      appropriate, so it's actually the more correct thing to use.
      
      Link: https://lkml.kernel.org/r/20210503180737.2487560-7-axelrasmussen@google.com
      Signed-off-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Brian Geffon <bgeffon@google.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Lokesh Gidra <lokeshgidra@google.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Oliver Upton <oupton@google.com>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Wang Qing <wangqing@vivo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7d64ae3a
    • Axel Rasmussen's avatar
      userfaultfd/shmem: advertise shmem minor fault support · 964ab004
      Axel Rasmussen authored
      
      
      Now that the feature is fully implemented (the faulting path hooks exist
      so userspace is notified, and the ioctl to resolve such faults is
      available), advertise this as a supported feature.
      
      Link: https://lkml.kernel.org/r/20210503180737.2487560-6-axelrasmussen@google.com
      Signed-off-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Brian Geffon <bgeffon@google.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Lokesh Gidra <lokeshgidra@google.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Oliver Upton <oupton@google.com>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Wang Qing <wangqing@vivo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      964ab004
    • Axel Rasmussen's avatar
      userfaultfd/shmem: support UFFDIO_CONTINUE for shmem · 15313257
      Axel Rasmussen authored
      
      
      With this change, userspace can resolve a minor fault within a
      shmem-backed area with a UFFDIO_CONTINUE ioctl.  The semantics for this
      match those for hugetlbfs - we look up the existing page in the page
      cache, and install a PTE for it.
      
      This commit introduces a new helper: mfill_atomic_install_pte.
      
      Why handle UFFDIO_CONTINUE for shmem in mm/userfaultfd.c, instead of in
      shmem.c?  The existing userfault implementation only relies on shmem.c for
      VM_SHARED VMAs.  However, minor fault handling / CONTINUE work just fine
      for !VM_SHARED VMAs as well.  We'd prefer to handle CONTINUE for shmem in
      one place, regardless of shared/private (to reduce code duplication).
      
      Why add a new mfill_atomic_install_pte helper?  A problem we have with
      continue is that shmem_mfill_atomic_pte() and mcopy_atomic_pte() are
      *close* to what we want, but not exactly.  We do want to setup the PTEs in
      a CONTINUE operation, but we don't want to e.g.  allocate a new page,
      charge it (e.g.  to the shmem inode), manipulate various flags, etc.  Also
      we have the problem stated above: shmem_mfill_atomic_pte() and
      mcopy_atomic_pte() both handle one-half of the problem (shared / private)
      continue cares about.  So, introduce mcontinue_atomic_pte(), to handle all
      of the shmem continue cases.  Introduce the helper so it doesn't duplicate
      code with mcopy_atomic_pte().
      
      In a future commit, shmem_mfill_atomic_pte() will also be modified to use
      this new helper.  However, since this is a bigger refactor, it seems most
      clear to do it as a separate change.
      
      Link: https://lkml.kernel.org/r/20210503180737.2487560-5-axelrasmussen@google.com
      Signed-off-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Brian Geffon <bgeffon@google.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Lokesh Gidra <lokeshgidra@google.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Oliver Upton <oupton@google.com>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Wang Qing <wangqing@vivo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      15313257
    • Axel Rasmussen's avatar
      userfaultfd/shmem: support minor fault registration for shmem · c949b097
      Axel Rasmussen authored
      
      
      This patch allows shmem-backed VMAs to be registered for minor faults.
      Minor faults are appropriately relayed to userspace in the fault path, for
      VMAs with the relevant flag.
      
      This commit doesn't hook up the UFFDIO_CONTINUE ioctl for shmem-backed
      minor faults, though, so userspace doesn't yet have a way to resolve such
      faults.
      
      Because of this, we also don't yet advertise this as a supported feature.
      That will be done in a separate commit when the feature is fully
      implemented.
      
      Link: https://lkml.kernel.org/r/20210503180737.2487560-4-axelrasmussen@google.com
      Signed-off-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Acked-by: default avatarPeter Xu <peterx@redhat.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Brian Geffon <bgeffon@google.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Lokesh Gidra <lokeshgidra@google.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Oliver Upton <oupton@google.com>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Wang Qing <wangqing@vivo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c949b097
    • Axel Rasmussen's avatar
      userfaultfd/shmem: combine shmem_{mcopy_atomic,mfill_zeropage}_pte · 3460f6e5
      Axel Rasmussen authored
      
      
      Patch series "userfaultfd: add minor fault handling for shmem", v6.
      
      Overview
      ========
      
      See the series which added minor faults for hugetlbfs [3] for a detailed
      overview of minor fault handling in general.  This series adds the same
      support for shmem-backed areas.
      
      This series is structured as follows:
      
      - Commits 1 and 2 are cleanups.
      - Commits 3 and 4 implement the new feature (minor fault handling for shmem).
      - Commit 5 advertises that the feature is now available since at this point it's
        fully implemented.
      - Commit 6 is a final cleanup, modifying an existing code path to re-use a new
        helper we've introduced.
      - Commits 7, 8, 9, 10 update the userfaultfd selftest to exercise the feature.
      
      Use Case
      ========
      
      In some cases it is useful to have VM memory backed by tmpfs instead of
      hugetlbfs.  So, this feature will be used to support the same VM live
      migration use case described in my original series.
      
      Additionally, Android folks (Lokesh Gidra <lokeshgidra@google.com>) hope
      to optimize the Android Runtime garbage collector using this feature:
      
      "The plan is to use userfaultfd for concurrently compacting the heap.
      With this feature, the heap can be shared-mapped at another location where
      the GC-thread(s) could continue the compaction operation without the need
      to invoke userfault ioctl(UFFDIO_COPY) each time.  OTOH, if and when Java
      threads get faults on the heap, UFFDIO_CONTINUE can be used to resume
      execution.  Furthermore, this feature enables updating references in the
      'non-moving' portion of the heap efficiently.  Without this feature,
      uneccessary page copying (ioctl(UFFDIO_COPY)) would be required."
      
      [1] https://lore.kernel.org/patchwork/cover/1388144/
      [2] https://lore.kernel.org/patchwork/patch/1408161/
      [3] https://lore.kernel.org/linux-fsdevel/20210301222728.176417-1-axelrasmussen@google.com/T/#t
      
      This patch (of 9):
      
      Previously, we did a dance where we had one calling path in userfaultfd.c
      (mfill_atomic_pte), but then we split it into two in shmem_fs.h
      (shmem_{mcopy_atomic,mfill_zeropage}_pte), and then rejoined into a single
      shared function in shmem.c (shmem_mfill_atomic_pte).
      
      This is all a bit overly complex.  Just call the single combined shmem
      function directly, allowing us to clean up various branches, boilerplate,
      etc.
      
      While we're touching this function, two other small cleanup changes:
      - offset is equivalent to pgoff, so we can get rid of offset entirely.
      - Split two VM_BUG_ON cases into two statements. This means the line
        number reported when the BUG is hit specifies exactly which condition
        was true.
      
      Link: https://lkml.kernel.org/r/20210503180737.2487560-1-axelrasmussen@google.com
      Link: https://lkml.kernel.org/r/20210503180737.2487560-3-axelrasmussen@google.com
      Signed-off-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Brian Geffon <bgeffon@google.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Lokesh Gidra <lokeshgidra@google.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Oliver Upton <oupton@google.com>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Wang Qing <wangqing@vivo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3460f6e5
    • Peter Xu's avatar
      userfaultfd/selftests: add pagemap uffd-wp test · eb3b2e00
      Peter Xu authored
      
      
      Add one anonymous specific test to start using pagemap.  With pagemap
      support, we can directly read the uffd-wp bit from pgtable without
      triggering any fault, so it's easier to do sanity checks in unit tests.
      
      Meanwhile this test also leverages the newly introduced MADV_PAGEOUT
      madvise function to test swap ptes with uffd-wp bit set, and across
      fork()s.
      
      Link: https://lkml.kernel.org/r/20210428225030.9708-7-peterx@redhat.com
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Brian Geffon <bgeffon@google.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Lokesh Gidra <lokeshgidra@google.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Oliver Upton <oupton@google.com>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Wang Qing <wangqing@vivo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      eb3b2e00