Skip to content
  1. Dec 16, 2020
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · ac73e3dc
      Linus Torvalds authored
      Merge misc updates from Andrew Morton:
      
       - a few random little subsystems
      
       - almost all of the MM patches which are staged ahead of linux-next
         material. I'll trickle to post-linux-next work in as the dependents
         get merged up.
      
      Subsystems affected by this patch series: kthread, kbuild, ide, ntfs,
      ocfs2, arch, and mm (slab-generic, slab, slub, dax, debug, pagecache,
      gup, swap, shmem, memcg, pagemap, mremap, hmm, vmalloc, documentation,
      kasan, pagealloc, memory-failure, hugetlb, vmscan, z3fold, compaction,
      oom-kill, migration, cma, page-poison, userfaultfd, zswap, zsmalloc,
      uaccess, zram, and cleanups).
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (200 commits)
        mm: cleanup kstrto*() usage
        mm: fix fall-through warnings for Clang
        mm: slub: convert sysfs sprintf family to sysfs_emit/sysfs_emit_at
        mm: shmem: convert shmem_enabled_show to use sysfs_emit_at
        mm:backing-dev: use sysfs_emit in macro defining functions
        mm: huge_memory: convert remaining use of sprintf to sysfs_emit and neatening
        mm: use sysfs_emit for struct kobject * uses
        mm: fix kernel-doc markups
        zram: break the strict dependency from lzo
        zram: add stat to gather incompressible pages since zram set up
        zram: support page writeback
        mm/process_vm_access: remove redundant initialization of iov_r
        mm/zsmalloc.c: rework the list_add code in insert_zspage()
        mm/zswap: move to use crypto_acomp API for hardware acceleration
        mm/zswap: fix passing zero to 'PTR_ERR' warning
        mm/zswap: make struct kernel_param_ops definitions const
        userfaultfd/selftests: hint the test runner on required privilege
        userfaultfd/selftests: fix retval check for userfaultfd_open()
        userfaultfd/selftests: always dump something in modes
        userfaultfd: selftests: make __{s,u}64 format specifiers portable
        ...
      ac73e3dc
    • Alexey Dobriyan's avatar
      mm: cleanup kstrto*() usage · dfefd226
      Alexey Dobriyan authored
      
      
      Range checks can folded into proper conversion function.  kstrto*() exist
      for all arithmetic types.
      
      Link: https://lkml.kernel.org/r/20201122123759.GC92364@localhost.localdomain
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dfefd226
    • Gustavo A. R. Silva's avatar
      mm: fix fall-through warnings for Clang · 01359eb2
      Gustavo A. R. Silva authored
      
      
      In preparation to enable -Wimplicit-fallthrough for Clang, fix a couple of
      warnings by explicitly adding a break statement instead of just letting
      the code fall through to the next, and by adding a fallthrough
      pseudo-keyword in places where the code is intended to fall through.
      
      Link: https://github.com/KSPP/linux/issues/115
      Link: https://lkml.kernel.org/r/f5756988b8842a3f10008fbc5b0a654f828920a9.1605896059.git.gustavoars@kernel.org
      Signed-off-by: default avatarGustavo A. R. Silva <gustavoars@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      01359eb2
    • Joe Perches's avatar
      mm: slub: convert sysfs sprintf family to sysfs_emit/sysfs_emit_at · bf16d19a
      Joe Perches authored
      
      
      Convert the unbounded uses of sprintf to sysfs_emit.
      
      A few conversions may now not end in a newline if the output buffer is
      overflowed.
      
      Link: https://lkml.kernel.org/r/0c90a90f466167f8c37de4b737553cf49c4a277f.1605376435.git.joe@perches.com
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bf16d19a
    • Joe Perches's avatar
      mm: shmem: convert shmem_enabled_show to use sysfs_emit_at · 79d4d38a
      Joe Perches authored
      
      
      Update the function to use sysfs_emit_at while neatening the uses of
      sprintf and overwriting the last space char with a newline to avoid
      possible output buffer overflow.
      
      Miscellanea:
      
       - in shmem_enabled_show, the removal of the indirected use of fmt
         allows __printf verification
      
      Link: https://lkml.kernel.org/r/b612a93825e5ea330cb68d2e8b516e9687a06cc6.1605376435.git.joe@perches.com
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      79d4d38a
    • Joe Perches's avatar
      mm:backing-dev: use sysfs_emit in macro defining functions · 5e4c0d86
      Joe Perches authored
      
      
      The cocci script used in commit bdacbb8d04f ("mm: Use sysfs_emit for
      struct kobject * uses") does not convert the name##_show macro because the
      macro uses concatenation via ##.
      
      Convert it by hand.
      
      Link: https://lkml.kernel.org/r/45ec6cfc177d743f9c0ebaf35e43969dce43af42.1605376435.git.joe@perches.com
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5e4c0d86
    • Joe Perches's avatar
      mm: huge_memory: convert remaining use of sprintf to sysfs_emit and neatening · bfb0ffeb
      Joe Perches authored
      
      
      Convert the only use of sprintf with struct kobject * that the cocci
      script could not convert.
      
      Miscellanea:
      
       - Neaten the uses of a constant string with sysfs_emit to use a const
         char * to reduce overall object size
      
      Link: https://lkml.kernel.org/r/7df6be66bbd68e1a0bca9d35aca1341dbf94d2a7.1605376435.git.joe@perches.com
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bfb0ffeb
    • Joe Perches's avatar
      mm: use sysfs_emit for struct kobject * uses · ae7a927d
      Joe Perches authored
      Patch series "mm: Convert sysfs sprintf family to sysfs_emit", v2.
      
      Use the new sysfs_emit family and not the sprintf family.
      
      This patch (of 5):
      
      Use the sysfs_emit function instead of the sprintf family.
      
      Done with cocci script as in commit 3c6bff3c
      
       ("RDMA: Convert sysfs
      kobject * show functions to use sysfs_emit()")
      
      Link: https://lkml.kernel.org/r/cover.1605376435.git.joe@perches.com
      Link: https://lkml.kernel.org/r/9c249215bad6df616ba0410ad980042694970c1b.1605376435.git.joe@perches.com
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ae7a927d
    • Mauro Carvalho Chehab's avatar
      mm: fix kernel-doc markups · a00cda3f
      Mauro Carvalho Chehab authored
      
      
      Kernel-doc markups should use this format:
              identifier - description
      
      Fix some issues on mm files:
      
      1) The definition for get_user_pages_locked() doesn't follow it.  Also,
         it expects a short descrpition at the header, followed by a long one,
         after the parameters.  Fix it.
      
      2) Kernel-doc requires that a kernel-doc markup to be immediately below
         the function prototype, as otherwise it will rename it.  So, move
         get_pfnblock_flags_mask() description to the right place.
      
      3) Make invalidate_mapping_pagevec() to also follow the expected
         kernel-doc format.
      
      While here, fix a few minor English syntax issues, as suggested
      by Matthew:
      	will used -> will be used
      	similar with -> similar to
      
      Link: https://lkml.kernel.org/r/80e85dddc92d333bc2159ee8a2294921612e8745.1605521731.git.mchehab+huawei@kernel.org
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+huawei@kernel.org>
      Suggested-by: Mattew Wilcox <willy@infradead.org>	[English fixes]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a00cda3f
    • Rui Salvaterra's avatar
      zram: break the strict dependency from lzo · 3d711a38
      Rui Salvaterra authored
      
      
      From the beginning, the zram block device always enabled CRYPTO_LZO,
      since lzo-rle is hardcoded as the fallback compression algorithm.  As a
      consequence, on systems where another compression algorithm is chosen
      (e.g.  CRYPTO_ZSTD), the lzo kernel module becomes unused, while still
      having to be built/loaded.
      
      This patch removes the hardcoded lzo-rle dependency and allows the user
      to select the default compression algorithm for zram at build time.  The
      previous behaviour is kept, as the default algorithm is still lzo-rle.
      
      Link: https://lkml.kernel.org/r/20201207121245.50529-1-rsalvaterra@gmail.com
      Signed-off-by: default avatarRui Salvaterra <rsalvaterra@gmail.com>
      Suggested-by: default avatarSergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Suggested-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3d711a38
    • Minchan Kim's avatar
      zram: add stat to gather incompressible pages since zram set up · 194e28da
      Minchan Kim authored
      
      
      Currently, zram supports the stat via /sys/block/zram/mm_stat to represent
      how many of incompressible pages are stored at the moment but it couldn't
      show how many times incompressible pages were wrote down since zram set
      up.  It's also good indication to see how zram is effective in the system.
      
      Link: https://lkml.kernel.org/r/20201130201907.1284910-1-minchan@kernel.org
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Reviewed-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      194e28da
    • Minchan Kim's avatar
      zram: support page writeback · 0d835962
      Minchan Kim authored
      
      
      There is demand to writeback specific process pages to backing store
      instead of all idles pages in the system due to storage wear out concerns
      and to launching latency of apps which are most of the time idle but are
      critical for resume latency.
      
      This patch extends the writeback knob to support a specific page
      writeback.
      
      Link: https://lkml.kernel.org/r/20201020190506.3758660-1-minchan@kernel.org
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Reviewed-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0d835962
    • Colin Ian King's avatar
      mm/process_vm_access: remove redundant initialization of iov_r · 95c9ae14
      Colin Ian King authored
      
      
      The pointer iov_r is being initialized with a value that is never read and
      it is being updated later with a new value.  The initialization is
      redundant and can be removed.
      
      Link: https://lkml.kernel.org/r/20201102120614.694917-1-colin.king@canonical.com
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      95c9ae14
    • Miaohe Lin's avatar
      mm/zsmalloc.c: rework the list_add code in insert_zspage() · 110ceb82
      Miaohe Lin authored
      
      
      Rework the list_add code to make it more readable and simple.
      
      Link: https://lkml.kernel.org/r/20201015130107.65195-1-linmiaohe@huawei.com
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Reviewed-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      110ceb82
    • Barry Song's avatar
      mm/zswap: move to use crypto_acomp API for hardware acceleration · 1ec3b5fe
      Barry Song authored
      
      
      Right now, all new ZIP drivers are adapted to crypto_acomp APIs rather
      than legacy crypto_comp APIs.  Tradiontal ZIP drivers like lz4,lzo etc
      have been also wrapped into acomp via scomp backend.  But zswap.c is still
      using the old APIs.  That means zswap won't be able to work on any new ZIP
      drivers in kernel.
      
      This patch moves to use cryto_acomp APIs to fix the disconnected bridge
      between new ZIP drivers and zswap.  It is probably the first real user to
      use acomp but perhaps not a good example to demonstrate how multiple acomp
      requests can be executed in parallel in one acomp instance.  frontswap is
      doing page load and store page by page synchronously.  swap_writepage()
      depends on the completion of frontswap_store() to decide if it should call
      __swap_writepage() to swap to disk.
      
      However this patch creates multiple acomp instances, so multiple threads
      running on multiple different cpus can actually do (de)compression
      parallelly, leveraging the power of multiple ZIP hardware queues.  This is
      also consistent with frontswap's page management model.
      
      The old zswap code uses atomic context and avoids the race conditions
      while shared resources like zswap_dstmem are accessed.  Here since acomp
      can sleep, per-cpu mutex is used to replace preemption-disable.
      
      While it is possible to make mm/page_io.c and mm/frontswap.c support async
      (de)compression in some way, the entire design requires careful thinking
      and performance evaluation.  For the first step, the base with fixed
      connection between ZIP drivers and zswap should be built.
      
      Link: https://lkml.kernel.org/r/20201107065332.26992-1-song.bao.hua@hisilicon.com
      Signed-off-by: default avatarBarry Song <song.bao.hua@hisilicon.com>
      Acked-by: default avatarVitaly Wool <vitalywool@gmail.com>
      Cc: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Mahipal Challa <mahipalreddy2006@gmail.com>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Zhou Wang <wangzhou1@hisilicon.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1ec3b5fe
    • YueHaibing's avatar
      mm/zswap: fix passing zero to 'PTR_ERR' warning · 42a44704
      YueHaibing authored
      Fix smatch warning:
      
        mm/zswap.c:425 zswap_cpu_comp_prepare() warn: passing zero to 'PTR_ERR'
      
      crypto_alloc_comp() never return NULL, use IS_ERR instead of
      IS_ERR_OR_NULL to fix this.
      
      Link: https://lkml.kernel.org/r/20201031055615.28080-1-yuehaibing@huawei.com
      Fixes: f1c54846
      
       ("zswap: dynamic pool creation")
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Vitaly Wool <vitaly.wool@konsulko.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      42a44704
    • Joe Perches's avatar
      mm/zswap: make struct kernel_param_ops definitions const · 83aed6cd
      Joe Perches authored
      
      
      These should be const, so make it so.
      
      Link: https://lkml.kernel.org/r/1791535ee0b00f4a5c68cc4a8adada06593ad8f1.1601770305.git.joe@perches.com
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Vitaly Wool <vitaly.wool@konsulko.com>
      Cc: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      83aed6cd
    • Peter Xu's avatar
      userfaultfd/selftests: hint the test runner on required privilege · d9f411ba
      Peter Xu authored
      
      
      Now userfaultfd test program requires either root or ptrace privilege due
      to the signal/event tests.  When UFFDIO_API failed, hint the test runner
      about this fact verbosely.
      
      Link: https://lkml.kernel.org/r/20201208024709.7701-4-peterx@redhat.com
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d9f411ba
    • Peter Xu's avatar
      userfaultfd/selftests: fix retval check for userfaultfd_open() · 1e17a24e
      Peter Xu authored
      
      
      userfaultfd_open() returns 1 for errors rather than negatives.  Fix it on
      all the callers so when UFFDIO_API failed the test will bail out.
      
      Link: https://lkml.kernel.org/r/20201208024709.7701-3-peterx@redhat.com
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1e17a24e
    • Peter Xu's avatar
      userfaultfd/selftests: always dump something in modes · 164c50be
      Peter Xu authored
      
      
      Patch series "userfaultfd: selftests: Small fixes".
      
      Some very trivial fixes that I kept locally to userfaultfd selftest
      program.
      
      This patch (of 3):
      
      BOUNCE_POLL is a special bit that if cleared it means "READ" instead.
      Dump that too otherwise we'll see tests with empty modes.
      
      Link: https://lkml.kernel.org/r/20201208024709.7701-1-peterx@redhat.com
      Link: https://lkml.kernel.org/r/20201208024709.7701-2-peterx@redhat.com
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      164c50be
    • Axel Rasmussen's avatar
      userfaultfd: selftests: make __{s,u}64 format specifiers portable · 77f962e7
      Axel Rasmussen authored
      
      
      On certain platforms (powerpcle is the one on which I ran into this),
      "%Ld" and "%Lu" are unsuitable for printing __s64 and __u64, respectively,
      resulting in build warnings.  Cast to {u,}int64_t, and use the PRI{d,u}64
      macros defined in inttypes.h to print them.  This ought to be portable to
      all platforms.
      
      Splitting this off into a separate macro lets us remove some lines, and
      get rid of some (I would argue) stylistically odd cases where we joined
      printf() and exit() into a single statement with a ,.
      
      Finally, this also fixes a "missing braces around initializer" warning
      when we initialize prms in wp_range().
      
      [axelrasmussen@google.com: v2]
        Link: https://lkml.kernel.org/r/20201203180244.1811601-1-axelrasmussen@google.com
      
      Link: https://lkml.kernel.org/r/20201202211542.1121189-1-axelrasmussen@google.com
      Signed-off-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Acked-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Joe Perches <joe@perches.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Alan Gilbert <dgilbert@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      77f962e7
    • Lokesh Gidra's avatar
      userfaultfd: add user-mode only option to unprivileged_userfaultfd sysctl knob · d0d4730a
      Lokesh Gidra authored
      
      
      With this change, when the knob is set to 0, it allows unprivileged users
      to call userfaultfd, like when it is set to 1, but with the restriction
      that page faults from only user-mode can be handled.  In this mode, an
      unprivileged user (without SYS_CAP_PTRACE capability) must pass
      UFFD_USER_MODE_ONLY to userfaultd or the API will fail with EPERM.
      
      This enables administrators to reduce the likelihood that an attacker with
      access to userfaultfd can delay faulting kernel code to widen timing
      windows for other exploits.
      
      The default value of this knob is changed to 0.  This is required for
      correct functioning of pipe mutex.  However, this will fail postcopy live
      migration, which will be unnoticeable to the VM guests.  To avoid this,
      set 'vm.userfault = 1' in /sys/sysctl.conf.
      
      The main reason this change is desirable as in the short term is that the
      Android userland will behave as with the sysctl set to zero.  So without
      this commit, any Linux binary using userfaultfd to manage its memory would
      behave differently if run within the Android userland.  For more details,
      refer to Andrea's reply [1].
      
      [1] https://lore.kernel.org/lkml/20200904033438.GI9411@redhat.com/
      
      Link: https://lkml.kernel.org/r/20201120030411.2690816-3-lokeshgidra@google.com
      Signed-off-by: default avatarLokesh Gidra <lokeshgidra@google.com>
      Reviewed-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
      Cc: Eric Biggers <ebiggers@kernel.org>
      Cc: Daniel Colascione <dancol@dancol.org>
      Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
      Cc: Kalesh Singh <kaleshsingh@google.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Jeff Vander Stoep <jeffv@google.com>
      Cc: <calin@google.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Nitin Gupta <nigupta@nvidia.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Iurii Zaikin <yzaikin@google.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: Daniel Colascione <dancol@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d0d4730a
    • Lokesh Gidra's avatar
      userfaultfd: add UFFD_USER_MODE_ONLY · 37cd0575
      Lokesh Gidra authored
      
      
      Patch series "Control over userfaultfd kernel-fault handling", v6.
      
      This patch series is split from [1].  The other series enables SELinux
      support for userfaultfd file descriptors so that its creation and movement
      can be controlled.
      
      It has been demonstrated on various occasions that suspending kernel code
      execution for an arbitrary amount of time at any access to userspace
      memory (copy_from_user()/copy_to_user()/...) can be exploited to change
      the intended behavior of the kernel.  For instance, handling page faults
      in kernel-mode using userfaultfd has been exploited in [2, 3].  Likewise,
      FUSE, which is similar to userfaultfd in this respect, has been exploited
      in [4, 5] for similar outcome.
      
      This small patch series adds a new flag to userfaultfd(2) that allows
      callers to give up the ability to handle kernel-mode faults with the
      resulting UFFD file object.  It then adds a 'user-mode only' option to the
      unprivileged_userfaultfd sysctl knob to require unprivileged callers to
      use this new flag.
      
      The purpose of this new interface is to decrease the chance of an
      unprivileged userfaultfd user taking advantage of userfaultfd to enhance
      security vulnerabilities by lengthening the race window in kernel code.
      
      [1] https://lore.kernel.org/lkml/20200211225547.235083-1-dancol@google.com/
      [2] https://duasynt.com/blog/linux-kernel-heap-spray
      [3] https://duasynt.com/blog/cve-2016-6187-heap-off-by-one-exploit
      [4] https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html
      [5] https://bugs.chromium.org/p/project-zero/issues/detail?id=808
      
      This patch (of 2):
      
      userfaultfd handles page faults from both user and kernel code.  Add a new
      UFFD_USER_MODE_ONLY flag for userfaultfd(2) that makes the resulting
      userfaultfd object refuse to handle faults from kernel mode, treating
      these faults as if SIGBUS were always raised, causing the kernel code to
      fail with EFAULT.
      
      A future patch adds a knob allowing administrators to give some processes
      the ability to create userfaultfd file objects only if they pass
      UFFD_USER_MODE_ONLY, reducing the likelihood that these processes will
      exploit userfaultfd's ability to delay kernel page faults to open timing
      windows for future exploits.
      
      Link: https://lkml.kernel.org/r/20201120030411.2690816-1-lokeshgidra@google.com
      Link: https://lkml.kernel.org/r/20201120030411.2690816-2-lokeshgidra@google.com
      Signed-off-by: default avatarDaniel Colascione <dancol@google.com>
      Signed-off-by: default avatarLokesh Gidra <lokeshgidra@google.com>
      Reviewed-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: <calin@google.com>
      Cc: Daniel Colascione <dancol@dancol.org>
      Cc: Eric Biggers <ebiggers@kernel.org>
      Cc: Iurii Zaikin <yzaikin@google.com>
      Cc: Jeff Vander Stoep <jeffv@google.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kalesh Singh <kaleshsingh@google.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Nitin Gupta <nigupta@nvidia.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      37cd0575
    • Vlastimil Babka's avatar
      mm, page_poison: remove CONFIG_PAGE_POISONING_ZERO · f289041e
      Vlastimil Babka authored
      CONFIG_PAGE_POISONING_ZERO uses the zero pattern instead of 0xAA.  It was
      introduced by commit 1414c7f4
      
       ("mm/page_poisoning.c: allow for zero
      poisoning"), noting that using zeroes retains the benefit of sanitizing
      content of freed pages, with the benefit of not having to zero them again
      on alloc, and the downside of making some forms of corruption (stray
      writes of NULLs) harder to detect than with the 0xAA pattern.  Together
      with CONFIG_PAGE_POISONING_NO_SANITY it made possible to sanitize the
      contents on free without checking it back on alloc.
      
      These days we have the init_on_free() option to achieve sanitization with
      zeroes and to save clearing on alloc (and without checking on alloc).
      Arguably if someone does choose to check the poison for corruption on
      alloc, the savings of not clearing the page are secondary, and it makes
      sense to always use the 0xAA poison pattern.  Thus, remove the
      CONFIG_PAGE_POISONING_ZERO option for being redundant.
      
      Link: https://lkml.kernel.org/r/20201113104033.22907-6-vbabka@suse.cz
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Laura Abbott <labbott@kernel.org>
      Cc: Mateusz Nosek <mateusznosek0@gmail.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f289041e
    • Vlastimil Babka's avatar
      mm, page_poison: remove CONFIG_PAGE_POISONING_NO_SANITY · 8f424750
      Vlastimil Babka authored
      CONFIG_PAGE_POISONING_NO_SANITY skips the check on page alloc whether the
      poison pattern was corrupted, suggesting a use-after-free.  The motivation
      to introduce it in commit 8823b1db
      
       ("mm/page_poison.c: enable
      PAGE_POISONING as a separate option") was to simply sanitize freed pages,
      optimally together with CONFIG_PAGE_POISONING_ZERO.
      
      These days we have an init_on_free=1 boot option, which makes this use
      case of page poisoning redundant.  For sanitizing, writing zeroes is
      sufficient, there is pretty much no benefit from writing the 0xAA poison
      pattern to freed pages, without checking it back on alloc.  Thus, remove
      this option and suggest init_on_free instead in the main config's help.
      
      Link: https://lkml.kernel.org/r/20201113104033.22907-5-vbabka@suse.cz
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Laura Abbott <labbott@kernel.org>
      Cc: Mateusz Nosek <mateusznosek0@gmail.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8f424750
    • Vlastimil Babka's avatar
      kernel/power: allow hibernation with page_poison sanity checking · 03b6c9a3
      Vlastimil Babka authored
      Page poisoning used to be incompatible with hibernation, as the state of
      poisoned pages was lost after resume, thus enabling CONFIG_HIBERNATION
      forces CONFIG_PAGE_POISONING_NO_SANITY.  For the same reason, the
      poisoning with zeroes variant CONFIG_PAGE_POISONING_ZERO used to disable
      hibernation.  The latter restriction was removed by commit 1ad1410f
      ("PM / Hibernate: allow hibernation with PAGE_POISONING_ZERO") and
      similarly for init_on_free by commit 18451f9f
      
       ("PM: hibernate: fix
      crashes with init_on_free=1") by making sure free pages are cleared after
      resume.
      
      We can use the same mechanism to instead poison free pages with
      PAGE_POISON after resume.  This covers both zero and 0xAA patterns.  Thus
      we can remove the Kconfig restriction that disables page poison sanity
      checking when hibernation is enabled.
      
      Link: https://lkml.kernel.org/r/20201113104033.22907-4-vbabka@suse.cz
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	[hibernation]
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Laura Abbott <labbott@kernel.org>
      Cc: Mateusz Nosek <mateusznosek0@gmail.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      03b6c9a3
    • Vlastimil Babka's avatar
      mm, page_poison: use static key more efficiently · 8db26a3d
      Vlastimil Babka authored
      Commit 11c9c7ed
      
       ("mm/page_poison.c: replace bool variable with static
      key") changed page_poisoning_enabled() to a static key check.  However,
      the function is not inlined, so each check still involves a function call
      with overhead not eliminated when page poisoning is disabled.
      
      Analogically to how debug_pagealloc is handled, this patch converts
      page_poisoning_enabled() back to boolean check, and introduces
      page_poisoning_enabled_static() for fast paths.  Both functions are
      inlined.
      
      The function kernel_poison_pages() is also called unconditionally and does
      the static key check inside.  Remove it from there and put it to callers.
      Also split it to two functions kernel_poison_pages() and
      kernel_unpoison_pages() instead of the confusing bool parameter.
      
      Also optimize the check that enables page poisoning instead of
      debug_pagealloc for architectures without proper debug_pagealloc support.
      Move the check to init_mem_debugging_and_hardening() to enable a single
      static key instead of having two static branches in
      page_poisoning_enabled_static().
      
      Link: https://lkml.kernel.org/r/20201113104033.22907-3-vbabka@suse.cz
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Laura Abbott <labbott@kernel.org>
      Cc: Mateusz Nosek <mateusznosek0@gmail.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8db26a3d
    • Vlastimil Babka's avatar
      mm, page_alloc: do not rely on the order of page_poison and init_on_alloc/free parameters · 04013513
      Vlastimil Babka authored
      
      
      Patch series "cleanup page poisoning", v3.
      
      I have identified a number of issues and opportunities for cleanup with
      CONFIG_PAGE_POISON and friends:
      
       - interaction with init_on_alloc and init_on_free parameters depends on
         the order of parameters (Patch 1)
      
       - the boot time enabling uses static key, but inefficienty (Patch 2)
      
       - sanity checking is incompatible with hibernation (Patch 3)
      
       - CONFIG_PAGE_POISONING_NO_SANITY can be removed now that we have
         init_on_free (Patch 4)
      
       - CONFIG_PAGE_POISONING_ZERO can be most likely removed now that we
         have init_on_free (Patch 5)
      
      This patch (of 5):
      
      Enabling page_poison=1 together with init_on_alloc=1 or init_on_free=1
      produces a warning in dmesg that page_poison takes precedence.  However,
      as these warnings are printed in early_param handlers for
      init_on_alloc/free, they are not printed if page_poison is enabled later
      on the command line (handlers are called in the order of their
      parameters), or when init_on_alloc/free is always enabled by the
      respective config option - before the page_poison early param handler is
      called, it is not considered to be enabled.  This is inconsistent.
      
      We can remove the dependency on order by making the init_on_* parameters
      only set a boolean variable, and postponing the evaluation after all early
      params have been processed.  Introduce a new
      init_mem_debugging_and_hardening() function for that, and move the related
      debug_pagealloc processing there as well.
      
      As a result init_mem_debugging_and_hardening() knows always accurately if
      init_on_* and/or page_poison options were enabled.  Thus we can also
      optimize want_init_on_alloc() and want_init_on_free().  We don't need to
      check page_poisoning_enabled() there, we can instead not enable the
      init_on_* static keys at all, if page poisoning is enabled.  This results
      in a simpler and more effective code.
      
      Link: https://lkml.kernel.org/r/20201113104033.22907-1-vbabka@suse.cz
      Link: https://lkml.kernel.org/r/20201113104033.22907-2-vbabka@suse.cz
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mateusz Nosek <mateusznosek0@gmail.com>
      Cc: Laura Abbott <labbott@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      04013513
    • Charan Teja Reddy's avatar
      mm: cma: improve pr_debug log in cma_release() · b8ca396f
      Charan Teja Reddy authored
      
      
      It is required to print 'count' of pages, along with the pages, passed to
      cma_release to debug the cases of mismatched count value passed between
      cma_alloc() and cma_release() from a code path.
      
      As an example, consider the below scenario:
      
      1) CMA pool size is 4MB and
      
      2) User doing the erroneous step of allocating 2 pages but freeing 1
         page in a loop from this CMA pool.  The step 2 causes cma_alloc() to
         return NULL at one point of time because of -ENOMEM condition.
      
      And the current pr_debug logs is not giving the info about these types of
      allocation patterns because of count value not being printed in
      cma_release().
      
      We are printing the count value in the trace logs, just extend the same to
      pr_debug logs too.
      
      [akpm@linux-foundation.org: fix printk warning]
      
      Link: https://lkml.kernel.org/r/1606318341-29521-1-git-send-email-charante@codeaurora.org
      Signed-off-by: default avatarCharan Teja Reddy <charante@codeaurora.org>
      Reviewed-by: default avatarSouptick Joarder <jrdr.linux@gmail.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Vinayak Menon <vinmenon@codeaurora.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b8ca396f
    • Lecopzer Chen's avatar
      mm/cma.c: remove redundant cma_mutex lock · a4efc174
      Lecopzer Chen authored
      The cma_mutex which protects alloc_contig_range() was first appeared in
      commit 7ee793a6 ("cma: Remove potential deadlock situation"), at that
      time, there is no guarantee the behavior of concurrency inside
      alloc_contig_range().
      
      After commit 2c7452a0
      
       ("mm/page_isolation.c: make
      start_isolate_page_range() fail if already isolated")
      
        > However, two subsystems (CMA and gigantic
        > huge pages for example) could attempt operations on the same range.  If
        > this happens, one thread may 'undo' the work another thread is doing.
        > This can result in pageblocks being incorrectly left marked as
        > MIGRATE_ISOLATE and therefore not available for page allocation.
      
      The concurrency inside alloc_contig_range() was clarified.
      
      Now we can find that hugepage and virtio call alloc_contig_range() without
      any lock, thus cma_mutex is "redundant" in cma_alloc() now.
      
      Link: https://lkml.kernel.org/r/20201020102241.3729-1-lecopzer.chen@mediatek.com
      Signed-off-by: default avatarLecopzer Chen <lecopzer.chen@mediatek.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Matthias Brugger <matthias.bgg@gmail.com>
      Cc: YJ Chiang <yj.chiang@mediatek.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a4efc174
    • Stephen Zhang's avatar
      mm: migrate: remove unused parameter in migrate_vma_insert_page() · d85c6db4
      Stephen Zhang authored
      
      
      "dst" parameter to migrate_vma_insert_page() is not used anymore.
      
      Link: https://lkml.kernel.org/r/CANubcdUwCAMuUyamG2dkWP=cqSR9MAS=tHLDc95kQkqU-rEnAg@mail.gmail.com
      Signed-off-by: default avatarStephen Zhang <starzhangzsd@gmail.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d85c6db4
    • Yang Shi's avatar
      mm: migrate: return -ENOSYS if THP migration is unsupported · d532e2e5
      Yang Shi authored
      
      
      In the current implementation unmap_and_move() would return -ENOMEM if THP
      migration is unsupported, then the THP will be split.  If split is failed
      just exit without trying to migrate other pages.  It doesn't make too much
      sense since there may be enough free memory to migrate other pages and
      there may be a lot base pages on the list.
      
      Return -ENOSYS to make consistent with hugetlb.  And if THP split is
      failed just skip and try other pages on the list.
      
      Just skip the whole list and exit when free memory is really low.
      
      Link: https://lkml.kernel.org/r/20201113205359.556831-6-shy828301@gmail.com
      Signed-off-by: default avatarYang Shi <shy828301@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d532e2e5
    • Yang Shi's avatar
      mm: migrate: clean up migrate_prep{_local} · 236c32eb
      Yang Shi authored
      
      
      The migrate_prep{_local} never fails, so it is pointless to have return
      value and check the return value.
      
      Link: https://lkml.kernel.org/r/20201113205359.556831-5-shy828301@gmail.com
      Signed-off-by: default avatarYang Shi <shy828301@gmail.com>
      Reviewed-by: default avatarZi Yan <ziy@nvidia.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Song Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      236c32eb
    • Yang Shi's avatar
      mm: migrate: skip shared exec THP for NUMA balancing · c77c5cba
      Yang Shi authored
      
      
      The NUMA balancing skip shared exec base page.  Since
      CONFIG_READ_ONLY_THP_FOR_FS was introduced, there are probably shared exec
      THP, so skip such THPs for NUMA balancing as well.
      
      And Willy's regular filesystem THP support patches could create shared
      exec THP wven without that config.
      
      In addition, the page_is_file_lru() is used to tell if the page is file
      cache or not, but it filters out shmem page.  It sounds like a typical
      usecase by putting executables in shmem to achieve performance gain via
      using shmem-THP, so it sounds worth skipping migration for such case too.
      
      Link: https://lkml.kernel.org/r/20201113205359.556831-4-shy828301@gmail.com
      Signed-off-by: default avatarYang Shi <shy828301@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c77c5cba
    • Yang Shi's avatar
      mm: migrate: simplify the logic for handling permanent failure · dd4ae78a
      Yang Shi authored
      
      
      When unmap_and_move{_huge_page}() returns !-EAGAIN and
      !MIGRATEPAGE_SUCCESS, the page would be put back to LRU or proper list if
      it is non-LRU movable page.  But, the callers always call
      putback_movable_pages() to put the failed pages back later on, so it seems
      not very efficient to put every single page back immediately, and the code
      looks convoluted.
      
      Put the failed page on a separate list, then splice the list to migrate
      list when all pages are tried.  It is the caller's responsibility to call
      putback_movable_pages() to handle failures.  This also makes the code
      simpler and more readable.
      
      After the change the rules are:
          * Success: non hugetlb page will be freed, hugetlb page will be put
                     back
          * -EAGAIN: stay on the from list
          * -ENOMEM: stay on the from list
          * Other errno: put on ret_pages list then splice to from list
      
      The from list would be empty iff all pages are migrated successfully, it
      was not so before.  This has no impact to current existing callsites.
      
      Link: https://lkml.kernel.org/r/20201113205359.556831-3-shy828301@gmail.com
      Signed-off-by: default avatarYang Shi <shy828301@gmail.com>
      Reviewed-by: default avatarZi Yan <ziy@nvidia.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Song Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dd4ae78a
    • Yang Shi's avatar
      mm: truncate_complete_page() does not exist any more · d12b8951
      Yang Shi authored
      Patch series "mm: misc migrate cleanup and improvement", v3.
      
      This patch (of 5):
      
      The commit 9f4e41f4
      
       ("mm: refactor truncate_complete_page()")
      refactored truncate_complete_page(), and it is not existed anymore,
      correct the comment in vmscan and migrate to avoid confusion.
      
      Link: https://lkml.kernel.org/r/20201113205359.556831-1-shy828301@gmail.com
      Link: https://lkml.kernel.org/r/20201113205359.556831-2-shy828301@gmail.com
      Signed-off-by: default avatarYang Shi <shy828301@gmail.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d12b8951
    • Matthew Wilcox (Oracle)'s avatar
      mm: support THPs in zero_user_segments · 0060ef3b
      Matthew Wilcox (Oracle) authored
      
      
      We can only kmap() one subpage of a THP at a time, so loop over all
      relevant subpages, skipping ones which don't need to be zeroed.  This is
      too large to inline when THPs are enabled and we actually need highmem, so
      put it in highmem.c.
      
      [willy@infradead.org: start1 was allowed to be less than start2]
      
      Link: https://lkml.kernel.org/r/20201124041507.28996-1-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0060ef3b
    • Ralph Campbell's avatar
      mm/migrate.c: optimize migrate_vma_pages() mmu notifier · 5e5dda81
      Ralph Campbell authored
      
      
      When migrating a zero page or pte_none() anonymous page to device private
      memory, migrate_vma_setup() will initialize the src[] array with a NULL
      PFN.  This lets the device driver allocate device private memory and clear
      it instead of DMAing a page of zeros over the device bus.
      
      Since the source page didn't exist at the time, no struct page was locked
      nor a migration PTE inserted into the CPU page tables.  The actual PTE
      insertion happens in migrate_vma_pages() when it tries to insert the
      device private struct page PTE into the CPU page tables.
      migrate_vma_pages() has to call the mmu notifiers again since another
      device could fault on the same page before the page table locks are
      acquired.
      
      Allow device drivers to optimize the invalidation similar to
      migrate_vma_setup() by calling mmu_notifier_range_init() which sets struct
      mmu_notifier_range event type to MMU_NOTIFY_MIGRATE and the
      migrate_pgmap_owner field.
      
      Link: https://lkml.kernel.org/r/20201021191335.10916-1-rcampbell@nvidia.com
      Signed-off-by: default avatarRalph Campbell <rcampbell@nvidia.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5e5dda81
    • Long Li's avatar
      mm/migrate.c: fix comment spelling · ab9dd4f8
      Long Li authored
      
      
      The word in the comment is misspelled, it should be "include".
      
      Link: https://lkml.kernel.org/r/20201024114144.GA20552@lilong
      Signed-off-by: default avatarLong Li <lonuxli.64@gmail.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ab9dd4f8
    • Hui Su's avatar
      mm/oom_kill: change comment and rename is_dump_unreclaim_slabs() · 259b3633
      Hui Su authored
      
      
      Change the comment of is_dump_unreclaim_slabs(), it just check whether
      nr_unreclaimable slabs amount is greater than user memory, and explain why
      we dump unreclaim slabs.
      
      Rename it to should_dump_unreclaim_slab() maybe better.
      
      Link: https://lkml.kernel.org/r/20201030182704.GA53949@rlk
      Signed-off-by: default avatarHui Su <sh_def@163.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      259b3633