Skip to content
  1. Oct 27, 2018
    • Yang Shi's avatar
      mm: brk: downgrade mmap_sem to read when shrinking · 9bc8039e
      Yang Shi authored
      
      
      brk might be used to shrink memory mapping too other than munmap().  So,
      it may hold write mmap_sem for long time when shrinking large mapping, as
      what commit ("mm: mmap: zap pages with read mmap_sem in munmap")
      described.
      
      The brk() will not manipulate vmas anymore after __do_munmap() call for
      the mapping shrink use case.  But, it may set mm->brk after __do_munmap(),
      which needs hold write mmap_sem.
      
      However, a simple trick can workaround this by setting mm->brk before
      __do_munmap().  Then restore the original value if __do_munmap() fails.
      With this trick, it is safe to downgrade to read mmap_sem.
      
      So, the same optimization, which downgrades mmap_sem to read for zapping
      pages, is also feasible and reasonable to this case.
      
      The period of holding exclusive mmap_sem for shrinking large mapping would
      be reduced significantly with this optimization.
      
      [akpm@linux-foundation.org: tweak comment]
      [yang.shi@linux.alibaba.com: fix unsigned compare against 0 issue]
        Link: http://lkml.kernel.org/r/1538687672-17795-1-git-send-email-yang.shi@linux.alibaba.com
      Link: http://lkml.kernel.org/r/1538067582-60038-2-git-send-email-yang.shi@linux.alibaba.com
      Signed-off-by: default avatarYang Shi <yang.shi@linux.alibaba.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9bc8039e
    • Yang Shi's avatar
      mm: mremap: downgrade mmap_sem to read when shrinking · 85a06835
      Yang Shi authored
      
      
      Other than munmap, mremap might be used to shrink memory mapping too.
      So, it may hold write mmap_sem for long time when shrinking large
      mapping, as what commit ("mm: mmap: zap pages with read mmap_sem in
      munmap") described.
      
      The mremap() will not manipulate vmas anymore after __do_munmap() call for
      the mapping shrink use case, so it is safe to downgrade to read mmap_sem.
      
      So, the same optimization, which downgrades mmap_sem to read for zapping
      pages, is also feasible and reasonable to this case.
      
      The period of holding exclusive mmap_sem for shrinking large mapping
      would be reduced significantly with this optimization.
      
      MREMAP_FIXED and MREMAP_MAYMOVE are more complicated to adopt this
      optimization since they need manipulate vmas after do_munmap(),
      downgrading mmap_sem may create race window.
      
      Simple mapping shrink is the low hanging fruit, and it may cover the
      most cases of unmap with munmap together.
      
      [akpm@linux-foundation.org: tweak comment]
      [yang.shi@linux.alibaba.com: fix unsigned compare against 0 issue]
        Link: http://lkml.kernel.org/r/1538687672-17795-2-git-send-email-yang.shi@linux.alibaba.com
      Link: http://lkml.kernel.org/r/1538067582-60038-1-git-send-email-yang.shi@linux.alibaba.com
      Signed-off-by: default avatarYang Shi <yang.shi@linux.alibaba.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      85a06835
    • Souptick Joarder's avatar
      mm/filemap.c: use vmf_error() · 3c051324
      Souptick Joarder authored
      
      
      These codes can be replaced with new inline vmf_error().
      
      Link: http://lkml.kernel.org/r/20180927171411.GA23331@jordon-HP-15-Notebook-PC
      Signed-off-by: default avatarSouptick Joarder <jrdr.linux@gmail.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3c051324
    • Alexandre Ghiti's avatar
      hugetlb: introduce generic version of huge_ptep_get · 544db759
      Alexandre Ghiti authored
      
      
      ia64, mips, parisc, powerpc, sh, sparc, x86 architectures use the same
      version of huge_ptep_get, so move this generic implementation into
      asm-generic/hugetlb.h.
      
      [arnd@arndb.de: fix ARM 3level page tables]
        Link: http://lkml.kernel.org/r/20181005161722.904274-1-arnd@arndb.de
      Link: http://lkml.kernel.org/r/20180920060358.16606-12-alex@ghiti.fr
      Signed-off-by: default avatarAlexandre Ghiti <alex@ghiti.fr>
      Reviewed-by: default avatarLuiz Capitulino <lcapitulino@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Tested-by: Helge Deller <deller@gmx.de>			[parisc]
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Acked-by: Paul Burton <paul.burton@mips.com>		[MIPS]
      Acked-by: Ingo Molnar <mingo@kernel.org>		[x86]
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      544db759
    • Alexandre Ghiti's avatar
      hugetlb: introduce generic version of huge_ptep_set_access_flags() · facf6d5b
      Alexandre Ghiti authored
      
      
      arm, ia64, sh, x86 architectures use the same version
      of huge_ptep_set_access_flags, so move this generic implementation
      into asm-generic/hugetlb.h.
      
      Link: http://lkml.kernel.org/r/20180920060358.16606-11-alex@ghiti.fr
      Signed-off-by: default avatarAlexandre Ghiti <alex@ghiti.fr>
      Reviewed-by: default avatarLuiz Capitulino <lcapitulino@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Tested-by: Helge Deller <deller@gmx.de>			[parisc]
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Acked-by: Paul Burton <paul.burton@mips.com>		[MIPS]
      Acked-by: Ingo Molnar <mingo@kernel.org>		[x86]
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      facf6d5b
    • Alexandre Ghiti's avatar
      hugetlb: introduce generic version of huge_ptep_set_wrprotect() · 8e581d43
      Alexandre Ghiti authored
      
      
      arm, ia64, mips, powerpc, sh, x86 architectures use the same version of
      huge_ptep_set_wrprotect, so move this generic implementation into
      asm-generic/hugetlb.h.
      
      Link: http://lkml.kernel.org/r/20180920060358.16606-10-alex@ghiti.fr
      Signed-off-by: default avatarAlexandre Ghiti <alex@ghiti.fr>
      Reviewed-by: default avatarLuiz Capitulino <lcapitulino@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Tested-by: Helge Deller <deller@gmx.de>			[parisc]
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Acked-by: Paul Burton <paul.burton@mips.com>		[MIPS]
      Acked-by: Ingo Molnar <mingo@kernel.org>		[x86]
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8e581d43
    • Alexandre Ghiti's avatar
      hugetlb: introduce generic version of prepare_hugepage_range · 78d6e4e8
      Alexandre Ghiti authored
      
      
      arm, arm64, powerpc, sparc, x86 architectures use the same version of
      prepare_hugepage_range, so move this generic implementation into
      asm-generic/hugetlb.h.
      
      Link: http://lkml.kernel.org/r/20180920060358.16606-9-alex@ghiti.fr
      Signed-off-by: default avatarAlexandre Ghiti <alex@ghiti.fr>
      Reviewed-by: default avatarLuiz Capitulino <lcapitulino@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Tested-by: Helge Deller <deller@gmx.de>			[parisc]
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Acked-by: Paul Burton <paul.burton@mips.com>		[MIPS]
      Acked-by: Ingo Molnar <mingo@kernel.org>		[x86]
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      78d6e4e8
    • Alexandre Ghiti's avatar
      hugetlb: introduce generic version of huge_pte_wrprotect · c4916a00
      Alexandre Ghiti authored
      
      
      arm, arm64, ia64, mips, parisc, powerpc, sh, sparc, x86 architectures use
      the same version of huge_pte_wrprotect, so move this generic
      implementation into asm-generic/hugetlb.h.
      
      Link: http://lkml.kernel.org/r/20180920060358.16606-8-alex@ghiti.fr
      Signed-off-by: default avatarAlexandre Ghiti <alex@ghiti.fr>
      Reviewed-by: default avatarLuiz Capitulino <lcapitulino@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Tested-by: Helge Deller <deller@gmx.de>			[parisc]
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Acked-by: Paul Burton <paul.burton@mips.com>		[MIPS]
      Acked-by: Ingo Molnar <mingo@kernel.org>		[x86]
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c4916a00
    • Alexandre Ghiti's avatar
      hugetlb: introduce generic version of huge_pte_none() · cae72abc
      Alexandre Ghiti authored
      
      
      arm, arm64, ia64, mips, parisc, powerpc, sh, sparc, x86 architectures use
      the same version of huge_pte_none, so move this generic implementation
      into asm-generic/hugetlb.h.
      
      Link: http://lkml.kernel.org/r/20180920060358.16606-7-alex@ghiti.fr
      Signed-off-by: default avatarAlexandre Ghiti <alex@ghiti.fr>
      Reviewed-by: default avatarLuiz Capitulino <lcapitulino@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Tested-by: Helge Deller <deller@gmx.de>			[parisc]
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Acked-by: Paul Burton <paul.burton@mips.com>		[MIPS]
      Acked-by: Ingo Molnar <mingo@kernel.org>		[x86]
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cae72abc
    • Alexandre Ghiti's avatar
      hugetlb: introduce generic version of huge_ptep_clear_flush · fe632225
      Alexandre Ghiti authored
      
      
      arm, x86 architectures use the same version of huge_ptep_clear_flush, so
      move this generic implementation into asm-generic/hugetlb.h.
      
      Link: http://lkml.kernel.org/r/20180920060358.16606-6-alex@ghiti.fr
      Signed-off-by: default avatarAlexandre Ghiti <alex@ghiti.fr>
      Reviewed-by: default avatarLuiz Capitulino <lcapitulino@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Tested-by: Helge Deller <deller@gmx.de>			[parisc]
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Acked-by: Paul Burton <paul.burton@mips.com>		[MIPS]
      Acked-by: Ingo Molnar <mingo@kernel.org>		[x86]
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fe632225
    • Alexandre Ghiti's avatar
      hugetlb: introduce generic version of huge_ptep_get_and_clear() · a4d83853
      Alexandre Ghiti authored
      
      
      arm, ia64, sh, x86 architectures use the same version of
      huge_ptep_get_and_clear, so move this generic implementation into
      asm-generic/hugetlb.h.
      
      Link: http://lkml.kernel.org/r/20180920060358.16606-5-alex@ghiti.fr
      Signed-off-by: default avatarAlexandre Ghiti <alex@ghiti.fr>
      Reviewed-by: default avatarLuiz Capitulino <lcapitulino@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Tested-by: Helge Deller <deller@gmx.de>			[parisc]
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Acked-by: Paul Burton <paul.burton@mips.com>		[MIPS]
      Acked-by: Ingo Molnar <mingo@kernel.org>		[x86]
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a4d83853
    • Alexandre Ghiti's avatar
      hugetlb: introduce generic version of set_huge_pte_at() · cea685d5
      Alexandre Ghiti authored
      
      
      arm, ia64, mips, powerpc, sh, x86 architectures use the same version of
      set_huge_pte_at, so move this generic implementation into
      asm-generic/hugetlb.h.
      
      Link: http://lkml.kernel.org/r/20180920060358.16606-4-alex@ghiti.fr
      Signed-off-by: default avatarAlexandre Ghiti <alex@ghiti.fr>
      Reviewed-by: default avatarLuiz Capitulino <lcapitulino@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Tested-by: Helge Deller <deller@gmx.de>			[parisc]
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Acked-by: Paul Burton <paul.burton@mips.com>		[MIPS]
      Acked-by: Ingo Molnar <mingo@kernel.org>		[x86]
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cea685d5
    • Alexandre Ghiti's avatar
      hugetlb: introduce generic version of hugetlb_free_pgd_range · 1e5f50fc
      Alexandre Ghiti authored
      
      
      arm, arm64, mips, parisc, sh, x86 architectures use the same version of
      hugetlb_free_pgd_range, so move this generic implementation into
      asm-generic/hugetlb.h.
      
      Link: http://lkml.kernel.org/r/20180920060358.16606-3-alex@ghiti.fr
      Signed-off-by: default avatarAlexandre Ghiti <alex@ghiti.fr>
      Reviewed-by: default avatarLuiz Capitulino <lcapitulino@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Tested-by: Helge Deller <deller@gmx.de>			[parisc]
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Acked-by: Paul Burton <paul.burton@mips.com>		[MIPS]
      Acked-by: Ingo Molnar <mingo@kernel.org>		[x86]
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1e5f50fc
    • Alexandre Ghiti's avatar
      hugetlb: harmonize hugetlb.h arch specific defines with pgtable.h · d018498c
      Alexandre Ghiti authored
      
      
      In order to reduce copy/paste of functions across architectures and then
      make riscv hugetlb port (and future ports) simpler and smaller, this
      patchset intends to factorize the numerous hugetlb primitives that are
      defined across all the architectures.
      
      Except for prepare_hugepage_range, this patchset moves the versions that
      are just pass-through to standard pte primitives into
      asm-generic/hugetlb.h by using the same #ifdef semantic that can be found
      in asm-generic/pgtable.h, i.e.  __HAVE_ARCH_***.
      
      s390 architecture has not been tackled in this serie since it does not use
      asm-generic/hugetlb.h at all.
      
      This patchset has been compiled on all addressed architectures with
      success (except for parisc, but the problem does not come from this
      series).
      
      This patch (of 11):
      
      asm-generic/hugetlb.h proposes generic implementations of hugetlb related
      functions: use __HAVE_ARCH_HUGE* defines in order to make arch specific
      implementations of hugetlb functions consistent with pgtable.h scheme.
      
      Link: http://lkml.kernel.org/r/20180920060358.16606-2-alex@ghiti.fr
      Signed-off-by: default avatarAlexandre Ghiti <alex@ghiti.fr>
      Reviewed-by: default avatarLuiz Capitulino <lcapitulino@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>		[x86]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d018498c
    • Wei Yang's avatar
      mm: remove unnecessary local variable addr in __get_user_pages_fast() · d4faa402
      Wei Yang authored
      
      
      The local variable `addr' in __get_user_pages_fast() is just a shadow of
      `start'.  Since `start' never changes after assignment to `addr', it is
      fine to replace `start' with it.
      
      Also the meaning of [start, end] is more obvious than [addr, end] when
      passed to gup_pgd_range().
      
      Link: http://lkml.kernel.org/r/20180925021448.20265-1-richard.weiyang@gmail.com
      Signed-off-by: default avatarWei Yang <richard.weiyang@gmail.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d4faa402
    • Alexander Duyck's avatar
      mm: defer ZONE_DEVICE page initialization to the point where we init pgmap · 966cf44f
      Alexander Duyck authored
      
      
      The ZONE_DEVICE pages were being initialized in two locations.  One was
      with the memory_hotplug lock held and another was outside of that lock.
      The problem with this is that it was nearly doubling the memory
      initialization time.  Instead of doing this twice, once while holding a
      global lock and once without, I am opting to defer the initialization to
      the one outside of the lock.  This allows us to avoid serializing the
      overhead for memory init and we can instead focus on per-node init times.
      
      One issue I encountered is that devm_memremap_pages and
      hmm_devmmem_pages_create were initializing only the pgmap field the same
      way.  One wasn't initializing hmm_data, and the other was initializing it
      to a poison value.  Since this is something that is exposed to the driver
      in the case of hmm I am opting for a third option and just initializing
      hmm_data to 0 since this is going to be exposed to unknown third party
      drivers.
      
      [alexander.h.duyck@linux.intel.com: fix reference count for pgmap in devm_memremap_pages]
        Link: http://lkml.kernel.org/r/20181008233404.1909.37302.stgit@localhost.localdomain
      Link: http://lkml.kernel.org/r/20180925202053.3576.66039.stgit@localhost.localdomain
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Reviewed-by: default avatarPavel Tatashin <pavel.tatashin@microsoft.com>
      Tested-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      966cf44f
    • Alexander Duyck's avatar
      mm: create non-atomic version of SetPageReserved for init use · d483da5b
      Alexander Duyck authored
      It doesn't make much sense to use the atomic SetPageReserved at init time
      when we are using memset to clear the memory and manipulating the page
      flags via simple "&=" and "|=" operations in __init_single_page.
      
      This patch adds a non-atomic version __SetPageReserved that can be used
      during page init and shows about a 10% improvement in initialization times
      on the systems I have available for testing.  On those systems I saw
      initialization times drop from around 35 seconds to around 32 seconds to
      initialize a 3TB block of persistent memory.  I believe the main advantage
      of this is that it allows for more compiler optimization as the __set_bit
      operation can be reordered whereas the atomic version cannot.
      
      I tried adding a bit of documentation based on f1dd2cd1
      
       ("mm,
      memory_hotplug: do not associate hotadded memory to zones until online").
      
      Ideally the reserved flag should be set earlier since there is a brief
      window where the page is initialization via __init_single_page and we have
      not set the PG_Reserved flag.  I'm leaving that for a future patch set as
      that will require a more significant refactor.
      
      Link: http://lkml.kernel.org/r/20180925202018.3576.11607.stgit@localhost.localdomain
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Reviewed-by: default avatarPavel Tatashin <pavel.tatashin@microsoft.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d483da5b
    • Alexander Duyck's avatar
      mm: provide kernel parameter to allow disabling page init poisoning · f682a97a
      Alexander Duyck authored
      
      
      Patch series "Address issues slowing persistent memory initialization", v5.
      
      The main thing this patch set achieves is that it allows us to initialize
      each node worth of persistent memory independently.  As a result we reduce
      page init time by about 2 minutes because instead of taking 30 to 40
      seconds per node and going through each node one at a time, we process all
      4 nodes in parallel in the case of a 12TB persistent memory setup spread
      evenly over 4 nodes.
      
      This patch (of 3):
      
      On systems with a large amount of memory it can take a significant amount
      of time to initialize all of the page structs with the PAGE_POISON_PATTERN
      value.  I have seen it take over 2 minutes to initialize a system with
      over 12TB of RAM.
      
      In order to work around the issue I had to disable CONFIG_DEBUG_VM and
      then the boot time returned to something much more reasonable as the
      arch_add_memory call completed in milliseconds versus seconds.  However in
      doing that I had to disable all of the other VM debugging on the system.
      
      In order to work around a kernel that might have CONFIG_DEBUG_VM enabled
      on a system that has a large amount of memory I have added a new kernel
      parameter named "vm_debug" that can be set to "-" in order to disable it.
      
      Link: http://lkml.kernel.org/r/20180925201921.3576.84239.stgit@localhost.localdomain
      Reviewed-by: default avatarPavel Tatashin <pavel.tatashin@microsoft.com>
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f682a97a
    • Shakeel Butt's avatar
      memcg: remove memcg_kmem_skip_account · 85cfb245
      Shakeel Butt authored
      
      
      The flag memcg_kmem_skip_account was added during the era of opt-out kmem
      accounting.  There is no need for such flag in the opt-in world as there
      aren't any __GFP_ACCOUNT allocations within memcg_create_cache_enqueue().
      
      Link: http://lkml.kernel.org/r/20180919004501.178023-1-shakeelb@google.com
      Signed-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Greg Thelen <gthelen@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      85cfb245
    • Oscar Salvador's avatar
      mm/memory_hotplug.c: clean up node_states_check_changes_offline() · 86b27bea
      Oscar Salvador authored
      
      
      This patch, as the previous one, gets rid of the wrong if statements.
      While at it, I realized that the comments are sometimes very confusing,
      to say the least, and wrong.
      For example:
      
      ___
      zone_last = ZONE_MOVABLE;
      
      /*
       * check whether node_states[N_HIGH_MEMORY] will be changed
       * If we try to offline the last present @nr_pages from the node,
       * we can determind we will need to clear the node from
       * node_states[N_HIGH_MEMORY].
       */
      
      for (; zt <= zone_last; zt++)
              present_pages += pgdat->node_zones[zt].present_pages;
      if (nr_pages >= present_pages)
              arg->status_change_nid = zone_to_nid(zone);
      else
              arg->status_change_nid = -1;
      ___
      
      In case the node gets empry, it must be removed from N_MEMORY.  We already
      check N_HIGH_MEMORY a bit above within the CONFIG_HIGHMEM ifdef code.  Not
      to say that status_change_nid is for N_MEMORY, and not for N_HIGH_MEMORY.
      
      So I re-wrote some of the comments to what I think is better.
      
      [osalvador@suse.de: address feedback from Pavel]
        Link: http://lkml.kernel.org/r/20180921132634.10103-5-osalvador@techadventures.net
      Link: http://lkml.kernel.org/r/20180919100819.25518-6-osalvador@techadventures.net
      Signed-off-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarPavel Tatashin <pavel.tatashin@microsoft.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: <yasu.isimatu@gmail.com>
      Cc: Mathieu Malaterre <malat@debian.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      86b27bea
    • Oscar Salvador's avatar
      mm/memory_hotplug.c: simplify node_states_check_changes_online · 8efe33f4
      Oscar Salvador authored
      
      
      While looking at node_states_check_changes_online, I stumbled upon some
      confusing things.
      
      Right after entering the function, we find this:
      
      if (N_MEMORY == N_NORMAL_MEMORY)
              zone_last = ZONE_MOVABLE;
      
      This is wrong.
      N_MEMORY cannot really be equal to N_NORMAL_MEMORY.
      My guess is that this wanted to be something like:
      
      if (N_NORMAL_MEMORY == N_HIGH_MEMORY)
      
      to check if we have CONFIG_HIGHMEM.
      
      Later on, in the CONFIG_HIGHMEM block, we have:
      
      if (N_MEMORY == N_HIGH_MEMORY)
              zone_last = ZONE_MOVABLE;
      
      Again, this is wrong, and will never be evaluated to true.
      
      Besides removing these wrong if statements, I simplified the function a
      bit.
      
      [osalvador@suse.de: address feedback from Pavel]
        Link: http://lkml.kernel.org/r/20180921132634.10103-4-osalvador@techadventures.net
      Link: http://lkml.kernel.org/r/20180919100819.25518-5-osalvador@techadventures.net
      Signed-off-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarPavel Tatashin <pavel.tatashin@microsoft.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: <yasu.isimatu@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8efe33f4
    • Oscar Salvador's avatar
      mm/memory_hotplug.c: tidy up node_states_clear_node() · cf01f6f5
      Oscar Salvador authored
      
      
      node_states_clear has the following if statements:
      
      if ((N_MEMORY != N_NORMAL_MEMORY) &&
          (arg->status_change_nid_high >= 0))
              ...
      
      if ((N_MEMORY != N_HIGH_MEMORY) &&
          (arg->status_change_nid >= 0))
              ...
      
      N_MEMORY can never be equal to neither N_NORMAL_MEMORY nor
      N_HIGH_MEMORY.
      
      Similar problem was found in [1].
      Since this is wrong, let us get rid of it.
      
      [1] https://patchwork.kernel.org/patch/10579155/
      
      Link: http://lkml.kernel.org/r/20180919100819.25518-4-osalvador@techadventures.net
      Signed-off-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarPavel Tatashin <pavel.tatashin@microsoft.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: <yasu.isimatu@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cf01f6f5
    • Oscar Salvador's avatar
      mm/memory_hotplug.c: spare unnecessary calls to node_set_state · 83d83612
      Oscar Salvador authored
      
      
      In node_states_check_changes_online, we check if the node will have to be
      set for any of the N_*_MEMORY states after the pages have been onlined.
      
      Later on, we perform the activation in node_states_set_node.  Currently,
      in node_states_set_node we set the node to N_MEMORY unconditionally.
      
      This means that we call node_set_state for N_MEMORY every time pages go
      online, but we only need to do it if the node has not yet been set for
      N_MEMORY.
      
      Fix this by checking status_change_nid.
      
      Link: http://lkml.kernel.org/r/20180919100819.25518-2-osalvador@techadventures.net
      Signed-off-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarPavel Tatashin <pavel.tatashin@microsoft.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: <yasu.isimatu@gmail.com>
      Cc: Mathieu Malaterre <malat@debian.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      83d83612
    • haiqing.shq's avatar
      mm/filemap.c: Use existing variable · 3cb7b121
      haiqing.shq authored
      
      
      Use the variable write_len instead of ov_iter_count(from).
      
      Link: http://lkml.kernel.org/r/1537375855-2088-1-git-send-email-leviathan0992@gmail.com
      Signed-off-by: default avatarhaiqing.shq <leviathan0992@gmail.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3cb7b121
    • Yang Shi's avatar
      mm: unmap VM_PFNMAP mappings with optimized path · cb492249
      Yang Shi authored
      
      
      When unmapping VM_PFNMAP mappings, vm flags need to be updated.  Since the
      vmas have been detached, so it sounds safe to update vm flags with read
      mmap_sem.
      
      Link: http://lkml.kernel.org/r/1537376621-51150-4-git-send-email-yang.shi@linux.alibaba.com
      Signed-off-by: default avatarYang Shi <yang.shi@linux.alibaba.com>
      Reviewed-by: default avatarMatthew Wilcox <willy@infradead.org>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cb492249
    • Yang Shi's avatar
      mm: unmap VM_HUGETLB mappings with optimized path · b4cefb36
      Yang Shi authored
      
      
      When unmapping VM_HUGETLB mappings, vm flags need to be updated.  Since
      the vmas have been detached, so it sounds safe to update vm flags with
      read mmap_sem.
      
      Link: http://lkml.kernel.org/r/1537376621-51150-3-git-send-email-yang.shi@linux.alibaba.com
      Signed-off-by: default avatarYang Shi <yang.shi@linux.alibaba.com>
      Reviewed-by: default avatarMatthew Wilcox <willy@infradead.org>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b4cefb36
    • Yang Shi's avatar
      mm: mmap: zap pages with read mmap_sem in munmap · dd2283f2
      Yang Shi authored
      
      
      Patch series "mm: zap pages with read mmap_sem in munmap for large
      mapping", v11.
      
      Background:
      Recently, when we ran some vm scalability tests on machines with large memory,
      we ran into a couple of mmap_sem scalability issues when unmapping large memory
      space, please refer to https://lkml.org/lkml/2017/12/14/733 and
      https://lkml.org/lkml/2018/2/20/576.
      
      History:
      Then akpm suggested to unmap large mapping section by section and drop mmap_sem
      at a time to mitigate it (see https://lkml.org/lkml/2018/3/6/784).
      
      V1 patch series was submitted to the mailing list per Andrew's suggestion
      (see https://lkml.org/lkml/2018/3/20/786).  Then I received a lot great
      feedback and suggestions.
      
      Then this topic was discussed on LSFMM summit 2018.  In the summit, Michal
      Hocko suggested (also in the v1 patches review) to try "two phases"
      approach.  Zapping pages with read mmap_sem, then doing via cleanup with
      write mmap_sem (for discussion detail, see
      https://lwn.net/Articles/753269/)
      
      Approach:
      Zapping pages is the most time consuming part, according to the suggestion from
      Michal Hocko [1], zapping pages can be done with holding read mmap_sem, like
      what MADV_DONTNEED does. Then re-acquire write mmap_sem to cleanup vmas.
      
      But, we can't call MADV_DONTNEED directly, since there are two major drawbacks:
        * The unexpected state from PF if it wins the race in the middle of munmap.
          It may return zero page, instead of the content or SIGSEGV.
        * Can't handle VM_LOCKED | VM_HUGETLB | VM_PFNMAP and uprobe mappings, which
          is a showstopper from akpm
      
      But, some part may need write mmap_sem, for example, vma splitting. So,
      the design is as follows:
              acquire write mmap_sem
              lookup vmas (find and split vmas)
              deal with special mappings
              detach vmas
              downgrade_write
      
              zap pages
              free page tables
              release mmap_sem
      
      The vm events with read mmap_sem may come in during page zapping, but
      since vmas have been detached before, they, i.e.  page fault, gup, etc,
      will not be able to find valid vma, then just return SIGSEGV or -EFAULT as
      expected.
      
      If the vma has VM_HUGETLB | VM_PFNMAP, they are considered as special
      mappings.  They will be handled by falling back to regular do_munmap()
      with exclusive mmap_sem held in this patch since they may update vm flags.
      
      But, with the "detach vmas first" approach, the vmas have been detached
      when vm flags are updated, so it sounds safe to update vm flags with read
      mmap_sem for this specific case.  So, VM_HUGETLB and VM_PFNMAP will be
      handled by using the optimized path in the following separate patches for
      bisectable sake.
      
      Unmapping uprobe areas may need update mm flags (MMF_RECALC_UPROBES).
      However it is fine to have false-positive MMF_RECALC_UPROBES according to
      uprobes developer.  So, uprobe unmap will not be handled by the regular
      path.
      
      With the "detach vmas first" approach we don't have to re-acquire mmap_sem
      again to clean up vmas to avoid race window which might get the address
      space changed since downgrade_write() doesn't release the lock to lead
      regression, which simply downgrades to read lock.
      
      And, since the lock acquire/release cost is managed to the minimum and
      almost as same as before, the optimization could be extended to any size
      of mapping without incurring significant penalty to small mappings.
      
      For the time being, just do this in munmap syscall path.  Other
      vm_munmap() or do_munmap() call sites (i.e mmap, mremap, etc) remain
      intact due to some implementation difficulties since they acquire write
      mmap_sem from very beginning and hold it until the end, do_munmap() might
      be called in the middle.  But, the optimized do_munmap would like to be
      called without mmap_sem held so that we can do the optimization.  So, if
      we want to do the similar optimization for mmap/mremap path, I'm afraid we
      would have to redesign them.  mremap might be called on very large area
      depending on the usecases, the optimization to it will be considered in
      the future.
      
      This patch (of 3):
      
      When running some mmap/munmap scalability tests with large memory (i.e.
      > 300GB), the below hung task issue may happen occasionally.
      
      INFO: task ps:14018 blocked for more than 120 seconds.
             Tainted: G            E 4.9.79-009.ali3000.alios7.x86_64 #1
       "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
      message.
       ps              D    0 14018      1 0x00000004
        ffff885582f84000 ffff885e8682f000 ffff880972943000 ffff885ebf499bc0
        ffff8828ee120000 ffffc900349bfca8 ffffffff817154d0 0000000000000040
        00ffffff812f872a ffff885ebf499bc0 024000d000948300 ffff880972943000
       Call Trace:
        [<ffffffff817154d0>] ? __schedule+0x250/0x730
        [<ffffffff817159e6>] schedule+0x36/0x80
        [<ffffffff81718560>] rwsem_down_read_failed+0xf0/0x150
        [<ffffffff81390a28>] call_rwsem_down_read_failed+0x18/0x30
        [<ffffffff81717db0>] down_read+0x20/0x40
        [<ffffffff812b9439>] proc_pid_cmdline_read+0xd9/0x4e0
        [<ffffffff81253c95>] ? do_filp_open+0xa5/0x100
        [<ffffffff81241d87>] __vfs_read+0x37/0x150
        [<ffffffff812f824b>] ? security_file_permission+0x9b/0xc0
        [<ffffffff81242266>] vfs_read+0x96/0x130
        [<ffffffff812437b5>] SyS_read+0x55/0xc0
        [<ffffffff8171a6da>] entry_SYSCALL_64_fastpath+0x1a/0xc5
      
      It is because munmap holds mmap_sem exclusively from very beginning to all
      the way down to the end, and doesn't release it in the middle.  When
      unmapping large mapping, it may take long time (take ~18 seconds to unmap
      320GB mapping with every single page mapped on an idle machine).
      
      Zapping pages is the most time consuming part, according to the suggestion
      from Michal Hocko [1], zapping pages can be done with holding read
      mmap_sem, like what MADV_DONTNEED does.  Then re-acquire write mmap_sem to
      cleanup vmas.
      
      But, some part may need write mmap_sem, for example, vma splitting. So,
      the design is as follows:
              acquire write mmap_sem
              lookup vmas (find and split vmas)
              deal with special mappings
              detach vmas
              downgrade_write
      
              zap pages
              free page tables
              release mmap_sem
      
      The vm events with read mmap_sem may come in during page zapping, but
      since vmas have been detached before, they, i.e.  page fault, gup, etc,
      will not be able to find valid vma, then just return SIGSEGV or -EFAULT as
      expected.
      
      If the vma has VM_HUGETLB | VM_PFNMAP, they are considered as special
      mappings.  They will be handled by without downgrading mmap_sem in this
      patch since they may update vm flags.
      
      But, with the "detach vmas first" approach, the vmas have been detached
      when vm flags are updated, so it sounds safe to update vm flags with read
      mmap_sem for this specific case.  So, VM_HUGETLB and VM_PFNMAP will be
      handled by using the optimized path in the following separate patches for
      bisectable sake.
      
      Unmapping uprobe areas may need update mm flags (MMF_RECALC_UPROBES).
      However it is fine to have false-positive MMF_RECALC_UPROBES according to
      uprobes developer.
      
      With the "detach vmas first" approach we don't have to re-acquire mmap_sem
      again to clean up vmas to avoid race window which might get the address
      space changed since downgrade_write() doesn't release the lock to lead
      regression, which simply downgrades to read lock.
      
      And, since the lock acquire/release cost is managed to the minimum and
      almost as same as before, the optimization could be extended to any size
      of mapping without incurring significant penalty to small mappings.
      
      For the time being, just do this in munmap syscall path.  Other
      vm_munmap() or do_munmap() call sites (i.e mmap, mremap, etc) remain
      intact due to some implementation difficulties since they acquire write
      mmap_sem from very beginning and hold it until the end, do_munmap() might
      be called in the middle.  But, the optimized do_munmap would like to be
      called without mmap_sem held so that we can do the optimization.  So, if
      we want to do the similar optimization for mmap/mremap path, I'm afraid we
      would have to redesign them.  mremap might be called on very large area
      depending on the usecases, the optimization to it will be considered in
      the future.
      
      With the patches, exclusive mmap_sem hold time when munmap a 80GB address
      space on a machine with 32 cores of E5-2680 @ 2.70GHz dropped to us level
      from second.
      
      munmap_test-15002 [008]   594.380138: funcgraph_entry: |
      __vm_munmap() {
      munmap_test-15002 [008]   594.380146: funcgraph_entry:      !2485684 us
      |    unmap_region();
      munmap_test-15002 [008]   596.865836: funcgraph_exit:       !2485692 us
      |  }
      
      Here the execution time of unmap_region() is used to evaluate the time of
      holding read mmap_sem, then the remaining time is used with holding
      exclusive lock.
      
      [1] https://lwn.net/Articles/753269/
      
      Link: http://lkml.kernel.org/r/1537376621-51150-2-git-send-email-yang.shi@linux.alibaba.com
      Signed-off-by: default avatarYang Shi &lt;yang.shi@linux.alibaba.com&gt;Suggested-by: Michal Hocko <mhocko@kernel.org>
      Suggested-by: default avatarKirill A. Shutemov <kirill@shutemov.name>
      Suggested-by: default avatarMatthew Wilcox <willy@infradead.org>
      Reviewed-by: default avatarMatthew Wilcox <willy@infradead.org>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dd2283f2
    • Andrey Ryabinin's avatar
      vfree: add debug might_sleep() · a8dda165
      Andrey Ryabinin authored
      
      
      Add might_sleep() call to vfree() to catch potential sleep-in-atomic bugs
      earlier.
      
      [aryabinin@virtuozzo.com: drop might_sleep_if() from kvfree()]
        Link: http://lkml.kernel.org/r/7e19e4df-b1a6-29bd-9ae7-0266d50bef1d@virtuozzo.com
      Link: http://lkml.kernel.org/r/20180914130512.10394-3-aryabinin@virtuozzo.com
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a8dda165
    • Andrey Ryabinin's avatar
      mm/vmalloc.c: improve vfree() kerneldoc · 3ca4ea3a
      Andrey Ryabinin authored
      
      
      vfree() might sleep if called not in interrupt context.  Explain that in
      the comment.
      
      Link: http://lkml.kernel.org/r/20180914130512.10394-2-aryabinin@virtuozzo.com
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3ca4ea3a
    • Andrey Ryabinin's avatar
      kvfree(): fix misleading comment · 52414d33
      Andrey Ryabinin authored
      vfree() might sleep if called not in interrupt context.  So does kvfree()
      too.  Fix misleading kvfree()'s comment about allowed context.
      
      Link: http://lkml.kernel.org/r/20180914130512.10394-1-aryabinin@virtuozzo.com
      Fixes: 04b8e946
      
       ("mm/util.c: improve kvfree() kerneldoc")
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      52414d33
    • zhong jiang's avatar
      mm/mempolicy.c: use match_string() helper to simplify the code · dedf2c73
      zhong jiang authored
      
      
      match_string() returns the index of an array for a matching string, which
      can be used intead of open coded implementation.
      
      Link: http://lkml.kernel.org/r/1536988365-50310-1-git-send-email-zhongjiang@huawei.com
      Signed-off-by: default avatarzhong jiang <zhongjiang@huawei.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dedf2c73
    • YueHaibing's avatar
      mm/swap.c: remove duplicated include · c3df29d1
      YueHaibing authored
      
      
      Remove duplicated include linux/memremap.h
      
      Link: http://lkml.kernel.org/r/20180917131308.16420-1-yuehaibing@huawei.com
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c3df29d1
    • Michal Hocko's avatar
      mm, page_alloc: drop should_suppress_show_mem · 2c029a1e
      Michal Hocko authored
      should_suppress_show_mem() was introduced to reduce the overhead of
      show_mem on large NUMA systems.  Things have changed since then though.
      Namely c78e9363
      
       ("mm: do not walk all of system memory during
      show_mem") has reduced the overhead considerably.
      
      Moreover warn_alloc_show_mem clears SHOW_MEM_FILTER_NODES when called from
      the IRQ context already so we are not printing per node stats.
      
      Remove should_suppress_show_mem because we are losing potentially
      interesting information about allocation failures.  We have seen a bug
      report where system gets unresponsive under memory pressure and there is
      only
      
      kernel: [2032243.696888] qlge 0000:8b:00.1 ql1: Could not get a page chunk, i=8, clean_idx =200 .
      kernel: [2032243.710725] swapper/7: page allocation failure: order:1, mode:0x1084120(GFP_ATOMIC|__GFP_COLD|__GFP_COMP)
      
      without an additional information for debugging.  It would be great to see
      the state of the page allocator at the moment.
      
      Link: http://lkml.kernel.org/r/20180907114334.7088-1-mhocko@kernel.org
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2c029a1e
    • Johannes Weiner's avatar
      mm/memcontrol.c: fix memory.stat item ordering · e9b257ed
      Johannes Weiner authored
      
      
      The refault stats go better with the page fault stats, and are of
      higher interest than the stats on LRU operations. In fact they used to
      be grouped together; when the LRU operation stats were added later on,
      they were wedged in between.
      
      Move them back together. Documentation/admin-guide/cgroup-v2.rst
      already lists them in the right order.
      
      Link: http://lkml.kernel.org/r/20181010140239.GA2527@cmpxchg.org
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e9b257ed
    • Johannes Weiner's avatar
      mm: zero-seek shrinkers · 4b85afbd
      Johannes Weiner authored
      
      
      The page cache and most shrinkable slab caches hold data that has been
      read from disk, but there are some caches that only cache CPU work, such
      as the dentry and inode caches of procfs and sysfs, as well as the subset
      of radix tree nodes that track non-resident page cache.
      
      Currently, all these are shrunk at the same rate: using DEFAULT_SEEKS for
      the shrinker's seeks setting tells the reclaim algorithm that for every
      two page cache pages scanned it should scan one slab object.
      
      This is a bogus setting.  A virtual inode that required no IO to create is
      not twice as valuable as a page cache page; shadow cache entries with
      eviction distances beyond the size of memory aren't either.
      
      In most cases, the behavior in practice is still fine.  Such virtual
      caches don't tend to grow and assert themselves aggressively, and usually
      get picked up before they cause problems.  But there are scenarios where
      that's not true.
      
      Our database workloads suffer from two of those.  For one, their file
      workingset is several times bigger than available memory, which has the
      kernel aggressively create shadow page cache entries for the non-resident
      parts of it.  The workingset code does tell the VM that most of these are
      expendable, but the VM ends up balancing them 2:1 to cache pages as per
      the seeks setting.  This is a huge waste of memory.
      
      These workloads also deal with tens of thousands of open files and use
      /proc for introspection, which ends up growing the proc_inode_cache to
      absurdly large sizes - again at the cost of valuable cache space, which
      isn't a reasonable trade-off, given that proc inodes can be re-created
      without involving the disk.
      
      This patch implements a "zero-seek" setting for shrinkers that results in
      a target ratio of 0:1 between their objects and IO-backed caches.  This
      allows such virtual caches to grow when memory is available (they do
      cache/avoid CPU work after all), but effectively disables them as soon as
      IO-backed objects are under pressure.
      
      It then switches the shrinkers for procfs and sysfs metadata, as well as
      excess page cache shadow nodes, to the new zero-seek setting.
      
      Link: http://lkml.kernel.org/r/20181009184732.762-5-hannes@cmpxchg.org
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: default avatarDomas Mituzas <dmituzas@fb.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarRik van Riel <riel@surriel.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4b85afbd
    • Johannes Weiner's avatar
      mm: workingset: add vmstat counter for shadow nodes · 68d48e6a
      Johannes Weiner authored
      
      
      Make it easier to catch bugs in the shadow node shrinker by adding a
      counter for the shadow nodes in circulation.
      
      [akpm@linux-foundation.org: assert that irqs are disabled, for __inc_lruvec_page_state()]
      [akpm@linux-foundation.org: s/WARN_ON_ONCE/VM_WARN_ON_ONCE/, per Johannes]
      Link: http://lkml.kernel.org/r/20181009184732.762-4-hannes@cmpxchg.org
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      68d48e6a
    • Johannes Weiner's avatar
      mm: workingset: use cheaper __inc_lruvec_state in irqsafe node reclaim · 505802a5
      Johannes Weiner authored
      
      
      No need to use the preemption-safe lruvec state function inside the
      reclaim region that has irqs disabled.
      
      Link: http://lkml.kernel.org/r/20181009184732.762-3-hannes@cmpxchg.org
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarRik van Riel <riel@surriel.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      505802a5
    • Johannes Weiner's avatar
      psi: cgroup support · 2ce7135a
      Johannes Weiner authored
      
      
      On a system that executes multiple cgrouped jobs and independent
      workloads, we don't just care about the health of the overall system, but
      also that of individual jobs, so that we can ensure individual job health,
      fairness between jobs, or prioritize some jobs over others.
      
      This patch implements pressure stall tracking for cgroups.  In kernels
      with CONFIG_PSI=y, cgroup2 groups will have cpu.pressure, memory.pressure,
      and io.pressure files that track aggregate pressure stall times for only
      the tasks inside the cgroup.
      
      Link: http://lkml.kernel.org/r/20180828172258.3185-10-hannes@cmpxchg.org
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: default avatarDaniel Drake <drake@endlessm.com>
      Tested-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Cc: Christopher Lameter <cl@linux.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Johannes Weiner <jweiner@fb.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Enderborg <peter.enderborg@sony.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Vinayak Menon <vinmenon@codeaurora.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2ce7135a
    • Johannes Weiner's avatar
      psi: pressure stall information for CPU, memory, and IO · eb414681
      Johannes Weiner authored
      When systems are overcommitted and resources become contended, it's hard
      to tell exactly the impact this has on workload productivity, or how close
      the system is to lockups and OOM kills.  In particular, when machines work
      multiple jobs concurrently, the impact of overcommit in terms of latency
      and throughput on the individual job can be enormous.
      
      In order to maximize hardware utilization without sacrificing individual
      job health or risk complete machine lockups, this patch implements a way
      to quantify resource pressure in the system.
      
      A kernel built with CONFIG_PSI=y creates files in /proc/pressure/ that
      expose the percentage of time the system is stalled on CPU, memory, or IO,
      respectively.  Stall states are aggregate versions of the per-task delay
      accounting delays:
      
             cpu: some tasks are runnable but not executing on a CPU
             memory: tasks are reclaiming, or waiting for swapin or thrashing cache
             io: tasks are waiting for io compl...
      eb414681
    • Johannes Weiner's avatar
      sched: introduce this_rq_lock_irq() · 246b3b33
      Johannes Weiner authored
      
      
      do_sched_yield() disables IRQs, looks up this_rq() and locks it.  The next
      patch is adding another site with the same pattern, so provide a
      convenience function for it.
      
      Link: http://lkml.kernel.org/r/20180828172258.3185-8-hannes@cmpxchg.org
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Tested-by: default avatarDaniel Drake <drake@endlessm.com>
      Cc: Christopher Lameter <cl@linux.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Johannes Weiner <jweiner@fb.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Enderborg <peter.enderborg@sony.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vinayak Menon <vinmenon@codeaurora.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      246b3b33