Skip to content
  1. Oct 14, 2020
    • Mike Rapoport's avatar
      memblock: reduce number of parameters in for_each_mem_range() · 6e245ad4
      Mike Rapoport authored
      
      
      Currently for_each_mem_range() and for_each_mem_range_rev() iterators are
      the most generic way to traverse memblock regions.  As such, they have 8
      parameters and they are hardly convenient to users.  Most users choose to
      utilize one of their wrappers and the only user that actually needs most
      of the parameters is memblock itself.
      
      To avoid yet another naming for memblock iterators, rename the existing
      for_each_mem_range[_rev]() to __for_each_mem_range[_rev]() and add a new
      for_each_mem_range[_rev]() wrappers with only index, start and end
      parameters.
      
      The new wrapper nicely fits into init_unavailable_mem() and will be used
      in upcoming changes to simplify memblock traversals.
      
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>	[MIPS]
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Daniel Axtens <dja@axtens.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Emil Renner Berthing <kernel@esmil.dk>
      Cc: Hari Bathini <hbathini@linux.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: https://lkml.kernel.org/r/20200818151634.14343-11-rppt@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6e245ad4
    • Mike Rapoport's avatar
      memblock: make memblock_debug and related functionality private · 87c55870
      Mike Rapoport authored
      
      
      The only user of memblock_dbg() outside memblock was s390 setup code and
      it is converted to use pr_debug() instead.  This allows to stop exposing
      memblock_debug and memblock_dbg() to the rest of the kernel.
      
      [akpm@linux-foundation.org: make memblock_dbg() safer and neater]
      
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Daniel Axtens <dja@axtens.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Emil Renner Berthing <kernel@esmil.dk>
      Cc: Hari Bathini <hbathini@linux.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: https://lkml.kernel.org/r/20200818151634.14343-10-rppt@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      87c55870
    • Mike Rapoport's avatar
      memblock: make for_each_memblock_type() iterator private · cd991db8
      Mike Rapoport authored
      
      
      for_each_memblock_type() is not used outside mm/memblock.c, move it there
      from include/linux/memblock.h
      
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Daniel Axtens <dja@axtens.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Emil Renner Berthing <kernel@esmil.dk>
      Cc: Hari Bathini <hbathini@linux.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: https://lkml.kernel.org/r/20200818151634.14343-9-rppt@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cd991db8
    • Mike Rapoport's avatar
      mircoblaze: drop unneeded NUMA and sparsemem initializations · 49645793
      Mike Rapoport authored
      
      
      microblaze does not support neither NUMA not SPARSMEM, so there is no
      point to call memblock_set_node() and
      sparse_memory_present_with_active_regions() functions during microblaze
      memory initialization.
      
      Remove these calls and the surrounding code.
      
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Daniel Axtens <dja@axtens.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Emil Renner Berthing <kernel@esmil.dk>
      Cc: Hari Bathini <hbathini@linux.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: https://lkml.kernel.org/r/20200818151634.14343-8-rppt@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      49645793
    • Mike Rapoport's avatar
      riscv: drop unneeded node initialization · c8e47018
      Mike Rapoport authored
      
      
      RISC-V does not (yet) support NUMA and for UMA architectures node 0 is
      used implicitly during early memory initialization.
      
      There is no need to call memblock_set_node(), remove this call and the
      surrounding code.
      
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Daniel Axtens <dja@axtens.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Emil Renner Berthing <kernel@esmil.dk>
      Cc: Hari Bathini <hbathini@linux.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: https://lkml.kernel.org/r/20200818151634.14343-7-rppt@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c8e47018
    • Mike Rapoport's avatar
      h8300, nds32, openrisc: simplify detection of memory extents · 80c45744
      Mike Rapoport authored
      
      
      Instead of traversing memblock.memory regions to find memory_start and
      memory_end, simply query memblock_{start,end}_of_DRAM().
      
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarStafford Horne <shorne@gmail.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Daniel Axtens <dja@axtens.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Emil Renner Berthing <kernel@esmil.dk>
      Cc: Hari Bathini <hbathini@linux.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: https://lkml.kernel.org/r/20200818151634.14343-6-rppt@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      80c45744
    • Mike Rapoport's avatar
      arm64: numa: simplify dummy_numa_init() · ab8f21aa
      Mike Rapoport authored
      
      
      dummy_numa_init() loops over memblock.memory and passes nid=0 to
      numa_add_memblk() which essentially wraps memblock_set_node().  However,
      memblock_set_node() can cope with entire memory span itself, so the loop
      over memblock.memory regions is redundant.
      
      Using a single call to memblock_set_node() rather than a loop also fixes
      an issue with a buggy ACPI firmware in which the SRAT table covers some
      but not all of the memory in the EFI memory map.
      
      Jonathan Cameron says:
      
        This issue can be easily triggered by having an SRAT table which fails
        to cover all elements of the EFI memory map.
      
        This firmware error is detected and a warning printed. e.g.
        "NUMA: Warning: invalid memblk node 64 [mem 0x240000000-0x27fffffff]"
        At that point we fall back to dummy_numa_init().
      
        However, the failed ACPI init has left us with our memblocks all broken
        up as we split them when trying to assign them to NUMA nodes.
      
        We then iterate over the memblocks and add them to node 0.
      
        numa_add_memblk() calls memblock_set_node() which merges regions that
        were previously split up during the earlier attempt to add them to
        different nodes during parsing of SRAT.
      
        This means elements are moved in the memblock array and we can end up
        in a different memblock after the call to numa_add_memblk().
        Result is:
      
        Unable to handle kernel paging request at virtual address 0000000000003a40
        Mem abort info:
          ESR = 0x96000004
          EC = 0x25: DABT (current EL), IL = 32 bits
          SET = 0, FnV = 0
          EA = 0, S1PTW = 0
        Data abort info:
          ISV = 0, ISS = 0x00000004
          CM = 0, WnR = 0
        [0000000000003a40] user address but active_mm is swapper
        Internal error: Oops: 96000004 [#1] PREEMPT SMP
      
        ...
      
        Call trace:
          sparse_init_nid+0x5c/0x2b0
          sparse_init+0x138/0x170
          bootmem_init+0x80/0xe0
          setup_arch+0x2a0/0x5fc
          start_kernel+0x8c/0x648
      
      Replace the loop with a single call to memblock_set_node() to the entire
      memory.
      
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Daniel Axtens <dja@axtens.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Emil Renner Berthing <kernel@esmil.dk>
      Cc: Hari Bathini <hbathini@linux.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: https://lkml.kernel.org/r/20200818151634.14343-5-rppt@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ab8f21aa
    • Mike Rapoport's avatar
      arm, xtensa: simplify initialization of high memory pages · cddb5ddf
      Mike Rapoport authored
      
      
      free_highpages() in both arm and xtensa essentially open-code
      for_each_free_mem_range() loop to detect high memory pages that were not
      reserved and that should be initialized and passed to the buddy allocator.
      
      Replace open-coded implementation of for_each_free_mem_range() with usage
      of memblock API to simplify the code.
      
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Max Filippov <jcmvbkbc@gmail.com>	[xtensa]
      Reviewed-by: Max Filippov <jcmvbkbc@gmail.com>	[xtensa]
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Daniel Axtens <dja@axtens.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Emil Renner Berthing <kernel@esmil.dk>
      Cc: Hari Bathini <hbathini@linux.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: https://lkml.kernel.org/r/20200818151634.14343-4-rppt@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cddb5ddf
    • Mike Rapoport's avatar
      dma-contiguous: simplify cma_early_percent_memory() · e9aa36cc
      Mike Rapoport authored
      
      
      The memory size calculation in cma_early_percent_memory() traverses
      memblock.memory rather than simply call memblock_phys_mem_size().  The
      comment in that function suggests that at some point there should have
      been call to memblock_analyze() before memblock_phys_mem_size() could be
      used.  As of now, there is no memblock_analyze() at all and
      memblock_phys_mem_size() can be used as soon as cold-plug memory is
      registered with memblock.
      
      Replace loop over memblock.memory with a call to memblock_phys_mem_size().
      
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Daniel Axtens <dja@axtens.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Emil Renner Berthing <kernel@esmil.dk>
      Cc: Hari Bathini <hbathini@linux.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: https://lkml.kernel.org/r/20200818151634.14343-3-rppt@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e9aa36cc
    • Mike Rapoport's avatar
      KVM: PPC: Book3S HV: simplify kvm_cma_reserve() · 04ba0a92
      Mike Rapoport authored
      
      
      Patch series "memblock: seasonal cleaning^w cleanup", v3.
      
      These patches simplify several uses of memblock iterators and hide some of
      the memblock implementation details from the rest of the system.
      
      This patch (of 17):
      
      The memory size calculation in kvm_cma_reserve() traverses memblock.memory
      rather than simply call memblock_phys_mem_size().  The comment in that
      function suggests that at some point there should have been call to
      memblock_analyze() before memblock_phys_mem_size() could be used.  As of
      now, there is no memblock_analyze() at all and memblock_phys_mem_size()
      can be used as soon as cold-plug memory is registered with memblock.
      
      Replace loop over memblock.memory with a call to memblock_phys_mem_size().
      
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Daniel Axtens <dja@axtens.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Emil Renner Berthing <kernel@esmil.dk>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Hari Bathini <hbathini@linux.ibm.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Link: https://lkml.kernel.org/r/20200818151634.14343-1-rppt@kernel.org
      Link: https://lkml.kernel.org/r/20200818151634.14343-2-rppt@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      04ba0a92
    • Miaohe Lin's avatar
      mm/mempool: add 'else' to split mutually exclusive case · 544941d7
      Miaohe Lin authored
      
      
      Add else to split mutually exclusive case and avoid some unnecessary check.
      It doesn't seem to change code generation (compiler is smart), but I think
      it helps readability.
      
      [akpm@linux-foundation.org: fix comment location]
      
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Link: https://lkml.kernel.org/r/20200924111641.28922-1-linmiaohe@huawei.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      544941d7
    • Wei Yang's avatar
      mm: remove unused alloc_page_vma_node() · f8fd5253
      Wei Yang authored
      
      
      No one use this macro anymore.
      
      Also fix code style of policy_node().
      
      Signed-off-by: default avatarWei Yang <richard.weiyang@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Link: https://lkml.kernel.org/r/20200921021401.84508-1-richard.weiyang@linux.alibaba.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f8fd5253
    • Wei Yang's avatar
      mm/mempolicy: remove or narrow the lock on current · 78b132e9
      Wei Yang authored
      
      
      It is not necessary to hold the lock of current when setting nodemask of
      a new policy.
      
      Signed-off-by: default avatarWei Yang <richard.weiyang@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Link: https://lkml.kernel.org/r/20200921040416.86185-1-richard.weiyang@linux.alibaba.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      78b132e9
    • John Hubbard's avatar
      selftests/vm: 8x compaction_test speedup · 11002620
      John Hubbard authored
      
      
      This patch reduces the running time for compaction_test from about 27 sec,
      to 3.3 sec, which is about an 8x speedup.
      
      These numbers are for an Intel x86_64 system with 32 GB of DRAM.
      
      The compaction_test.c program was spending most of its time doing mmap(),
      1 MB at a time, on about 25 GB of memory.
      
      Instead, do the mmaps 100 MB at a time.  (Going past 100 MB doesn't make
      things go much faster, because other parts of the program are using the
      remaining time.)
      
      Signed-off-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarSri Jayaramappa <sjayaram@akamai.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Link: https://lkml.kernel.org/r/20201002080621.551044-2-jhubbard@nvidia.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      11002620
    • Mateusz Nosek's avatar
      include/linux/compaction.h: clean code by removing unused enum value · 74c9da4e
      Mateusz Nosek authored
      
      
      The enum value 'COMPACT_INACTIVE' is never used so can be removed.
      
      Signed-off-by: default avatarMateusz Nosek <mateusznosek0@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Link: https://lkml.kernel.org/r/20200917110750.12015-1-mateusznosek0@gmail.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      74c9da4e
    • Mateusz Nosek's avatar
      mm/compaction.c: micro-optimization remove unnecessary branch · 62b35fe0
      Mateusz Nosek authored
      
      
      The same code can work both for 'zone->compact_considered > defer_limit'
      and 'zone->compact_considered >= defer_limit'.  In the latter there is one
      branch less which is more effective considering performance.
      
      Signed-off-by: default avatarMateusz Nosek <mateusznosek0@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Link: https://lkml.kernel.org/r/20200913190448.28649-1-mateusznosek0@gmail.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      62b35fe0
    • Xiang Chen's avatar
      mm/zbud: remove redundant initialization · 18601294
      Xiang Chen authored
      
      
      zhdr is already initialized in the front of the function, so remove
      redundant initialization here.
      
      Signed-off-by: default avatarXiang Chen <chenxiang66@hisilicon.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Link: https://lkml.kernel.org/r/1600419885-191907-1-git-send-email-chenxiang66@hisilicon.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      18601294
    • Hui Su's avatar
      mm/z3fold.c: use xx_zalloc instead xx_alloc and memset · f94afee9
      Hui Su authored
      
      
      alloc_slots() allocates memory for slots using kmem_cache_alloc(), then
      memsets it.  We can just use kmem_cache_zalloc().
      
      Signed-off-by: default avatarHui Su <sh_def@163.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Link: https://lkml.kernel.org/r/20200926100834.GA184671@rlk
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f94afee9
    • Hui Su's avatar
      mm/vmscan: fix comments for isolate_lru_page() · 01c4776b
      Hui Su authored
      
      
      fix comments for isolate_lru_page():
      s/fundamentnal/fundamental
      
      Signed-off-by: default avatarHui Su <sh_def@163.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Link: https://lkml.kernel.org/r/20200927173923.GA8058@rlk
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      01c4776b
    • Chunxin Zang's avatar
      mm/vmscan: fix infinite loop in drop_slab_node · 069c411d
      Chunxin Zang authored
      
      
      We have observed that drop_caches can take a considerable amount of
      time (<put data here>).  Especially when there are many memcgs involved
      because they are adding an additional overhead.
      
      It is quite unfortunate that the operation cannot be interrupted by a
      signal currently.  Add a check for fatal signals into the main loop so
      that userspace can control early bailout.
      
      There are two reasons:
      
      1. We have too many memcgs, even though one object freed in one memcg,
         the sum of object is bigger than 10.
      
      2. We spend a lot of time in traverse memcg once.  So, the memcg who
         traversed at the first have been freed many objects.  Traverse memcg
         next time, the freed count bigger than 10 again.
      
      We can get the following info through 'ps':
      
        root:~# ps -aux | grep drop
        root  357956 ... R    Aug25 21119854:55 echo 3 > /proc/sys/vm/drop_caches
        root 1771385 ... R    Aug16 21146421:17 echo 3 > /proc/sys/vm/drop_caches
        root 1986319 ... R    18:56 117:27 echo 3 > /proc/sys/vm/drop_caches
        root 2002148 ... R    Aug24 5720:39 echo 3 > /proc/sys/vm/drop_caches
        root 2564666 ... R    18:59 113:58 echo 3 > /proc/sys/vm/drop_caches
        root 2639347 ... R    Sep03 2383:39 echo 3 > /proc/sys/vm/drop_caches
        root 3904747 ... R    03:35 993:31 echo 3 > /proc/sys/vm/drop_caches
        root 4016780 ... R    Aug21 7882:18 echo 3 > /proc/sys/vm/drop_caches
      
      Use bpftrace follow 'freed' value in drop_slab_node:
      
        root:~# bpftrace -e 'kprobe:drop_slab_node+70 {@ret=hist(reg("bp")); }'
        Attaching 1 probe...
        ^B^C
      
        @ret:
        [64, 128)        1 |                                                    |
        [128, 256)      28 |                                                    |
        [256, 512)     107 |@                                                   |
        [512, 1K)      298 |@@@                                                 |
        [1K, 2K)       613 |@@@@@@@                                             |
        [2K, 4K)      4435 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
        [4K, 8K)       442 |@@@@@                                               |
        [8K, 16K)      299 |@@@                                                 |
        [16K, 32K)     100 |@                                                   |
        [32K, 64K)     139 |@                                                   |
        [64K, 128K)     56 |                                                    |
        [128K, 256K)    26 |                                                    |
        [256K, 512K)     2 |                                                    |
      
      In the while loop, we can check whether the TASK_KILLABLE signal is set,
      if so, we should break the loop.
      
      Signed-off-by: default avatarChunxin Zang <zangchunxin@bytedance.com>
      Signed-off-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarChris Down <chris@chrisdown.name>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Matthew Wilcox <willy@infradead.org>
      Link: https://lkml.kernel.org/r/20200909152047.27905-1-zangchunxin@bytedance.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      069c411d
    • Mike Kravetz's avatar
      hugetlb: add lockdep check for i_mmap_rwsem held in huge_pmd_share · 0bf7b64e
      Mike Kravetz authored
      As a debugging aid, huge_pmd_share should make sure i_mmap_rwsem is held
      if necessary.  To clarify the 'if necessary', expand the comment block at
      the beginning of huge_pmd_share.
      
      No functional change.  The added i_mmap_assert_locked() call is only
      enabled if CONFIG_LOCKDEP.
      
      Ideally, this should have been included with commit 34ae204f
      
      
      ("hugetlbfs: remove call to huge_pte_alloc without i_mmap_rwsem").
      
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Link: https://lkml.kernel.org/r/20200911201248.88537-1-mike.kravetz@oracle.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0bf7b64e
    • Wei Yang's avatar
      mm/hugetlb: take the free hpage during the iteration directly · 6664bfc8
      Wei Yang authored
      
      
      Function dequeue_huge_page_node_exact() iterates the free list and return
      the first valid free hpage.
      
      Instead of break and check the loop variant, we could return in the loop
      directly.  This could reduce some redundant check.
      
      [mike.kravetz@oracle.com: points out a logic error]
      [richard.weiyang@linux.alibaba.com: v4]
        Link: https://lkml.kernel.org/r/20200901014636.29737-8-richard.weiyang@linux.alibaba.com
      
      Signed-off-by: default avatarWei Yang <richard.weiyang@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Link: https://lkml.kernel.org/r/20200831022351.20916-8-richard.weiyang@linux.alibaba.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6664bfc8
    • Wei Yang's avatar
      mm/hugetlb: narrow the hugetlb_lock protection area during preparing huge page · 2f37511c
      Wei Yang authored
      
      
      set_hugetlb_cgroup_[rsvd] just manipulate page local data, which is not
      necessary to be protected by hugetlb_lock.
      
      Let's take this out.
      
      Signed-off-by: default avatarWei Yang <richard.weiyang@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Link: https://lkml.kernel.org/r/20200831022351.20916-7-richard.weiyang@linux.alibaba.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2f37511c
    • Wei Yang's avatar
      mm/hugetlb: a page from buddy is not on any list · 15a8d68e
      Wei Yang authored
      
      
      The page allocated from buddy is not on any list, so just use list_add()
      is enough.
      
      Signed-off-by: default avatarWei Yang <richard.weiyang@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Link: https://lkml.kernel.org/r/20200831022351.20916-6-richard.weiyang@linux.alibaba.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      15a8d68e
    • Wei Yang's avatar
      mm/hugetlb: count file_region to be added when regions_needed != NULL · 972a3da3
      Wei Yang authored
      
      
      There are only two cases of function add_reservation_in_range()
      
          * count file_region and return the number in regions_needed
          * do the real list operation without counting
      
      This means it is not necessary to have two parameters to classify these
      two cases.
      
      Just use regions_needed to separate them.
      
      Signed-off-by: default avatarWei Yang <richard.weiyang@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Link: https://lkml.kernel.org/r/20200831022351.20916-5-richard.weiyang@linux.alibaba.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      972a3da3
    • Wei Yang's avatar
      mm/hugetlb: use list_splice to merge two list at once · d3ec7b6e
      Wei Yang authored
      
      
      Instead of add allocated file_region one by one to region_cache, we could
      use list_splice to merge two list at once.
      
      Also we know the number of entries in the list, increase the number
      directly.
      
      Signed-off-by: default avatarWei Yang <richard.weiyang@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Link: https://lkml.kernel.org/r/20200831022351.20916-4-richard.weiyang@linux.alibaba.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d3ec7b6e
    • Wei Yang's avatar
      mm/hugetlb: remove VM_BUG_ON(!nrg) in get_file_region_entry_from_cache() · a1ddc2e8
      Wei Yang authored
      
      
      We are sure to get a valid file_region, otherwise the
      VM_BUG_ON(resv->region_cache_count <= 0) at the very beginning would be
      triggered.
      
      Let's remove the redundant one.
      
      Signed-off-by: default avatarWei Yang <richard.weiyang@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Link: https://lkml.kernel.org/r/20200831022351.20916-3-richard.weiyang@linux.alibaba.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a1ddc2e8
    • Wei Yang's avatar
      mm/hugetlb: not necessary to coalesce regions recursively · 7db5e7b6
      Wei Yang authored
      
      
      Patch series "mm/hugetlb: code refine and simplification", v4.
      
      Following are some cleanups for hugetlb.  Simple testing with
      tools/testing/selftests/vm/map_hugetlb passes.
      
      This patch (of 7):
      
      Per my understanding, we keep the regions ordered and would always
      coalesce regions properly.  So the task to keep this property is just to
      coalesce its neighbour.
      
      Let's simplify this.
      
      Signed-off-by: default avatarWei Yang <richard.weiyang@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Link: https://lkml.kernel.org/r/20200901014636.29737-1-richard.weiyang@linux.alibaba.com
      Link: https://lkml.kernel.org/r/20200831022351.20916-1-richard.weiyang@linux.alibaba.com
      Link: https://lkml.kernel.org/r/20200831022351.20916-2-richard.weiyang@linux.alibaba.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7db5e7b6
    • Baoquan He's avatar
      doc/vm: fix typo in the hugetlb admin documentation · 540809be
      Baoquan He authored
      
      
      Change 'pecify' to 'Specify'.
      
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Link: https://lkml.kernel.org/r/20200723032248.24772-4-bhe@redhat.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      540809be
    • Baoquan He's avatar
      mm/hugetlb.c: remove the unnecessary non_swap_entry() · d79d176a
      Baoquan He authored
      
      
      If a swap entry tests positive for either is_[migration|hwpoison]_entry(),
      then its swap_type() is among SWP_MIGRATION_READ, SWP_MIGRATION_WRITE and
      SWP_HWPOISON.  All these types >= MAX_SWAPFILES, exactly what is asserted
      with non_swap_entry().
      
      So the checking non_swap_entry() in is_hugetlb_entry_migration() and
      is_hugetlb_entry_hwpoisoned() is redundant.
      
      Let's remove it to optimize code.
      
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Link: https://lkml.kernel.org/r/20200723032248.24772-3-bhe@redhat.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d79d176a
    • Baoquan He's avatar
      mm/hugetlb.c: make is_hugetlb_entry_hwpoisoned return bool · 3e5c3600
      Baoquan He authored
      
      
      Patch series "mm/hugetlb: Small cleanup and improvement", v2.
      
      This patch (of 3):
      
      Just like its neighbour is_hugetlb_entry_migration() has done.
      
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Link: https://lkml.kernel.org/r/20200723032248.24772-1-bhe@redhat.com
      Link: https://lkml.kernel.org/r/20200723032248.24772-2-bhe@redhat.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3e5c3600
    • Michal Hocko's avatar
      include/linux/gfp.h: clarify usage of GFP_ATOMIC in !preemptible contexts · ab00db21
      Michal Hocko authored
      
      
      There is a general understanding that GFP_ATOMIC/GFP_NOWAIT are to be used
      from atomic contexts.  E.g.  from within a spin lock or from the IRQ
      context.  This is correct but there are some atomic contexts where the
      above doesn't hold.  One of them would be an NMI context.  Page allocator
      has never supported that and the general fear of this context didn't let
      anybody to actually even try to use the allocator there.  Good, but let's
      be more specific about that.
      
      Another such a context, and that is where people seem to be more daring,
      is raw_spin_lock.  Mostly because it simply resembles regular spin lock
      which is supported by the allocator and there is not any implementation
      difference with !RT kernels in the first place.  Be explicit that such a
      context is not supported by the allocator.  The underlying reason is that
      zone->lock would have to become raw_spin_lock as well and that has turned
      out to be a problem for RT
      (http://lkml.kernel.org/r/87mu305c1w.fsf@nanos.tec.linutronix.de).
      
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarUladzislau Rezki <urezki@gmail.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Link: https://lkml.kernel.org/r/20200929123010.5137-1-mhocko@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ab00db21
    • Matthew Wilcox (Oracle)'s avatar
      mm/page_alloc.c: fix freeing non-compound pages · e320d301
      Matthew Wilcox (Oracle) authored
      Here is a very rare race which leaks memory:
      
      Page P0 is allocated to the page cache.  Page P1 is free.
      
      Thread A                Thread B                Thread C
      find_get_entry():
      xas_load() returns P0
      						Removes P0 from page cache
      						P0 finds its buddy P1
      			alloc_pages(GFP_KERNEL, 1) returns P0
      			P0 has refcount 1
      page_cache_get_speculative(P0)
      P0 has refcount 2
      			__free_pages(P0)
      			P0 has refcount 1
      put_page(P0)
      P1 is not freed
      
      Fix this by freeing all the pages in __free_pages() that won't be freed
      by the call to put_page().  It's usually not a good idea to split a page,
      but this is a very unlikely scenario.
      
      Fixes: e286781d
      
       ("mm: speculative page references")
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Cc: Nick Piggin <npiggin@gmail.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20200926213919.26642-1-willy@infradead.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e320d301
    • Ralph Campbell's avatar
      mm: move call to compound_head() in release_pages() · a9b576f7
      Ralph Campbell authored
      
      
      The function is_huge_zero_page() doesn't call compound_head() to make sure
      the page pointer is a head page. The call to is_huge_zero_page() in
      release_pages() is made before compound_head() is called so the test would
      fail if release_pages() was called with a tail page of the huge_zero_page
      and put_page_testzero() would be called releasing the page.
      This is unlikely to be happening in normal use or we would be seeing all
      sorts of process data corruption when accessing a THP zero page.
      
      Looking at other places where is_huge_zero_page() is called, all seem to
      only pass a head page so I think the right solution is to move the call
      to compound_head() in release_pages() to a point before calling
      is_huge_zero_page().
      
      Signed-off-by: default avatarRalph Campbell <rcampbell@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Link: https://lkml.kernel.org/r/20200917173938.16420-1-rcampbell@nvidia.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a9b576f7
    • Mateusz Nosek's avatar
      mmzone: clean code by removing unused macro parameter · 30d8ec73
      Mateusz Nosek authored
      
      
      Previously 'for_next_zone_zonelist_nodemask' macro parameter 'zlist' was
      unused so this patch removes it.
      
      Signed-off-by: default avatarMateusz Nosek <mateusznosek0@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Link: https://lkml.kernel.org/r/20200917211906.30059-1-mateusznosek0@gmail.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      30d8ec73
    • Yanfei Xu's avatar
      mm/page_alloc.c: __perform_reclaim should return 'unsigned long' · 2187e17b
      Yanfei Xu authored
      
      
      __perform_reclaim()'s single caller expects it to return 'unsigned long',
      hence change its return value and a local variable to 'unsigned long'.
      
      Suggested-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarYanfei Xu <yanfei.xu@windriver.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Link: https://lkml.kernel.org/r/20200916022138.16740-1-yanfei.xu@windriver.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2187e17b
    • Mateusz Nosek's avatar
      mm/page_alloc.c: clean code by merging two functions · a0622d05
      Mateusz Nosek authored
      
      
      finalise_ac() is just 'epilogue' for 'prepare_alloc_pages'.  Therefore
      there is no need to keep them both so 'finalise_ac' content can be merged
      into prepare_alloc_pages() code.  It would make __alloc_pages_nodemask()
      cleaner when it comes to readability.
      
      Signed-off-by: default avatarMateusz Nosek <mateusznosek0@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Rapoport <rppt@kernel.org>
      Link: https://lkml.kernel.org/r/20200916110118.6537-1-mateusznosek0@gmail.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a0622d05
    • Mateusz Nosek's avatar
      mm/page_alloc.c: fix early params garbage value accesses · fdd4fa1c
      Mateusz Nosek authored
      
      
      Previously in '__init early_init_on_alloc' and '__init early_init_on_free'
      the return values from 'kstrtobool' were not handled properly.  That
      caused potential garbage value read from variable 'bool_result'.
      Introduced patch fixes error handling.
      
      Signed-off-by: default avatarMateusz Nosek <mateusznosek0@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Link: https://lkml.kernel.org/r/20200916214125.28271-1-mateusznosek0@gmail.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fdd4fa1c
    • Mateusz Nosek's avatar
      mm/page_alloc.c: micro-optimization remove unnecessary branch · cfb4a541
      Mateusz Nosek authored
      
      
      Previously flags check was separated into two separated checks with two
      separated branches. In case of presence of any of two mentioned flags,
      the same effect on flow occurs. Therefore checks can be merged and one
      branch can be avoided.
      
      Signed-off-by: default avatarMateusz Nosek <mateusznosek0@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Link: https://lkml.kernel.org/r/20200911092310.31136-1-mateusznosek0@gmail.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cfb4a541
    • Mateusz Nosek's avatar
      mm/page_alloc.c: clean code by removing unnecessary initialization · b630749f
      Mateusz Nosek authored
      
      
      Previously variable 'tmp' was initialized, but was not read later before
      reassigning.  So the initialization can be removed.
      
      [akpm@linux-foundation.org: remove `tmp' altogether]
      
      Signed-off-by: default avatarMateusz Nosek <mateusznosek0@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Link: https://lkml.kernel.org/r/20200904132422.17387-1-mateusznosek0@gmail.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b630749f