Skip to content
  1. Apr 11, 2020
    • Eric Biggers's avatar
      selftests: kmod: test disabling module autoloading · 23756e55
      Eric Biggers authored
      
      
      Test that request_module() fails with -ENOENT when
      /proc/sys/kernel/modprobe contains (a) a nonexistent path, and (b) an
      empty path.
      
      Case (b) is a regression test for the patch "kmod: make request_module()
      return an error when autoloading is disabled".
      
      Tested with 'kmod.sh -t 0010 && kmod.sh -t 0011', and also simply with
      'kmod.sh' to run all kmod tests.
      
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jeff Vander Stoep <jeffv@google.com>
      Cc: Jessica Yu <jeyu@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: NeilBrown <neilb@suse.com>
      Link: http://lkml.kernel.org/r/20200312202552.241885-5-ebiggers@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      23756e55
    • Eric Biggers's avatar
      selftests: kmod: fix handling test numbers above 9 · 6d573a07
      Eric Biggers authored
      
      
      get_test_count() and get_test_enabled() were broken for test numbers
      above 9 due to awk interpreting a field specification like '$0010' as
      octal rather than decimal.  Fix it by stripping the leading zeroes.
      
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jeff Vander Stoep <jeffv@google.com>
      Cc: Jessica Yu <jeyu@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: NeilBrown <neilb@suse.com>
      Link: http://lkml.kernel.org/r/20200318230515.171692-5-ebiggers@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6d573a07
    • Eric Biggers's avatar
      docs: admin-guide: document the kernel.modprobe sysctl · 6e715825
      Eric Biggers authored
      
      
      Document the kernel.modprobe sysctl in the same place that all the other
      kernel.* sysctls are documented.  Make sure to mention how to use this
      sysctl to completely disable module autoloading, and how this sysctl
      relates to CONFIG_STATIC_USERMODEHELPER.
      
      [ebiggers@google.com: v5]
        Link: http://lkml.kernel.org/r/20200318230515.171692-4-ebiggers@kernel.org
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jeff Vander Stoep <jeffv@google.com>
      Cc: Jessica Yu <jeyu@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: NeilBrown <neilb@suse.com>
      Link: http://lkml.kernel.org/r/20200312202552.241885-4-ebiggers@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6e715825
    • Eric Biggers's avatar
      fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once() · 26c5d78c
      Eric Biggers authored
      After request_module(), nothing is stopping the module from being
      unloaded until someone takes a reference to it via try_get_module().
      
      The WARN_ONCE() in get_fs_type() is thus user-reachable, via userspace
      running 'rmmod' concurrently.
      
      Since WARN_ONCE() is for kernel bugs only, not for user-reachable
      situations, downgrade this warning to pr_warn_once().
      
      Keep it printed once only, since the intent of this warning is to detect
      a bug in modprobe at boot time.  Printing the warning more than once
      wouldn't really provide any useful extra information.
      
      Fixes: 41124db8
      
       ("fs: warn in case userspace lied about modprobe return")
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarJessica Yu <jeyu@kernel.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jeff Vander Stoep <jeffv@google.com>
      Cc: Jessica Yu <jeyu@kernel.org...
      26c5d78c
    • Eric Biggers's avatar
      kmod: make request_module() return an error when autoloading is disabled · d7d27cfc
      Eric Biggers authored
      
      
      Patch series "module autoloading fixes and cleanups", v5.
      
      This series fixes a bug where request_module() was reporting success to
      kernel code when module autoloading had been completely disabled via
      'echo > /proc/sys/kernel/modprobe'.
      
      It also addresses the issues raised on the original thread
      (https://lkml.kernel.org/lkml/20200310223731.126894-1-ebiggers@kernel.org/T/#u)
      bydocumenting the modprobe sysctl, adding a self-test for the empty path
      case, and downgrading a user-reachable WARN_ONCE().
      
      This patch (of 4):
      
      It's long been possible to disable kernel module autoloading completely
      (while still allowing manual module insertion) by setting
      /proc/sys/kernel/modprobe to the empty string.
      
      This can be preferable to setting it to a nonexistent file since it
      avoids the overhead of an attempted execve(), avoids potential
      deadlocks, and avoids the call to security_kernel_module_request() and
      thus on SELinux-based systems eliminates the need to write SELinux rules
      to dontaudit module_request.
      
      However, when module autoloading is disabled in this way,
      request_module() returns 0.  This is broken because callers expect 0 to
      mean that the module was successfully loaded.
      
      Apparently this was never noticed because this method of disabling
      module autoloading isn't used much, and also most callers don't use the
      return value of request_module() since it's always necessary to check
      whether the module registered its functionality or not anyway.
      
      But improperly returning 0 can indeed confuse a few callers, for example
      get_fs_type() in fs/filesystems.c where it causes a WARNING to be hit:
      
      	if (!fs && (request_module("fs-%.*s", len, name) == 0)) {
      		fs = __get_fs_type(name, len);
      		WARN_ONCE(!fs, "request_module fs-%.*s succeeded, but still no fs?\n", len, name);
      	}
      
      This is easily reproduced with:
      
      	echo > /proc/sys/kernel/modprobe
      	mount -t NONEXISTENT none /
      
      It causes:
      
      	request_module fs-NONEXISTENT succeeded, but still no fs?
      	WARNING: CPU: 1 PID: 1106 at fs/filesystems.c:275 get_fs_type+0xd6/0xf0
      	[...]
      
      This should actually use pr_warn_once() rather than WARN_ONCE(), since
      it's also user-reachable if userspace immediately unloads the module.
      Regardless, request_module() should correctly return an error when it
      fails.  So let's make it return -ENOENT, which matches the error when
      the modprobe binary doesn't exist.
      
      I've also sent patches to document and test this case.
      
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarJessica Yu <jeyu@kernel.org>
      Acked-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jeff Vander Stoep <jeffv@google.com>
      Cc: Ben Hutchings <benh@debian.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20200310223731.126894-1-ebiggers@kernel.org
      Link: http://lkml.kernel.org/r/20200312202552.241885-1-ebiggers@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d7d27cfc
    • Logan Gunthorpe's avatar
      mm/memremap: set caching mode for PCI P2PDMA memory to WC · a50d8d98
      Logan Gunthorpe authored
      
      
      PCI BAR IO memory should never be mapped as WB, however prior to this
      the PAT bits were set WB and it was typically overridden by MTRR
      registers set by the firmware.
      
      Set PCI P2PDMA memory to be UC as this is what it currently, typically,
      ends up being mapped as on x86 after the MTRR registers override the
      cache setting.
      
      Future use-cases may need to generalize this by adding flags to select
      the caching type, as some P2PDMA cases may not want UC.  However, those
      use-cases are not upstream yet and this can be changed when they arrive.
      
      Signed-off-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Eric Badger <ebadger@gigaio.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Link: http://lkml.kernel.org/r/20200306170846.9333-8-logang@deltatee.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a50d8d98
    • Logan Gunthorpe's avatar
      mm/memory_hotplug: add pgprot_t to mhp_params · bfeb022f
      Logan Gunthorpe authored
      
      
      devm_memremap_pages() is currently used by the PCI P2PDMA code to create
      struct page mappings for IO memory.  At present, these mappings are
      created with PAGE_KERNEL which implies setting the PAT bits to be WB.
      However, on x86, an mtrr register will typically override this and force
      the cache type to be UC-.  In the case firmware doesn't set this
      register it is effectively WB and will typically result in a machine
      check exception when it's accessed.
      
      Other arches are not currently likely to function correctly seeing they
      don't have any MTRR registers to fall back on.
      
      To solve this, provide a way to specify the pgprot value explicitly to
      arch_add_memory().
      
      Of the arches that support MEMORY_HOTPLUG: x86_64, and arm64 need a
      simple change to pass the pgprot_t down to their respective functions
      which set up the page tables.  For x86_32, set the page tables
      explicitly using _set_memory_prot() (seeing they are already mapped).
      
      For ia64, s390 and sh, reject anything but PAGE_KERNEL settings -- this
      should be fine, for now, seeing these architectures don't support
      ZONE_DEVICE.
      
      A check in __add_pages() is also added to ensure the pgprot parameter
      was set for all arches.
      
      Signed-off-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Badger <ebadger@gigaio.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Link: http://lkml.kernel.org/r/20200306170846.9333-7-logang@deltatee.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bfeb022f
    • Logan Gunthorpe's avatar
      powerpc/mm: thread pgprot_t through create_section_mapping() · 4e00c5af
      Logan Gunthorpe authored
      
      
      In prepartion to support a pgprot_t argument for arch_add_memory().
      
      Signed-off-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Eric Badger <ebadger@gigaio.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Link: http://lkml.kernel.org/r/20200306170846.9333-6-logang@deltatee.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4e00c5af
    • Logan Gunthorpe's avatar
      x86/mm: introduce __set_memory_prot() · 30796e18
      Logan Gunthorpe authored
      
      
      For use in the 32bit arch_add_memory() to set the pgprot type of the
      memory to add.
      
      Signed-off-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Eric Badger <ebadger@gigaio.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Will Deacon <will@kernel.org>
      Link: http://lkml.kernel.org/r/20200306170846.9333-5-logang@deltatee.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      30796e18
    • Logan Gunthorpe's avatar
      x86/mm: thread pgprot_t through init_memory_mapping() · c164fbb4
      Logan Gunthorpe authored
      
      
      In preparation to support a pgprot_t argument for arch_add_memory().
      
      It's required to move the prototype of init_memory_mapping() seeing the
      original location came before the definition of pgprot_t.
      
      Signed-off-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Eric Badger <ebadger@gigaio.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Will Deacon <will@kernel.org>
      Link: http://lkml.kernel.org/r/20200306170846.9333-4-logang@deltatee.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c164fbb4
    • Logan Gunthorpe's avatar
      mm/memory_hotplug: rename mhp_restrictions to mhp_params · f5637d3b
      Logan Gunthorpe authored
      
      
      The mhp_restrictions struct really doesn't specify anything resembling a
      restriction anymore so rename it to be mhp_params as it is a list of
      extended parameters.
      
      Signed-off-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Badger <ebadger@gigaio.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Link: http://lkml.kernel.org/r/20200306170846.9333-3-logang@deltatee.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f5637d3b
    • Logan Gunthorpe's avatar
      mm/memory_hotplug: drop the flags field from struct mhp_restrictions · 96c6b598
      Logan Gunthorpe authored
      
      
      Patch series "Allow setting caching mode in arch_add_memory() for
      P2PDMA", v4.
      
      Currently, the page tables created using memremap_pages() are always
      created with the PAGE_KERNEL cacheing mode.  However, the P2PDMA code is
      creating pages for PCI BAR memory which should never be accessed through
      the cache and instead use either WC or UC.  This still works in most
      cases, on x86, because the MTRR registers typically override the caching
      settings in the page tables for all of the IO memory to be UC-.
      However, this tends not to work so well on other arches or some rare x86
      machines that have firmware which does not setup the MTRR registers in
      this way.
      
      Instead of this, this series proposes a change to arch_add_memory() to
      take the pgprot required by the mapping which allows us to explicitly
      set pagetable entries for P2PDMA memory to UC.
      
      This changes is pretty routine for most of the arches: x86_64, arm64 and
      powerpc simply need to thread the pgprot through to where the page
      tables are setup.  x86_32 unfortunately sets up the page tables at boot
      so must use _set_memory_prot() to change their caching mode.  ia64, s390
      and sh don't appear to have an easy way to change the page tables so,
      for now at least, we just return -EINVAL on such mappings and thus they
      will not support P2PDMA memory until the work for this is done.  This
      should be fine as they don't yet support ZONE_DEVICE.
      
      This patch (of 7):
      
      This variable is not used anywhere and should therefore be removed from
      the structure.
      
      Signed-off-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Eric Badger <ebadger@gigaio.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Link: http://lkml.kernel.org/r/20200306170846.9333-2-logang@deltatee.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      96c6b598
    • Anshuman Khandual's avatar
      mm/special: create generic fallbacks for pte_special() and pte_mkspecial() · 78e7c5af
      Anshuman Khandual authored
      
      
      Currently there are many platforms that dont enable ARCH_HAS_PTE_SPECIAL
      but required to define quite similar fallback stubs for special page
      table entry helpers such as pte_special() and pte_mkspecial(), as they
      get build in generic MM without a config check.  This creates two
      generic fallback stub definitions for these helpers, eliminating much
      code duplication.
      
      mips platform has a special case where pte_special() and pte_mkspecial()
      visibility is wider than what ARCH_HAS_PTE_SPECIAL enablement requires.
      This restricts those symbol visibility in order to avoid redefinitions
      which is now exposed through this new generic stubs and subsequent build
      failure.  arm platform set_pte_at() definition needs to be moved into a
      C file just to prevent a build failure.
      
      [anshuman.khandual@arm.com: use defined(CONFIG_ARCH_HAS_PTE_SPECIAL) in mips per Thomas]
        Link: http://lkml.kernel.org/r/1583851924-21603-1-git-send-email-anshuman.khandual@arm.com
      Signed-off-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: Guo Ren <guoren@kernel.org>			[csky]
      Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
      Acked-by: Stafford Horne <shorne@gmail.com>		[openrisc]
      Acked-by: Helge Deller <deller@gmx.de>			[parisc]
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Sam Creasey <sammy@sammy.net>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Burton <paulburton@kernel.org>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Link: http://lkml.kernel.org/r/1583802551-15406-1-git-send-email-anshuman.khandual@arm.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      78e7c5af
    • Anshuman Khandual's avatar
      mm/vma: introduce VM_ACCESS_FLAGS · 6cb4d9a2
      Anshuman Khandual authored
      
      
      There are many places where all basic VMA access flags (read, write,
      exec) are initialized or checked against as a group.  One such example
      is during page fault.  Existing vma_is_accessible() wrapper already
      creates the notion of VMA accessibility as a group access permissions.
      
      Hence lets just create VM_ACCESS_FLAGS (VM_READ|VM_WRITE|VM_EXEC) which
      will not only reduce code duplication but also extend the VMA
      accessibility concept in general.
      
      Signed-off-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Rob Springer <rspringer@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Link: http://lkml.kernel.org/r/1583391014-8170-3-git-send-email-anshuman.khandual@arm.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6cb4d9a2
    • Anshuman Khandual's avatar
      mm/vma: define a default value for VM_DATA_DEFAULT_FLAGS · c62da0c3
      Anshuman Khandual authored
      
      
      There are many platforms with exact same value for VM_DATA_DEFAULT_FLAGS
      This creates a default value for VM_DATA_DEFAULT_FLAGS in line with the
      existing VM_STACK_DEFAULT_FLAGS.  While here, also define some more
      macros with standard VMA access flag combinations that are used
      frequently across many platforms.  Apart from simplification, this
      reduces code duplication as well.
      
      Signed-off-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Burton <paulburton@kernel.org>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Chris Zankel <chris@zankel.net>
      Link: http://lkml.kernel.org/r/1583391014-8170-2-git-send-email-anshuman.khandual@arm.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c62da0c3
    • Arjun Roy's avatar
      mm/memory.c: add vm_insert_pages() · 8cd3984d
      Arjun Roy authored
      
      
      Add the ability to insert multiple pages at once to a user VM with lower
      PTE spinlock operations.
      
      The intention of this patch-set is to reduce atomic ops for tcp zerocopy
      receives, which normally hits the same spinlock multiple times
      consecutively.
      
      [akpm@linux-foundation.org: pte_alloc() no longer takes the `addr' argument]
      [arjunroy@google.com: add missing page_count() check to vm_insert_pages()]
        Link: http://lkml.kernel.org/r/20200214005929.104481-1-arjunroy.kdev@gmail.com
      [arjunroy@google.com: vm_insert_pages() checks if pte_index defined]
        Link: http://lkml.kernel.org/r/20200228054714.204424-2-arjunroy.kdev@gmail.com
      Signed-off-by: default avatarArjun Roy <arjunroy@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Link: http://lkml.kernel.org/r/20200128025958.43490-2-arjunroy.kdev@gmail.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8cd3984d
    • Arjun Roy's avatar
      mm: define pte_index as macro for x86 · c97078bd
      Arjun Roy authored
      
      
      pte_index() is either defined as a macro (e.g.  sparc64) or as an
      inlined function (e.g.  x86).  vm_insert_pages() depends on pte_index
      but it is not defined on all platforms (e.g.  m68k).
      
      To fix compilation of vm_insert_pages() on architectures not providing
      pte_index(), we perform the following fix:
      
      0. For platforms where it is meaningful, and defined as a macro, no
          change is needed.
      1. For platforms where it is meaningful and defined as an inlined
          function, and we want to use it with vm_insert_pages(), we define
          a degenerate macro of the form:  #define pte_index pte_index
      2. vm_insert_pages() checks for the existence of a pte_index macro
         definition. If found, it implements a batched insert. If not found,
         it devolves to calling vm_insert_page() in a loop.
      
      This patch implements step 1 for x86.
      
      v3 of this patch fixes a compilation warning for an unused method.
      v2 of this patch moved a macro definition to a more readable location.
      
      Signed-off-by: default avatarArjun Roy <arjunroy@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Link: http://lkml.kernel.org/r/20200228054714.204424-1-arjunroy.kdev@gmail.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c97078bd
    • Arjun Roy's avatar
      mm: bring sparc pte_index() semantics inline with other platforms · 251a0ffe
      Arjun Roy authored
      
      
      pte_index() on platforms other than sparc return a numerical index.  On
      sparc, it returns a pte_t*.  This presents an issue for
      vm_insert_pages(), which relies on pte_index() to find the offset for a
      pte within a pmd, for batched inserts.
      
      This patch:
      1. Modifies pte_index() for sparc to return a numerical index, like
         other platforms,
      2. Defines pte_entry() for sparc which returns a pte_t*
         (as pte_index() used to),
      3. Converts existing sparc callers for pte_index() to use pte_entry().
      
      [sfr@canb.auug.org.au: remove pte_entry and just directly modified pte_offset_kernel instead]
      Signed-off-by: default avatarArjun Roy <arjunroy@google.com>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Arjun Roy <arjunroy.kdev@gmail.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Link: http://lkml.kernel.org/r/20200227105045.6b421d9f@canb.auug.org.au
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      251a0ffe
    • Arjun Roy's avatar
      mm/memory.c: refactor insert_page to prepare for batched-lock insert · 8efd6f5b
      Arjun Roy authored
      
      
      Add helper methods for vm_insert_page()/insert_page() to prepare for
      vm_insert_pages(), which batch-inserts pages to reduce spinlock
      operations when inserting multiple consecutive pages into the user page
      table.
      
      The intention of this patch-set is to reduce atomic ops for tcp zerocopy
      receives, which normally hits the same spinlock multiple times
      consecutively.
      
      Signed-off-by: default avatarArjun Roy <arjunroy@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Link: http://lkml.kernel.org/r/20200128025958.43490-1-arjunroy.kdev@gmail.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8efd6f5b
    • Jaewon Kim's avatar
      mm/mmap.c: initialize align_offset explicitly for vm_unmapped_area · 09ef5283
      Jaewon Kim authored
      On passing requirement to vm_unmapped_area, arch_get_unmapped_area and
      arch_get_unmapped_area_topdown did not set align_offset.  Internally on
      both unmapped_area and unmapped_area_topdown, if info->align_mask is 0,
      then info->align_offset was meaningless.
      
      But commit df529cab
      
       ("mm: mmap: add trace point of
      vm_unmapped_area") always prints info->align_offset even though it is
      uninitialized.
      
      Fix this uninitialized value issue by setting it to 0 explicitly.
      
      Before:
        vm_unmapped_area: addr=0x755b155000 err=0 total_vm=0x15aaf0 flags=0x1 len=0x109000 lo=0x8000 hi=0x75eed48000 mask=0x0 ofs=0x4022
      
      After:
        vm_unmapped_area: addr=0x74a4ca1000 err=0 total_vm=0x168ab1 flags=0x1 len=0x9000 lo=0x8000 hi=0x753d94b000 mask=0x0 ofs=0x0
      
      Signed-off-by: default avatarJaewon Kim <jaewon31.kim@samsung.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Borislav Petkov <bp@suse.de>
      Link: http://lkml.kernel.org/r/20200409094035.19457-1-jaewon31.kim@samsung.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      09ef5283
    • Roman Gushchin's avatar
      mm: hugetlb: optionally allocate gigantic hugepages using cma · cf11e85f
      Roman Gushchin authored
      Commit 944d9fec
      
       ("hugetlb: add support for gigantic page allocation
      at runtime") has added the run-time allocation of gigantic pages.
      
      However it actually works only at early stages of the system loading,
      when the majority of memory is free.  After some time the memory gets
      fragmented by non-movable pages, so the chances to find a contiguous 1GB
      block are getting close to zero.  Even dropping caches manually doesn't
      help a lot.
      
      At large scale rebooting servers in order to allocate gigantic hugepages
      is quite expensive and complex.  At the same time keeping some constant
      percentage of memory in reserved hugepages even if the workload isn't
      using it is a big waste: not all workloads can benefit from using 1 GB
      pages.
      
      The following solution can solve the problem:
      1) On boot time a dedicated cma area* is reserved. The size is passed
         as a kernel argument.
      2) Run-time allocations of gigantic hugepages are performed using the
         cma allocator and the dedicated cma area
      
      In this case gigantic hugepages can be allocated successfully with a
      high probability, however the memory isn't completely wasted if nobody
      is using 1GB hugepages: it can be used for pagecache, anon memory, THPs,
      etc.
      
      * On a multi-node machine a per-node cma area is allocated on each node.
        Following gigantic hugetlb allocation are using the first available
        numa node if the mask isn't specified by a user.
      
      Usage:
      1) configure the kernel to allocate a cma area for hugetlb allocations:
         pass hugetlb_cma=10G as a kernel argument
      
      2) allocate hugetlb pages as usual, e.g.
         echo 10 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
      
      If the option isn't enabled or the allocation of the cma area failed,
      the current behavior of the system is preserved.
      
      x86 and arm-64 are covered by this patch, other architectures can be
      trivially added later.
      
      The patch contains clean-ups and fixes proposed and implemented by Aslan
      Bakirov and Randy Dunlap.  It also contains ideas and suggestions
      proposed by Rik van Riel, Michal Hocko and Mike Kravetz.  Thanks!
      
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: default avatarAndreas Schaufler <andreas.schaufler@gmx.de>
      Acked-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: default avatarMichal Hocko <mhocko@kernel.org>
      Cc: Aslan Bakirov <aslan@fb.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Link: http://lkml.kernel.org/r/20200407163840.92263-3-guro@fb.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cf11e85f
    • Aslan Bakirov's avatar
      mm: cma: NUMA node interface · 8676af1f
      Aslan Bakirov authored
      
      
      I've noticed that there is no interface exposed by CMA which would let
      me to declare contigous memory on particular NUMA node.
      
      This patchset adds the ability to try to allocate contiguous memory on a
      specific node.  It will fallback to other nodes if the specified one
      doesn't work.
      
      Implement a new method for declaring contigous memory on particular node
      and keep cma_declare_contiguous() as a wrapper.
      
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: default avatarAslan Bakirov <aslan@fb.com>
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarMichal Hocko <mhocko@kernel.org>
      Cc: Andreas Schaufler <andreas.schaufler@gmx.de>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Link: http://lkml.kernel.org/r/20200407163840.92263-2-guro@fb.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8676af1f
    • Changwei Ge's avatar
      ocfs2: no need try to truncate file beyond i_size · 783fda85
      Changwei Ge authored
      
      
      Linux fallocate(2) with FALLOC_FL_PUNCH_HOLE mode set, its offset can
      exceed the inode size.  Ocfs2 now doesn't allow that offset beyond inode
      size.  This restriction is not necessary and violates fallocate(2)
      semantics.
      
      If fallocate(2) offset is beyond inode size, just return success and do
      nothing further.
      
      Otherwise, ocfs2 will crash the kernel.
      
        kernel BUG at fs/ocfs2//alloc.c:7264!
         ocfs2_truncate_inline+0x20f/0x360 [ocfs2]
         ocfs2_remove_inode_range+0x23c/0xcb0 [ocfs2]
         __ocfs2_change_file_space+0x4a5/0x650 [ocfs2]
         ocfs2_fallocate+0x83/0xa0 [ocfs2]
         vfs_fallocate+0x148/0x230
         SyS_fallocate+0x48/0x80
         do_syscall_64+0x79/0x170
      
      Signed-off-by: default avatarChangwei Ge <chge@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20200407082754.17565-1-chge@linux.alibaba.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      783fda85
    • Jason Yan's avatar
      mm/page_alloc: make pcpu_drain_mutex and pcpu_drain static · 8b885f53
      Jason Yan authored
      
      
      Fix the following sparse warning:
      
        mm/page_alloc.c:106:1: warning: symbol 'pcpu_drain_mutex' was not declared. Should it be static?
        mm/page_alloc.c:107:1: warning: symbol '__pcpu_scope_pcpu_drain' was not declared. Should it be static?
      
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarJason Yan <yanaijie@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20200407023925.46438-1-yanaijie@huawei.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8b885f53
    • Randy Dunlap's avatar
      mm/page_alloc.c: fix kernel-doc warning · e6a0a7ad
      Randy Dunlap authored
      
      
      Add description of function parameter 'mt' to fix kernel-doc warning:
      
        mm/page_alloc.c:3246: warning: Function parameter or member 'mt' not described in '__putback_isolated_page'
      
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarPankaj Gupta <pankaj.gupta.linux@gmail.com>
      Link: http://lkml.kernel.org/r/02998bd4-0b82-2f15-2570-f86130304d1e@infradead.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e6a0a7ad
    • Mauro Carvalho Chehab's avatar
      docs: mm: slab.h: fix a broken cross-reference · 2370ae4b
      Mauro Carvalho Chehab authored
      
      
      There is a typo at the cross-reference link, causing this warning:
      
        include/linux/slab.h:11: WARNING: undefined label: memory-allocation (if the link has no caption the label must precede a section header)
      
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+huawei@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Link: http://lkml.kernel.org/r/0aeac24235d356ebd935d11e147dcc6edbb6465c.1586359676.git.mchehab+huawei@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2370ae4b
    • Qiujun Huang's avatar
      mm, slab_common: fix a typo in comment "eariler"->"earlier" · b991cee5
      Qiujun Huang authored
      
      
      There is a typo in comment, fix it.
      s/eariler/earlier/
      
      Signed-off-by: default avatarQiujun Huang <hqjagain@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Link: http://lkml.kernel.org/r/20200405160544.1246-1-hqjagain@gmail.com
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b991cee5
    • Jakub Kicinski's avatar
      mm, memcg: do not high throttle allocators based on wraparound · 9b8b1754
      Jakub Kicinski authored
      If a cgroup violates its memory.high constraints, we may end up unduly
      penalising it.  For example, for the following hierarchy:
      
        A:   max high, 20 usage
        A/B: 9 high, 10 usage
        A/C: max high, 10 usage
      
      We would end up doing the following calculation below when calculating
      high delay for A/B:
      
        A/B: 10 - 9 = 1...
        A:   20 - PAGE_COUNTER_MAX = 21, so set max_overage to 21.
      
      This gets worse with higher disparities in usage in the parent.
      
      I have no idea how this disappeared from the final version of the patch,
      but it is certainly Not Good(tm).  This wasn't obvious in testing because,
      for a simple cgroup hierarchy with only one child, the result is usually
      roughly the same.  It's only in more complex hierarchies that things go
      really awry (although still, the effects are limited to a maximum of 2
      seconds in schedule_timeout_killable at a maximum).
      
      [chris@chrisdown.name: changelog]
      Fixes: e26733e0
      
       ("mm, memcg: throttle allocators based on ancestral memory.high")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarChris Down <chris@chrisdown.name>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>	[5.4.x]
      Link: http://lkml.kernel.org/r/20200331152424.GA1019937@chrisdown.name
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9b8b1754
    • Simon Gander's avatar
      hfsplus: fix crash and filesystem corruption when deleting files · 25efb2ff
      Simon Gander authored
      When removing files containing extended attributes, the hfsplus driver may
      remove the wrong entries from the attributes b-tree, causing major
      filesystem damage and in some cases even kernel crashes.
      
      To remove a file, all its extended attributes have to be removed as well.
      The driver does this by looking up all keys in the attributes b-tree with
      the cnid of the file.  Each of these entries then gets deleted using the
      key used for searching, which doesn't contain the attribute's name when it
      should.  Since the key doesn't contain the name, the deletion routine will
      not find the correct entry and instead remove the one in front of it.  If
      parent nodes have to be modified, these become corrupt as well.  This
      causes invalid links and unsorted entries that not even macOS's fsck_hfs
      is able to fix.
      
      To fix this, modify the search key before an entry is deleted from the
      attributes b-tree by copying the found entry's key into the search key,
      therefo...
      25efb2ff
  2. Apr 10, 2020
    • Linus Torvalds's avatar
      Merge tag 'modules-for-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux · c0cc2711
      Linus Torvalds authored
      Pull module updates from Jessica Yu:
       "Only a small cleanup this time around: a trivial conversion of
        zero-length arrays to flexible arrays"
      
      * tag 'modules-for-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
        kernel: module: Replace zero-length array with flexible-array member
      c0cc2711
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 87ebc45d
      Linus Torvalds authored
      Pull arm64 fixes from Catalin Marinas:
      
       - Ensure that the compiler and linker versions are aligned so that ld
         doesn't complain about not understanding a .note.gnu.property section
         (emitted when pointer authentication is enabled).
      
       - Force -mbranch-protection=none when the feature is not enabled, in
         case a compiler may choose a different default value.
      
       - Remove CONFIG_DEBUG_ALIGN_RODATA. It was never in defconfig and
         rarely enabled.
      
       - Fix checking 16-bit Thumb-2 instructions checking mask in the
         emulation of the SETEND instruction (it could match the bottom half
         of a 32-bit Thumb-2 instruction).
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: armv8_deprecated: Fix undef_hook mask for thumb setend
        arm64: remove CONFIG_DEBUG_ALIGN_RODATA feature
        arm64: Always force a branch protection mode when the compiler has one
        arm64: Kconfig: ptrauth: Add binutils version check to fix mismatch
        init/kconfig: Add LD_VERSION Kconfig
      87ebc45d
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · e4da01d8
      Linus Torvalds authored
      Pull more powerpc updates from Michael Ellerman:
       "The bulk of this is the series to make CONFIG_COMPAT user-selectable,
        it's been around for a long time but was blocked behind the
        syscall-in-C series.
      
        Plus there's also a few fixes and other minor things.
      
        Summary:
      
         - A fix for a crash in machine check handling on pseries (ie. guests)
      
         - A small series to make it possible to disable CONFIG_COMPAT, and
           turn it off by default for ppc64le where it's not used.
      
         - A few other miscellaneous fixes and small improvements.
      
        Thanks to: Alexey Kardashevskiy, Anju T Sudhakar, Arnd Bergmann,
        Christophe Leroy, Dan Carpenter, Ganesh Goudar, Geert Uytterhoeven,
        Geoff Levand, Mahesh Salgaonkar, Markus Elfring, Michal Suchanek,
        Nicholas Piggin, Stephen Boyd, Wen Xiong"
      
      * tag 'powerpc-5.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        selftests/powerpc: Always build the tm-poison test 64-bit
        powerpc: Improve ppc_save_regs()
        Revert "powerpc/64: irq_work avoid interrupt when called with hardware irqs enabled"
        powerpc/time: Replace <linux/clk-provider.h> by <linux/of_clk.h>
        powerpc/pseries/ddw: Extend upper limit for huge DMA window for persistent memory
        powerpc/perf: split callchain.c by bitness
        powerpc/64: Make COMPAT user-selectable disabled on littleendian by default.
        powerpc/64: make buildable without CONFIG_COMPAT
        powerpc/perf: consolidate valid_user_sp -> invalid_user_sp
        powerpc/perf: consolidate read_user_stack_32
        powerpc: move common register copy functions from signal_32.c to signal.c
        powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro
        powerpc/ps3: Set CONFIG_UEVENT_HELPER=y in ps3_defconfig
        powerpc/ps3: Remove an unneeded NULL check
        powerpc/ps3: Remove duplicate error message
        powerpc/powernv: Re-enable imc trace-mode in kernel
        powerpc/perf: Implement a global lock to avoid races between trace, core and thread imc events.
        powerpc/pseries: Fix MCE handling on pseries
        selftests/eeh: Skip ahci adapters
        powerpc/64s: Fix doorbell wakeup msgclr optimisation
      e4da01d8
    • Linus Torvalds's avatar
      Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu · 6cff4821
      Linus Torvalds authored
      Pull m68knommu update from Greg Ungerer:
       "Only a single commit, to remove all use of the obsolete setup_irq()
        calls within the m68knommu architecture code"
      
      * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
        m68k: Replace setup_irq() by request_irq()
      6cff4821
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · eab40026
      Linus Torvalds authored
      Pull RISC-V updates from Palmer Dabbelt:
       "This contains a handful of new features:
      
         - Partial support for the Kendryte K210.
      
           There are still a few outstanding issues that I have patches for,
           but I don't actually have a board to test them so they're not
           included yet.
      
         - SBI v0.2 support.
      
         - Fixes to support for building with LLVM-based toolchains. The
           resulting images are known not to boot yet.
      
        I don't anticipate a part two, but I'll probably have something early
        in the RCs to finish up the K210 support"
      
      * tag 'riscv-for-linus-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (38 commits)
        riscv: create a loader.bin boot image for Kendryte SoC
        riscv: Kendryte K210 default config
        riscv: Add Kendryte K210 device tree
        riscv: Select required drivers for Kendryte SOC
        riscv: Add Kendryte K210 SoC support
        riscv: Add SOC early init support
        riscv: Unaligned load/store handling for M_MODE
        RISC-V: Support cpu hotplug
        RISC-V: Add supported for ordered booting method using HSM
        RISC-V: Add SBI HSM extension definitions
        RISC-V: Export SBI error to linux error mapping function
        RISC-V: Add cpu_ops and modify default booting method
        RISC-V: Move relocate and few other functions out of __init
        RISC-V: Implement new SBI v0.2 extensions
        RISC-V: Introduce a new config for SBI v0.1
        RISC-V: Add SBI v0.2 extension definitions
        RISC-V: Add basic support for SBI v0.2
        RISC-V: Mark existing SBI as 0.1 SBI.
        riscv: Use macro definition instead of magic number
        riscv: Add support to dump the kernel page tables
        ...
      eab40026
  3. Apr 09, 2020
    • Linus Torvalds's avatar
      Merge tag '9p-for-5.7-2' of git://github.com/martinetd/linux · 5d30bcac
      Linus Torvalds authored
      Pull 9p documentation update from Dominique Martinet:
       "Document the new O_NONBLOCK short read behavior"
      
      * tag '9p-for-5.7-2' of git://github.com/martinetd/linux:
        9p: document short read behaviour with O_NONBLOCK
      5d30bcac
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-5.7-rc1' of git://github.com/ceph/ceph-client · fcc95f06
      Linus Torvalds authored
      Pull ceph updates from Ilya Dryomov:
       "The main items are:
      
         - support for asynchronous create and unlink (Jeff Layton).
      
           Creates and unlinks are satisfied locally, without waiting for a
           reply from the MDS, provided the client has been granted
           appropriate caps (new in v15.y.z ("Octopus") release). This can be
           a big help for metadata heavy workloads such as tar and rsync.
           Opt-in with the new nowsync mount option.
      
         - multiple blk-mq queues for rbd (Hannes Reinecke and myself).
      
           When the driver was converted to blk-mq, we settled on a single
           blk-mq queue because of a global lock in libceph and some other
           technical debt. These have since been addressed, so allocate a
           queue per CPU to enhance parallelism.
      
         - don't hold onto caps that aren't actually needed (Zheng Yan).
      
           This has been our long-standing behavior, but it causes issues with
           some active/standby applications (synchronous I/O, stalls if the
           standby goes down, etc).
      
         - .snap directory timestamps consistent with ceph-fuse (Luis
           Henriques)"
      
      * tag 'ceph-for-5.7-rc1' of git://github.com/ceph/ceph-client: (49 commits)
        ceph: fix snapshot directory timestamps
        ceph: wait for async creating inode before requesting new max size
        ceph: don't skip updating wanted caps when cap is stale
        ceph: request new max size only when there is auth cap
        ceph: cleanup return error of try_get_cap_refs()
        ceph: return ceph_mdsc_do_request() errors from __get_parent()
        ceph: check all mds' caps after page writeback
        ceph: update i_requested_max_size only when sending cap msg to auth mds
        ceph: simplify calling of ceph_get_fmode()
        ceph: remove delay check logic from ceph_check_caps()
        ceph: consider inode's last read/write when calculating wanted caps
        ceph: always renew caps if mds_wanted is insufficient
        ceph: update dentry lease for async create
        ceph: attempt to do async create when possible
        ceph: cache layout in parent dir on first sync create
        ceph: add new MDS req field to hold delegated inode number
        ceph: decode interval_sets for delegated inos
        ceph: make ceph_fill_inode non-static
        ceph: perform asynchronous unlink if we have sufficient caps
        ceph: don't take refs to want mask unless we have all bits
        ...
      fcc95f06
    • Linus Torvalds's avatar
      Merge tag 'ovl-update-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs · c6b80eb8
      Linus Torvalds authored
      Pull overlayfs update from Miklos Szeredi:
      
       - Fix failure to copy-up files from certain NFSv4 mounts
      
       - Sort out inconsistencies between st_ino and i_ino (used in /proc/locks)
      
       - Allow consistent (POSIX-y) inode numbering in more cases
      
       - Allow virtiofs to be used as upper layer
      
       - Miscellaneous cleanups and fixes
      
      * tag 'ovl-update-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
        ovl: document xino expected behavior
        ovl: enable xino automatically in more cases
        ovl: avoid possible inode number collisions with xino=on
        ovl: use a private non-persistent ino pool
        ovl: fix WARN_ON nlink drop to zero
        ovl: fix a typo in comment
        ovl: replace zero-length array with flexible-array member
        ovl: ovl_obtain_alias(): don't call d_instantiate_anon() for old
        ovl: strict upper fs requirements for remote upper fs
        ovl: check if upper fs supports RENAME_WHITEOUT
        ovl: allow remote upper
        ovl: decide if revalidate needed on a per-dentry basis
        ovl: separate detection of remote upper layer from stacked overlay
        ovl: restructure dentry revalidation
        ovl: ignore failure to copy up unknown xattrs
        ovl: document permission model
        ovl: simplify i_ino initialization
        ovl: factor out helper ovl_get_root()
        ovl: fix out of date comment and unreachable code
        ovl: fix value of i_ino for lower hardlink corner case
      c6b80eb8
    • Linus Torvalds's avatar
      Merge tag 'iomap-5.7-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 9744b923
      Linus Torvalds authored
      Pull iomap fix from Darrick Wong:
       "Fix a problem in readahead where we can crash if we can't allocate a
        full bio due to GFP_NORETRY"
      
      * tag 'iomap-5.7-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        iomap: Handle memory allocation failure in readahead
      9744b923
    • Linus Torvalds's avatar
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · d8fc9cde
      Linus Torvalds authored
      Pull crypto fixes from Herbert Xu:
       "This fixes a Kconfig dependency for hisilicon as well as a double free
        in marvell/octeontx"
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: marvell/octeontx - fix double free of ptr
        crypto: hisilicon - Fix build error
      d8fc9cde
    • Linus Torvalds's avatar
      Merge tag 'linux-watchdog-5.7-rc1' of git://www.linux-watchdog.org/linux-watchdog · 5602b0af
      Linus Torvalds authored
      Pull watchdog updates from Wim Van Sebroeck:
      
       - add TI K3 RTI watchdog
      
       - add stop_on_reboot parameter to control reboot policy
      
       - wm831x_wdt: Remove GPIO handling
      
       - several small fixes, improvements and clean-ups
      
      * tag 'linux-watchdog-5.7-rc1' of git://www.linux-watchdog.org/linux-watchdog:
        watchdog: Add K3 RTI watchdog support
        dt-bindings: watchdog: Add support for TI K3 RTI watchdog
        watchdog: ziirave_wdt: change name to be more specific
        watchdog: orion: use 0 for unset heartbeat
        watchdog: npcm: remove whitespaces
        watchdog: reset last_hw_keepalive time at start
        watchdog: imx2_wdt: Drop .remove callback
        watchdog: Add stop_on_reboot parameter to control reboot policy
        watchdog: wm831x_wdt: Remove GPIO handling
        watchdog: imx7ulp: Remove unused include of init.h
        watchdog: imx_sc_wdt: Remove unused includes
        watchdog: qcom: Use irq flags from firmware
        watchdog: pm8916_wdt: Add system sleep callbacks
        watchdog: qcom-wdt: disable pretimeout on timer platform
      5602b0af