Skip to content
  1. Feb 27, 2021
    • David Hildenbrand's avatar
      Documentation: sysfs/memory: clarify some memory block device properties · a89107c0
      David Hildenbrand authored
      In commit 53cdc1cb
      
       ("drivers/base/memory.c: indicate all memory blocks
      as removable") we changed the output of the "removable" property of memory
      devices to return "1" if and only if the kernel supports memory offlining.
      
      Let's update documentation, stating that the interface is legacy.  Also
      update documentation of the "state" property and "valid_zones" properties.
      
      Link: https://lkml.kernel.org/r/20210201181347.13262-3-david@redhat.com
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Ilya Dryomov <idryomov@gmail.com>
      Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
      Cc: Geert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a89107c0
    • David Hildenbrand's avatar
      drivers/base/memory: don't store phys_device in memory blocks · e9a2e48e
      David Hildenbrand authored
      No need to store the value for each and every memory block, as we can
      easily query the value at runtime.  Reshuffle the members to optimize the
      memory layout.  Also, let's clarify what the interface once was used for
      and why it's legacy nowadays.
      
      "phys_device" was used on s390x in older versions of lsmem[2]/chmem[3],
      back when they were still part of s390x-tools.  They were later replaced
      by the variants in linux-utils.  For example, RHEL6 and RHEL7 contain
      lsmem/chmem from s390-utils.  RHEL8 switched to versions from util-linux
      on s390x [4].
      
      "phys_device" was added with sysfs support for memory hotplug in commit
      3947be19 ("[PATCH] memory hotplug: sysfs and add/remove functions") in
      2005.  It always returned 0.
      
      s390x started returning something != 0 on some setups (if sclp.rzm is set
      by HW) in 2010 via commit 57b552ba ("memory hotplug/s390: set
      phys_device").
      
      For s390x, it allowed for identifying which memory block devices belong to
      the same storage increment (RZM).  Only if all memory block devices
      comprising a single storage increment were offline, the memory could
      actually be removed in the hypervisor.
      
      Since commit e5d709bb
      
       ("s390/memory hotplug: provide
      memory_block_size_bytes() function") in 2013 a memory block device spans
      at least one storage increment - which is why the interface isn't really
      helpful/used anymore (except by old lsmem/chmem tools).
      
      There were once RFC patches to make use of "phys_device" in ACPI context;
      however, the underlying problem could be solved using different interfaces
      [1].
      
      [1] https://patchwork.kernel.org/patch/2163871/
      [2] https://github.com/ibm-s390-tools/s390-tools/blob/v2.1.0/zconf/lsmem
      [3] https://github.com/ibm-s390-tools/s390-tools/blob/v2.1.0/zconf/chmem
      [4] https://bugzilla.redhat.com/show_bug.cgi?id=1504134
      
      Link: https://lkml.kernel.org/r/20210201181347.13262-2-david@redhat.com
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
      Cc: Ilya Dryomov <idryomov@gmail.com>
      Cc: Vaibhav Jain <vaibhav@linux.ibm.com>
      Cc: Tom Rix <trix@redhat.com>
      Cc: Geert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e9a2e48e
    • Miaohe Lin's avatar
      mm/memory_hotplug: use helper function zone_end_pfn() to get end_pfn · 6c922cf7
      Miaohe Lin authored
      Commit 108bcc96
      
       ("mm: add & use zone_end_pfn() and zone_spans_pfn()")
      introduced the helper zone_end_pfn() to calculate the zone end pfn.  But
      update_pgdat_span() forgot to use it.
      
      Use this helper and rename local variable zone_end_pfn to end_pfn to avoid
      a naming conflict with the existing zone_end_pfn().
      
      Link: https://lkml.kernel.org/r/20210127093211.37714-1-linmiaohe@huawei.com
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6c922cf7
    • David Hildenbrand's avatar
      mm/memory_hotplug: MEMHP_MERGE_RESOURCE -> MHP_MERGE_RESOURCE · 26011267
      David Hildenbrand authored
      
      
      Let's make "MEMHP_MERGE_RESOURCE" consistent with "MHP_NONE", "mhp_t" and
      "mhp_flags".  As discussed recently [1], "mhp" is our internal acronym for
      memory hotplug now.
      
      [1] https://lore.kernel.org/linux-mm/c37de2d0-28a1-4f7d-f944-cfd7d81c334d@redhat.com/
      
      Link: https://lkml.kernel.org/r/20210126115829.10909-1-david@redhat.com
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Acked-by: default avatarWei Liu <wei.liu@kernel.org>
      Reviewed-by: default avatarPankaj Gupta <pankaj.gupta@cloud.ionos.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      26011267
    • Anshuman Khandual's avatar
      mm/memory_hotplug: rename all existing 'memhp' into 'mhp' · 1adf8b46
      Anshuman Khandual authored
      
      
      This renames all 'memhp' instances to 'mhp' except for memhp_default_state
      for being a kernel command line option.  This is just a clean up and
      should not cause a functional change.  Let's make it consistent rater than
      mixing the two prefixes.  In preparation for more users of the 'mhp'
      terminology.
      
      Link: https://lkml.kernel.org/r/1611554093-27316-1-git-send-email-anshuman.khandual@arm.com
      Signed-off-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Suggested-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1adf8b46
    • Dan Williams's avatar
      mm: fix memory_failure() handling of dax-namespace metadata · 34dc45be
      Dan Williams authored
      Given 'struct dev_pagemap' spans both data pages and metadata pages be
      careful to consult the altmap if present to delineate metadata.  In fact
      the pfn_first() helper already identifies the first valid data pfn, so
      export that helper for other code paths via pgmap_pfn_valid().
      
      Other usage of get_dev_pagemap() are not a concern because those are
      operating on known data pfns having been looked up by get_user_pages().
      I.e.  metadata pfns are never user mapped.
      
      Link: https://lkml.kernel.org/r/161058501758.1840162.4239831989762604527.stgit@dwillia2-desk3.amr.corp.intel.com
      Fixes: 6100e34b
      
       ("mm, memory_failure: Teach memory_failure() about dev_pagemap pages")
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reported-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Qian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      34dc45be
    • Dan Williams's avatar
      mm: teach pfn_to_online_page() about ZONE_DEVICE section collisions · 1f90a347
      Dan Williams authored
      While pfn_to_online_page() is able to determine pfn_valid() at subsection
      granularity it is not able to reliably determine if a given pfn is also
      online if the section is mixes ZONE_{NORMAL,MOVABLE} with ZONE_DEVICE.
      This means that pfn_to_online_page() may return invalid @page objects.
      For example with a memory map like:
      
      100000000-1fbffffff : System RAM
        142000000-143002e16 : Kernel code
        143200000-143713fff : Kernel rodata
        143800000-143b15b7f : Kernel data
        144227000-144ffffff : Kernel bss
      1fc000000-2fbffffff : Persistent Memory (legacy)
        1fc000000-2fbffffff : namespace0.0
      
      This command:
      
      echo 0x1fc000000 > /sys/devices/system/memory/soft_offline_page
      
      ...succeeds when it should fail.  When it succeeds it touches an
      uninitialized page and may crash or cause other damage (see
      dissolve_free_huge_page()).
      
      While the memory map above is contrived via the memmap=ss!nn kernel
      command line option, the collision happens in practice on shipping
      platforms.  The memory controller resources that decode spans of physical
      address space are a limited resource.  One technique platform-firmware
      uses to conserve those resources is to share a decoder across 2 devices to
      keep the address range contiguous.  Unfortunately the unit of operation of
      a decoder is 64MiB while the Linux section size is 128MiB.  This results
      in situations where, without subsection hotplug memory mappings with
      different lifetimes collide into one object that can only express one
      lifetime.
      
      Update move_pfn_range_to_zone() to flag (SECTION_TAINT_ZONE_DEVICE) a
      section that mixes ZONE_DEVICE pfns with other online pfns.  With
      SECTION_TAINT_ZONE_DEVICE to delineate, pfn_to_online_page() can fall back
      to a slow-path check for ZONE_DEVICE pfns in an online section.  In the
      fast path online_section() for a full ZONE_DEVICE section returns false.
      
      Because the collision case is rare, and for simplicity, the
      SECTION_TAINT_ZONE_DEVICE flag is never cleared once set.
      
      [dan.j.williams@intel.com: fix CONFIG_ZONE_DEVICE=n build]
        Link: https://lkml.kernel.org/r/CAPcyv4iX+7LAgAeSqx7Zw-Zd=ZV9gBv8Bo7oTbwCOOqJoZ3+Yg@mail.gmail.com
      
      Link: https://lkml.kernel.org/r/161058500675.1840162.7887862152161279354.stgit@dwillia2-desk3.amr.corp.intel.com
      Fixes: ba72b4c8
      
       ("mm/sparsemem: support sub-section hotplug")
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reported-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Qian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1f90a347
    • Dan Williams's avatar
      mm: teach pfn_to_online_page() to consider subsection validity · 9f9b02e5
      Dan Williams authored
      pfn_to_online_page is primarily used to filter out offline or fully
      uninitialized pages.  pfn_valid resp.  online_section_nr have a coarse
      per memory section granularity.  If a section shared with a partially
      offline memory (e.g.  part of ZONE_DEVICE) then pfn_to_online_page
      would lead to a false positive on some pfns.  Fix this by adding
      pfn_section_valid check which is subsection aware.
      
      [mhocko@kernel.org: changelog rewrite]
      
      Link: https://lkml.kernel.org/r/161058500148.1840162.4365921007820501696.stgit@dwillia2-desk3.amr.corp.intel.com
      Fixes: b13bc351
      
       ("mm/hotplug: invalid PFNs from pfn_to_online_page()")
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reported-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9f9b02e5
    • Dan Williams's avatar
      mm: move pfn_to_online_page() out of line · 9f605f26
      Dan Williams authored
      
      
      Patch series "mm: Fix pfn_to_online_page() with respect to ZONE_DEVICE", v4.
      
      A pfn-walker that uses pfn_to_online_page() may inadvertently translate a
      pfn as online and in the page allocator, when it is offline managed by a
      ZONE_DEVICE mapping (details in Patch 3: ("mm: Teach pfn_to_online_page()
      about ZONE_DEVICE section collisions")).
      
      The 2 proposals under consideration are teach pfn_to_online_page() to be
      precise in the presence of mixed-zone sections, or teach the memory-add
      code to drop the System RAM associated with ZONE_DEVICE collisions.  In
      order to not regress memory capacity by a few 10s to 100s of MiB the
      approach taken in this set is to add precision to pfn_to_online_page().
      
      In the course of validating pfn_to_online_page() a couple other fixes
      fell out:
      
      1/ soft_offline_page() fails to drop the reference taken in the
         madvise(..., MADV_SOFT_OFFLINE) case.
      
      2/ memory_failure() uses get_dev_pagemap() to lookup ZONE_DEVICE pages,
         however that mapping may contain data pages and metadata raw pfns.
         Introduce pgmap_pfn_valid() to delineate the 2 types and fail the
         handling of raw metadata pfns.
      
      This patch (of 4);
      
      pfn_to_online_page() is already too large to be a macro or an inline
      function.  In anticipation of further logic changes / growth, move it out
      of line.
      
      No functional change, just code movement.
      
      Link: https://lkml.kernel.org/r/161058499000.1840162.702316708443239771.stgit@dwillia2-desk3.amr.corp.intel.com
      Link: https://lkml.kernel.org/r/161058499608.1840162.10165648147615238793.stgit@dwillia2-desk3.amr.corp.intel.com
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reported-by: default avatarMichal Hocko <mhocko@kernel.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Qian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9f605f26
    • Jiang Biao's avatar
      mm/vmstat.c: erase latency in vmstat_shepherd · fbcc8183
      Jiang Biao authored
      
      
      Many 100us+ latencies have been deteceted in vmstat_shepherd() on CPX
      platform which has 208 logic cpus.  And vmstat_shepherd is queued every
      second, which could make the case worse.
      
      Add schedule point in vmstat_shepherd() to erase the latency.
      
      Link: https://lkml.kernel.org/r/20210111035526.1511-1-benbjiang@tencent.com
      Signed-off-by: default avatarJiang Biao <benbjiang@tencent.com>
      Reported-by: default avatarBin Lai <robinlai@tencent.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fbcc8183
    • Johannes Weiner's avatar
      mm: vmstat: add some comments on internal storage of byte items · 629484ae
      Johannes Weiner authored
      
      
      Byte-accounted items are used for slab object accounting at the cgroup
      level, because the objects in a slab page can belong to different cgroups.
      At the global level these items always change in multiples of whole slab
      pages.  The vmstat code exploits this and stores these items as pages
      internally, which allows for more compact per-cpu data.
      
      This optimization isn't self-evident from the asserts and the division in
      the stat update functions.  Provide the reader with some context.
      
      Link: https://lkml.kernel.org/r/20210202184411.118614-1-hannes@cmpxchg.org
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      629484ae
    • Johannes Weiner's avatar
      mm: vmstat: fix NOHZ wakeups for node stat changes · 2bbd00ae
      Johannes Weiner authored
      On NOHZ, the periodic vmstat flushers on each CPU can go to sleep and
      won't wake up until stat changes are detected in the per-cpu deltas of the
      zone vmstat counters.
      
      In commit 75ef7184 ("mm, vmstat: add infrastructure for per-node
      vmstats") per-node counters were introduced, and subsequently most stats
      were moved from the zone to the node level.  However, the node counters
      weren't added to the NOHZ wakeup detection.
      
      In theory this can cause per-cpu errors to remain in the user-reported
      stats indefinitely.  In practice this only affects a handful of sub
      counters (file_mapped, dirty and writeback e.g.) because other page state
      changes at the node level likely involve a change at the zone level as
      well (alloc and free, lru ops).  Also, nobody has complained.
      
      Fix it up for completeness: wake up vmstat refreshing on node changes.
      Also remove the BUILD_BUG_ONs that assert counter size; we haven't relied
      on it since we added sizeof() to the range calculation in commit
      13c9aaf7
      
       ("mm/vmstat.c: fix NUMA statistics updates").
      
      Link: https://lkml.kernel.org/r/20210202184342.118513-1-hannes@cmpxchg.org
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2bbd00ae
    • Patrick Daly's avatar
      mm: cma: print region name on failure · a052d4d1
      Patrick Daly authored
      
      
      Print the name of the CMA region for convenience.  This is useful
      information to have when cma_alloc() fails.
      
      [pdaly@codeaurora.org: print the "count" variable]
        Link: https://lkml.kernel.org/r/20210209142414.12768-1-georgi.djakov@linaro.org
      
      Link: https://lkml.kernel.org/r/20210208115200.20286-1-georgi.djakov@linaro.org
      Signed-off-by: default avatarPatrick Daly <pdaly@codeaurora.org>
      Signed-off-by: default avatarGeorgi Djakov <georgi.djakov@linaro.org>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a052d4d1
    • David Hildenbrand's avatar
      mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfo · 3c381db1
      David Hildenbrand authored
      
      
      Let's count the number of CMA pages per zone and print them in
      /proc/zoneinfo.
      
      Having access to the total number of CMA pages per zone is helpful for
      debugging purposes to know where exactly the CMA pages ended up, and to
      figure out how many pages of a zone might behave differently, even after
      some of these pages might already have been allocated.
      
      As one example, CMA pages part of a kernel zone cannot be used for
      ordinary kernel allocations but instead behave more like ZONE_MOVABLE.
      
      For now, we are only able to get the global nr+free cma pages from
      /proc/meminfo and the free cma pages per zone from /proc/zoneinfo.
      
      Example after this patch when booting a 6 GiB QEMU VM with
      "hugetlb_cma=2G":
        # cat /proc/zoneinfo | grep cma
                cma      0
              nr_free_cma  0
                cma      0
              nr_free_cma  0
                cma      524288
              nr_free_cma  493016
                cma      0
                cma      0
        # cat /proc/meminfo | grep Cma
        CmaTotal:        2097152 kB
        CmaFree:         1972064 kB
      
      Note: We print even without CONFIG_CMA, just like "nr_free_cma"; this way,
            one can be sure when spotting "cma 0", that there are definetly no
            CMA pages located in a zone.
      
      [david@redhat.com: v2]
        Link: https://lkml.kernel.org/r/20210128164533.18566-1-david@redhat.com
      [david@redhat.com: v3]
        Link: https://lkml.kernel.org/r/20210129113451.22085-1-david@redhat.com
      
      Link: https://lkml.kernel.org/r/20210127101813.6370-3-david@redhat.com
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3c381db1
    • David Hildenbrand's avatar
      mm/cma: expose all pages to the buddy if activation of an area fails · 072355c1
      David Hildenbrand authored
      
      
      Right now, if activation fails, we might already have exposed some pages
      to the buddy for CMA use (although they will never get actually used by
      CMA), and some pages won't be exposed to the buddy at all.
      
      Let's check for "single zone" early and on error, don't expose any pages
      for CMA use - instead, expose them to the buddy available for any use.
      Simply call free_reserved_page() on every single page - easier than going
      via free_reserved_area(), converting back and forth between pfns and virt
      addresses.
      
      In addition, make sure to fixup totalcma_pages properly.
      
      Example: 6 GiB QEMU VM with "... hugetlb_cma=2G movablecore=20% ...":
        [    0.006891] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
        [    0.006893] cma: Reserved 2048 MiB at 0x0000000100000000
        [    0.006893] hugetlb_cma: reserved 2048 MiB on node 0
        ...
        [    0.175433] cma: CMA area hugetlb0 could not be activated
      
      Before this patch:
        # cat /proc/meminfo
        MemTotal:        5867348 kB
        MemFree:         5692808 kB
        MemAvailable:    5542516 kB
        ...
        CmaTotal:        2097152 kB
        CmaFree:         1884160 kB
      
      After this patch:
        # cat /proc/meminfo
        MemTotal:        6077308 kB
        MemFree:         5904208 kB
        MemAvailable:    5747968 kB
        ...
        CmaTotal:              0 kB
        CmaFree:               0 kB
      
      Note: cma_init_reserved_mem() makes sure that we always cover full
      pageblocks / MAX_ORDER - 1 pages.
      
      Link: https://lkml.kernel.org/r/20210127101813.6370-2-david@redhat.com
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarZi Yan <ziy@nvidia.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      072355c1
    • Roman Gushchin's avatar
      mm: cma: allocate cma areas bottom-up · df2ff39e
      Roman Gushchin authored
      
      
      Currently cma areas without a fixed base are allocated close to the end of
      the node.  This placement is sub-optimal because of compaction: it brings
      pages into the cma area.  In particular, it can bring in hot executable
      pages, even if there is a plenty of free memory on the machine.  This
      results in cma allocation failures.
      
      Instead let's place cma areas close to the beginning of a node.  In this
      case the compaction will help to free cma areas, resulting in better cma
      allocation success rates.
      
      If there is enough memory let's try to allocate bottom-up starting with
      4GB to exclude any possible interference with DMA32.  On smaller machines
      or in a case of a failure, stick with the old behavior.
      
      16GB vm, 2GB cma area:
      With this patch:
      [    0.000000] Command line: root=/dev/vda3 rootflags=subvol=/root systemd.unified_cgroup_hierarchy=1 enforcing=0 console=ttyS0,115200 hugetlb_cma=2G
      [    0.002928] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
      [    0.002930] cma: Reserved 2048 MiB at 0x0000000100000000
      [    0.002931] hugetlb_cma: reserved 2048 MiB on node 0
      
      Without this patch:
      [    0.000000] Command line: root=/dev/vda3 rootflags=subvol=/root systemd.unified_cgroup_hierarchy=1 enforcing=0 console=ttyS0,115200 hugetlb_cma=2G
      [    0.002930] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
      [    0.002933] cma: Reserved 2048 MiB at 0x00000003c0000000
      [    0.002934] hugetlb_cma: reserved 2048 MiB on node 0
      
      v2:
        - switched to memblock_set_bottom_up(true), by Mike
        - start with 4GB, by Mike
      
      [guro@fb.com: whitespace fix, per Mike]
        Link: https://lkml.kernel.org/r/20201221170551.GB3428478@carbon.DHCP.thefacebook.com
      [guro@fb.com: fix 32-bit warnings]
        Link: https://lkml.kernel.org/r/20201223163537.GA4011967@carbon.DHCP.thefacebook.com
      [guro@fb.com: fix 32-bit systems]
      [akpm@linux-foundation.org: build fix]
      
      Link: https://lkml.kernel.org/r/20201217201214.3414100-1-guro@fb.com
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Reviewed-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Cc: Wonhyuk Yang <vvghjk1234@gmail.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      df2ff39e
    • Rik van Riel's avatar
      mm,shmem,thp: limit shmem THP allocations to requested zones · 187df5dd
      Rik van Riel authored
      
      
      Hugh pointed out that the gma500 driver uses shmem pages, but needs to
      limit them to the DMA32 zone.  Ensure the allocations resulting from the
      gfp_mask returned by limit_gfp_mask use the zone flags that were
      originally passed to shmem_getpage_gfp.
      
      Link: https://lkml.kernel.org/r/20210224121016.1314ed6d@imladris.surriel.com
      Signed-off-by: default avatarRik van Riel <riel@surriel.com>
      Suggested-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Xu Yu <xuyu@linux.alibaba.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      187df5dd
    • Rik van Riel's avatar
      mm,thp,shmem: make khugepaged obey tmpfs mount flags · cd89fb06
      Rik van Riel authored
      Currently if thp enabled=[madvise], mounting a tmpfs filesystem with
      huge=always and mmapping files from that tmpfs does not result in
      khugepaged collapsing those mappings, despite the mount flag indicating
      that it should.
      
      Fix that by breaking up the blocks of tests in hugepage_vma_check a little
      bit, and testing things in the correct order.
      
      Link: https://lkml.kernel.org/r/20201124194925.623931-4-riel@surriel.com
      Fixes: c2231020
      
       ("mm: thp: register mm for khugepaged when merging vma for shmem")
      Signed-off-by: default avatarRik van Riel <riel@surriel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Xu Yu <xuyu@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cd89fb06
    • Rik van Riel's avatar
      mm,thp,shm: limit gfp mask to no more than specified · 78cc8cdc
      Rik van Riel authored
      
      
      Matthew Wilcox pointed out that the i915 driver opportunistically
      allocates tmpfs memory, but will happily reclaim some of its pool if no
      memory is available.
      
      Make sure the gfp mask used to opportunistically allocate a THP is always
      at least as restrictive as the original gfp mask.
      
      Link: https://lkml.kernel.org/r/20201124194925.623931-3-riel@surriel.com
      Signed-off-by: default avatarRik van Riel <riel@surriel.com>
      Suggested-by: default avatarMatthew Wilcox <willy@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Xu Yu <xuyu@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      78cc8cdc
    • Rik van Riel's avatar
      mm,thp,shmem: limit shmem THP alloc gfp_mask · 164cc4fe
      Rik van Riel authored
      
      
      Patch series "mm,thp,shm: limit shmem THP alloc gfp_mask", v6.
      
      The allocation flags of anonymous transparent huge pages can be controlled
      through the files in /sys/kernel/mm/transparent_hugepage/defrag, which can
      help the system from getting bogged down in the page reclaim and
      compaction code when many THPs are getting allocated simultaneously.
      
      However, the gfp_mask for shmem THP allocations were not limited by those
      configuration settings, and some workloads ended up with all CPUs stuck on
      the LRU lock in the page reclaim code, trying to allocate dozens of THPs
      simultaneously.
      
      This patch applies the same configurated limitation of THPs to shmem
      hugepage allocations, to prevent that from happening.
      
      This way a THP defrag setting of "never" or "defer+madvise" will result in
      quick allocation failures without direct reclaim when no 2MB free pages
      are available.
      
      With this patch applied, THP allocations for tmpfs will be a little more
      aggressive than today for files mmapped with MADV_HUGEPAGE, and a little
      less aggressive for files that are not mmapped or mapped without that
      flag.
      
      This patch (of 4):
      
      The allocation flags of anonymous transparent huge pages can be controlled
      through the files in /sys/kernel/mm/transparent_hugepage/defrag, which can
      help the system from getting bogged down in the page reclaim and
      compaction code when many THPs are getting allocated simultaneously.
      
      However, the gfp_mask for shmem THP allocations were not limited by those
      configuration settings, and some workloads ended up with all CPUs stuck on
      the LRU lock in the page reclaim code, trying to allocate dozens of THPs
      simultaneously.
      
      This patch applies the same configurated limitation of THPs to shmem
      hugepage allocations, to prevent that from happening.
      
      Controlling the gfp_mask of THP allocations through the knobs in sysfs
      allows users to determine the balance between how aggressively the system
      tries to allocate THPs at fault time, and how much the application may end
      up stalling attempting those allocations.
      
      This way a THP defrag setting of "never" or "defer+madvise" will result in
      quick allocation failures without direct reclaim when no 2MB free pages
      are available.
      
      With this patch applied, THP allocations for tmpfs will be a little more
      aggressive than today for files mmapped with MADV_HUGEPAGE, and a little
      less aggressive for files that are not mmapped or mapped without that
      flag.
      
      Link: https://lkml.kernel.org/r/20201124194925.623931-1-riel@surriel.com
      Link: https://lkml.kernel.org/r/20201124194925.623931-2-riel@surriel.com
      Signed-off-by: default avatarRik van Riel <riel@surriel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Xu Yu <xuyu@linux.alibaba.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      164cc4fe
    • Matthew Wilcox (Oracle)'s avatar
      mm: remove pagevec_lookup_entries · a656a202
      Matthew Wilcox (Oracle) authored
      
      
      pagevec_lookup_entries() is now just a wrapper around find_get_entries()
      so remove it and convert all its callers.
      
      Link: https://lkml.kernel.org/r/20201112212641.27837-15-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarWilliam Kucharski <william.kucharski@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a656a202
    • Matthew Wilcox (Oracle)'s avatar
      mm: pass pvec directly to find_get_entries · cf2039af
      Matthew Wilcox (Oracle) authored
      
      
      All callers of find_get_entries() use a pvec, so pass it directly instead
      of manipulating it in the caller.
      
      Link: https://lkml.kernel.org/r/20201112212641.27837-14-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarWilliam Kucharski <william.kucharski@oracle.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cf2039af
    • Matthew Wilcox (Oracle)'s avatar
      mm: remove nr_entries parameter from pagevec_lookup_entries · 38cefeb3
      Matthew Wilcox (Oracle) authored
      
      
      All callers want to fetch the full size of the pvec.
      
      Link: https://lkml.kernel.org/r/20201112212641.27837-13-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarWilliam Kucharski <william.kucharski@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      38cefeb3
    • Matthew Wilcox (Oracle)'s avatar
      mm: add an 'end' parameter to pagevec_lookup_entries · 31d270fd
      Matthew Wilcox (Oracle) authored
      
      
      Simplifies the callers and uses the existing functionality in
      find_get_entries().  We can also drop the final argument of
      truncate_exceptional_pvec_entries() and simplify the logic in that
      function.
      
      Link: https://lkml.kernel.org/r/20201112212641.27837-12-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarWilliam Kucharski <william.kucharski@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      31d270fd
    • Matthew Wilcox (Oracle)'s avatar
      mm: add an 'end' parameter to find_get_entries · ca122fe4
      Matthew Wilcox (Oracle) authored
      
      
      This simplifies the callers and leads to a more efficient implementation
      since the XArray has this functionality already.
      
      Link: https://lkml.kernel.org/r/20201112212641.27837-11-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarWilliam Kucharski <william.kucharski@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ca122fe4
    • Matthew Wilcox (Oracle)'s avatar
      mm: add and use find_lock_entries · 5c211ba2
      Matthew Wilcox (Oracle) authored
      
      
      We have three functions (shmem_undo_range(), truncate_inode_pages_range()
      and invalidate_mapping_pages()) which want exactly this function, so add
      it to filemap.c.  Before this patch, shmem_undo_range() would split any
      compound page which overlaps either end of the range being punched in both
      the first and second loops through the address space.  After this patch,
      that functionality is left for the second loop, which is arguably more
      appropriate since the first loop is supposed to run through all the pages
      quickly, and splitting a page can sleep.
      
      [willy@infradead.org: add assertion]
        Link: https://lkml.kernel.org/r/20201124041507.28996-3-willy@infradead.org
      
      Link: https://lkml.kernel.org/r/20201112212641.27837-10-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarWilliam Kucharski <william.kucharski@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5c211ba2
    • Matthew Wilcox (Oracle)'s avatar
      iomap: use mapping_seek_hole_data · 54fa39ac
      Matthew Wilcox (Oracle) authored
      
      
      Enhance mapping_seek_hole_data() to handle partially uptodate pages and
      convert the iomap seek code to call it.
      
      Link: https://lkml.kernel.org/r/20201112212641.27837-9-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      54fa39ac
    • Matthew Wilcox (Oracle)'s avatar
      mm/filemap: add mapping_seek_hole_data · 41139aa4
      Matthew Wilcox (Oracle) authored
      
      
      Rewrite shmem_seek_hole_data() and move it to filemap.c.
      
      [willy@infradead.org: don't put an xa_is_value() page]
        Link: https://lkml.kernel.org/r/20201124041507.28996-4-willy@infradead.org
      
      Link: https://lkml.kernel.org/r/20201112212641.27837-8-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Reviewed-by: default avatarWilliam Kucharski <william.kucharski@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      41139aa4
    • Matthew Wilcox (Oracle)'s avatar
      mm/filemap: add helper for finding pages · c7bad633
      Matthew Wilcox (Oracle) authored
      
      
      There is a lot of common code in find_get_entries(),
      find_get_pages_range() and find_get_pages_range_tag().  Factor out
      find_get_entry() which simplifies all three functions.
      
      [willy@infradead.org: remove VM_BUG_ON_PAGE()]
        Link: https://lkml.kernel.org/r/20201124041507.28996-2-willy@infradead.orgLink: https://lkml.kernel.org/r/20201112212641.27837-7-willy@infradead.org
      
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarWilliam Kucharski <william.kucharski@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c7bad633
    • Matthew Wilcox (Oracle)'s avatar
      mm/filemap: rename find_get_entry to mapping_get_entry · bc5a3011
      Matthew Wilcox (Oracle) authored
      
      
      find_get_entry doesn't "find" anything.  It returns the entry at a
      particular index.
      
      Link: https://lkml.kernel.org/r/20201112212641.27837-6-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bc5a3011
    • Matthew Wilcox (Oracle)'s avatar
      mm: add FGP_ENTRY · 44835d20
      Matthew Wilcox (Oracle) authored
      
      
      The functionality of find_lock_entry() and find_get_entry() can be
      provided by pagecache_get_page(), which lets us delete find_lock_entry()
      and make find_get_entry() static.
      
      Link: https://lkml.kernel.org/r/20201112212641.27837-5-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      44835d20
    • Matthew Wilcox (Oracle)'s avatar
      mm/swap: optimise get_shadow_from_swap_cache · 8c647dd1
      Matthew Wilcox (Oracle) authored
      
      
      There's no need to get a reference to the page, just load the entry and
      see if it's a shadow entry.
      
      Link: https://lkml.kernel.org/r/20201112212641.27837-4-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8c647dd1
    • Matthew Wilcox (Oracle)'s avatar
      mm/shmem: use pagevec_lookup in shmem_unlock_mapping · 96888e0a
      Matthew Wilcox (Oracle) authored
      
      
      The comment shows that the reason for using find_get_entries() is now
      stale; find_get_pages() will not return 0 if it hits a consecutive run of
      swap entries, and I don't believe it has since 2011.  pagevec_lookup() is
      a simpler function to use than find_get_pages(), so use it instead.
      
      Link: https://lkml.kernel.org/r/20201112212641.27837-3-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarWilliam Kucharski <william.kucharski@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      96888e0a
    • Matthew Wilcox (Oracle)'s avatar
      mm: make pagecache tagged lookups return only head pages · c49f50d1
      Matthew Wilcox (Oracle) authored
      
      
      Patch series "Overhaul multi-page lookups for THP", v4.
      
      This THP prep patchset changes several page cache iteration APIs to only
      return head pages.
      
       - It's only possible to tag head pages in the page cache, so only
         return head pages, not all their subpages.
       - Factor a lot of common code out of the various batch lookup routines
       - Add mapping_seek_hole_data()
       - Unify find_get_entries() and pagevec_lookup_entries()
       - Make find_get_entries only return head pages, like find_get_entry().
      
      These are only loosely connected, but they seem to make sense together as
      a series.
      
      This patch (of 14):
      
      Pagecache tags are used for dirty page writeback.  Since dirtiness is
      tracked on a per-THP basis, we only want to return the head page rather
      than each subpage of a tagged page.  All the filesystems which use huge
      pages today are in-memory, so there are no tagged huge pages today.
      
      Link: https://lkml.kernel.org/r/20201112212641.27837-2-willy@infradead.org
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarWilliam Kucharski <william.kucharski@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c49f50d1
  2. Feb 26, 2021
    • Linus Torvalds's avatar
      Merge tag 'kbuild-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild · 6fbd6cf8
      Linus Torvalds authored
      Pull Kbuild updates from Masahiro Yamada:
      
       - Fix false-positive build warnings for ARCH=ia64 builds
      
       - Optimize dictionary size for module compression with xz
      
       - Check the compiler and linker versions in Kconfig
      
       - Fix misuse of extra-y
      
       - Support DWARF v5 debug info
      
       - Clamp SUBLEVEL to 255 because stable releases 4.4.x and 4.9.x
         exceeded the limit
      
       - Add generic syscall{tbl,hdr}.sh for cleanups across arches
      
       - Minor cleanups of genksyms
      
       - Minor cleanups of Kconfig
      
      * tag 'kbuild-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (38 commits)
        initramfs: Remove redundant dependency of RD_ZSTD on BLK_DEV_INITRD
        kbuild: remove deprecated 'always' and 'hostprogs-y/m'
        kbuild: parse C= and M= before changing the working directory
        kbuild: reuse this-makefile to define abs_srctree
        kconfig: unify rule of config, menuconfig, nconfig, gconfig, xconfig
        kconfig: omit --oldaskconfig option for 'make config'
        kconfig: fix 'invalid option' for help option
        kconfig: remove dead code in conf_askvalue()
        kconfig: clean up nested if-conditionals in check_conf()
        kconfig: Remove duplicate call to sym_get_string_value()
        Makefile: Remove # characters from compiler string
        Makefile: reuse CC_VERSION_TEXT
        kbuild: check the minimum linker version in Kconfig
        kbuild: remove ld-version macro
        scripts: add generic syscallhdr.sh
        scripts: add generic syscalltbl.sh
        arch: syscalls: remove $(srctree)/ prefix from syscall tables
        arch: syscalls: add missing FORCE and fix 'targets' to make if_changed work
        gen_compile_commands: prune some directories
        kbuild: simplify access to the kernel's version
        ...
      6fbd6cf8
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 6f9972bb
      Linus Torvalds authored
      Pull ext4 updates from Ted Ts'o:
       "Miscellaneous ext4 cleanups and bug fixes. Pretty boring this cycle..."
      
      * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: add .kunitconfig fragment to enable ext4-specific tests
        ext: EXT4_KUNIT_TESTS should depend on EXT4_FS instead of selecting it
        ext4: reset retry counter when ext4_alloc_file_blocks() makes progress
        ext4: fix potential htree index checksum corruption
        ext4: factor out htree rep invariant check
        ext4: Change list_for_each* to list_for_each_entry*
        ext4: don't try to processed freed blocks until mballoc is initialized
        ext4: use DEFINE_MUTEX() for mutex lock
      6f9972bb
    • Linus Torvalds's avatar
      Merge tag 'pci-v5.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 5b47b10e
      Linus Torvalds authored
      Pull PCI updates from Bjorn Helgaas:
       "Enumeration:
         - Remove unnecessary locking around _OSC (Bjorn Helgaas)
         - Clarify message about _OSC failure (Bjorn Helgaas)
         - Remove notification of PCIe bandwidth changes (Bjorn Helgaas)
         - Tidy checking of syscall user config accessors (Heiner Kallweit)
      
        Resource management:
         - Decline to resize resources if boot config must be preserved (Ard
           Biesheuvel)
         - Fix pci_register_io_range() memory leak (Geert Uytterhoeven)
      
        Error handling (Keith Busch):
         - Clear error status from the correct device
         - Retain error recovery status so drivers can use it after reset
         - Log the type of Port (Root or Switch Downstream) that we reset
         - Always request a reset for Downstream Ports in frozen state
      
        Endpoint framework and NTB (Kishon Vijay Abraham I):
         - Make *_get_first_free_bar() take into account 64 bit BAR
         - Add helper API to get the 'next' unreserved BAR
         - Make *_free_bar() return error codes on failure
         - Remove unused pci_epf_match_device()
         - Add support to associate secondary EPC with EPF
         - Add support in configfs to associate two EPCs with EPF
         - Add pci_epc_ops to map MSI IRQ
         - Add pci_epf_ops to expose function-specific attrs
         - Allow user to create sub-directory of 'EPF Device' directory
         - Implement ->msi_map_irq() ops for cadence
         - Configure LM_EP_FUNC_CFG based on epc->function_num_map for cadence
         - Add EP function driver to provide NTB functionality
         - Add support for EPF PCI Non-Transparent Bridge
         - Add specification for PCI NTB function device
         - Add PCI endpoint NTB function user guide
         - Add configfs binding documentation for pci-ntb endpoint function
      
        Broadcom STB PCIe controller driver:
         - Add support for BCM4908 and external PERST# signal controller
           (Rafał Miłecki)
      
        Cadence PCIe controller driver:
         - Retrain Link to work around Gen2 training defect (Nadeem Athani)
         - Fix merge botch in cdns_pcie_host_map_dma_ranges() (Krzysztof
           Wilczyński)
      
        Freescale Layerscape PCIe controller driver:
         - Add LX2160A rev2 EP mode support (Hou Zhiqiang)
         - Convert to builtin_platform_driver() (Michael Walle)
      
        MediaTek PCIe controller driver:
         - Fix OF node reference leak (Krzysztof Wilczyński)
      
        Microchip PolarFlare PCIe controller driver:
         - Add Microchip PolarFire PCIe controller driver (Daire McNamara)
      
        Qualcomm PCIe controller driver:
         - Use PHY_REFCLK_USE_PAD only for ipq8064 (Ansuel Smith)
         - Add support for ddrss_sf_tbu clock for sm8250 (Dmitry Baryshkov)
      
        Renesas R-Car PCIe controller driver:
         - Drop PCIE_RCAR config option (Lad Prabhakar)
         - Always allocate MSI addresses in 32bit space (Marek Vasut)
      
        Rockchip PCIe controller driver:
         - Add FriendlyARM NanoPi M4B DT binding (Chen-Yu Tsai)
         - Make 'ep-gpios' DT property optional (Chen-Yu Tsai)
      
        Synopsys DesignWare PCIe controller driver:
         - Work around ECRC configuration hardware defect (Vidya Sagar)
         - Drop support for config space in DT 'ranges' (Rob Herring)
         - Change size to u64 for EP outbound iATU (Shradha Todi)
         - Add upper limit address for outbound iATU (Shradha Todi)
         - Make dw_pcie ops optional (Jisheng Zhang)
         - Remove unnecessary dw_pcie_ops from al driver (Jisheng Zhang)
      
        Xilinx Versal CPM PCIe controller driver:
         - Fix OF node reference leak (Pan Bian)
      
        Miscellaneous:
         - Remove tango host controller driver (Arnd Bergmann)
         - Remove IRQ handler & data together (altera-msi, brcmstb, dwc)
           (Martin Kaiser)
         - Fix xgene-msi race in installing chained IRQ handler (Martin
           Kaiser)
         - Apply CONFIG_PCI_DEBUG to entire drivers/pci hierarchy (Junhao He)
         - Fix pci-bridge-emul array overruns (Russell King)
         - Remove obsolete uses of WARN_ON(in_interrupt()) (Sebastian Andrzej
           Siewior)"
      
      * tag 'pci-v5.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (69 commits)
        PCI: qcom: Use PHY_REFCLK_USE_PAD only for ipq8064
        PCI: qcom: Add support for ddrss_sf_tbu clock
        dt-bindings: PCI: qcom: Document ddrss_sf_tbu clock for sm8250
        PCI: al: Remove useless dw_pcie_ops
        PCI: dwc: Don't assume the ops in dw_pcie always exist
        PCI: dwc: Add upper limit address for outbound iATU
        PCI: dwc: Change size to u64 for EP outbound iATU
        PCI: dwc: Drop support for config space in 'ranges'
        PCI: layerscape: Convert to builtin_platform_driver()
        PCI: layerscape: Add LX2160A rev2 EP mode support
        dt-bindings: PCI: layerscape: Add LX2160A rev2 compatible strings
        PCI: dwc: Work around ECRC configuration issue
        PCI/portdrv: Report reset for frozen channel
        PCI/AER: Specify the type of Port that was reset
        PCI/ERR: Retain status from error notification
        PCI/AER: Clear AER status from Root Port when resetting Downstream Port
        PCI/ERR: Clear status of the reporting device
        dt-bindings: arm: rockchip: Add FriendlyARM NanoPi M4B
        PCI: rockchip: Make 'ep-gpios' DT property optional
        Documentation: PCI: Add PCI endpoint NTB function user guide
        ...
      5b47b10e
    • Linus Torvalds's avatar
      Merge tag 'nds32-for-linux-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux · 6c15f9e8
      Linus Torvalds authored
      Pull nds32 updates from Greentime Hu:
       "Code clean-up and refinement"
      
      * tag 'nds32-for-linux-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux:
        nds32: Fix bogus reference to <asm/procinfo.h>
        nds32: use get_kernel_nofault in dump_mem
        nds32: remove dump_instr
        nds32: configs: Cleanup CONFIG_CROSS_COMPILE
        nds32: Replace <linux/clk-provider.h> by <linux/of_clk.h>
      6c15f9e8
  3. Feb 25, 2021