Skip to content
  1. Jan 15, 2022
    • Andrey Konovalov's avatar
      kasan: fix quarantine conflicting with init_on_free · 26dca996
      Andrey Konovalov authored
      KASAN's quarantine might save its metadata inside freed objects.  As
      this happens after the memory is zeroed by the slab allocator when
      init_on_free is enabled, the memory coming out of quarantine is not
      properly zeroed.
      
      This causes lib/test_meminit.c tests to fail with Generic KASAN.
      
      Zero the metadata when the object is removed from quarantine.
      
      Link: https://lkml.kernel.org/r/2805da5df4b57138fdacd671f5d227d58950ba54.1640037083.git.andreyknvl@google.com
      Fixes: 6471384a
      
       ("mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options")
      Signed-off-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      26dca996
    • Marco Elver's avatar
      kasan: test: add test case for double-kmem_cache_destroy() · f98f966c
      Marco Elver authored
      
      
      Add a test case for double-kmem_cache_destroy() detection.
      
      Link: https://lkml.kernel.org/r/20211119142219.1519617-2-elver@google.com
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Reviewed-by: default avatarAndrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f98f966c
    • Marco Elver's avatar
      kasan: add ability to detect double-kmem_cache_destroy() · bed0a9b5
      Marco Elver authored
      
      
      Because mm/slab_common.c is not instrumented with software KASAN modes,
      it is not possible to detect use-after-free of the kmem_cache passed
      into kmem_cache_destroy().  In particular, because of the s->refcount--
      and subsequent early return if non-zero, KASAN would never be able to
      see the double-free via kmem_cache_free(kmem_cache, s).  To be able to
      detect a double-kmem_cache_destroy(), check accessibility of the
      kmem_cache, and in case of failure return early.
      
      While KASAN_HW_TAGS is able to detect such bugs, by checking
      accessibility and returning early we fail more gracefully and also avoid
      corrupting reused objects (where tags mismatch).
      
      A recent case of a double-kmem_cache_destroy() was detected by KFENCE:
      https://lkml.kernel.org/r/0000000000003f654905c168b09d@google.com, which
      was not detectable by software KASAN modes.
      
      Link: https://lkml.kernel.org/r/20211119142219.1519617-1-elver@google.com
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarAndrey Konovalov <andreyknvl@gmail.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bed0a9b5
    • Marco Elver's avatar
      kasan: test: add globals left-out-of-bounds test · e5f47287
      Marco Elver authored
      
      
      Add a test checking that KASAN generic can also detect out-of-bounds
      accesses to the left of globals.
      
      Unfortunately it seems that GCC doesn't catch this (tested GCC 10, 11).
      The main difference between GCC's globals redzoning and Clang's is that
      GCC relies on using increased alignment to producing padding, where
      Clang's redzoning implementation actually adds real data after the
      global and doesn't rely on alignment to produce padding.  I believe this
      is the main reason why GCC can't reliably catch globals out-of-bounds in
      this case.
      
      Given this is now a known issue, to avoid failing the whole test suite,
      skip this test case with GCC.
      
      Link: https://lkml.kernel.org/r/20211117130714.135656-1-elver@google.com
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Reported-by: default avatarKaiwan N Billimoria <kaiwan.billimoria@gmail.com>
      Reviewed-by: default avatarAndrey Konovalov <andreyknvl@gmail.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Kaiwan N Billimoria <kaiwan.billimoria@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e5f47287
    • Joao Martins's avatar
      device-dax: compound devmap support · 14606001
      Joao Martins authored
      
      
      Use the newly added compound devmap facility which maps the assigned dax
      ranges as compound pages at a page size of @align.
      
      dax devices are created with a fixed @align (huge page size) which is
      enforced through as well at mmap() of the device.  Faults, consequently
      happen too at the specified @align specified at the creation, and those
      don't change throughout dax device lifetime.  MCEs unmap a whole dax
      huge page, as well as splits occurring at the configured page size.
      
      Performance measured by gup_test improves considerably for
      unpin_user_pages() and altmap with NVDIMMs:
      
        $ gup_test -f /dev/dax1.0 -m 16384 -r 10 -S -a -n 512 -w
        (pin_user_pages_fast 2M pages) put:~71 ms -> put:~22 ms
        [altmap]
        (pin_user_pages_fast 2M pages) get:~524ms put:~525 ms -> get: ~127ms put:~71ms
      
         $ gup_test -f /dev/dax1.0 -m 129022 -r 10 -S -a -n 512 -w
        (pin_user_pages_fast 2M pages) put:~513 ms -> put:~188 ms
        [altmap with -m 127004]
        (pin_user_pages_fast 2M pages) get:~4.1 secs put:~4.12 secs -> get:~1sec put:~563ms
      
      .. as well as unpin_user_page_range_dirty_lock() being just as effective
      as THP/hugetlb[0] pages.
      
      [0] https://lore.kernel.org/linux-mm/20210212130843.13865-5-joao.m.martins@oracle.com/
      
      Link: https://lkml.kernel.org/r/20211202204422.26777-12-joao.m.martins@oracle.com
      Signed-off-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      14606001
    • Joao Martins's avatar
      device-dax: remove pfn from __dev_dax_{pte,pmd,pud}_fault() · 6ec228b6
      Joao Martins authored
      
      
      After moving the page mapping to be set prior to pte insertion, the pfn
      in dev_dax_huge_fault() no longer is necessary.  Remove it, as well as
      the @pfn argument passed to the internal fault handler helpers.
      
      [akpm@linux-foundation.org: fix CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD=n build]
      
      Link: https://lkml.kernel.org/r/20211202204422.26777-11-joao.m.martins@oracle.com
      Signed-off-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Suggested-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6ec228b6
    • Joao Martins's avatar
      device-dax: set mapping prior to vmf_insert_pfn{,_pmd,pud}() · 0e7325f0
      Joao Martins authored
      
      
      Normally, the @page mapping is set prior to inserting the page into a
      page table entry.  Make device-dax adhere to the same ordering, rather
      than setting mapping after the PTE is inserted.
      
      The address_space never changes and it is always associated with the
      same inode and underlying pages.  So, the page mapping is set once but
      cleared when the struct pages are removed/freed (i.e.  after
      {devm_}memunmap_pages()).
      
      Link: https://lkml.kernel.org/r/20211202204422.26777-10-joao.m.martins@oracle.com
      Suggested-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      Signed-off-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0e7325f0
    • Joao Martins's avatar
      device-dax: factor out page mapping initialization · a0fb038e
      Joao Martins authored
      
      
      Move initialization of page->mapping into a separate helper.
      
      This is in preparation to move the mapping set to be prior to inserting
      the page table entry and also for tidying up compound page handling into
      one helper.
      
      Link: https://lkml.kernel.org/r/20211202204422.26777-9-joao.m.martins@oracle.com
      Signed-off-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a0fb038e
    • Joao Martins's avatar
      device-dax: ensure dev_dax->pgmap is valid for dynamic devices · fc65c4eb
      Joao Martins authored
      
      
      Right now, only static dax regions have a valid @pgmap pointer in its
      struct dev_dax.  Dynamic dax case however, do not.
      
      In preparation for device-dax compound devmap support, make sure that
      dev_dax pgmap field is set after it has been allocated and initialized.
      
      dynamic dax device have the @pgmap is allocated at probe() and it's
      managed by devm (contrast to static dax region which a pgmap is provided
      and dax core kfrees it).  So in addition to ensure a valid @pgmap, clear
      the pgmap when the dynamic dax device is released to avoid the same
      pgmap ranges to be re-requested across multiple region device reconfigs.
      
      Add a static_dev_dax() and use that helper in dev_dax_probe() to ensure
      the initialization differences between dynamic and static regions are
      more explicit.  While at it, consolidate the ranges initialization when
      we allocate the @pgmap for the dynamic dax region case.  Also take the
      opportunity to document the differences between static and dynamic da
      regions.
      
      Link: https://lkml.kernel.org/r/20211202204422.26777-8-joao.m.martins@oracle.com
      Suggested-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fc65c4eb
    • Joao Martins's avatar
      device-dax: use struct_size() · 09b80137
      Joao Martins authored
      
      
      Use the struct_size() helper for the size of a struct with variable
      array member at the end, rather than manually calculating it.
      
      Link: https://lkml.kernel.org/r/20211202204422.26777-7-joao.m.martins@oracle.com
      Suggested-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      09b80137
    • Joao Martins's avatar
      device-dax: use ALIGN() for determining pgoff · b9b5777f
      Joao Martins authored
      
      
      Rather than calculating @pgoff manually, switch to ALIGN() instead.
      
      Link: https://lkml.kernel.org/r/20211202204422.26777-6-joao.m.martins@oracle.com
      Suggested-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b9b5777f
    • Joao Martins's avatar
      mm/memremap: add ZONE_DEVICE support for compound pages · c4386bd8
      Joao Martins authored
      Add a new @vmemmap_shift property for struct dev_pagemap which specifies
      that a devmap is composed of a set of compound pages of order
      @vmemmap_shift, instead of base pages.  When a compound page devmap is
      requested, all but the first page are initialised as tail pages instead
      of order-0 pages.
      
      For certain ZONE_DEVICE users like device-dax which have a fixed page
      size, this creates an opportunity to optimize GUP and GUP-fast walkers,
      treating it the same way as THP or hugetlb pages.
      
      Additionally, commit 7118fc29
      
       ("hugetlb: address ref count racing in
      prep_compound_gigantic_page") removed set_page_count() because the
      setting of page ref count to zero was redundant.  devmap pages don't
      come from page allocator though and only head page refcount is used for
      compound pages, hence initialize tail page count to zero.
      
      Link: https://lkml.kernel.org/r/20211202204422.26777-5-joao.m.martins@oracle.com
      Signed-off-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c4386bd8
    • Joao Martins's avatar
      mm/page_alloc: refactor memmap_init_zone_device() page init · 46487e00
      Joao Martins authored
      
      
      Move struct page init to an helper function __init_zone_device_page().
      
      This is in preparation for sharing the storage for compound page
      metadata.
      
      Link: https://lkml.kernel.org/r/20211202204422.26777-4-joao.m.martins@oracle.com
      Signed-off-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      46487e00
    • Joao Martins's avatar
      mm/page_alloc: split prep_compound_page into head and tail subparts · 5b24eeef
      Joao Martins authored
      Patch series "mm, device-dax: Introduce compound pages in devmap", v7.
      
      This series converts device-dax to use compound pages, and moves away
      from the 'struct page per basepage on PMD/PUD' that is done today.
      
      Doing so
       1) unlocks a few noticeable improvements on unpin_user_pages() and
          makes device-dax+altmap case 4x times faster in pinning (numbers
          below and in last patch)
       2) as mentioned in various other threads it's one important step
          towards cleaning up ZONE_DEVICE refcounting.
      
      I've split the compound pages on devmap part from the rest based on
      recent discussions on devmap pending and future work planned[5][6].
      There is consensus that device-dax should be using compound pages to
      represent its PMD/PUDs just like HugeTLB and THP, and that leads to less
      specialization of the dax parts.  I will pursue the rest of the work in
      parallel once this part is merged, particular the GUP-{slow,fast}
      improvements [7] and the tail struct page deduplication memory savings
      part[8].
      
      To summarize what the series does:
      
      Patch 1: Prepare hwpoisoning to work with dax compound pages.
      
      Patches 2-3: Split the current utility function of prep_compound_page()
      into head and tail and use those two helpers where appropriate to take
      advantage of caches being warm after __init_single_page().  This is used
      when initializing zone device when we bring up device-dax namespaces.
      
      Patches 4-10: Add devmap support for compound pages in device-dax.
      memmap_init_zone_device() initialize its metadata as compound pages, and
      it introduces a new devmap property known as vmemmap_shift which
      outlines how the vmemmap is structured (defaults to base pages as done
      today).  The property describe the page order of the metadata
      essentially.  While at it do a few cleanups in device-dax in patches
      5-9.  Finally enable device-dax usage of devmap @vmemmap_shift to a
      value based on its own @align property.  @vmemmap_shift returns 0 by
      default (which is today's case of base pages in devmap, like fsdax or
      the others) and the usage of compound devmap is optional.  Starting with
      device-dax (*not* fsdax) we enable it by default.  There are a few
      pinning improvements particular on the unpinning case and altmap, as
      well as unpin_user_page_range_dirty_lock() being just as effective as
      THP/hugetlb[0] pages.
      
          $ gup_test -f /dev/dax1.0 -m 16384 -r 10 -S -a -n 512 -w
          (pin_user_pages_fast 2M pages) put:~71 ms -> put:~22 ms
          [altmap]
          (pin_user_pages_fast 2M pages) get:~524ms put:~525 ms -> get: ~127ms put:~71ms
      
           $ gup_test -f /dev/dax1.0 -m 129022 -r 10 -S -a -n 512 -w
          (pin_user_pages_fast 2M pages) put:~513 ms -> put:~188 ms
          [altmap with -m 127004]
          (pin_user_pages_fast 2M pages) get:~4.1 secs put:~4.12 secs -> get:~1sec put:~563ms
      
      Tested on x86 with 1Tb+ of pmem (alongside registering it with RDMA with
      and without altmap), alongside gup_test selftests with dynamic dax
      regions and static dax regions.  Coupled with ndctl unit tests for
      dynamic dax devices that exercise all of this.  Note, for dynamic dax
      regions I had to revert commit 8aa83e63
      
       ("x86/setup: Call
      early_reserve_memory() earlier"), it is a known issue that this commit
      broke efi_fake_mem=.
      
      This patch (of 11):
      
      Split the utility function prep_compound_page() into head and tail
      counterparts, and use them accordingly.
      
      This is in preparation for sharing the storage for compound page
      metadata.
      
      Link: https://lkml.kernel.org/r/20211202204422.26777-1-joao.m.martins@oracle.com
      Link: https://lkml.kernel.org/r/20211202204422.26777-3-joao.m.martins@oracle.com
      Signed-off-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Acked-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5b24eeef
    • Kefeng Wang's avatar
      mm: defer kmemleak object creation of module_alloc() · 60115fa5
      Kefeng Wang authored
      Yongqiang reports a kmemleak panic when module insmod/rmmod with KASAN
      enabled(without KASAN_VMALLOC) on x86[1].
      
      When the module area allocates memory, it's kmemleak_object is created
      successfully, but the KASAN shadow memory of module allocation is not
      ready, so when kmemleak scan the module's pointer, it will panic due to
      no shadow memory with KASAN check.
      
        module_alloc
          __vmalloc_node_range
            kmemleak_vmalloc
      				kmemleak_scan
      				  update_checksum
          kasan_module_alloc
            kmemleak_ignore
      
      Note, there is no problem if KASAN_VMALLOC enabled, the modules area
      entire shadow memory is preallocated.  Thus, the bug only exits on ARCH
      which supports dynamic allocation of module area per module load, for
      now, only x86/arm64/s390 are involved.
      
      Add a VM_DEFER_KMEMLEAK flags, defer vmalloc'ed object register of
      kmemleak in module_alloc() to fix this issue.
      
      [1] https://lore.kernel.org/all/6d41e2b9-4692-5ec4-b1cd-cbe29ae89739@huawei.com/
      
      [wangkefeng.wang@huawei.com: fix build]
        Link: https://lkml.kernel.org/r/20211125080307.27225-1-wangkefeng.wang@huawei.com
      [akpm@linux-foundation.org: simplify ifdefs, per Andrey]
        Link: https://lkml.kernel.org/r/CA+fCnZcnwJHUQq34VuRxpdoY6_XbJCDJ-jopksS5Eia4PijPzw@mail.gmail.com
      
      Link: https://lkml.kernel.org/r/20211124142034.192078-1-wangkefeng.wang@huawei.com
      Fixes: 793213a8 ("s390/kasan: dynamic shadow mem allocation for modules")
      Fixes: 39d114dd ("arm64: add KASAN support")
      Fixes: bebf56a1
      
       ("kasan: enable instrumentation of global variables")
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Reported-by: default avatarYongqiang Liu <liuyongqiang13@huawei.com>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      60115fa5
    • Calvin Zhang's avatar
      mm: kmemleak: alloc gray object for reserved region with direct map · 972fa3a7
      Calvin Zhang authored
      
      
      Reserved regions with direct mapping may contain references to other
      regions.  CMA region with fixed location is reserved without creating
      kmemleak_object for it.
      
      So add them as gray kmemleak objects.
      
      Link: https://lkml.kernel.org/r/20211123090641.3654006-1-calvinzhang.cool@gmail.com
      Signed-off-by: default avatarCalvin Zhang <calvinzhang.cool@gmail.com>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Frank Rowand <frowand.list@gmail.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      972fa3a7
    • Kuan-Ying Lee's avatar
      kmemleak: fix kmemleak false positive report with HW tag-based kasan enable · ad1a3e15
      Kuan-Ying Lee authored
      
      
      With HW tag-based kasan enable, We will get the warning when we free
      object whose address starts with 0xFF.
      
      It is because kmemleak rbtree stores tagged object and this freeing
      object's tag does not match with rbtree object.
      
      In the example below, kmemleak rbtree stores the tagged object in the
      kmalloc(), and kfree() gets the pointer with 0xFF tag.
      
      Call sequence:
          ptr = kmalloc(size, GFP_KERNEL);
          page = virt_to_page(ptr);
          offset = offset_in_page(ptr);
          kfree(page_address(page) + offset);
          ptr = kmalloc(size, GFP_KERNEL);
      
      A sequence like that may cause the warning as following:
      
       1) Freeing unknown object:
      
          In kfree(), we will get free unknown object warning in
          kmemleak_free(). Because object(0xFx) in kmemleak rbtree and
          pointer(0xFF) in kfree() have different tag.
      
       2) Overlap existing:
      
          When we allocate that object with the same hw-tag again, we will
          find the overlap in the kmemleak rbtree and kmemleak thread will be
          killed.
      
      	kmemleak: Freeing unknown object at 0xffff000003f88000
      	CPU: 5 PID: 177 Comm: cat Not tainted 5.16.0-rc1-dirty #21
      	Hardware name: linux,dummy-virt (DT)
      	Call trace:
      	 dump_backtrace+0x0/0x1ac
      	 show_stack+0x1c/0x30
      	 dump_stack_lvl+0x68/0x84
      	 dump_stack+0x1c/0x38
      	 kmemleak_free+0x6c/0x70
      	 slab_free_freelist_hook+0x104/0x200
      	 kmem_cache_free+0xa8/0x3d4
      	 test_version_show+0x270/0x3a0
      	 module_attr_show+0x28/0x40
      	 sysfs_kf_seq_show+0xb0/0x130
      	 kernfs_seq_show+0x30/0x40
      	 seq_read_iter+0x1bc/0x4b0
      	 seq_read_iter+0x1bc/0x4b0
      	 kernfs_fop_read_iter+0x144/0x1c0
      	 generic_file_splice_read+0xd0/0x184
      	 do_splice_to+0x90/0xe0
      	 splice_direct_to_actor+0xb8/0x250
      	 do_splice_direct+0x88/0xd4
      	 do_sendfile+0x2b0/0x344
      	 __arm64_sys_sendfile64+0x164/0x16c
      	 invoke_syscall+0x48/0x114
      	 el0_svc_common.constprop.0+0x44/0xec
      	 do_el0_svc+0x74/0x90
      	 el0_svc+0x20/0x80
      	 el0t_64_sync_handler+0x1a8/0x1b0
      	 el0t_64_sync+0x1ac/0x1b0
      	...
      	kmemleak: Cannot insert 0xf2ff000003f88000 into the object search tree (overlaps existing)
      	CPU: 5 PID: 178 Comm: cat Not tainted 5.16.0-rc1-dirty #21
      	Hardware name: linux,dummy-virt (DT)
      	Call trace:
      	 dump_backtrace+0x0/0x1ac
      	 show_stack+0x1c/0x30
      	 dump_stack_lvl+0x68/0x84
      	 dump_stack+0x1c/0x38
      	 create_object.isra.0+0x2d8/0x2fc
      	 kmemleak_alloc+0x34/0x40
      	 kmem_cache_alloc+0x23c/0x2f0
      	 test_version_show+0x1fc/0x3a0
      	 module_attr_show+0x28/0x40
      	 sysfs_kf_seq_show+0xb0/0x130
      	 kernfs_seq_show+0x30/0x40
      	 seq_read_iter+0x1bc/0x4b0
      	 kernfs_fop_read_iter+0x144/0x1c0
      	 generic_file_splice_read+0xd0/0x184
      	 do_splice_to+0x90/0xe0
      	 splice_direct_to_actor+0xb8/0x250
      	 do_splice_direct+0x88/0xd4
      	 do_sendfile+0x2b0/0x344
      	 __arm64_sys_sendfile64+0x164/0x16c
      	 invoke_syscall+0x48/0x114
      	 el0_svc_common.constprop.0+0x44/0xec
      	 do_el0_svc+0x74/0x90
      	 el0_svc+0x20/0x80
      	 el0t_64_sync_handler+0x1a8/0x1b0
      	 el0t_64_sync+0x1ac/0x1b0
      	kmemleak: Kernel memory leak detector disabled
      	kmemleak: Object 0xf2ff000003f88000 (size 128):
      	kmemleak:   comm "cat", pid 177, jiffies 4294921177
      	kmemleak:   min_count = 1
      	kmemleak:   count = 0
      	kmemleak:   flags = 0x1
      	kmemleak:   checksum = 0
      	kmemleak:   backtrace:
      	     kmem_cache_alloc+0x23c/0x2f0
      	     test_version_show+0x1fc/0x3a0
      	     module_attr_show+0x28/0x40
      	     sysfs_kf_seq_show+0xb0/0x130
      	     kernfs_seq_show+0x30/0x40
      	     seq_read_iter+0x1bc/0x4b0
      	     kernfs_fop_read_iter+0x144/0x1c0
      	     generic_file_splice_read+0xd0/0x184
      	     do_splice_to+0x90/0xe0
      	     splice_direct_to_actor+0xb8/0x250
      	     do_splice_direct+0x88/0xd4
      	     do_sendfile+0x2b0/0x344
      	     __arm64_sys_sendfile64+0x164/0x16c
      	     invoke_syscall+0x48/0x114
      	     el0_svc_common.constprop.0+0x44/0xec
      	     do_el0_svc+0x74/0x90
      	kmemleak: Automatic memory scanning thread ended
      
      [akpm@linux-foundation.org: whitespace tweak]
      
      Link: https://lkml.kernel.org/r/20211118054426.4123-1-Kuan-Ying.Lee@mediatek.com
      Signed-off-by: default avatarKuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
      Reviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Doug Berger <opendmb@gmail.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ad1a3e15
    • Muchun Song's avatar
      mm: slab: make slab iterator functions static · c29b5b3d
      Muchun Song authored
      
      
      There is no external users of slab_start/next/stop(), so make them
      static.  And the memory.kmem.slabinfo is deprecated, which outputs
      nothing now, so move memcg_slab_show() into mm/memcontrol.c and rename
      it to mem_cgroup_slab_show to be consistent with other function names.
      
      Link: https://lkml.kernel.org/r/20211109133359.32881-1-songmuchun@bytedance.com
      Signed-off-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c29b5b3d
    • Marco Elver's avatar
      mm/slab_common: use WARN() if cache still has objects on destroy · 7302e91f
      Marco Elver authored
      Calling kmem_cache_destroy() while the cache still has objects allocated
      is a kernel bug, and will usually result in the entire cache being
      leaked.  While the message in kmem_cache_destroy() resembles a warning,
      it is currently not implemented using a real WARN().
      
      This is problematic for infrastructure testing the kernel, all of which
      rely on the specific format of WARN()s to pick up on bugs.
      
      Some 13 years ago this used to be a simple WARN_ON() in slub, but commit
      d629d819
      
       ("slub: improve kmem_cache_destroy() error message")
      changed it into an open-coded warning to avoid confusion with a bug in
      slub itself.
      
      Instead, turn the open-coded warning into a real WARN() with the message
      preserved, so that test systems can actually identify these issues, and
      we get all the other benefits of using a normal WARN().  The warning
      message is extended with "when called from <caller-ip>" to make it even
      clearer where the fault lies.
      
      For most configurations this is only a cosmetic change, however, note
      that WARN() here will now also respect panic_on_warn.
      
      Link: https://lkml.kernel.org/r/20211102170733.648216-1-elver@google.com
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7302e91f
    • Amit Daniel Kachhap's avatar
      fs/ioctl: remove unnecessary __user annotation · a12cf8b3
      Amit Daniel Kachhap authored
      
      
      __user annotations are used by the checker (e.g sparse) to mark user
      pointers.  However here __user is applied to a struct directly, without a
      pointer being directly involved.
      
      Although the presence of __user does not cause sparse to emit a warning,
      __user should be removed for consistency with other uses of offsetof().
      
      Note: No functional changes intended.
      
      Link: https://lkml.kernel.org/r/20211122101256.7875-1-amit.kachhap@arm.com
      Signed-off-by: default avatarAmit Daniel Kachhap <amit.kachhap@arm.com>
      Cc: Vincenzo Frascino <Vincenzo.Frascino@arm.com>
      Cc: Kevin Brodsky <Kevin.Brodsky@arm.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a12cf8b3
    • Colin Ian King's avatar
      ocfs2: remove redundant assignment to variable free_space · 9a25d051
      Colin Ian King authored
      
      
      The variable 'free_space' is being initialized with a value that is not
      read, it is being re-assigned later in the two paths of an if statement.
      The early initialization is redundant and can be removed.
      
      Link: https://lkml.kernel.org/r/20220112230411.1090761-1-colin.i.king@gmail.com
      Signed-off-by: default avatarColin Ian King <colin.i.king@gmail.com>
      Acked-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9a25d051
    • Greg Kroah-Hartman's avatar
      ocfs2: cluster: use default_groups in kobj_type · d141b39b
      Greg Kroah-Hartman authored
      There are currently two ways to create a set of sysfs files for a
      kobj_type, through the default_attrs field, and the default_groups
      field.
      
      Move the ocfs2 cluster sysfs code to use default_groups field which has
      been the preferred way since aa30f47c
      
       ("kobject: Add support for
      default attribute groups to kobj_type") so that we can soon get rid of
      the obsolete default_attrs field.
      
      Link: https://lkml.kernel.org/r/20220106102028.3345634-1-gregkh@linuxfoundation.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Tested-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d141b39b
    • Colin Ian King's avatar
      ocfs2: remove redundant assignment to pointer root_bh · f018844f
      Colin Ian King authored
      
      
      The variable 'root_bh' is being initialized with a value that is not
      read, it is being re-assigned later on closer to its use.  The early
      initialization is redundant and can be removed.
      
      Link: https://lkml.kernel.org/r/20211228013719.620923-1-colin.i.king@gmail.com
      Signed-off-by: default avatarColin Ian King <colin.i.king@gmail.com>
      Acked-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f018844f
    • Greg Kroah-Hartman's avatar
      ocfs2: use default_groups in kobj_type · 59430cc1
      Greg Kroah-Hartman authored
      There are currently two ways to create a set of sysfs files for a
      kobj_type, through the default_attrs field, and the default_groups
      field.
      
      Move the ocfs2 code to use default_groups field which has been the
      preferred way since aa30f47c
      
       ("kobject: Add support for default
      attribute groups to kobj_type") so that we can soon get rid of the
      obsolete default_attrs field.
      
      Link: https://lkml.kernel.org/r/20211228144517.391660-1-gregkh@linuxfoundation.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      59430cc1
    • Joseph Qi's avatar
      ocfs2: clearly handle ocfs2_grab_pages_for_write() return value · e07bf00c
      Joseph Qi authored
      
      
      ocfs2_grab_pages_for_write() may return -EAGAIN if write context type is
      mmap and it could not lock the target page.  In this case, we exit with
      no error and no target page.  And then trigger the caller page_mkwrite()
      to retry.
      
      Since there are other caller types, e.g.  buffer and direct io, make the
      return value handling more clear.
      
      Link: https://lkml.kernel.org/r/20211206065051.103353-1-joseph.qi@linux.alibaba.com
      Signed-off-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e07bf00c
    • Zhang Mingyu's avatar
      ocfs2: use BUG_ON instead of if condition followed by BUG. · 783cc68d
      Zhang Mingyu authored
      
      
      This issue was detected with the help of Coccinelle.
      
      Link: https://lkml.kernel.org/r/20211105014424.75372-1-zhang.mingyu@zte.com.cn
      Signed-off-by: default avatarZhang Mingyu <zhang.mingyu@zte.com.cn>
      Reported-by: default avatarZeal Robot <zealci@zte.com.cn>
      Acked-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      783cc68d
    • Zheng Liang's avatar
      squashfs: provide backing_dev_info in order to disable read-ahead · 9eec1d89
      Zheng Liang authored
      Commit c1f6925e ("mm: put readahead pages in cache earlier") causes
      the read performance of squashfs to deteriorate.Through testing, we find
      that the performance will be back by closing the readahead of squashfs.
      
      So we want to learn the way of ubifs, provides backing_dev_info and
      disable read-ahead
      
      We tested the following data by fio.
      squashfs image blocksize=128K
      test command:
      
        fio --name basic --bs=? --filename="/mnt/test_file" --rw=? --iodepth=1 --ioengine=psync --runtime=200 --time_based
      
        turn on squashfs readahead in 5.10 kernel
        bs(k)      read/randread           MB/s
        4            randread              271
        128          randread              231
        1024         randread              246
        4            read                  310
        128          read                  245
        1024         read                  247
      
        turn off squashfs readahead in 5.10 kernel
        bs(k)      read/randread           MB/s
        4            randread              293
        128          randread              330
        1024         randread              363
        4            read                  338
        128          read                  360
        1024         read                  365
      
        turn on squashfs readahead and revert the
        commit c1f6925e
      
      ("mm: put readahead
        pages in cache earlier") in 5.10 kernel
        bs(k)      read/randread           MB/s
        4           randread               289
        128         randread               306
        1024        randread               335
        4           read                   337
        128         read                   336
        1024        read                   338
      
      Link: https://lkml.kernel.org/r/20211116113141.1391026-1-zhengliang6@huawei.com
      Signed-off-by: default avatarZheng Liang <zhengliang6@huawei.com>
      Reviewed-by: default avatarPhillip Lougher <phillip@squashfs.org.uk>
      Cc: Zhang Yi <yi.zhang@huawei.com>
      Cc: Hou Tao <houtao1@huawei.com>
      Cc: Miao Xie <miaoxie@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9eec1d89
    • Yang Li's avatar
      fs/ntfs/attrib.c: fix one kernel-doc comment · 7e0af978
      Yang Li authored
      
      
      The comments for the file should not be in kernel-doc format:
      
      /**
       * attrib.c - NTFS attribute operations.  Part of the Linux-NTFS
      
      as it causes it to be incorrectly identified for function
      ntfs_map_runlist_nolock(), causing some warnings found by running
      scripts/kernel-doc.:
      
        fs/ntfs/attrib.c:25: warning: Incorrect use of kernel-doc format:  * ntfs_map_runlist_nolock - map (a part of) a runlist of an ntfs inode
        fs/ntfs/attrib.c:71: warning: Function parameter or member 'ni' not described in 'ntfs_map_runlist_nolock'
        fs/ntfs/attrib.c:71: warning: Function parameter or member 'vcn' not described in 'ntfs_map_runlist_nolock'
        fs/ntfs/attrib.c:71: warning: Function parameter or member 'ctx' not described in 'ntfs_map_runlist_nolock'
        fs/ntfs/attrib.c:71: warning: expecting prototype for attrib.c - NTFS attribute operations.  Part of the Linux(). Prototype was for ntfs_map_runlist_nolock() instead
      
      Link: https://lkml.kernel.org/r/20220106015145.67067-1-yang.lee@linux.alibaba.com
      Signed-off-by: default avatarYang Li <yang.lee@linux.alibaba.com>
      Reported-by: default avatarAbaci Robot <abaci@linux.alibaba.com>
      Acked-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Anton Altaparmakov <anton@tuxera.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7e0af978
    • Drew Fustini's avatar
      scripts/spelling.txt: add "oveflow" · 9a69f2b0
      Drew Fustini authored
      
      
      Add typo "oveflow" for "overflow".  This typo was found and fixed in
      tools/testing/selftests/bpf/prog_tests/btf_dump.c
      
      Link: https://lore.kernel.org/all/20211122070528.837806-1-dfustini@baylibre.com/
      Link: https://lkml.kernel.org/r/20211122072302.839102-1-dfustini@baylibre.com
      Signed-off-by: default avatarDrew Fustini <dfustini@baylibre.com>
      Suggested-by: default avatarGustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Colin Ian King <colin.king@intel.com>
      Cc: Drew Fustini <dfustini@baylibre.com>
      Cc: zuoqilin <zuoqilin@yulong.com>
      Cc: Tom Saeger <tom.saeger@oracle.com>
      Cc: Sven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9a69f2b0
    • Greg Kroah-Hartman's avatar
      ia64: topology: use default_groups in kobj_type · a7eddfc9
      Greg Kroah-Hartman authored
      There are currently two ways to create a set of sysfs files for a kobj_type,
      through the default_attrs field, and the default_groups field.
      
      Move the ia64 topology sysfs code to use default_groups field which has
      been the preferred way since aa30f47c
      
       ("kobject: Add support for
      default attribute groups to kobj_type") so that we can soon get rid of
      the obsolete default_attrs field.
      
      Link: https://lkml.kernel.org/r/20220104154800.1287947-1-gregkh@linuxfoundation.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: David Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a7eddfc9
    • Jason Wang's avatar
      ia64: fix typo in a comment · c5c21354
      Jason Wang authored
      
      
      The double `the' in a comment is repeated, thus it should be removed.
      
      Link: https://lkml.kernel.org/r/20211113030316.22650-1-wangborong@cdjrlc.com
      Signed-off-by: default avatarJason Wang <wangborong@cdjrlc.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c5c21354
    • Yang Guang's avatar
      arch/ia64/kernel/setup.c: use swap() to make code cleaner · 6c4420b0
      Yang Guang authored
      
      
      Use the macro 'swap()' defined in 'include/linux/minmax.h' to avoid
      opencoding it.
      
      Link: https://lkml.kernel.org/r/20211104001908.695110-1-yang.guang5@zte.com.cn
      Reported-by: default avatarZeal Robot <zealci@zte.com.cn>
      Signed-off-by: default avatarYang Guang <yang.guang5@zte.com.cn>
      Cc: David Yang <davidcomponentone@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6c4420b0
    • Yang Guang's avatar
      ia64: module: use swap() to make code cleaner · f2fed022
      Yang Guang authored
      
      
      Use the macro 'swap()' defined in 'include/linux/minmax.h' to avoid
      opencoding it.
      
      Link: https://lkml.kernel.org/r/20211104062642.1506539-1-yang.guang5@zte.com.cn
      Signed-off-by: default avatarYang Guang <yang.guang5@zte.com.cn>
      Reported-by: default avatarZeal Robot <zealci@zte.com.cn>
      Cc: David Yang <davidcomponentone@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f2fed022
    • Cai Huoqing's avatar
      trace/hwlat: make use of the helper function kthread_run_on_cpu() · ff78f667
      Cai Huoqing authored
      
      
      Replace kthread_create_on_cpu/wake_up_process() with kthread_run_on_cpu()
      to simplify the code.
      
      Link: https://lkml.kernel.org/r/20211022025711.3673-7-caihuoqing@baidu.com
      Signed-off-by: default avatarCai Huoqing <caihuoqing@baidu.com>
      Cc: Bernard Metzler <bmt@zurich.ibm.com>
      Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: "Paul E . McKenney" <paulmck@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ff78f667
    • Cai Huoqing's avatar
      trace/osnoise: make use of the helper function kthread_run_on_cpu() · 11e4e352
      Cai Huoqing authored
      
      
      Replace kthread_create_on_cpu/wake_up_process() with kthread_run_on_cpu()
      to simplify the code.
      
      Link: https://lkml.kernel.org/r/20211022025711.3673-6-caihuoqing@baidu.com
      Signed-off-by: default avatarCai Huoqing <caihuoqing@baidu.com>
      Cc: Bernard Metzler <bmt@zurich.ibm.com>
      Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: "Paul E . McKenney" <paulmck@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      11e4e352
    • Cai Huoqing's avatar
      rcutorture: make use of the helper function kthread_run_on_cpu() · 3b9cb4ba
      Cai Huoqing authored
      
      
      Replace kthread_create_on_node/kthread_bind/wake_up_process() with
      kthread_run_on_cpu() to simplify the code.
      
      Link: https://lkml.kernel.org/r/20211022025711.3673-5-caihuoqing@baidu.com
      Signed-off-by: default avatarCai Huoqing <caihuoqing@baidu.com>
      Cc: Bernard Metzler <bmt@zurich.ibm.com>
      Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: "Paul E . McKenney" <paulmck@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3b9cb4ba
    • Cai Huoqing's avatar
      ring-buffer: make use of the helper function kthread_run_on_cpu() · 64ed3a04
      Cai Huoqing authored
      
      
      Replace kthread_create/kthread_bind/wake_up_process() with
      kthread_run_on_cpu() to simplify the code.
      
      Link: https://lkml.kernel.org/r/20211022025711.3673-4-caihuoqing@baidu.com
      Signed-off-by: default avatarCai Huoqing <caihuoqing@baidu.com>
      Cc: Bernard Metzler <bmt@zurich.ibm.com>
      Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: "Paul E . McKenney" <paulmck@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      64ed3a04
    • Cai Huoqing's avatar
      RDMA/siw: make use of the helper function kthread_run_on_cpu() · e0850113
      Cai Huoqing authored
      
      
      Replace kthread_create/kthread_bind/wake_up_process() with
      kthread_run_on_cpu() to simplify the code.
      
      Link: https://lkml.kernel.org/r/20211022025711.3673-3-caihuoqing@baidu.com
      Signed-off-by: default avatarCai Huoqing <caihuoqing@baidu.com>
      Cc: Bernard Metzler <bmt@zurich.ibm.com>
      Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: "Paul E . McKenney" <paulmck@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e0850113
    • Cai Huoqing's avatar
      kthread: add the helper function kthread_run_on_cpu() · 800977f6
      Cai Huoqing authored
      
      
      Add a new helper function kthread_run_on_cpu(), which includes
      kthread_create_on_cpu/wake_up_process().
      
      In some cases, use kthread_run_on_cpu() directly instead of
      kthread_create_on_node/kthread_bind/wake_up_process() or
      kthread_create_on_cpu/wake_up_process() or
      kthreadd_create/kthread_bind/wake_up_process() to simplify the code.
      
      [akpm@linux-foundation.org: export kthread_create_on_cpu to modules]
      
      Link: https://lkml.kernel.org/r/20211022025711.3673-2-caihuoqing@baidu.com
      Signed-off-by: default avatarCai Huoqing <caihuoqing@baidu.com>
      Cc: Bernard Metzler <bmt@zurich.ibm.com>
      Cc: Cai Huoqing <caihuoqing@baidu.com>
      Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: "Paul E . McKenney" <paulmck@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      800977f6
  2. Jan 10, 2022