Skip to content
  1. Feb 17, 2023
  2. Feb 10, 2023
    • Isaac J. Manjarres's avatar
      of: reserved_mem: Have kmemleak ignore dynamically allocated reserved mem · ce4d9a1e
      Isaac J. Manjarres authored
      Patch series "Fix kmemleak crashes when scanning CMA regions", v2.
      
      When trying to boot a device with an ARM64 kernel with the following
      config options enabled:
      
      CONFIG_DEBUG_PAGEALLOC=y
      CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y
      CONFIG_DEBUG_KMEMLEAK=y
      
      a crash is encountered when kmemleak starts to scan the list of gray
      or allocated objects that it maintains. Upon closer inspection, it was
      observed that these page-faults always occurred when kmemleak attempted
      to scan a CMA region.
      
      At the moment, kmemleak is made aware of CMA regions that are specified
      through the devicetree to be dynamically allocated within a range of
      addresses. However, kmemleak should not need to scan CMA regions or any
      reserved memory region, as those regions can be used for DMA transfers
      between drivers and peripherals, and thus wouldn't contain anything
      useful for kmemleak.
      
      Additionally, since CMA regions are unmapped from the kernel's address
      space when they are freed to the buddy allocator at boot when
      CONFIG_DEBUG_PAGEALLOC is enabled, kmemleak shouldn't attempt to access
      those memory regions, as that will trigger a crash. Thus, kmemleak
      should ignore all dynamically allocated reserved memory regions.
      
      
      This patch (of 1):
      
      Currently, kmemleak ignores dynamically allocated reserved memory regions
      that don't have a kernel mapping.  However, regions that do retain a
      kernel mapping (e.g.  CMA regions) do get scanned by kmemleak.
      
      This is not ideal for two reasons:
      
      1  kmemleak works by scanning memory regions for pointers to allocated
         objects to determine if those objects have been leaked or not. 
         However, reserved memory regions can be used between drivers and
         peripherals for DMA transfers, and thus, would not contain pointers to
         allocated objects, making it unnecessary for kmemleak to scan these
         reserved memory regions.
      
      2  When CONFIG_DEBUG_PAGEALLOC is enabled, along with kmemleak, the
         CMA reserved memory regions are unmapped from the kernel's address
         space when they are freed to buddy at boot.  These CMA reserved regions
         are still tracked by kmemleak, however, and when kmemleak attempts to
         scan them, a crash will happen, as accessing the CMA region will result
         in a page-fault, since the regions are unmapped.
      
      Thus, use kmemleak_ignore_phys() for all dynamically allocated reserved
      memory regions, instead of those that do not have a kernel mapping
      associated with them.
      
      Link: https://lkml.kernel.org/r/20230208232001.2052777-1-isaacmanjarres@google.com
      Link: https://lkml.kernel.org/r/20230208232001.2052777-2-isaacmanjarres@google.com
      Fixes: a7259df7
      
       ("memblock: make memblock_find_in_range method private")
      Signed-off-by: default avatarIsaac J. Manjarres <isaacmanjarres@google.com>
      Acked-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Frank Rowand <frowand.list@gmail.com>
      Cc: Kirill A. Shutemov <kirill.shtuemov@linux.intel.com>
      Cc: Nick Kossifidis <mick@ics.forth.gr>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Cc: Saravana Kannan <saravanak@google.com>
      Cc: <stable@vger.kernel.org>	[5.15+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ce4d9a1e
    • Jeff Xie's avatar
      scripts/gdb: fix 'lx-current' for x86 · c16a3b11
      Jeff Xie authored
      When printing the name of the current process, it will report an error:
      (gdb) p $lx_current().comm Python Exception <class 'gdb.error'> No symbol
      "current_task" in current context.: Error occurred in Python: No symbol
      "current_task" in current context.
      
      Because e57ef2ed ("x86: Put hot per CPU variables into a struct")
      changed it.
      
      Link: https://lkml.kernel.org/r/20230204090139.1789264-1-xiehuan09@gmail.com
      Fixes: e57ef2ed
      
       ("x86: Put hot per CPU variables into a struct")
      Signed-off-by: default avatarJeff Xie <xiehuan09@gmail.com>
      Cc: Jan Kiszka <jan.kiszka@siemens.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c16a3b11
    • Li Lingfeng's avatar
      lib: parser: optimize match_NUMBER apis to use local array · 67222c4b
      Li Lingfeng authored
      Memory will be allocated to store substring_t in match_strdup(), which
      means the caller of match_strdup() may need to be scheduled out to wait
      for reclaiming memory.  smatch complains that this can cuase sleeping in
      an atoic context.
      
      Using local array to store substring_t to remove the restriction.
      
      Link: https://lkml.kernel.org/r/20230120032352.242767-1-lilingfeng3@huawei.com
      Link: https://lore.kernel.org/all/20221104023938.2346986-5-yukuai1@huaweicloud.com/
      Link: https://lkml.kernel.org/r/20230120032352.242767-1-lilingfeng3@huawei.com
      Fixes: 2c064798
      
       ("blk-iocost: don't release 'ioc->lock' while updating params")
      Signed-off-by: default avatarLi Lingfeng <lilingfeng3@huawei.com>
      Reported-by: default avatarYu Kuai <yukuai1@huaweicloud.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Cc: BingJing Chang <bingjingc@synology.com>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Hou Tao <houtao1@huawei.com>
      Cc: James Smart <james.smart@broadcom.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: yangerkun <yangerkun@huawei.com>
      Cc: Zhang Yi <yi.zhang@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      67222c4b
    • Qi Zheng's avatar
      mm: shrinkers: fix deadlock in shrinker debugfs · badc28d4
      Qi Zheng authored
      The debugfs_remove_recursive() is invoked by unregister_shrinker(), which
      is holding the write lock of shrinker_rwsem.  It will waits for the
      handler of debugfs file complete.  The handler also needs to hold the read
      lock of shrinker_rwsem to do something.  So it may cause the following
      deadlock:
      
       	CPU0				CPU1
      
      debugfs_file_get()
      shrinker_debugfs_count_show()/shrinker_debugfs_scan_write()
      
           				unregister_shrinker()
      				--> down_write(&shrinker_rwsem);
      				    debugfs_remove_recursive()
      					// wait for (A)
      				    --> wait_for_completion();
      
          // wait for (B)
      --> down_read_killable(&shrinker_rwsem)
      debugfs_file_put() -- (A)
      
      				    up_write() -- (B)
      
      The down_read_killable() can be killed, so that the above deadlock can be
      recovered.  But it still requires an extra kill action, otherwise it will
      block all subsequent shrinker-related operations, so it's better to fix
      it.
      
      [akpm@linux-foundation.org: fix CONFIG_SHRINKER_DEBUG=n stub]
      Link: https://lkml.kernel.org/r/20230202105612.64641-1-zhengqi.arch@bytedance.com
      Fixes: 5035ebc6
      
       ("mm: shrinkers: introduce debugfs interface for memory shrinkers")
      Signed-off-by: default avatarQi Zheng <zhengqi.arch@bytedance.com>
      Reviewed-by: default avatarRoman Gushchin <roman.gushchin@linux.dev>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      badc28d4
    • Kefeng Wang's avatar
      mm: hwpoison: support recovery from ksm_might_need_to_copy() · 6b970599
      Kefeng Wang authored
      When the kernel copies a page from ksm_might_need_to_copy(), but runs into
      an uncorrectable error, it will crash since poisoned page is consumed by
      kernel, this is similar to the issue recently fixed by Copy-on-write
      poison recovery.
      
      When an error is detected during the page copy, return VM_FAULT_HWPOISON
      in do_swap_page(), and install a hwpoison entry in unuse_pte() when
      swapoff, which help us to avoid system crash.  Note, memory failure on a
      KSM page will be skipped, but still call memory_failure_queue() to be
      consistent with general memory failure process, and we could support KSM
      page recovery in the feature.
      
      [wangkefeng.wang@huawei.com: enhance unuse_pte(), fix issue found by lkp]
        Link: https://lkml.kernel.org/r/20221213120523.141588-1-wangkefeng.wang@huawei.com
      [wangkefeng.wang@huawei.com: update changelog, alter ksm_might_need_to_copy(), restore unlikely() in unuse_pte()]
        Link: https://lkml.kernel.org/r/20230201074433.96641-1-wangkefeng.wang@huawei.com
      Link: https://lkml.kernel.org/r/20221209072801.193221-1-wangkefeng.wang@huawei.com
      
      
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6b970599
    • Christophe Leroy's avatar
      kasan: fix Oops due to missing calls to kasan_arch_is_ready() · 55d77bae
      Christophe Leroy authored
      On powerpc64, you can build a kernel with KASAN as soon as you build it
      with RADIX MMU support.  However if the CPU doesn't have RADIX MMU, KASAN
      isn't enabled at init and the following Oops is encountered.
      
        [    0.000000][    T0] KASAN not enabled as it requires radix!
      
        [    4.484295][   T26] BUG: Unable to handle kernel data access at 0xc00e000000804a04
        [    4.485270][   T26] Faulting instruction address: 0xc00000000062ec6c
        [    4.485748][   T26] Oops: Kernel access of bad area, sig: 11 [#1]
        [    4.485920][   T26] BE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
        [    4.486259][   T26] Modules linked in:
        [    4.486637][   T26] CPU: 0 PID: 26 Comm: kworker/u2:2 Not tainted 6.2.0-rc3-02590-gf8a023b0a805 #249
        [    4.486907][   T26] Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1200 0xf000005 of:SLOF,HEAD pSeries
        [    4.487445][   T26] Workqueue: eval_map_wq .tracer_init_tracefs_work_func
        [    4.488744][   T26] NIP:  c00000000062ec6c LR: c00000000062bb84 CTR: c0000000002ebcd0
        [    4.488867][   T26] REGS: c0000000049175c0 TRAP: 0380   Not tainted  (6.2.0-rc3-02590-gf8a023b0a805)
        [    4.489028][   T26] MSR:  8000000002009032 <SF,VEC,EE,ME,IR,DR,RI>  CR: 44002808  XER: 00000000
        [    4.489584][   T26] CFAR: c00000000062bb80 IRQMASK: 0
        [    4.489584][   T26] GPR00: c0000000005624d4 c000000004917860 c000000001cfc000 1800000000804a04
        [    4.489584][   T26] GPR04: c0000000003a2650 0000000000000cc0 c00000000000d3d8 c00000000000d3d8
        [    4.489584][   T26] GPR08: c0000000049175b0 a80e000000000000 0000000000000000 0000000017d78400
        [    4.489584][   T26] GPR12: 0000000044002204 c000000003790000 c00000000435003c c0000000043f1c40
        [    4.489584][   T26] GPR16: c0000000043f1c68 c0000000043501a0 c000000002106138 c0000000043f1c08
        [    4.489584][   T26] GPR20: c0000000043f1c10 c0000000043f1c20 c000000004146c40 c000000002fdb7f8
        [    4.489584][   T26] GPR24: c000000002fdb834 c000000003685e00 c000000004025030 c000000003522e90
        [    4.489584][   T26] GPR28: 0000000000000cc0 c0000000003a2650 c000000004025020 c000000004025020
        [    4.491201][   T26] NIP [c00000000062ec6c] .kasan_byte_accessible+0xc/0x20
        [    4.491430][   T26] LR [c00000000062bb84] .__kasan_check_byte+0x24/0x90
        [    4.491767][   T26] Call Trace:
        [    4.491941][   T26] [c000000004917860] [c00000000062ae70] .__kasan_kmalloc+0xc0/0x110 (unreliable)
        [    4.492270][   T26] [c0000000049178f0] [c0000000005624d4] .krealloc+0x54/0x1c0
        [    4.492453][   T26] [c000000004917990] [c0000000003a2650] .create_trace_option_files+0x280/0x530
        [    4.492613][   T26] [c000000004917a90] [c000000002050d90] .tracer_init_tracefs_work_func+0x274/0x2c0
        [    4.492771][   T26] [c000000004917b40] [c0000000001f9948] .process_one_work+0x578/0x9f0
        [    4.492927][   T26] [c000000004917c30] [c0000000001f9ebc] .worker_thread+0xfc/0x950
        [    4.493084][   T26] [c000000004917d60] [c00000000020be84] .kthread+0x1a4/0x1b0
        [    4.493232][   T26] [c000000004917e10] [c00000000000d3d8] .ret_from_kernel_thread+0x58/0x60
        [    4.495642][   T26] Code: 60000000 7cc802a6 38a00000 4bfffc78 60000000 7cc802a6 38a00001 4bfffc68 60000000 3d20a80e 7863e8c2 792907c6 <7c6348ae> 20630007 78630fe0 68630001
        [    4.496704][   T26] ---[ end trace 0000000000000000 ]---
      
      The Oops is due to kasan_byte_accessible() not checking the readiness of
      KASAN.  Add missing call to kasan_arch_is_ready() and bail out when not
      ready.  The same problem is observed with ____kasan_kfree_large() so fix
      it the same.
      
      Also, as KASAN is not available and no shadow area is allocated for linear
      memory mapping, there is no point in allocating shadow mem for vmalloc
      memory as shown below in /sys/kernel/debug/kernel_page_tables
      
        ---[ kasan shadow mem start ]---
        0xc00f000000000000-0xc00f00000006ffff  0x00000000040f0000       448K         r  w       pte  valid  present        dirty  accessed
        0xc00f000000860000-0xc00f00000086ffff  0x000000000ac10000        64K         r  w       pte  valid  present        dirty  accessed
        0xc00f3ffffffe0000-0xc00f3fffffffffff  0x0000000004d10000       128K         r  w       pte  valid  present        dirty  accessed
        ---[ kasan shadow mem end ]---
      
      So, also verify KASAN readiness before allocating and poisoning
      shadow mem for VMAs.
      
      Link: https://lkml.kernel.org/r/150768c55722311699fdcf8f5379e8256749f47d.1674716617.git.christophe.leroy@csgroup.eu
      Fixes: 41b7a347
      
       ("powerpc: Book3S 64-bit outline-only KASAN support")
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Reported-by: default avatarNathan Lynch <nathanl@linux.ibm.com>
      Suggested-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: <stable@vger.kernel.org>	[5.19+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      55d77bae
  3. Feb 04, 2023
  4. Feb 01, 2023
  5. Jan 20, 2023
  6. Jan 19, 2023
    • Peter Xu's avatar
      selftests/vm: remove __USE_GNU in hugetlb-madvise.c · 0ca2c535
      Peter Xu authored
      __USE_GNU should be an internal macro only used inside glibc.  Either
      memfd_create() or fallocate() requires _GNU_SOURCE per man page, where
      __USE_GNU will further be defined by glibc headers include/features.h:
      
        #ifdef _GNU_SOURCE
        # define __USE_GNU	1
        #endif
      
      This fixes:
      
         >> hugetlb-madvise.c:20: warning: "__USE_GNU" redefined
            20 | #define __USE_GNU
               |
         In file included from /usr/include/x86_64-linux-gnu/bits/libc-header-start.h:33,
                          from /usr/include/stdlib.h:26,
                          from hugetlb-madvise.c:16:
         /usr/include/features.h:407: note: this is the location of the previous definition
           407 | # define __USE_GNU      1
               |
      
      Link: https://lkml.kernel.org/r/Y8V9z+z6Tk7NetI3@x1n
      
      
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0ca2c535
    • Peter Xu's avatar
      mm: fix a few rare cases of using swapin error pte marker · 7e3ce3f8
      Peter Xu authored
      This patch should harden commit 15520a3f ("mm: use pte markers for
      swap errors") on using pte markers for swapin errors on a few corner
      cases.
      
      1. Propagate swapin errors across fork()s: if there're swapin errors in
         the parent mm, after fork()s the child should sigbus too when an error
         page is accessed.
      
      2. Fix a rare condition race in pte_marker_clear() where a uffd-wp pte
         marker can be quickly switched to a swapin error.
      
      3. Explicitly ignore swapin error pte markers in change_protection().
      
      I mostly don't worry on (2) or (3) at all, but we should still have them. 
      Case (1) is special because it can potentially cause silent data corrupt
      on child when parent has swapin error triggered with swapoff, but since
      swapin error is rare itself already it's probably not easy to trigger
      either.
      
      Currently there is a priority difference between the uffd-wp bit and the
      swapin error entry, in which the swapin error always has higher priority
      (e.g.  we don't need to wr-protect a swapin error pte marker).
      
      If there will be a 3rd bit introduced, we'll probably need to consider a
      more involved approach so we may need to start operate on the bits.  Let's
      leave that for later.
      
      This patch is tested with case (1) explicitly where we'll get corrupted
      data before in the child if there's existing swapin error pte markers, and
      after patch applied the child can be rightfully killed.
      
      We don't need to copy stable for this one since 15520a3f just landed
      as part of v6.2-rc1, only "Fixes" applied.
      
      Link: https://lkml.kernel.org/r/20221214200453.1772655-3-peterx@redhat.com
      Fixes: 15520a3f
      
       ("mm: use pte markers for swap errors")
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Pengfei Xu <pengfei.xu@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7e3ce3f8