Skip to content
  1. Mar 16, 2022
  2. Mar 15, 2022
  3. Mar 09, 2022
  4. Feb 28, 2022
    • Yu Kuai's avatar
      blktrace: fix use after free for struct blk_trace · 30939293
      Yu Kuai authored
      
      
      When tracing the whole disk, 'dropped' and 'msg' will be created
      under 'q->debugfs_dir' and 'bt->dir' is NULL, thus blk_trace_free()
      won't remove those files. What's worse, the following UAF can be
      triggered because of accessing stale 'dropped' and 'msg':
      
      ==================================================================
      BUG: KASAN: use-after-free in blk_dropped_read+0x89/0x100
      Read of size 4 at addr ffff88816912f3d8 by task blktrace/1188
      
      CPU: 27 PID: 1188 Comm: blktrace Not tainted 5.17.0-rc4-next-20220217+ #469
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-4
      Call Trace:
       <TASK>
       dump_stack_lvl+0x34/0x44
       print_address_description.constprop.0.cold+0xab/0x381
       ? blk_dropped_read+0x89/0x100
       ? blk_dropped_read+0x89/0x100
       kasan_report.cold+0x83/0xdf
       ? blk_dropped_read+0x89/0x100
       kasan_check_range+0x140/0x1b0
       blk_dropped_read+0x89/0x100
       ? blk_create_buf_file_callback+0x20/0x20
       ? kmem_cache_free+0xa1/0x500
       ? do_sys_openat2+0x258/0x460
       full_proxy_read+0x8f/0xc0
       vfs_read+0xc6/0x260
       ksys_read+0xb9/0x150
       ? vfs_write+0x3d0/0x3d0
       ? fpregs_assert_state_consistent+0x55/0x60
       ? exit_to_user_mode_prepare+0x39/0x1e0
       do_syscall_64+0x35/0x80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7fbc080d92fd
      Code: ce 20 00 00 75 10 b8 00 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 1
      RSP: 002b:00007fbb95ff9cb0 EFLAGS: 00000293 ORIG_RAX: 0000000000000000
      RAX: ffffffffffffffda RBX: 00007fbb95ff9dc0 RCX: 00007fbc080d92fd
      RDX: 0000000000000100 RSI: 00007fbb95ff9cc0 RDI: 0000000000000045
      RBP: 0000000000000045 R08: 0000000000406299 R09: 00000000fffffffd
      R10: 000000000153afa0 R11: 0000000000000293 R12: 00007fbb780008c0
      R13: 00007fbb78000938 R14: 0000000000608b30 R15: 00007fbb780029c8
       </TASK>
      
      Allocated by task 1050:
       kasan_save_stack+0x1e/0x40
       __kasan_kmalloc+0x81/0xa0
       do_blk_trace_setup+0xcb/0x410
       __blk_trace_setup+0xac/0x130
       blk_trace_ioctl+0xe9/0x1c0
       blkdev_ioctl+0xf1/0x390
       __x64_sys_ioctl+0xa5/0xe0
       do_syscall_64+0x35/0x80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Freed by task 1050:
       kasan_save_stack+0x1e/0x40
       kasan_set_track+0x21/0x30
       kasan_set_free_info+0x20/0x30
       __kasan_slab_free+0x103/0x180
       kfree+0x9a/0x4c0
       __blk_trace_remove+0x53/0x70
       blk_trace_ioctl+0x199/0x1c0
       blkdev_common_ioctl+0x5e9/0xb30
       blkdev_ioctl+0x1a5/0x390
       __x64_sys_ioctl+0xa5/0xe0
       do_syscall_64+0x35/0x80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      The buggy address belongs to the object at ffff88816912f380
       which belongs to the cache kmalloc-96 of size 96
      The buggy address is located 88 bytes inside of
       96-byte region [ffff88816912f380, ffff88816912f3e0)
      The buggy address belongs to the page:
      page:000000009a1b4e7c refcount:1 mapcount:0 mapping:0000000000000000 index:0x0f
      flags: 0x17ffffc0000200(slab|node=0|zone=2|lastcpupid=0x1fffff)
      raw: 0017ffffc0000200 ffffea00044f1100 dead000000000002 ffff88810004c780
      raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff88816912f280: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
       ffff88816912f300: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
      >ffff88816912f380: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
                                                          ^
       ffff88816912f400: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
       ffff88816912f480: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
      ==================================================================
      
      Fixes: c0ea5760 ("blktrace: remove debugfs file dentries from struct blk_trace")
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Link: https://lore.kernel.org/r/20220228034354.4047385-1-yukuai3@huawei.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      30939293
  5. Feb 24, 2022
  6. Feb 23, 2022
  7. Feb 22, 2022
  8. Feb 17, 2022
  9. Feb 12, 2022
  10. Feb 11, 2022
  11. Feb 10, 2022
  12. Feb 09, 2022
  13. Feb 04, 2022
  14. Feb 03, 2022
    • Uday Shankar's avatar
      nvme-fabrics: fix state check in nvmf_ctlr_matches_baseopts() · 6a51abde
      Uday Shankar authored
      
      
      Controller deletion/reset, immediately followed by or concurrent with
      a reconnect, is hard failing the connect attempt resulting in a
      complete loss of connectivity to the controller.
      
      In the connect request, fabrics looks for an existing controller with
      the same address components and aborts the connect if a controller
      already exists and the duplicate connect option isn't set. The match
      routine filters out controllers that are dead or dying, so they don't
      interfere with the new connect request.
      
      When NVME_CTRL_DELETING_NOIO was added, it missed updating the state
      filters in the nvmf_ctlr_matches_baseopts() routine. Thus, when in this
      new state, it's seen as a live controller and fails the connect request.
      
      Correct by adding the DELETING_NIO state to the match checks.
      
      Fixes: ecca390e ("nvme: fix deadlock in disconnect during scan_work and/or ana_work")
      Cc: <stable@vger.kernel.org> # v5.7+
      Signed-off-by: default avatarUday Shankar <ushankar@purestorage.com>
      Reviewed-by: default avatarJames Smart <jsmart2021@gmail.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      6a51abde
    • Song Liu's avatar
      md: fix NULL pointer deref with nowait but no mddev->queue · 0f9650bd
      Song Liu authored
      
      
      Leon reported NULL pointer deref with nowait support:
      
      [   15.123761] device-mapper: raid: Loading target version 1.15.1
      [   15.124185] device-mapper: raid: Ignoring chunk size parameter for RAID 1
      [   15.124192] device-mapper: raid: Choosing default region size of 4MiB
      [   15.129524] BUG: kernel NULL pointer dereference, address: 0000000000000060
      [   15.129530] #PF: supervisor write access in kernel mode
      [   15.129533] #PF: error_code(0x0002) - not-present page
      [   15.129535] PGD 0 P4D 0
      [   15.129538] Oops: 0002 [#1] PREEMPT SMP NOPTI
      [   15.129541] CPU: 5 PID: 494 Comm: ldmtool Not tainted 5.17.0-rc2-1-mainline #1 9fe89d43dfcb215d2731e6f8851740520778615e
      [   15.129546] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS ELITE/X570 AORUS ELITE, BIOS F36e 10/14/2021
      [   15.129549] RIP: 0010:blk_queue_flag_set+0x7/0x20
      [   15.129555] Code: 00 00 00 0f 1f 44 00 00 48 8b 35 e4 e0 04 02 48 8d 57 28 bf 40 01 \
             00 00 e9 16 c1 be ff 66 0f 1f 44 00 00 0f 1f 44 00 00 89 ff <f0> 48 0f ab 7e 60 \
             31 f6 89 f7 c3 66 66 2e 0f 1f 84 00 00 00 00 00
      [   15.129559] RSP: 0018:ffff966b81987a88 EFLAGS: 00010202
      [   15.129562] RAX: ffff8b11c363a0d0 RBX: ffff8b11e294b070 RCX: 0000000000000000
      [   15.129564] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000001d
      [   15.129566] RBP: ffff8b11e294b058 R08: 0000000000000000 R09: 0000000000000000
      [   15.129568] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b11e294b070
      [   15.129570] R13: 0000000000000000 R14: ffff8b11e294b000 R15: 0000000000000001
      [   15.129572] FS:  00007fa96e826780(0000) GS:ffff8b18deb40000(0000) knlGS:0000000000000000
      [   15.129575] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   15.129577] CR2: 0000000000000060 CR3: 000000010b8ce000 CR4: 00000000003506e0
      [   15.129580] Call Trace:
      [   15.129582]  <TASK>
      [   15.129584]  md_run+0x67c/0xc70 [md_mod 1e470c1b6bcf1114198109f42682f5a2740e9531]
      [   15.129597]  raid_ctr+0x134a/0x28ea [dm_raid 6a645dd7519e72834bd7e98c23497eeade14cd63]
      [   15.129604]  ? dm_split_args+0x63/0x150 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e]
      [   15.129615]  dm_table_add_target+0x188/0x380 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e]
      [   15.129625]  table_load+0x13b/0x370 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e]
      [   15.129635]  ? dev_suspend+0x2d0/0x2d0 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e]
      [   15.129644]  ctl_ioctl+0x1bd/0x460 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e]
      [   15.129655]  dm_ctl_ioctl+0xa/0x20 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e]
      [   15.129663]  __x64_sys_ioctl+0x8e/0xd0
      [   15.129667]  do_syscall_64+0x5c/0x90
      [   15.129672]  ? syscall_exit_to_user_mode+0x23/0x50
      [   15.129675]  ? do_syscall_64+0x69/0x90
      [   15.129677]  ? do_syscall_64+0x69/0x90
      [   15.129679]  ? syscall_exit_to_user_mode+0x23/0x50
      [   15.129682]  ? do_syscall_64+0x69/0x90
      [   15.129684]  ? do_syscall_64+0x69/0x90
      [   15.129686]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [   15.129689] RIP: 0033:0x7fa96ecd559b
      [   15.129692] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c \
          c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff \
          ff 73 01 c3 48 8b 0d a5 a8 0c 00 f7 d8 64 89 01 48
      [   15.129696] RSP: 002b:00007ffcaf85c258 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
      [   15.129699] RAX: ffffffffffffffda RBX: 00007fa96f1b48f0 RCX: 00007fa96ecd559b
      [   15.129701] RDX: 00007fa97017e610 RSI: 00000000c138fd09 RDI: 0000000000000003
      [   15.129702] RBP: 00007fa96ebab583 R08: 00007fa97017c9e0 R09: 00007ffcaf85bf27
      [   15.129704] R10: 0000000000000001 R11: 0000000000000206 R12: 00007fa97017e610
      [   15.129706] R13: 00007fa97017e640 R14: 00007fa97017e6c0 R15: 00007fa97017e530
      [   15.129709]  </TASK>
      
      This is caused by missing mddev->queue check for setting QUEUE_FLAG_NOWAIT
      Fix this by moving the QUEUE_FLAG_NOWAIT logic to under mddev->queue check.
      
      Fixes: f51d46d0 ("md: add support for REQ_NOWAIT")
      Reported-by: default avatarLeon Möller <jkhsjdhjs@totally.rip>
      Tested-by: default avatarLeon Möller <jkhsjdhjs@totally.rip>
      Cc: Vishal Verma <vverma@digitalocean.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      0f9650bd
  15. Feb 02, 2022
    • Ilya Dryomov's avatar
      block: fix DIO handling regressions in blkdev_read_iter() · 3e1f941d
      Ilya Dryomov authored
      
      
      Commit ceaa7625 ("block: move direct_IO into our own read_iter
      handler") introduced several regressions for bdev DIO:
      
      1. read spanning EOF always returns 0 instead of the number of bytes
         read.  This is because "count" is assigned early and isn't updated
         when the iterator is truncated:
      
           $ lsblk -o name,size /dev/vdb
           NAME SIZE
           vdb    1G
           $ xfs_io -d -c 'pread -b 4M 1021M 4M' /dev/vdb
           read 0/4194304 bytes at offset 1070596096
           0.000000 bytes, 0 ops; 0.0007 sec (0.000000 bytes/sec and 0.0000 ops/sec)
      
           instead of
      
           $ xfs_io -d -c 'pread -b 4M 1021M 4M' /dev/vdb
           read 3145728/4194304 bytes at offset 1070596096
           3 MiB, 1 ops; 0.0007 sec (3.865 GiB/sec and 1319.2612 ops/sec)
      
      2. truncated iterator isn't reexpanded
      3. iterator isn't reverted on blkdev_direct_IO() error
      4. zero size read no longer skips atime update
      
      Fixes: ceaa7625 ("block: move direct_IO into our own read_iter handler")
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20220201100420.25875-1-idryomov@gmail.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3e1f941d
    • Sagi Grimberg's avatar
      nvme-rdma: fix possible use-after-free in transport error_recovery work · b6bb1722
      Sagi Grimberg authored
      
      
      While nvme_rdma_submit_async_event_work is checking the ctrl and queue
      state before preparing the AER command and scheduling io_work, in order
      to fully prevent a race where this check is not reliable the error
      recovery work must flush async_event_work before continuing to destroy
      the admin queue after setting the ctrl state to RESETTING such that
      there is no race .submit_async_event and the error recovery handler
      itself changing the ctrl state.
      
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      b6bb1722
    • Sagi Grimberg's avatar
      nvme-tcp: fix possible use-after-free in transport error_recovery work · ff9fc7eb
      Sagi Grimberg authored
      
      
      While nvme_tcp_submit_async_event_work is checking the ctrl and queue
      state before preparing the AER command and scheduling io_work, in order
      to fully prevent a race where this check is not reliable the error
      recovery work must flush async_event_work before continuing to destroy
      the admin queue after setting the ctrl state to RESETTING such that
      there is no race .submit_async_event and the error recovery handler
      itself changing the ctrl state.
      
      Tested-by: default avatarChris Leech <cleech@redhat.com>
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      ff9fc7eb
    • Sagi Grimberg's avatar
      nvme: fix a possible use-after-free in controller reset during load · 0fa0f99f
      Sagi Grimberg authored
      
      
      Unlike .queue_rq, in .submit_async_event drivers may not check the ctrl
      readiness for AER submission. This may lead to a use-after-free
      condition that was observed with nvme-tcp.
      
      The race condition may happen in the following scenario:
      1. driver executes its reset_ctrl_work
      2. -> nvme_stop_ctrl - flushes ctrl async_event_work
      3. ctrl sends AEN which is received by the host, which in turn
         schedules AEN handling
      4. teardown admin queue (which releases the queue socket)
      5. AEN processed, submits another AER, calling the driver to submit
      6. driver attempts to send the cmd
      ==> use-after-free
      
      In order to fix that, add ctrl state check to validate the ctrl
      is actually able to accept the AER submission.
      
      This addresses the above race in controller resets because the driver
      during teardown should:
      1. change ctrl state to RESETTING
      2. flush async_event_work (as well as other async work elements)
      
      So after 1,2, any other AER command will find the
      ctrl state to be RESETTING and bail out without submitting the AER.
      
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      0fa0f99f
  16. Jan 29, 2022
  17. Jan 28, 2022
  18. Jan 27, 2022
  19. Jan 26, 2022
  20. Jan 24, 2022
  21. Jan 23, 2022
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · dd81e1c7
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
      
       - A series of bpf fixes, including an oops fix and some codegen fixes.
      
       - Fix a regression in syscall_get_arch() for compat processes.
      
       - Fix boot failure on some 32-bit systems with KASAN enabled.
      
       - A couple of other build/minor fixes.
      
      Thanks to Athira Rajeev, Christophe Leroy, Dmitry V. Levin, Jiri Olsa,
      Johan Almbladh, Maxime Bizon, Naveen N. Rao, and Nicholas Piggin.
      
      * tag 'powerpc-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/64s: Mask SRR0 before checking against the masked NIP
        powerpc/perf: Only define power_pmu_wants_prompt_pmi() for CONFIG_PPC64
        powerpc/32s: Fix kasan_init_region() for KASAN
        powerpc/time: Fix build failure due to do_hard_irq_enable() on PPC32
        powerpc/audit: Fix syscall_get_arch()
        powerpc64/bpf: Limit 'ldbrx' to processors compliant with ISA v2.06
        tools/bpf: Rename 'struct event' to avoid naming conflict
        powerpc/bpf: Update ldimm64 instructions during extra pass
        powerpc32/bpf: Fix codegen for bpf-to-bpf calls
        bpf: Guard against accessing NULL pt_regs in bpf_get_task_stack()
      dd81e1c7
    • Linus Torvalds's avatar
      Merge tag 'irq_urgent_for_v5.17_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ac5a9bb6
      Linus Torvalds authored
      Pull irq fix from Borislav Petkov:
       "A single use-after-free fix in the PCI MSI irq domain allocation path"
      
      * tag 'irq_urgent_for_v5.17_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        PCI/MSI: Prevent UAF in error path
      ac5a9bb6
    • Linus Torvalds's avatar
      Merge tag 'sched_urgent_for_v5.17_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 10c64a0f
      Linus Torvalds authored
      Pull scheduler fixes from Borislav Petkov:
       "A bunch of fixes: forced idle time accounting, utilization values
        propagation in the sched hierarchies and other minor cleanups and
        improvements"
      
      * tag 'sched_urgent_for_v5.17_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        kernel/sched: Remove dl_boosted flag comment
        sched: Avoid double preemption in __cond_resched_*lock*()
        sched/fair: Fix all kernel-doc warnings
        sched/core: Accounting forceidle time for all tasks except idle task
        sched/pelt: Relax the sync of load_sum with load_avg
        sched/pelt: Relax the sync of runnable_sum with runnable_avg
        sched/pelt: Continue to relax the sync of util_sum with util_avg
        sched/pelt: Relax the sync of util_sum with util_avg
        psi: Fix uaf issue when psi trigger is destroyed while being polled
      10c64a0f