Skip to content
  1. Feb 10, 2021
  2. Feb 08, 2021
  3. Feb 05, 2021
  4. Feb 04, 2021
  5. Feb 02, 2021
    • Pavel Begunkov's avatar
      io_uring: simplify do_read return parsing · 57cd657b
      Pavel Begunkov authored
      
      
      do_read() returning 0 bytes read (not -EAGAIN/etc.) is not an important
      enough of a case to prioritise it. Fold it into ret < 0 check, so we get
      rid of an extra if and make it a bit more readable.
      
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      57cd657b
    • Pavel Begunkov's avatar
      io_uring: deduplicate adding to REQ_F_INFLIGHT · ce3d5aae
      Pavel Begunkov authored
      
      
      We don't know for how long REQ_F_INFLIGHT is going to stay, cleaner to
      extract a helper for marking requests as so.
      
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ce3d5aae
    • Pavel Begunkov's avatar
      io_uring: remove work flags after cleanup · e86d0047
      Pavel Begunkov authored
      
      
      Shouldn't be a problem now, but it's better to clean
      REQ_F_WORK_INITIALIZED and work->flags only after relevant resources are
      killed, so cancellation see them.
      
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e86d0047
    • Pavel Begunkov's avatar
      io_uring: inline io_req_drop_files() · 34e08fed
      Pavel Begunkov authored
      
      
      req->files now have same lifetime as all other iowq-work resources,
      inline io_req_drop_files() for consistency. Moreover, since
      REQ_F_INFLIGHT is no more files specific, the function name became
      very confusing.
      
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      34e08fed
    • Pavel Begunkov's avatar
      io_uring: kill not used needs_file_no_error · ba13e23f
      Pavel Begunkov authored
      
      
      We have no request types left using needs_file_no_error, remove it.
      
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ba13e23f
    • Pavel Begunkov's avatar
      io_uring: fix inconsistent lock state · 9ae1f8dd
      Pavel Begunkov authored
      
      
      WARNING: inconsistent lock state
      
      inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
      syz-executor217/8450 [HC1[1]:SC0[0]:HE0:SE1] takes:
      ffff888023d6e620 (&fs->lock){?.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:354 [inline]
      ffff888023d6e620 (&fs->lock){?.+.}-{2:2}, at: io_req_clean_work fs/io_uring.c:1398 [inline]
      ffff888023d6e620 (&fs->lock){?.+.}-{2:2}, at: io_dismantle_req+0x66f/0xf60 fs/io_uring.c:2029
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&fs->lock);
        <Interrupt>
          lock(&fs->lock);
      
       *** DEADLOCK ***
      
      1 lock held by syz-executor217/8450:
       #0: ffff88802417c3e8 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x1071/0x1f30 fs/io_uring.c:9442
      
      stack backtrace:
      CPU: 1 PID: 8450 Comm: syz-executor217 Not tainted 5.11.0-rc5-next-20210129-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <IRQ>
      [...]
       _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
       spin_lock include/linux/spinlock.h:354 [inline]
       io_req_clean_work fs/io_uring.c:1398 [inline]
       io_dismantle_req+0x66f/0xf60 fs/io_uring.c:2029
       __io_free_req+0x3d/0x2e0 fs/io_uring.c:2046
       io_free_req fs/io_uring.c:2269 [inline]
       io_double_put_req fs/io_uring.c:2392 [inline]
       io_put_req+0xf9/0x570 fs/io_uring.c:2388
       io_link_timeout_fn+0x30c/0x480 fs/io_uring.c:6497
       __run_hrtimer kernel/time/hrtimer.c:1519 [inline]
       __hrtimer_run_queues+0x609/0xe40 kernel/time/hrtimer.c:1583
       hrtimer_interrupt+0x334/0x940 kernel/time/hrtimer.c:1645
       local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1085 [inline]
       __sysvec_apic_timer_interrupt+0x146/0x540 arch/x86/kernel/apic/apic.c:1102
       asm_call_irq_on_stack+0xf/0x20
       </IRQ>
       __run_sysvec_on_irqstack arch/x86/include/asm/irq_stack.h:37 [inline]
       run_sysvec_on_irqstack_cond arch/x86/include/asm/irq_stack.h:89 [inline]
       sysvec_apic_timer_interrupt+0xbd/0x100 arch/x86/kernel/apic/apic.c:1096
       asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:629
      RIP: 0010:__raw_spin_unlock_irq include/linux/spinlock_api_smp.h:169 [inline]
      RIP: 0010:_raw_spin_unlock_irq+0x25/0x40 kernel/locking/spinlock.c:199
       spin_unlock_irq include/linux/spinlock.h:404 [inline]
       io_queue_linked_timeout+0x194/0x1f0 fs/io_uring.c:6525
       __io_queue_sqe+0x328/0x1290 fs/io_uring.c:6594
       io_queue_sqe+0x631/0x10d0 fs/io_uring.c:6639
       io_queue_link_head fs/io_uring.c:6650 [inline]
       io_submit_sqe fs/io_uring.c:6697 [inline]
       io_submit_sqes+0x19b5/0x2720 fs/io_uring.c:6960
       __do_sys_io_uring_enter+0x107d/0x1f30 fs/io_uring.c:9443
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Don't free requests from under hrtimer context (softirq) as it may sleep
      or take spinlocks improperly (e.g. non-irq versions).
      
      Cc: stable@vger.kernel.org # 5.6+
      Reported-by: default avatar <syzbot+81d17233a2b02eafba33@syzkaller.appspotmail.com>
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9ae1f8dd
    • Dan Carpenter's avatar
      io_uring: Fix NULL dereference in error in io_sqe_files_register() · 13770a71
      Dan Carpenter authored
      If we hit a "goto out_free;" before the "ctx->file_data" pointer has
      been assigned then it leads to a NULL derefence when we call:
      
      	free_fixed_rsrc_data(ctx->file_data);
      
      We can fix this by moving the assignment earlier.
      
      Fixes: 1ad555c6
      
       ("io_uring: create common fixed_rsrc_data allocation routines")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      13770a71
    • Hao Xu's avatar
      io_uring: check kthread parked flag before sqthread goes to sleep · 8b28fdf2
      Hao Xu authored
      
      
      Abaci reported this issue:
      
      #[  605.170872] INFO: task kworker/u4:1:53 blocked for more than 143 seconds.
      [  605.172123]       Not tainted 5.10.0+ #1
      [  605.172811] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  605.173915] task:kworker/u4:1    state:D stack:    0 pid:   53 ppid:     2 flags:0x00004000
      [  605.175130] Workqueue: events_unbound io_ring_exit_work
      [  605.175931] Call Trace:
      [  605.176334]  __schedule+0xe0e/0x25a0
      [  605.176971]  ? firmware_map_remove+0x1a1/0x1a1
      [  605.177631]  ? write_comp_data+0x2a/0x80
      [  605.178272]  schedule+0xd0/0x270
      [  605.178811]  schedule_timeout+0x6b6/0x940
      [  605.179415]  ? mark_lock.part.0+0xca/0x1420
      [  605.180062]  ? usleep_range+0x170/0x170
      [  605.180684]  ? wait_for_completion+0x16d/0x280
      [  605.181392]  ? mark_held_locks+0x9e/0xe0
      [  605.182079]  ? rwlock_bug.part.0+0x90/0x90
      [  605.182853]  ? lockdep_hardirqs_on_prepare+0x286/0x400
      [  605.183817]  wait_for_completion+0x175/0x280
      [  605.184713]  ? wait_for_completion_interruptible+0x340/0x340
      [  605.185611]  ? _raw_spin_unlock_irq+0x24/0x30
      [  605.186307]  ? migrate_swap_stop+0x9c0/0x9c0
      [  605.187046]  kthread_park+0x127/0x1c0
      [  605.187738]  io_sq_thread_stop+0xd5/0x530
      [  605.188459]  io_ring_exit_work+0xb1/0x970
      [  605.189207]  process_one_work+0x92c/0x1510
      [  605.189947]  ? pwq_dec_nr_in_flight+0x360/0x360
      [  605.190682]  ? rwlock_bug.part.0+0x90/0x90
      [  605.191430]  ? write_comp_data+0x2a/0x80
      [  605.192207]  worker_thread+0x9b/0xe20
      [  605.192900]  ? process_one_work+0x1510/0x1510
      [  605.193599]  kthread+0x353/0x460
      [  605.194154]  ? _raw_spin_unlock_irq+0x24/0x30
      [  605.194910]  ? kthread_create_on_node+0x100/0x100
      [  605.195821]  ret_from_fork+0x1f/0x30
      [  605.196605]
      [  605.196605] Showing all locks held in the system:
      [  605.197598] 1 lock held by khungtaskd/25:
      [  605.198301]  #0: ffffffff8b5f76a0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire.constprop.0+0x0/0x30
      [  605.199914] 3 locks held by kworker/u4:1/53:
      [  605.200609]  #0: ffff888100109938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x82a/0x1510
      [  605.202108]  #1: ffff888100e47dc0 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x85e/0x1510
      [  605.203681]  #2: ffff888116931870 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread_park.part.0+0x19/0x50
      [  605.205183] 3 locks held by systemd-journal/161:
      [  605.206037] 1 lock held by syslog-ng/254:
      [  605.206674] 2 locks held by agetty/311:
      [  605.207292]  #0: ffff888101097098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x27/0x80
      [  605.208715]  #1: ffffc900000332e8 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x222/0x1bb0
      [  605.210131] 2 locks held by bash/677:
      [  605.210723]  #0: ffff88810419a098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x27/0x80
      [  605.212105]  #1: ffffc900000512e8 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x222/0x1bb0
      [  605.213777]
      [  605.214151] =============================================
      
      I believe this is caused by the follow race:
      
      (ctx_list is empty now)
      => io_put_sq_data               |
      ==> kthread_park(sqd->thread);  |
      ====> set KTHREAD_SHOULD_PARK	|
      ====> wake_up_process(k)        | sq thread is running
      				|
      				|
      				| needs_sched is true since no ctx,
      				| so TASK_INTERRUPTIBLE set and schedule
      				| out then never wake up again
      				|
      ====> wait_for_completion	|
      	(stuck here)
      
      So check if sqthread gets park flag right before schedule().
      since ctx_list is always empty when this problem happens, here I put
      kthread_should_park() before setting the wakeup flag(ctx_list is empty
      so this for loop is fast), where is close enough to schedule(). The
      problem doesn't show again in my repro testing after this fix.
      
      Reported-by: default avatarAbaci <abaci@linux.alibaba.com>
      Signed-off-by: default avatarHao Xu <haoxu@linux.alibaba.com>
      Reviewed-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8b28fdf2
    • Pavel Begunkov's avatar
      MAINTAINERS: update io_uring section · 090da7d5
      Pavel Begunkov authored
      
      
      Add the missing kernel io_uring header, add Pavel as a reviewer, and
      exclude io_uring from the FILESYSTEMS section to avoid keep spamming Al
      (mainly) with bug reports, patches, etc.
      
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      090da7d5