Skip to content
  1. Mar 28, 2021
  2. Mar 27, 2021
    • Jens Axboe's avatar
      kernel: don't call do_exit() for PF_IO_WORKER threads · 10442994
      Jens Axboe authored
      
      
      Right now we're never calling get_signal() from PF_IO_WORKER threads, but
      in preparation for doing so, don't handle a fatal signal for them. The
      workers have state they need to cleanup when exiting, so just return
      instead of calling do_exit() on their behalf. The threads themselves will
      detect a fatal signal and do proper shutdown.
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      10442994
  3. Mar 26, 2021
    • Pavel Begunkov's avatar
      io_uring: maintain CQE order of a failed link · 90b87490
      Pavel Begunkov authored
      Arguably we want CQEs of linked requests be in a strict order of
      submission as it always was. Now if init of a request fails its CQE may
      be posted before all prior linked requests including the head of the
      link. Fix it by failing it last.
      
      Fixes: de59bc10
      
       ("io_uring: fail links more in io_submit_sqe()")
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/b7a96b05832e7ab23ad55f84092a2548c4a888b0.1616699075.git.asml.silence@gmail.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      90b87490
    • Jens Axboe's avatar
      io-wq: fix race around pending work on teardown · f5d2d23b
      Jens Axboe authored
      
      
      syzbot reports that it's triggering the warning condition on having
      pending work on shutdown:
      
      WARNING: CPU: 1 PID: 12346 at fs/io-wq.c:1061 io_wq_destroy fs/io-wq.c:1061 [inline]
      WARNING: CPU: 1 PID: 12346 at fs/io-wq.c:1061 io_wq_put+0x153/0x260 fs/io-wq.c:1072
      Modules linked in:
      CPU: 1 PID: 12346 Comm: syz-executor.5 Not tainted 5.12.0-rc2-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:io_wq_destroy fs/io-wq.c:1061 [inline]
      RIP: 0010:io_wq_put+0x153/0x260 fs/io-wq.c:1072
      Code: 8d e8 71 90 ea 01 49 89 c4 41 83 fc 40 7d 4f e8 33 4d 97 ff 42 80 7c 2d 00 00 0f 85 77 ff ff ff e9 7a ff ff ff e8 1d 4d 97 ff <0f> 0b eb b9 8d 6b ff 89 ee 09 de bf ff ff ff ff e8 18 51 97 ff 09
      RSP: 0018:ffffc90001ebfb08 EFLAGS: 00010293
      RAX: ffffffff81e16083 RBX: ffff888019038040 RCX: ffff88801e86b780
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000040
      RBP: 1ffff1100b2f8a80 R08: ffffffff81e15fce R09: ffffed100b2f8a82
      R10: ffffed100b2f8a82 R11: 0000000000000000 R12: 0000000000000000
      R13: dffffc0000000000 R14: ffff8880597c5400 R15: ffff888019038000
      FS:  00007f8dcd89c700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055e9a054e160 CR3: 000000001dfb8000 CR4: 00000000001506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       io_uring_clean_tctx+0x1b7/0x210 fs/io_uring.c:8802
       __io_uring_files_cancel+0x13c/0x170 fs/io_uring.c:8820
       io_uring_files_cancel include/linux/io_uring.h:47 [inline]
       do_exit+0x258/0x2340 kernel/exit.c:780
       do_group_exit+0x168/0x2d0 kernel/exit.c:922
       get_signal+0x1734/0x1ef0 kernel/signal.c:2773
       arch_do_signal_or_restart+0x3c/0x610 arch/x86/kernel/signal.c:811
       handle_signal_work kernel/entry/common.c:147 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
       exit_to_user_mode_prepare+0xac/0x1e0 kernel/entry/common.c:208
       __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline]
       syscall_exit_to_user_mode+0x48/0x180 kernel/entry/common.c:301
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x465f69
      
      which shouldn't happen, but seems to be possible due to a race on whether
      or not the io-wq manager sees a fatal signal first, or whether the io-wq
      workers do. If we race with queueing work and then send a fatal signal to
      the owning task, and the io-wq worker sees that before the manager sets
      IO_WQ_BIT_EXIT, then it's possible to have the worker exit and leave work
      behind.
      
      Just turn the WARN_ON_ONCE() into a cancelation condition instead.
      
      Reported-by: default avatar <syzbot+77a738a6bc947bf639ca@syzkaller.appspotmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f5d2d23b
  4. Mar 24, 2021
    • Pavel Begunkov's avatar
      io_uring: do ctx sqd ejection in a clear context · a185f1db
      Pavel Begunkov authored
      WARNING: CPU: 1 PID: 27907 at fs/io_uring.c:7147 io_sq_thread_park+0xb5/0xd0 fs/io_uring.c:7147
      CPU: 1 PID: 27907 Comm: iou-sqp-27905 Not tainted 5.12.0-rc4-syzkaller #0
      RIP: 0010:io_sq_thread_park+0xb5/0xd0 fs/io_uring.c:7147
      Call Trace:
       io_ring_ctx_wait_and_kill+0x214/0x700 fs/io_uring.c:8619
       io_uring_release+0x3e/0x50 fs/io_uring.c:8646
       __fput+0x288/0x920 fs/file_table.c:280
       task_work_run+0xdd/0x1a0 kernel/task_work.c:140
       io_run_task_work fs/io_uring.c:2238 [inline]
       io_run_task_work fs/io_uring.c:2228 [inline]
       io_uring_try_cancel_requests+0x8ec/0xc60 fs/io_uring.c:8770
       io_uring_cancel_sqpoll+0x1cf/0x290 fs/io_uring.c:8974
       io_sqpoll_cancel_cb+0x87/0xb0 fs/io_uring.c:8907
       io_run_task_work_head+0x58/0xb0 fs/io_uring.c:1961
       io_sq_thread+0x3e2/0x18d0 fs/io_uring.c:6763
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
      
      May happen that last ctx ref is killed in io_uring_cancel_sqpoll(), so
      fput callback (i.e. io_uring_release()) is enqueued through task_work,
      and run by same cancellation. As it's deeply nested we can't do parking
      or taking sqd->lock there, because its state is unclear. So avoid
      ctx ejection from sqd list from io_ring_ctx_wait_and_kill() and do it
      in a clear context in io_ring_exit_work().
      
      Fixes: f6d54255
      
       ("io_uring: halt SQO submission on ctx exit")
      Reported-by: default avatar <syzbot+e3a3f84f5cecf61f0583@syzkaller.appspotmail.com>
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/e90df88b8ff2cabb14a7534601d35d62ab4cb8c7.1616496707.git.asml.silence@gmail.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a185f1db
  5. Mar 22, 2021
  6. Mar 21, 2021
  7. Mar 18, 2021
  8. Mar 15, 2021
  9. Mar 14, 2021
    • Jens Axboe's avatar
      io_uring: convert io_buffer_idr to XArray · 9e15c3a0
      Jens Axboe authored
      Like we did for the personality idr, convert the IO buffer idr to use
      XArray. This avoids a use-after-free on removal of entries, since idr
      doesn't like doing so from inside an iterator, and it nicely reduces
      the amount of code we need to support this feature.
      
      Fixes: 5a2e745d
      
       ("io_uring: buffer registration infrastructure")
      Cc: stable@vger.kernel.org
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: yangerkun <yangerkun@huawei.com>
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9e15c3a0
  10. Mar 13, 2021
    • Jens Axboe's avatar
      io_uring: allow IO worker threads to be frozen · 16efa4fc
      Jens Axboe authored
      
      
      With the freezer using the proper signaling to notify us of when it's
      time to freeze a thread, we can re-enable normal freezer usage for the
      IO threads. Ensure that SQPOLL, io-wq, and the io-wq manager call
      try_to_freeze() appropriately, and remove the default setting of
      PF_NOFREEZE from create_io_thread().
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      16efa4fc
    • Jens Axboe's avatar
      kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing · 15b2219f
      Jens Axboe authored
      
      
      Don't send fake signals to PF_IO_WORKER threads, they don't accept
      signals. Just treat them like kthreads in this regard, all they need
      is a wakeup as no forced kernel/user transition is needed.
      
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      15b2219f
    • Pavel Begunkov's avatar
      io_uring: fix OP_ASYNC_CANCEL across tasks · 58f99373
      Pavel Begunkov authored
      
      
      IORING_OP_ASYNC_CANCEL tries io-wq cancellation only for current task.
      If it fails go over tctx_list and try it out for every single tctx.
      
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      58f99373
    • Pavel Begunkov's avatar
      io_uring: cancel sqpoll via task_work · 521d6a73
      Pavel Begunkov authored
      
      
      1) The first problem is io_uring_cancel_sqpoll() ->
      io_uring_cancel_task_requests() basically doing park(); park(); and so
      hanging.
      
      2) Another one is more subtle, when the master task is doing cancellations,
      but SQPOLL task submits in-between the end of the cancellation but
      before finish() requests taking a ref to the ctx, and so eternally
      locking it up.
      
      3) Yet another is a dying SQPOLL task doing io_uring_cancel_sqpoll() and
      same io_uring_cancel_sqpoll() from the owner task, they race for
      tctx->wait events. And there probably more of them.
      
      Instead do SQPOLL cancellations from within SQPOLL task context via
      task_work, see io_sqpoll_cancel_sync(). With that we don't need temporal
      park()/unpark() during cancellation, which is ugly, subtle and anyway
      doesn't allow to do io_run_task_work() properly.
      
      io_uring_cancel_sqpoll() is called only from SQPOLL task context and
      under sqd locking, so all parking is removed from there. And so,
      io_sq_thread_[un]park() and io_sq_thread_stop() are not used now by
      SQPOLL task, and that spare us from some headache.
      
      Also remove ctx->sqd_list early to avoid 2). And kill tctx->sqpoll,
      which is not used anymore.
      
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      521d6a73
    • Pavel Begunkov's avatar
      io_uring: prevent racy sqd->thread checks · 26984fbf
      Pavel Begunkov authored
      
      
      SQPOLL thread to which we're trying to attach may be going away, it's
      not nice but a more serious problem is if io_sq_offload_create() sees
      sqd->thread==NULL, and tries to init it with a new thread. There are
      tons of ways it can be exploited or fail.
      
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      26984fbf
  11. Mar 12, 2021
  12. Mar 10, 2021
    • Pavel Begunkov's avatar
      io_uring: remove indirect ctx into sqo injection · 7d41e854
      Pavel Begunkov authored
      
      
      We use ->ctx_new_list to notify sqo about new ctx pending, then sqo
      should stop and splice it to its sqd->ctx_list, paired with
      ->sq_thread_comp.
      
      The last one is broken because nobody reinitialises it, and trying to
      fix it would only add more complexity and bugs. And the first isn't
      really needed as is done under park(), that protects from races well.
      Add ctx into sqd->ctx_list directly (under park()), it's much simpler
      and allows to kill both, ctx_new_list and sq_thread_comp.
      
      note: apparently there is no real problem at the moment, because
      sq_thread_comp is used only by io_sq_thread_finish() followed by
      parking, where list_del(&ctx->sqd_list) removes it well regardless
      whether it's in the new or the active list.
      
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7d41e854