Skip to content
  1. Mar 22, 2021
    • Pavel Begunkov's avatar
      io_uring: correct io_queue_async_work() traces · d07f1e8a
      Pavel Begunkov authored
      
      
      Request's io-wq work is hashed in io_prep_async_link(), so
      as trace_io_uring_queue_async_work() looks at it should follow after
      prep has been done.
      
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/709c9f872f4d2e198c7aed9c49019ca7095dd24d.1616366969.git.asml.silence@gmail.com
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d07f1e8a
    • Jens Axboe's avatar
      io_uring: don't use {test,clear}_tsk_thread_flag() for current · 0b8cfa97
      Jens Axboe authored
      
      
      Linus correctly points out that this is both unnecessary and generates
      much worse code on some archs as going from current to thread_info is
      actually backwards - and obviously just wasteful, since the thread_info
      is what we care about.
      
      Since io_uring only operates on current for these operations, just use
      test_thread_flag() instead. For io-wq, we can further simplify and use
      tracehook_notify_signal() to handle the TIF_NOTIFY_SIGNAL work and clear
      the flag. The latter isn't an actual bug right now, but it may very well
      be in the future if we place other work items under TIF_NOTIFY_SIGNAL.
      
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Link: https://lore.kernel.org/io-uring/CAHk-=wgYhNck33YHKZ14mFB5MzTTk8gqXHcfj=RWTAXKwgQJgg@mail.gmail.com/
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0b8cfa97
  2. Mar 21, 2021
    • Stefan Metzmacher's avatar
      io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL · 0031275d
      Stefan Metzmacher authored
      
      
      Without that it's not safe to use them in a linked combination with
      others.
      
      Now combinations like IORING_OP_SENDMSG followed by IORING_OP_SPLICE
      should be possible.
      
      We already handle short reads and writes for the following opcodes:
      
      - IORING_OP_READV
      - IORING_OP_READ_FIXED
      - IORING_OP_READ
      - IORING_OP_WRITEV
      - IORING_OP_WRITE_FIXED
      - IORING_OP_WRITE
      - IORING_OP_SPLICE
      - IORING_OP_TEE
      
      Now we have it for these as well:
      
      - IORING_OP_SENDMSG
      - IORING_OP_SEND
      - IORING_OP_RECVMSG
      - IORING_OP_RECV
      
      For IORING_OP_RECVMSG we also check for the MSG_TRUNC and MSG_CTRUNC
      flags in order to call req_set_fail_links().
      
      There might be applications arround depending on the behavior
      that even short send[msg]()/recv[msg]() retuns continue an
      IOSQE_IO_LINK chain.
      
      It's very unlikely that such applications pass in MSG_WAITALL,
      which is only defined in 'man 2 recvmsg', but not in 'man 2 sendmsg'.
      
      It's expected that the low level sock_sendmsg() call just ignores
      MSG_WAITALL, as MSG_ZEROCOPY is also ignored without explicitly set
      SO_ZEROCOPY.
      
      We also expect the caller to know about the implicit truncation to
      MAX_RW_COUNT, which we don't detect.
      
      cc: netdev@vger.kernel.org
      Link: https://lore.kernel.org/r/c4e1a4cc0d905314f4d5dc567e65a7b09621aab3.1615908477.git.metze@samba.org
      Signed-off-by: default avatarStefan Metzmacher <metze@samba.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0031275d
    • Jens Axboe's avatar
      io-wq: ensure task is running before processing task_work · 00ddff43
      Jens Axboe authored
      Mark the current task as running if we need to run task_work from the
      io-wq threads as part of work handling. If that is the case, then return
      as such so that the caller can appropriately loop back and reset if it
      was part of a going-to-sleep flush.
      
      Fixes: 3bfe6106
      
       ("io-wq: fork worker threads from original task")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      00ddff43
    • Eric W. Biederman's avatar
      signal: don't allow STOP on PF_IO_WORKER threads · 4db4b1a0
      Eric W. Biederman authored
      
      
      Just like we don't allow normal signals to IO threads, don't deliver a
      STOP to a task that has PF_IO_WORKER set. The IO threads don't take
      signals in general, and have no means of flushing out a stop either.
      
      Longer term, we may want to look into allowing stop of these threads,
      as it relates to eg process freezing. For now, this prevents a spin
      issue if a SIGSTOP is delivered to the parent task.
      
      Reported-by: default avatarStefan Metzmacher <metze@samba.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      4db4b1a0
    • Jens Axboe's avatar
      signal: don't allow sending any signals to PF_IO_WORKER threads · 5be28c8f
      Jens Axboe authored
      
      
      They don't take signals individually, and even if they share signals with
      the parent task, don't allow them to be delivered through the worker
      thread. Linux does allow this kind of behavior for regular threads, but
      it's really a compatability thing that we need not care about for the IO
      threads.
      
      Reported-by: default avatarStefan Metzmacher <metze@samba.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5be28c8f
  3. Mar 18, 2021
  4. Mar 15, 2021
  5. Mar 14, 2021
    • Jens Axboe's avatar
      io_uring: convert io_buffer_idr to XArray · 9e15c3a0
      Jens Axboe authored
      Like we did for the personality idr, convert the IO buffer idr to use
      XArray. This avoids a use-after-free on removal of entries, since idr
      doesn't like doing so from inside an iterator, and it nicely reduces
      the amount of code we need to support this feature.
      
      Fixes: 5a2e745d
      
       ("io_uring: buffer registration infrastructure")
      Cc: stable@vger.kernel.org
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: yangerkun <yangerkun@huawei.com>
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9e15c3a0
  6. Mar 13, 2021
    • Jens Axboe's avatar
      io_uring: allow IO worker threads to be frozen · 16efa4fc
      Jens Axboe authored
      
      
      With the freezer using the proper signaling to notify us of when it's
      time to freeze a thread, we can re-enable normal freezer usage for the
      IO threads. Ensure that SQPOLL, io-wq, and the io-wq manager call
      try_to_freeze() appropriately, and remove the default setting of
      PF_NOFREEZE from create_io_thread().
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      16efa4fc
    • Jens Axboe's avatar
      kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing · 15b2219f
      Jens Axboe authored
      
      
      Don't send fake signals to PF_IO_WORKER threads, they don't accept
      signals. Just treat them like kthreads in this regard, all they need
      is a wakeup as no forced kernel/user transition is needed.
      
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      15b2219f
    • Pavel Begunkov's avatar
      io_uring: fix OP_ASYNC_CANCEL across tasks · 58f99373
      Pavel Begunkov authored
      
      
      IORING_OP_ASYNC_CANCEL tries io-wq cancellation only for current task.
      If it fails go over tctx_list and try it out for every single tctx.
      
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      58f99373
    • Pavel Begunkov's avatar
      io_uring: cancel sqpoll via task_work · 521d6a73
      Pavel Begunkov authored
      
      
      1) The first problem is io_uring_cancel_sqpoll() ->
      io_uring_cancel_task_requests() basically doing park(); park(); and so
      hanging.
      
      2) Another one is more subtle, when the master task is doing cancellations,
      but SQPOLL task submits in-between the end of the cancellation but
      before finish() requests taking a ref to the ctx, and so eternally
      locking it up.
      
      3) Yet another is a dying SQPOLL task doing io_uring_cancel_sqpoll() and
      same io_uring_cancel_sqpoll() from the owner task, they race for
      tctx->wait events. And there probably more of them.
      
      Instead do SQPOLL cancellations from within SQPOLL task context via
      task_work, see io_sqpoll_cancel_sync(). With that we don't need temporal
      park()/unpark() during cancellation, which is ugly, subtle and anyway
      doesn't allow to do io_run_task_work() properly.
      
      io_uring_cancel_sqpoll() is called only from SQPOLL task context and
      under sqd locking, so all parking is removed from there. And so,
      io_sq_thread_[un]park() and io_sq_thread_stop() are not used now by
      SQPOLL task, and that spare us from some headache.
      
      Also remove ctx->sqd_list early to avoid 2). And kill tctx->sqpoll,
      which is not used anymore.
      
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      521d6a73
    • Pavel Begunkov's avatar
      io_uring: prevent racy sqd->thread checks · 26984fbf
      Pavel Begunkov authored
      
      
      SQPOLL thread to which we're trying to attach may be going away, it's
      not nice but a more serious problem is if io_sq_offload_create() sees
      sqd->thread==NULL, and tries to init it with a new thread. There are
      tons of ways it can be exploited or fail.
      
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      26984fbf
  7. Mar 12, 2021
  8. Mar 10, 2021