Skip to content
  1. Jul 09, 2021
    • Jens Axboe's avatar
      io_uring: remove dead non-zero 'poll' check · 9ce85ef2
      Jens Axboe authored
      
      
      Colin reports that Coverity complains about checking for poll being
      non-zero after having dereferenced it multiple times. This is a valid
      complaint, and actually a leftover from back when this code was based
      on the aio poll code.
      
      Kill the redundant check.
      
      Link: https://lore.kernel.org/io-uring/fe70c532-e2a7-3722-58a1-0fa4e5c5ff2c@canonical.com/
      Reported-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9ce85ef2
    • Pavel Begunkov's avatar
      io_uring: mitigate unlikely iopoll lag · 8f487ef2
      Pavel Begunkov authored
      
      
      We have requests like IORING_OP_FILES_UPDATE that don't go through
      ->iopoll_list but get completed in place under ->uring_lock, and so
      after dropping the lock io_iopoll_check() should expect that some CQEs
      might have get completed in a meanwhile.
      
      Currently such events won't be accounted in @nr_events, and the loop
      will continue to poll even if there is enough of CQEs. It shouldn't be a
      problem as it's not likely to happen and so, but not nice either. Just
      return earlier in this case, it should be enough.
      
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/66ef932cc66a34e3771bbae04b2953a8058e9d05.1625747741.git.asml.silence@gmail.com
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8f487ef2
  2. Jul 08, 2021
  3. Jul 02, 2021
    • Pavel Begunkov's avatar
      io_uring: fix exiting io_req_task_work_add leaks · e09ee510
      Pavel Begunkov authored
      If one entered io_req_task_work_add() not seeing PF_EXITING, it will set
      a ->task_state bit and try task_work_add(), which may fail by that
      moment. If that happens the function would try to cancel the request.
      
      However, in a meanwhile there might come other io_req_task_work_add()
      callers, which will see the bit set and leave their requests in the
      list, which will never be executed.
      
      Don't propagate an error, but clear the bit first and then fallback
      all requests that we can splice from the list. The callback functions
      have to be able to deal with PF_EXITING, so poll and apoll was modified
      via changing io_poll_rewait().
      
      Fixes: 7cbf1722
      
       ("io_uring: provide FIFO ordering for task_work")
      Reported-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/060002f19f1fdbd130ba24aef818ea4d3080819b.1625142209.git.asml.silence@gmail.com
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e09ee510
    • Pavel Begunkov's avatar
      io_uring: simplify task_work func · 5b0a6acc
      Pavel Begunkov authored
      
      
      Since we don't really use req->task_work anymore, get rid of it together
      with the nasty ->func aliasing between ->io_task_work and ->task_work,
      and hide ->fallback_node inside of io_task_work.
      
      Also, as task_work is gone now, replace the callback type from
      task_work_func_t to a function taking io_kiocb to avoid casting and
      simplify code.
      
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5b0a6acc
    • Pavel Begunkov's avatar
      io_uring: fix stuck fallback reqs · 9011bf9a
      Pavel Begunkov authored
      
      
      When task_work_add() fails, we use ->exit_task_work to queue the work.
      That will be run only in the cancellation path, which happens either
      when the ctx is dying or one of tasks with inflight requests is exiting
      or executing. There is a good chance that such a request would just get
      stuck in the list potentially hodling a file, all io_uring rsrc
      recycling or some other resources. Nothing terrible, it'll go away at
      some point, but we don't want to lock them up for longer than needed.
      
      Replace that hand made ->exit_task_work with delayed_work + llist
      inspired by fput_many().
      
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9011bf9a
    • Linus Torvalds's avatar
      Merge tag 'for-5.14/io_uring-2021-06-30' of git://git.kernel.dk/linux-block · c288d9cd
      Linus Torvalds authored
      Pull io_uring updates from Jens Axboe:
      
       - Multi-queue iopoll improvement (Fam)
      
       - Allow configurable io-wq CPU masks (me)
      
       - renameat/linkat tightening (me)
      
       - poll re-arm improvement (Olivier)
      
       - SQPOLL race fix (Olivier)
      
       - Cancelation unification (Pavel)
      
       - SQPOLL cleanups (Pavel)
      
       - Enable file backed buffers for shmem/memfd (Pavel)
      
       - A ton of cleanups and performance improvements (Pavel)
      
       - Followup and misc fixes (Colin, Fam, Hao, Olivier)
      
      * tag 'for-5.14/io_uring-2021-06-30' of git://git.kernel.dk/linux-block: (83 commits)
        io_uring: code clean for kiocb_done()
        io_uring: spin in iopoll() only when reqs are in a single queue
        io_uring: pre-initialise some of req fields
        io_uring: refactor io_submit_flush_completions
        io_uring: optimise hot path restricted checks
        io_uring: remove not needed PF_EXITING check
        io_uring: mainstream sqpoll task_work running
        io_uring: refactor io_arm_poll_handler()
        io_uring: reduce latency by reissueing the operation
        io_uring: add IOPOLL and reserved field checks to IORING_OP_UNLINKAT
        io_uring: add IOPOLL and reserved field checks to IORING_OP_RENAMEAT
        io_uring: refactor io_openat2()
        io_uring: simplify struct io_uring_sqe layout
        io_uring: update sqe layout build checks
        io_uring: fix code style problems
        io_uring: refactor io_sq_thread()
        io_uring: don't change sqpoll creds if not needed
        io_uring: Create define to modify a SQPOLL parameter
        io_uring: Fix race condition when sqp thread goes to sleep
        io_uring: improve in tctx_task_work() resubmission
        ...
      c288d9cd
    • Linus Torvalds's avatar
      Merge tag 'fs_for_v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · 911a2997
      Linus Torvalds authored
      Pull misc fs updates from Jan Kara:
       "The new quotactl_fd() syscall (remake of quotactl_path() syscall that
        got introduced & disabled in 5.13 cycle), and couple of udf, reiserfs,
        isofs, and writeback fixes and cleanups"
      
      * tag 'fs_for_v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
        writeback: fix obtain a reference to a freeing memcg css
        quota: remove unnecessary oom message
        isofs: remove redundant continue statement
        quota: Wire up quotactl_fd syscall
        quota: Change quotactl_path() systcall to an fd-based one
        reiserfs: Remove unneed check in reiserfs_write_full_page()
        udf: Fix NULL pointer dereference in udf_symlink function
        reiserfs: add check for invalid 1st journal block
      911a2997
  4. Jul 01, 2021
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · a6ecc2a4
      Linus Torvalds authored
      Pull ext4 updates from Ted Ts'o:
       "In addition to bug fixes and cleanups, there are two new features for
        ext4 in 5.14:
      
         - Allow applications to poll on changes to
           /sys/fs/ext4/*/errors_count
      
         - Add the ioctl EXT4_IOC_CHECKPOINT which allows the journal to be
           checkpointed, truncated and discarded or zero'ed"
      
      * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (32 commits)
        jbd2: export jbd2_journal_[un]register_shrinker()
        ext4: notify sysfs on errors_count value change
        fs: remove bdev_try_to_free_page callback
        ext4: remove bdev_try_to_free_page() callback
        jbd2: simplify journal_clean_one_cp_list()
        jbd2,ext4: add a shrinker to release checkpointed buffers
        jbd2: remove redundant buffer io error checks
        jbd2: don't abort the journal when freeing buffers
        jbd2: ensure abort the journal if detect IO error when writing original buffer back
        jbd2: remove the out label in __jbd2_journal_remove_checkpoint()
        ext4: no need to verify new add extent block
        jbd2: clean up misleading comments for jbd2_fc_release_bufs
        ext4: add check to prevent attempting to resize an fs with sparse_super2
        ext4: consolidate checks for resize of bigalloc into ext4_resize_begin
        ext4: remove duplicate definition of ext4_xattr_ibody_inline_set()
        ext4: fsmap: fix the block/inode bitmap comment
        ext4: fix comment for s_hash_unsigned
        ext4: use local variable ei instead of EXT4_I() macro
        ext4: fix avefreec in find_group_orlov
        ext4: correct the cache_nr in tracepoint ext4_es_shrink_exit
        ...
      a6ecc2a4
    • Linus Torvalds's avatar
      Merge tag 'for-5.14/dm-changes' of... · 2cfa582b
      Linus Torvalds authored
      Merge tag 'for-5.14/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
      
      Pull device mapper updates from Mike Snitzer:
      
       - Various DM persistent-data library improvements and fixes that
         benefit both the DM thinp and cache targets.
      
       - A few small DM kcopyd efficiency improvements.
      
       - Significant zoned related block core, DM core and DM zoned target
         changes that culminate with adding zoned append emulation (which is
         required to properly fix DM crypt's zoned support).
      
       - Various DM writecache target changes that improve efficiency. Adds an
         optional "metadata_only" feature that only promotes bios flagged with
         REQ_META. But the most significant improvement is writecache's
         ability to pause writeback, for a confiurable time, if/when the
         working set is larger than the cache (and the cache is full) -- this
         ensures performance is no worse than the slower origin device.
      
      * tag 'for-5.14/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (35 commits)
        dm writecache: make writeback pause configurable
        dm writecache: pause writeback if cache full and origin being written directly
        dm io tracker: factor out IO tracker
        dm btree remove: assign new_root only when removal succeeds
        dm zone: fix dm_revalidate_zones() memory allocation
        dm ps io affinity: remove redundant continue statement
        dm writecache: add optional "metadata_only" parameter
        dm writecache: add "cleaner" and "max_age" to Documentation
        dm writecache: write at least 4k when committing
        dm writecache: flush origin device when writing and cache is full
        dm writecache: have ssd writeback wait if the kcopyd workqueue is busy
        dm writecache: use list_move instead of list_del/list_add in writecache_writeback()
        dm writecache: commit just one block, not a full page
        dm writecache: remove unused gfp_t argument from wc_add_block()
        dm crypt: Fix zoned block device support
        dm: introduce zone append emulation
        dm: rearrange core declarations for extended use from dm-zone.c
        block: introduce BIO_ZONE_WRITE_LOCKED bio flag
        block: introduce bio zone helpers
        block: improve handling of all zones reset operation
        ...
      2cfa582b
    • Linus Torvalds's avatar
      Merge tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next · dbe69e43
      Linus Torvalds authored
      Pull networking updates from Jakub Kicinski:
       "Core:
      
         - BPF:
            - add syscall program type and libbpf support for generating
              instructions and bindings for in-kernel BPF loaders (BPF loaders
              for BPF), this is a stepping stone for signed BPF programs
            - infrastructure to migrate TCP child sockets from one listener to
              another in the same reuseport group/map to improve flexibility
              of service hand-off/restart
            - add broadcast support to XDP redirect
      
         - allow bypass of the lockless qdisc to improving performance (for
           pktgen: +23% with one thread, +44% with 2 threads)
      
         - add a simpler version of "DO_ONCE()" which does not require jump
           labels, intended for slow-path usage
      
         - virtio/vsock: introduce SOCK_SEQPACKET support
      
         - add getsocketopt to retrieve netns cookie
      
         - ip: treat lowest address of a IPv4 subnet as ordinary unicast
           address allowing reclaiming of precious IPv4 addresses
      
         - ipv6: use prandom_u32() for ID generation
      
         - ip: add support for more flexible field selection for hashing
           across multi-path routes (w/ offload to mlxsw)
      
         - icmp: add support for extended RFC 8335 PROBE (ping)
      
         - seg6: add support for SRv6 End.DT46 behavior
      
         - mptcp:
            - DSS checksum support (RFC 8684) to detect middlebox meddling
            - support Connection-time 'C' flag
            - time stamping support
      
         - sctp: packetization Layer Path MTU Discovery (RFC 8899)
      
         - xfrm: speed up state addition with seq set
      
         - WiFi:
            - hidden AP discovery on 6 GHz and other HE 6 GHz improvements
            - aggregation handling improvements for some drivers
            - minstrel improvements for no-ack frames
            - deferred rate control for TXQs to improve reaction times
            - switch from round robin to virtual time-based airtime scheduler
      
         - add trace points:
            - tcp checksum errors
            - openvswitch - action execution, upcalls
            - socket errors via sk_error_report
      
        Device APIs:
      
         - devlink: add rate API for hierarchical control of max egress rate
           of virtual devices (VFs, SFs etc.)
      
         - don't require RCU read lock to be held around BPF hooks in NAPI
           context
      
         - page_pool: generic buffer recycling
      
        New hardware/drivers:
      
         - mobile:
            - iosm: PCIe Driver for Intel M.2 Modem
            - support for Qualcomm MSM8998 (ipa)
      
         - WiFi: Qualcomm QCN9074 and WCN6855 PCI devices
      
         - sparx5: Microchip SparX-5 family of Enterprise Ethernet switches
      
         - Mellanox BlueField Gigabit Ethernet (control NIC of the DPU)
      
         - NXP SJA1110 Automotive Ethernet 10-port switch
      
         - Qualcomm QCA8327 switch support (qca8k)
      
         - Mikrotik 10/25G NIC (atl1c)
      
        Driver changes:
      
         - ACPI support for some MDIO, MAC and PHY devices from Marvell and
           NXP (our first foray into MAC/PHY description via ACPI)
      
         - HW timestamping (PTP) support: bnxt_en, ice, sja1105, hns3, tja11xx
      
         - Mellanox/Nvidia NIC (mlx5)
            - NIC VF offload of L2 bridging
            - support IRQ distribution to Sub-functions
      
         - Marvell (prestera):
            - add flower and match all
            - devlink trap
            - link aggregation
      
         - Netronome (nfp): connection tracking offload
      
         - Intel 1GE (igc): add AF_XDP support
      
         - Marvell DPU (octeontx2): ingress ratelimit offload
      
         - Google vNIC (gve): new ring/descriptor format support
      
         - Qualcomm mobile (rmnet & ipa): inline checksum offload support
      
         - MediaTek WiFi (mt76)
            - mt7915 MSI support
            - mt7915 Tx status reporting
            - mt7915 thermal sensors support
            - mt7921 decapsulation offload
            - mt7921 enable runtime pm and deep sleep
      
         - Realtek WiFi (rtw88)
            - beacon filter support
            - Tx antenna path diversity support
            - firmware crash information via devcoredump
      
         - Qualcomm WiFi (wcn36xx)
            - Wake-on-WLAN support with magic packets and GTK rekeying
      
         - Micrel PHY (ksz886x/ksz8081): add cable test support"
      
      * tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2168 commits)
        tcp: change ICSK_CA_PRIV_SIZE definition
        tcp_yeah: check struct yeah size at compile time
        gve: DQO: Fix off by one in gve_rx_dqo()
        stmmac: intel: set PCI_D3hot in suspend
        stmmac: intel: Enable PHY WOL option in EHL
        net: stmmac: option to enable PHY WOL with PMT enabled
        net: say "local" instead of "static" addresses in ndo_dflt_fdb_{add,del}
        net: use netdev_info in ndo_dflt_fdb_{add,del}
        ptp: Set lookup cookie when creating a PTP PPS source.
        net: sock: add trace for socket errors
        net: sock: introduce sk_error_report
        net: dsa: replay the local bridge FDB entries pointing to the bridge dev too
        net: dsa: ensure during dsa_fdb_offload_notify that dev_hold and dev_put are on the same dev
        net: dsa: include fdb entries pointing to bridge in the host fdb list
        net: dsa: include bridge addresses which are local in the host fdb list
        net: dsa: sync static FDB entries on foreign interfaces to hardware
        net: dsa: install the host MDB and FDB entries in the master's RX filter
        net: dsa: reference count the FDB addresses at the cross-chip notifier level
        net: dsa: introduce a separate cross-chip notifier type for host FDBs
        net: dsa: reference count the MDB entries at the cross-chip notifier level
        ...
      dbe69e43
    • Linus Torvalds's avatar
      Merge tag 'sched-urgent-2021-06-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a6eaf385
      Linus Torvalds authored
      Pull scheduler fixes from Ingo Molnar:
      
       - Fix a small inconsistency (bug) in load tracking, caught by a new
         warning that several people reported.
      
       - Flip CONFIG_SCHED_CORE to default-disabled, and update the Kconfig
         help text.
      
      * tag 'sched-urgent-2021-06-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/core: Disable CONFIG_SCHED_CORE by default
        sched/fair: Ensure _sum and _avg values stay consistent
      a6eaf385
    • Linus Torvalds's avatar
      Merge tag 'microblaze-v5.14' of git://git.monstr.eu/linux-2.6-microblaze · f4cc74c9
      Linus Torvalds authored
      Pull microblaze updates from Michal Simek:
      
       - Remove unused PAGE_UP/DOWN macros
      
       - Fix trivial spelling mistake
      
      * tag 'microblaze-v5.14' of git://git.monstr.eu/linux-2.6-microblaze:
        arch: microblaze: Fix spelling mistake "vesion" -> "version"
        microblaze: Cleanup unused functions
      f4cc74c9
    • Linus Torvalds's avatar
      Merge tag 'safesetid-5.14' of git://github.com/micah-morton/linux · 92183137
      Linus Torvalds authored
      Pull SafeSetID update from Micah Morton:
       "One very minor code cleanup change that marks a variable as
        __initdata"
      
      * tag 'safesetid-5.14' of git://github.com/micah-morton/linux:
        LSM: SafeSetID: Mark safesetid_initialized as __initdata
      92183137
    • Linus Torvalds's avatar
      Merge tag 'Smack-for-5.14' of git://github.com/cschaufler/smack-next · 5c874a5b
      Linus Torvalds authored
      Pull smack updates from Casey Schaufler:
       "There is nothing more significant than an improvement to a byte count
        check in smackfs.
      
        All changes have been in next for weeks"
      
      * tag 'Smack-for-5.14' of git://github.com/cschaufler/smack-next:
        Smack: fix doc warning
        Revert "Smack: Handle io_uring kernel thread privileges"
        smackfs: restrict bytes count in smk_set_cipso()
        security/smack/: fix misspellings using codespell tool
      5c874a5b
    • Linus Torvalds's avatar
      Merge tag 'audit-pr-20210629' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit · 290fe0fa
      Linus Torvalds authored
      Pull audit updates from Paul Moore:
       "Another merge window, another small audit pull request.
      
        Four patches in total: one is cosmetic, one removes an unnecessary
        initialization, one renames some enum values to prevent name
        collisions, and one converts list_del()/list_add() to list_move().
      
        None of these are earth shattering and all pass the audit-testsuite
        tests while merging cleanly on top of your tree from earlier today"
      
      * tag 'audit-pr-20210629' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
        audit: remove unnecessary 'ret' initialization
        audit: remove trailing spaces and tabs
        audit: Use list_move instead of list_del/list_add
        audit: Rename enum audit_state constants to avoid AUDIT_DISABLED redefinition
        audit: add blank line after variable declarations
      290fe0fa
    • Linus Torvalds's avatar
      Merge tag 'selinux-pr-20210629' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux · 6bd344e5
      Linus Torvalds authored
      Pull SELinux updates from Paul Moore:
      
       - The slow_avc_audit() function is now non-blocking so we can remove
         the AVC_NONBLOCKING tricks; this also includes the 'flags' variant of
         avc_has_perm().
      
       - Use kmemdup() instead of kcalloc()+copy when copying parts of the
         SELinux policydb.
      
       - The InfiniBand device name is now passed by reference when possible
         in the SELinux code, removing a strncpy().
      
       - Minor cleanups including: constification of avtab function args,
         removal of useless LSM/XFRM function args, SELinux kdoc fixes, and
         removal of redundant assignments.
      
      * tag 'selinux-pr-20210629' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
        selinux: kill 'flags' argument in avc_has_perm_flags() and avc_audit()
        selinux: slow_avc_audit has become non-blocking
        selinux: Fix kernel-doc
        selinux: use __GFP_NOWARN with GFP_NOWAIT in the AVC
        lsm_audit,selinux: pass IB device name by reference
        selinux: Remove redundant assignment to rc
        selinux: Corrected comment to match kernel-doc comment
        selinux: delete selinux_xfrm_policy_lookup() useless argument
        selinux: constify some avtab function arguments
        selinux: simplify duplicate_policydb_cond_list() by using kmemdup()
      6bd344e5
    • Linus Torvalds's avatar
      Merge tag 'clang-features-v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 44b6ed4c
      Linus Torvalds authored
      Pull clang feature updates from Kees Cook:
      
       - Add CC_HAS_NO_PROFILE_FN_ATTR in preparation for PGO support in the
         face of the noinstr attribute, paving the way for PGO and fixing
         GCOV. (Nick Desaulniers)
      
       - x86_64 LTO coverage is expanded to 32-bit x86. (Nathan Chancellor)
      
       - Small fixes to CFI. (Mark Rutland, Nathan Chancellor)
      
      * tag 'clang-features-v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        qemu_fw_cfg: Make fw_cfg_rev_attr a proper kobj_attribute
        Kconfig: Introduce ARCH_WANTS_NO_INSTR and CC_HAS_NO_PROFILE_FN_ATTR
        compiler_attributes.h: cleanups for GCC 4.9+
        compiler_attributes.h: define __no_profile, add to noinstr
        x86, lto: Enable Clang LTO for 32-bit as well
        CFI: Move function_nocfi() into compiler.h
        MAINTAINERS: Add Clang CFI section
      44b6ed4c
    • Hao Xu's avatar
      io_uring: code clean for kiocb_done() · e149bd74
      Hao Xu authored
      
      
      A simple code clean for kiocb_done()
      
      Signed-off-by: default avatarHao Xu <haoxu@linux.alibaba.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e149bd74
    • Hao Xu's avatar
      io_uring: spin in iopoll() only when reqs are in a single queue · 915b3dde
      Hao Xu authored
      
      
      We currently spin in iopoll() when requests to be iopolled are for
      same file(device), while one device may have multiple hardware queues.
      given an example:
      
      hw_queue_0     |    hw_queue_1
      req(30us)           req(10us)
      
      If we first spin on iopolling for the hw_queue_0. the avg latency would
      be (30us + 30us) / 2 = 30us. While if we do round robin, the avg
      latency would be (30us + 10us) / 2 = 20us since we reap the request in
      hw_queue_1 in time. So it's better to do spinning only when requests
      are in same hardware queue.
      
      Signed-off-by: default avatarHao Xu <haoxu@linux.alibaba.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      915b3dde
    • Pavel Begunkov's avatar
      io_uring: pre-initialise some of req fields · 99ebe4ef
      Pavel Begunkov authored
      
      
      Most of requests are allocated from an internal cache, so it's waste of
      time fully initialising them every time. Instead, let's pre-init some of
      the fields we can during initial allocation (e.g. kmalloc(), see
      io_alloc_req()) and keep them valid on request recycling. There are four
      of them in this patch:
      
      ->ctx is always stays the same
      ->link is NULL on free, it's an invariant
      ->result is not even needed to init, just a precaution
      ->async_data we now clean in io_dismantle_req() as it's likely to
         never be allocated.
      
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/892ba0e71309bba9fe9e0142472330bbf9d8f05d.1624739600.git.asml.silence@gmail.com
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      99ebe4ef
    • Pavel Begunkov's avatar
      io_uring: refactor io_submit_flush_completions · 5182ed2e
      Pavel Begunkov authored
      
      
      Don't init req_batch before we actually need it. Also, add a small clean
      up for req declaration.
      
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/ad85512e12bd3a20d521e9782750300970e5afc8.1624739600.git.asml.silence@gmail.com
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5182ed2e
    • Pavel Begunkov's avatar
      io_uring: optimise hot path restricted checks · 4cfb25bf
      Pavel Begunkov authored
      
      
      Move likely/unlikely from io_check_restriction() to specifically
      ctx->restricted check, because doesn't do what it supposed to and make
      the common path take an extra jump.
      
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/22bf70d0a543dfc935d7276bdc73081784e30698.1624739600.git.asml.silence@gmail.com
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      4cfb25bf
    • Pavel Begunkov's avatar
      io_uring: remove not needed PF_EXITING check · e5dc480d
      Pavel Begunkov authored
      
      
      Since cancellation got moved before exit_signals(), there is no one left
      who can call io_run_task_work() with PF_EXIING set, so remove the check.
      Note that __io_req_task_submit() still needs a similar check.
      
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/f7f305ececb1e6044ea649fb983ca754805bb884.1624739600.git.asml.silence@gmail.com
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e5dc480d
    • Pavel Begunkov's avatar
      io_uring: mainstream sqpoll task_work running · dd432ea5
      Pavel Begunkov authored
      
      
      task_works are widely used, so place io_run_task_work() directly into
      the main path of io_sq_thread(), and remove it from other places where
      it's not needed anymore.
      
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/24eb5e35d519c590d3dffbd694b4c61a5fe49029.1624739600.git.asml.silence@gmail.com
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      dd432ea5
    • Pavel Begunkov's avatar
      io_uring: refactor io_arm_poll_handler() · b2d9c3da
      Pavel Begunkov authored
      
      
      gcc 11 goes a weird path and duplicates most of io_arm_poll_handler()
      for READ and WRITE cases. Help it and move all pollin vs pollout
      specific bits under a single if-else, so there is no temptation for this
      kind of unfolding.
      
      before vs after:
         text    data     bss     dec     hex filename
        85362   12650       8   98020   17ee4 ./fs/io_uring.o
        85186   12650       8   97844   17e34 ./fs/io_uring.o
      
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/1deea0037293a922a0358e2958384b2e42437885.1624739600.git.asml.silence@gmail.com
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b2d9c3da
    • Olivier Langlois's avatar
      io_uring: reduce latency by reissueing the operation · 59b735ae
      Olivier Langlois authored
      
      
      It is quite frequent that when an operation fails and returns EAGAIN,
      the data becomes available between that failure and the call to
      vfs_poll() done by io_arm_poll_handler().
      
      Detecting the situation and reissuing the operation is much faster
      than going ahead and push the operation to the io-wq.
      
      Performance improvement testing has been performed with:
      Single thread, 1 TCP connection receiving a 5 Mbps stream, no sqpoll.
      
      4 measurements have been taken:
      1. The time it takes to process a read request when data is already available
      2. The time it takes to process by calling twice io_issue_sqe() after vfs_poll() indicated that data was available
      3. The time it takes to execute io_queue_async_work()
      4. The time it takes to complete a read request asynchronously
      
      2.25% of all the read operations did use the new path.
      
      ready data (baseline)
      avg	3657.94182918628
      min	580
      max	20098
      stddev	1213.15975908162
      
      reissue	completion
      average	7882.67567567568
      min	2316
      max	28811
      stddev	1982.79172973284
      
      insert io-wq time
      average	8983.82276995305
      min	3324
      max	87816
      stddev	2551.60056552038
      
      async time completion
      average	24670.4758861127
      min	10758
      max	102612
      stddev	3483.92416873804
      
      Conclusion:
      On average reissuing the sqe with the patch code is 1.1uSec faster and
      in the worse case scenario 59uSec faster than placing the request on
      io-wq
      
      On average completion time by reissuing the sqe with the patch code is
      16.79uSec faster and in the worse case scenario 73.8uSec faster than
      async completion.
      
      Signed-off-by: default avatarOlivier Langlois <olivier@trillion01.com>
      Reviewed-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/9e8441419bb1b8f3c3fcc607b2713efecdef2136.1624364038.git.olivier@trillion01.com
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      59b735ae
    • Jens Axboe's avatar
      io_uring: add IOPOLL and reserved field checks to IORING_OP_UNLINKAT · 22634bc5
      Jens Axboe authored
      We can't support IOPOLL with non-pollable request types, and we should
      check for unused/reserved fields like we do for other request types.
      
      Fixes: 14a1143b
      
       ("io_uring: add support for IORING_OP_UNLINKAT")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarDmitry Kadashev <dkadashev@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      22634bc5
    • Jens Axboe's avatar
      io_uring: add IOPOLL and reserved field checks to IORING_OP_RENAMEAT · ed7eb259
      Jens Axboe authored
      We can't support IOPOLL with non-pollable request types, and we should
      check for unused/reserved fields like we do for other request types.
      
      Fixes: 80a261fd
      
       ("io_uring: add support for IORING_OP_RENAMEAT")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarDmitry Kadashev <dkadashev@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ed7eb259
    • Pavel Begunkov's avatar
      io_uring: refactor io_openat2() · 12dcb58a
      Pavel Begunkov authored
      
      
      Put do_filp_open() fail path of io_openat2() under a single if,
      deduplicating put_unused_fd(), making it look better and helping
      the hot path.
      
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/f4c84d25c049d0af2adc19c703bbfef607200209.1624543113.git.asml.silence@gmail.com
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      12dcb58a
    • Pavel Begunkov's avatar
      io_uring: simplify struct io_uring_sqe layout · 9ba6a1c0
      Pavel Begunkov authored
      
      
      Flatten struct io_uring_sqe, the last union is exactly 64B, so move them
      out of union { struct { ... }}, and decrease __pad2 size.
      
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/2e21ef7aed136293d654450bc3088973a8adc730.1624543113.git.asml.silence@gmail.com
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9ba6a1c0
    • Pavel Begunkov's avatar
      io_uring: update sqe layout build checks · 16340eab
      Pavel Begunkov authored
      
      
      Add missing BUILD_BUG_SQE_ELEM() for ->buf_group verifying that SQE
      layout doesn't change.
      
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/1f9d21bd74599b856b3a632be4c23ffa184a3ef0.1624543113.git.asml.silence@gmail.com
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      16340eab
    • Pavel Begunkov's avatar
      io_uring: fix code style problems · fe7e3257
      Pavel Begunkov authored
      
      
      Fix a bunch of problems mostly found by checkpatch.pl
      
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/cfaf9a2f27b43934144fe9422a916bd327099f44.1624543113.git.asml.silence@gmail.com
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      fe7e3257
    • Pavel Begunkov's avatar
      io_uring: refactor io_sq_thread() · 1a924a80
      Pavel Begunkov authored
      
      
      Move needs_sched declaration into the block where it's used, so it's
      harder to misuse/wrongfully reuse.
      
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/e4a07db1353ee38b924dd1b45394cf8e746130b4.1624543113.git.asml.silence@gmail.com
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      1a924a80
    • Pavel Begunkov's avatar
      io_uring: don't change sqpoll creds if not needed · 948e1947
      Pavel Begunkov authored
      
      
      SQPOLL doesn't need to change creds if it's not submitting requests.
      Move creds overriding into __io_sq_thread() after checking if there are
      SQEs pending.
      
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/c54368da2357ac539e0a333f7cfff70d5fb045b2.1624543113.git.asml.silence@gmail.com
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      948e1947
    • Linus Torvalds's avatar
      Merge tag 'for-5.14/drivers-2021-06-29' of git://git.kernel.dk/linux-block · 44046219
      Linus Torvalds authored
      Pull block driver updates from Jens Axboe:
       "Pretty calm round, mostly just NVMe and a bit of MD:
      
         - NVMe updates (via Christoph)
              - improve the APST configuration algorithm (Alexey Bogoslavsky)
              - look for StorageD3Enable on companion ACPI device
                (Mario Limonciello)
              - allow selecting the network interface for TCP connections
                (Martin Belanger)
              - misc cleanups (Amit Engel, Chaitanya Kulkarni, Colin Ian King,
                Christoph)
              - move the ACPI StorageD3 code to drivers/acpi/ and add quirks
                for certain AMD CPUs (Mario Limonciello)
              - zoned device support for nvmet (Chaitanya Kulkarni)
              - fix the rules for changing the serial number in nvmet
                (Noam Gottlieb)
              - various small fixes and cleanups (Dan Carpenter, JK Kim,
                Chaitanya Kulkarni, Hannes Reinecke, Wesley Sheng, Geert
                Uytterhoeven, Daniel Wagner)
      
         - MD updates (Via Song)
              - iostats rewrite (Guoqing Jiang)
              - raid5 lock contention optimization (Gal Ofri)
      
         - Fall through warning fix (Gustavo)
      
         - Misc fixes (Gustavo, Jiapeng)"
      
      * tag 'for-5.14/drivers-2021-06-29' of git://git.kernel.dk/linux-block: (78 commits)
        nvmet: use NVMET_MAX_NAMESPACES to set nn value
        loop: Fix missing discard support when using LOOP_CONFIGURE
        nvme.h: add missing nvme_lba_range_type endianness annotations
        nvme: remove zeroout memset call for struct
        nvme-pci: remove zeroout memset call for struct
        nvmet: remove zeroout memset call for struct
        nvmet: add ZBD over ZNS backend support
        nvmet: add Command Set Identifier support
        nvmet: add nvmet_req_bio put helper for backends
        nvmet: add req cns error complete helper
        block: export blk_next_bio()
        nvmet: remove local variable
        nvmet: use nvme status value directly
        nvmet: use u32 type for the local variable nsid
        nvmet: use u32 for nvmet_subsys max_nsid
        nvmet: use req->cmd directly in file-ns fast path
        nvmet: use req->cmd directly in bdev-ns fast path
        nvmet: make ver stable once connection established
        nvmet: allow mn change if subsys not discovered
        nvmet: make sn stable once connection was established
        ...
      44046219
    • Linus Torvalds's avatar
      Merge tag 'for-5.14/block-2021-06-29' of git://git.kernel.dk/linux-block · df668a5f
      Linus Torvalds authored
      Pull core block updates from Jens Axboe:
      
       - disk events cleanup (Christoph)
      
       - gendisk and request queue allocation simplifications (Christoph)
      
       - bdev_disk_changed cleanups (Christoph)
      
       - IO priority improvements (Bart)
      
       - Chained bio completion trace fix (Edward)
      
       - blk-wbt fixes (Jan)
      
       - blk-wbt enable/disable fix (Zhang)
      
       - Scheduler dispatch improvements (Jan, Ming)
      
       - Shared tagset scheduler improvements (John)
      
       - BFQ updates (Paolo, Luca, Pietro)
      
       - BFQ lock inversion fix (Jan)
      
       - Documentation improvements (Kir)
      
       - CLONE_IO block cgroup fix (Tejun)
      
       - Remove of ancient and deprecated block dump feature (zhangyi)
      
       - Discard merge fix (Ming)
      
       - Misc fixes or followup fixes (Colin, Damien, Dan, Long, Max, Thomas,
         Yang)
      
      * tag 'for-5.14/block-2021-06-29' of git://git.kernel.dk/linux-block: (129 commits)
        block: fix discard request merge
        block/mq-deadline: Remove a WARN_ON_ONCE() call
        blk-mq: update hctx->dispatch_busy in case of real scheduler
        blk: Fix lock inversion between ioc lock and bfqd lock
        bfq: Remove merged request already in bfq_requests_merged()
        block: pass a gendisk to bdev_disk_changed
        block: move bdev_disk_changed
        block: add the events* attributes to disk_attrs
        block: move the disk events code to a separate file
        block: fix trace completion for chained bio
        block/partitions/msdos: Fix typo inidicator -> indicator
        block, bfq: reset waker pointer with shared queues
        block, bfq: check waker only for queues with no in-flight I/O
        block, bfq: avoid delayed merge of async queues
        block, bfq: boost throughput by extending queue-merging times
        block, bfq: consider also creation time in delayed stable merge
        block, bfq: fix delayed stable merge check
        block, bfq: let also stably merged queues enjoy weight raising
        blk-wbt: make sure throttle is enabled properly
        blk-wbt: introduce a new disable state to prevent false positive by rwb_enabled()
        ...
      df668a5f
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid · df04fbe8
      Linus Torvalds authored
      Pull HID updates from Jiri Kosina:
      
       - patch series that ensures that hid-multitouch driver disables touch
         and button-press reporting on hid-mt devices during suspend when the
         device is not configured as a wakeup-source, from Hans de Goede
      
       - support for ISH DMA on Intel EHL platform, from Even Xu
      
       - support for Renoir and Cezanne SoCs, Ambient Light Sensor and Human
         Presence Detection sensor for amd-sfh driver, from Basavaraj Natikar
      
       - other assorted code cleanups and device-specific fixes/quirks
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: (45 commits)
        HID: thrustmaster: Switch to kmemdup() when allocate change_request
        HID: multitouch: Disable event reporting on suspend when the device is not a wakeup-source
        HID: logitech-dj: Implement may_wakeup ll-driver callback
        HID: usbhid: Implement may_wakeup ll-driver callback
        HID: core: Add hid_hw_may_wakeup() function
        HID: input: Add support for Programmable Buttons
        HID: wacom: Correct base usage for capacitive ExpressKey status bits
        HID: amd_sfh: Add initial support for HPD sensor
        HID: amd_sfh: Extend ALS support for newer AMD platform
        HID: amd_sfh: Extend driver capabilities for multi-generation support
        HID: surface-hid: Fix get-report request
        HID: sony: fix freeze when inserting ghlive ps3/wii dongles
        HID: usbkbd: Avoid GFP_ATOMIC when GFP_KERNEL is possible
        HID: amd_sfh: change in maintainer
        HID: intel-ish-hid: ipc: Specify that EHL no cache snooping
        HID: intel-ish-hid: ishtp: Add dma_no_cache_snooping() callback
        HID: intel-ish-hid: Set ISH driver depends on x86
        HID: hid-input: add Surface Go battery quirk
        HID: intel-ish-hid: Fix minor typos in comments
        HID: usbmouse: Avoid GFP_ATOMIC when GFP_KERNEL is possible
        ...
      df04fbe8
    • Linus Torvalds's avatar
      Merge tag 'edac_updates_for_v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras · 4b5e35ce
      Linus Torvalds authored
      Pull EDAC updates from Tony Luck:
       "Various fixes and support for new CPUs:
      
         - Clean up error messages from thunderx_edac
      
         - Add MODULE_DEVICE_TABLE to ti_edac so it will autoload
      
         - Use %pR to print resources in aspeed_edac
      
         - Add Yazen Ghannam as MAINTAINER for AMD edac drivers
      
         - Fix Ice Lake and Sapphire Rapids drivers to report correct "near"
           or "far" device for errors in 2LM configurations
      
         - Add support of on package high bandwidth memory in Sapphire Rapids
      
         - New CPU support for three CPUs supporting in-band ECC (IOT SKUs for
           ICL-NNPI, Tiger Lake and Alder Lake)
      
         - Don't even try to load Intel EDAC drivers when running as a guest
      
         - Fix Kconfig dependency on X86_MCE_INTEL for EDAC_IGEN6"
      
      * tag 'edac_updates_for_v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
        EDAC/igen6: fix core dependency
        EDAC/Intel: Do not load EDAC driver when running as a guest
        EDAC/igen6: Add Intel Alder Lake SoC support
        EDAC/igen6: Add Intel Tiger Lake SoC support
        EDAC/igen6: Add Intel ICL-NNPI SoC support
        EDAC/i10nm: Add support for high bandwidth memory
        EDAC/i10nm: Add detection of memory levels for ICX/SPR servers
        EDAC/skx_common: Add new ADXL components for 2-level memory
        MAINTAINERS: Make Yazen Ghannam maintainer for EDAC-AMD64
        EDAC/aspeed: Use proper format string for printing resource
        EDAC/ti: Add missing MODULE_DEVICE_TABLE
        EDAC/thunderx: Remove irrelevant variable from error messages
      4b5e35ce
    • Linus Torvalds's avatar
      Merge tag 'tpmdd-next-v5.14-rc1' of... · e60d726f
      Linus Torvalds authored
      Merge tag 'tpmdd-next-v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
      
      Pull tpm driver updates from Jarkko Sakkinen:
       "Bug fixes for TPM"
      
      [ This isn't actually the whole contents of the tag and thus doesn't
        contain Jarkko's signature - I dropped the two top commits that added
        support for signing modules using elliptic curve keys because there's
        a new series for that that fixes a few confising things   - Linus ]
      
      * tag 'tpmdd-next-v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
        tpm: Replace WARN_ONCE() with dev_err_once() in tpm_tis_status()
        tpm_tis: Use DEFINE_RES_MEM() to simplify code
        tpm: fix some doc warnings in tpm1-cmd.c
        tpm_tis_spi: add missing SPI device ID entries
        tpm: add longer timeout for TPM2_CC_VERIFY_SIGNATURE
        char: tpm: move to use request_irq by IRQF_NO_AUTOEN flag
        tpm_tis_spi: set default probe function if device id not match
        tpm_crb: Use IOMEM_ERR_PTR when function returns iomem
      e60d726f