Skip to content
  1. Mar 06, 2021
    • Linus Torvalds's avatar
      Merge tag 'io_uring-5.12-2021-03-05' of git://git.kernel.dk/linux-block · f292e873
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
       "A bit of a mix between fallout from the worker change, cleanups and
        reductions now possible from that change, and fixes in general. In
        detail:
      
         - Fully serialize manager and worker creation, fixing races due to
           that.
      
         - Clean up some naming that had gone stale.
      
         - SQPOLL fixes.
      
         - Fix race condition around task_work rework that went into this
           merge window.
      
         - Implement unshare. Used for when the original task does unshare(2)
           or setuid/seteuid and friends, drops the original workers and forks
           new ones.
      
         - Drop the only remaining piece of state shuffling we had left, which
           was cred. Move it into issue instead, and we can drop all of that
           code too.
      
         - Kill f_op->flush() usage. That was such a nasty hack that we had
           out of necessity, we no longer need it.
      
         - Following from ->flush() removal, we can also drop various bits of
           ctx state related to SQPOLL and cancelations.
      
         - Fix an issue with IOPOLL retry, which originally was fallout from a
           filemap change (removing iov_iter_revert()), but uncovered an issue
           with iovec re-import too late.
      
         - Fix an issue with system suspend.
      
         - Use xchg() for fallback work, instead of cmpxchg().
      
         - Properly destroy io-wq on exec.
      
         - Add create_io_thread() core helper, and use that in io-wq and
           io_uring. This allows us to remove various silly completion events
           related to thread setup.
      
         - A few error handling fixes.
      
        This should be the grunt of fixes necessary for the new workers, next
        week should be quieter. We've got a pending series from Pavel on
        cancelations, and how tasks and rings are indexed. Outside of that,
        should just be minor fixes. Even with these fixes, we're still killing
        a net ~80 lines"
      
      * tag 'io_uring-5.12-2021-03-05' of git://git.kernel.dk/linux-block: (41 commits)
        io_uring: don't restrict issue_flags for io_openat
        io_uring: make SQPOLL thread parking saner
        io-wq: kill hashed waitqueue before manager exits
        io_uring: clear IOCB_WAITQ for non -EIOCBQUEUED return
        io_uring: don't keep looping for more events if we can't flush overflow
        io_uring: move to using create_io_thread()
        kernel: provide create_io_thread() helper
        io_uring: reliably cancel linked timeouts
        io_uring: cancel-match based on flags
        io-wq: ensure all pending work is canceled on exit
        io_uring: ensure that threads freeze on suspend
        io_uring: remove extra in_idle wake up
        io_uring: inline __io_queue_async_work()
        io_uring: inline io_req_clean_work()
        io_uring: choose right tctx->io_wq for try cancel
        io_uring: fix -EAGAIN retry with IOPOLL
        io-wq: fix error path leak of buffered write hash map
        io_uring: remove sqo_task
        io_uring: kill sqo_dead and sqo submission halting
        io_uring: ignore double poll add on the same waitqueue head
        ...
      f292e873
    • Linus Torvalds's avatar
      Merge tag 'pm-5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 6d47254c
      Linus Torvalds authored
      Pull power management fixes from Rafael Wysocki:
       "These fix the usage of device links in the runtime PM core code and
        update the DTPM (Dynamic Thermal Power Management) feature added
        recently.
      
        Specifics:
      
         - Make the runtime PM core code avoid attempting to suspend supplier
           devices before updating the PM-runtime status of a consumer to
           'suspended' (Rafael Wysocki).
      
         - Fix DTPM (Dynamic Thermal Power Management) root node
           initialization and label that feature as EXPERIMENTAL in Kconfig
           (Daniel Lezcano)"
      
      * tag 'pm-5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        powercap/drivers/dtpm: Add the experimental label to the option description
        powercap/drivers/dtpm: Fix root node initialization
        PM: runtime: Update device status before letting suppliers suspend
      6d47254c
    • Linus Torvalds's avatar
      Merge tag 'acpi-5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · ea6be461
      Linus Torvalds authored
      Pull ACPI fix from Rafael Wysocki:
       "Make the empty stubs of some helper functions used when CONFIG_ACPI is
        not set actually match those functions (Andy Shevchenko)"
      
      * tag 'acpi-5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI: bus: Constify is_acpi_node() and friends (part 2)
      ea6be461
    • Linus Torvalds's avatar
      Merge tag 'iommu-fixes-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · fc2c8d0a
      Linus Torvalds authored
      Pull iommu fixes from Joerg Roedel:
      
       - Fix a sleeping-while-atomic issue in the AMD IOMMU code
      
       - Disable lazy IOTLB flush for untrusted devices in the Intel VT-d
         driver
      
       - Fix status code definitions for Intel VT-d
      
       - Fix IO Page Fault issue in Tegra IOMMU driver
      
      * tag 'iommu-fixes-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
        iommu/vt-d: Fix status code for Allocate/Free PASID command
        iommu: Don't use lazy flush for untrusted device
        iommu/tegra-smmu: Fix mc errors on tegra124-nyan
        iommu/amd: Fix sleeping in atomic in increase_address_space()
      fc2c8d0a
    • Linus Torvalds's avatar
      Merge tag 'for-5.12-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · f09b04cc
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
       "More regression fixes and stabilization.
      
        Regressions:
      
         - zoned mode
            - count zone sizes in wider int types
            - fix space accounting for read-only block groups
      
         - subpage: fix page tail zeroing
      
        Fixes:
      
         - fix spurious warning when remounting with free space tree
      
         - fix warning when creating a directory with smack enabled
      
         - ioctl checks for qgroup inheritance when creating a snapshot
      
         - qgroup
            - fix missing unlock on error path in zero range
            - fix amount of released reservation on error
            - fix flushing from unsafe context with open transaction,
              potentially deadlocking
      
         - minor build warning fixes"
      
      * tag 'for-5.12-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: zoned: do not account freed region of read-only block group as zone_unusable
        btrfs: zoned: use sector_t for zone sectors
        btrfs: subpage: fix the false data csum mismatch error
        btrfs: fix warning when creating a directory with smack enabled
        btrfs: don't flush from btrfs_delayed_inode_reserve_metadata
        btrfs: export and rename qgroup_reserve_meta
        btrfs: free correct amount of space in btrfs_delayed_inode_reserve_metadata
        btrfs: fix spurious free_space_tree remount warning
        btrfs: validate qgroup inherit for SNAP_CREATE_V2 ioctl
        btrfs: unlock extents in btrfs_zero_range in case of quota reservation errors
        btrfs: ref-verify: use 'inline void' keyword ordering
      f09b04cc
    • Linus Torvalds's avatar
      Merge tag 'devicetree-fixes-for-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux · 6bf331d5
      Linus Torvalds authored
      Pull devicetree fixes from Rob Herring:
      
       - Another batch of graph and video-interfaces schema conversions
      
       - Drop DT header symlink for dropped C6X arch
      
       - Fix bcm2711-hdmi schema error
      
      * tag 'devicetree-fixes-for-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
        dt-bindings: media: Use graph and video-interfaces schemas, round 2
        dts: drop dangling c6x symlink
        dt-bindings: bcm2711-hdmi: Fix broken schema
      6bf331d5
    • Linus Torvalds's avatar
      Merge tag 'trace-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 54663cf3
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
       "Functional fixes:
      
         - Fix big endian conversion for arm64 in recordmcount processing
      
         - Fix timestamp corruption in ring buffer on discarding events
      
         - Fix memory leak in __create_synth_event()
      
         - Skip selftests if tracing is disabled as it will cause them to
           fail.
      
        Non-functional fixes:
      
         - Fix help text in Kconfig
      
         - Remove duplicate prototype for trace_empty()
      
         - Fix stale comment about the trace_event_call flags.
      
        Self test update:
      
         - Add more information to the validation output of when a corrupt
           timestamp is found in the ring buffer, and also trigger a warning
           to make sure that tests catch it"
      
      * tag 'trace-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing: Fix comment about the trace_event_call flags
        tracing: Skip selftests if tracing is disabled
        tracing: Fix memory leak in __create_synth_event()
        ring-buffer: Add a little more information and a WARN when time stamp going backwards is detected
        ring-buffer: Force before_stamp and write_stamp to be different on discard
        tracing: Fix help text of TRACEPOINT_BENCHMARK in Kconfig
        tracing: Remove duplicate declaration from trace.h
        ftrace: Have recordmcount use w8 to read relp->r_info in arm64_is_fake_mcount
      54663cf3
    • Pavel Begunkov's avatar
      io_uring: don't restrict issue_flags for io_openat · e45cff58
      Pavel Begunkov authored
      45d189c6
      
       ("io_uring: replace force_nonblock with flags") did
      something strange for io_openat() slicing all issue_flags but
      IO_URING_F_NONBLOCK. Not a bug for now, but better to just forward the
      flags.
      
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e45cff58
  2. Mar 05, 2021
  3. Mar 04, 2021
    • Naohiro Aota's avatar
      btrfs: zoned: do not account freed region of read-only block group as zone_unusable · badae9c8
      Naohiro Aota authored
      We migrate zone unusable bytes to read-only bytes when a block group is
      set to read-only, and account all the free region as bytes_readonly.
      Thus, we should not increase block_group->zone_unusable when the block
      group is read-only.
      
      Fixes: 169e0da9
      
       ("btrfs: zoned: track unusable bytes for zones")
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarNaohiro Aota <naohiro.aota@wdc.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      badae9c8
    • Naohiro Aota's avatar
      btrfs: zoned: use sector_t for zone sectors · d734492a
      Naohiro Aota authored
      We need to use sector_t for zone_sectors, or it would set the zone size
      to zero when the size >= 4GB (= 2^24 sectors) by shifting the
      zone_sectors value by SECTOR_SHIFT. We're assuming zones sizes up to
      8GiB.
      
      Fixes: 5b316468
      
       ("btrfs: get zone information of zoned block devices")
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarNaohiro Aota <naohiro.aota@wdc.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      d734492a
    • Steven Rostedt (VMware)'s avatar
      tracing: Fix comment about the trace_event_call flags · f9f34447
      Steven Rostedt (VMware) authored
      
      
      In the declaration of the struct trace_event_call, the flags has the bits
      defined in the comment above it. But these bits are also defined by the
      TRACE_EVENT_FL_* enums just above the declaration of the struct. As the
      comment about the flags in the struct has become stale and incorrect, just
      replace it with a reference to the TRACE_EVENT_FL_* enum above.
      
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      f9f34447
    • Steven Rostedt (VMware)'s avatar
      tracing: Skip selftests if tracing is disabled · ee666a18
      Steven Rostedt (VMware) authored
      
      
      If tracing is disabled for some reason (traceoff_on_warning, command line,
      etc), the ftrace selftests are guaranteed to fail, as their results are
      defined by trace data in the ring buffers. If the ring buffers are turned
      off, the tests will fail, due to lack of data.
      
      Because tracing being disabled is for a specific reason (warning, user
      decided to, etc), it does not make sense to enable tracing to run the self
      tests, as the test output may corrupt the reason for the tracing to be
      disabled.
      
      Instead, simply skip the self tests and report that they are being skipped
      due to tracing being disabled.
      
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      ee666a18
    • Vamshi K Sthambamkadi's avatar
      tracing: Fix memory leak in __create_synth_event() · f40fc799
      Vamshi K Sthambamkadi authored
      kmemleak report:
      unreferenced object 0xc5a6f708 (size 8):
        comm "ftracetest", pid 1209, jiffies 4294911500 (age 6.816s)
        hex dump (first 8 bytes):
          00 c1 3d 60 14 83 1f 8a                          ..=`....
        backtrace:
          [<f0aa4ac4>] __kmalloc_track_caller+0x2a6/0x460
          [<7d3d60a6>] kstrndup+0x37/0x70
          [<45a0e739>] argv_split+0x1c/0x120
          [<c17982f8>] __create_synth_event+0x192/0xb00
          [<0708b8a3>] create_synth_event+0xbb/0x150
          [<3d1941e1>] create_dyn_event+0x5c/0xb0
          [<5cf8b9e3>] trace_parse_run_command+0xa7/0x140
          [<04deb2ef>] dyn_event_write+0x10/0x20
          [<8779ac95>] vfs_write+0xa9/0x3c0
          [<ed93722a>] ksys_write+0x89/0xc0
          [<b9ca0507>] __ia32_sys_write+0x15/0x20
          [<7ce02d85>] __do_fast_syscall_32+0x45/0x80
          [<cb0ecb35>] do_fast_syscall_32+0x29/0x60
          [<2467454a>] do_SYSENTER_32+0x15/0x20
          [<9beaa61d>] entry_SYSENTER_32+0xa9/0xfc
      unreferenced object 0xc5a6f078 (size 8):
        comm "ftracetest", pid 1209, jiffies 4294911500 (age 6.816s)
        hex dump (first 8 bytes):
          08 f7 a6 c5 00 00 00 00                          ........
        backtrace:
          [<bbac096a>] __kmalloc+0x2b6/0x470
          [<aa2624b4>] argv_split+0x82/0x120
          [<c17982f8>] __create_synth_event+0x192/0xb00
          [<0708b8a3>] create_synth_event+0xbb/0x150
          [<3d1941e1>] create_dyn_event+0x5c/0xb0
          [<5cf8b9e3>] trace_parse_run_command+0xa7/0x140
          [<04deb2ef>] dyn_event_write+0x10/0x20
          [<8779ac95>] vfs_write+0xa9/0x3c0
          [<ed93722a>] ksys_write+0x89/0xc0
          [<b9ca0507>] __ia32_sys_write+0x15/0x20
          [<7ce02d85>] __do_fast_syscall_32+0x45/0x80
          [<cb0ecb35>] do_fast_syscall_32+0x29/0x60
          [<2467454a>] do_SYSENTER_32+0x15/0x20
          [<9beaa61d>] entry_SYSENTER_32+0xa9/0xfc
      
      In __create_synth_event(), while iterating field/type arguments, the
      argv_split() will return array of atleast 2 elements even when zero
      arguments(argc=0) are passed. for e.g. when there is double delimiter
      or string ends with delimiter
      
      To fix call argv_free() even when argc=0.
      
      Link: https://lkml.kernel.org/r/20210304094521.GA1826@cosmos
      
      
      
      Signed-off-by: default avatarVamshi K Sthambamkadi <vamshi.k.sthambamkadi@gmail.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      f40fc799
    • Steven Rostedt (VMware)'s avatar
      ring-buffer: Add a little more information and a WARN when time stamp going backwards is detected · 6549de1f
      Steven Rostedt (VMware) authored
      
      
      When the CONFIG_RING_BUFFER_VALIDATE_TIME_DELTAS is enabled, and the time
      stamps are detected as not being valid, it reports information about the
      write stamp, but does not show the before_stamp which is still useful
      information. Also, it should give a warning once, such that tests detect
      this happening.
      
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      6549de1f
    • Steven Rostedt (VMware)'s avatar
      ring-buffer: Force before_stamp and write_stamp to be different on discard · 6f6be606
      Steven Rostedt (VMware) authored
      Part of the logic of the new time stamp code depends on the before_stamp and
      the write_stamp to be different if the write_stamp does not match the last
      event on the buffer, as it will be used to calculate the delta of the next
      event written on the buffer.
      
      The discard logic depends on this, as the next event to come in needs to
      inject a full timestamp as it can not rely on the last event timestamp in
      the buffer because it is unknown due to events after it being discarded. But
      by changing the write_stamp back to the time before it, it forces the next
      event to use a full time stamp, instead of relying on it.
      
      The issue came when a full time stamp was used for the event, and
      rb_time_delta() returns zero in that case. The update to the write_stamp
      (which subtracts delta) made it not change. Then when the event is removed
      from the buffer, because the before_stamp and write_stamp still match, the
      next event written would calculate its delta from the write_stamp, but that
      would be wrong as the write_stamp is of the time of the event that was
      discarded.
      
      In the case that the delta change being made to write_stamp is zero, set the
      before_stamp to zero as well, and this will force the next event to inject a
      full timestamp and not use the current write_stamp.
      
      Cc: stable@vger.kernel.org
      Fixes: a389d86f
      
       ("ring-buffer: Have nested events still record running time stamp")
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      6f6be606
    • Rolf Eike Beer's avatar
      69268094
    • Yordan Karadzhov (VMware)'s avatar
      tracing: Remove duplicate declaration from trace.h · 70d443d8
      Yordan Karadzhov (VMware) authored
      A declaration of function "int trace_empty(struct trace_iterator *iter)"
      shows up twice in the header file kernel/trace/trace.h
      
      Link: https://lkml.kernel.org/r/20210304092348.208033-1-y.karadz@gmail.com
      
      
      
      Signed-off-by: default avatarYordan Karadzhov (VMware) <y.karadz@gmail.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      70d443d8
    • Jens Axboe's avatar
      io-wq: ensure all pending work is canceled on exit · f0127254
      Jens Axboe authored
      
      
      If we race on shutting down the io-wq, then we should ensure that any
      work that was queued after workers shutdown is canceled. Harden the
      add work check a bit too, checking for IO_WQ_BIT_EXIT and cancel if
      it's set.
      
      Add a WARN_ON() for having any work before we kill the io-wq context.
      
      Reported-by: default avatar <syzbot+91b4b56ead187d35c9d3@syzkaller.appspotmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f0127254
    • Jens Axboe's avatar
      io_uring: ensure that threads freeze on suspend · e4b4a13f
      Jens Axboe authored
      
      
      Alex reports that his system fails to suspend using 5.12-rc1, with the
      following dump:
      
      [  240.650300] PM: suspend entry (deep)
      [  240.650748] Filesystems sync: 0.000 seconds
      [  240.725605] Freezing user space processes ...
      [  260.739483] Freezing of tasks failed after 20.013 seconds (3 tasks refusing to freeze, wq_busy=0):
      [  260.739497] task:iou-mgr-446     state:S stack:    0 pid:  516 ppid:   439 flags:0x00004224
      [  260.739504] Call Trace:
      [  260.739507]  ? sysvec_apic_timer_interrupt+0xb/0x81
      [  260.739515]  ? pick_next_task_fair+0x197/0x1cde
      [  260.739519]  ? sysvec_reschedule_ipi+0x2f/0x6a
      [  260.739522]  ? asm_sysvec_reschedule_ipi+0x12/0x20
      [  260.739525]  ? __schedule+0x57/0x6d6
      [  260.739529]  ? del_timer_sync+0xb9/0x115
      [  260.739533]  ? schedule+0x63/0xd5
      [  260.739536]  ? schedule_timeout+0x219/0x356
      [  260.739540]  ? __next_timer_interrupt+0xf1/0xf1
      [  260.739544]  ? io_wq_manager+0x73/0xb1
      [  260.739549]  ? io_wq_create+0x262/0x262
      [  260.739553]  ? ret_from_fork+0x22/0x30
      [  260.739557] task:iou-mgr-517     state:S stack:    0 pid:  522 ppid:   439 flags:0x00004224
      [  260.739561] Call Trace:
      [  260.739563]  ? sysvec_apic_timer_interrupt+0xb/0x81
      [  260.739566]  ? pick_next_task_fair+0x16f/0x1cde
      [  260.739569]  ? sysvec_apic_timer_interrupt+0xb/0x81
      [  260.739571]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
      [  260.739574]  ? __schedule+0x5b7/0x6d6
      [  260.739578]  ? del_timer_sync+0x70/0x115
      [  260.739581]  ? schedule_timeout+0x211/0x356
      [  260.739585]  ? __next_timer_interrupt+0xf1/0xf1
      [  260.739588]  ? io_wq_check_workers+0x15/0x11f
      [  260.739592]  ? io_wq_manager+0x69/0xb1
      [  260.739596]  ? io_wq_create+0x262/0x262
      [  260.739600]  ? ret_from_fork+0x22/0x30
      [  260.739603] task:iou-wrk-517     state:S stack:    0 pid:  523 ppid:   439 flags:0x00004224
      [  260.739607] Call Trace:
      [  260.739609]  ? __schedule+0x5b7/0x6d6
      [  260.739614]  ? schedule+0x63/0xd5
      [  260.739617]  ? schedule_timeout+0x219/0x356
      [  260.739621]  ? __next_timer_interrupt+0xf1/0xf1
      [  260.739624]  ? task_thread.isra.0+0x148/0x3af
      [  260.739628]  ? task_thread_unbound+0xa/0xa
      [  260.739632]  ? task_thread_bound+0x7/0x7
      [  260.739636]  ? ret_from_fork+0x22/0x30
      [  260.739647] OOM killer enabled.
      [  260.739648] Restarting tasks ... done.
      [  260.740077] PM: suspend exit
      
      Play nice and ensure that any thread we create will call try_to_freeze()
      at an opportune time so that memory suspend can proceed. For the io-wq
      worker threads, mark them as PF_NOFREEZE. They could potentially be
      blocked for a long time.
      
      Reported-by: default avatarAlex Xu (Hello71) <alex_y_xu@yahoo.ca>
      Tested-by: default avatarAlex Xu (Hello71) <alex_y_xu@yahoo.ca>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e4b4a13f
    • Pavel Begunkov's avatar
      io_uring: remove extra in_idle wake up · b23fcf47
      Pavel Begunkov authored
      
      
      io_dismantle_req() is always followed by io_put_task(), which already do
      proper in_idle wake ups, so we can skip waking the owner task in
      io_dismantle_req(). The rules are simpler now, do io_put_task() shortly
      after ending a request, and it will be fine.
      
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b23fcf47