Skip to content
  1. Oct 07, 2021
  2. Oct 01, 2021
    • Thomas Gleixner's avatar
      rtmutex: Wake up the waiters lockless while dropping the read lock. · 9321f815
      Thomas Gleixner authored
      
      
      The rw_semaphore and rwlock_t implementation both wake the waiter while
      holding the rt_mutex_base::wait_lock acquired.
      This can be optimized by waking the waiter lockless outside of the
      locked section to avoid a needless contention on the
      rt_mutex_base::wait_lock lock.
      
      Extend rt_mutex_wake_q_add() to also accept task and state and use it in
      __rwbase_read_unlock().
      
      Suggested-by: default avatarDavidlohr Bueso <dave@stgolabs.net>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20210928150006.597310-3-bigeasy@linutronix.de
      9321f815
    • Sebastian Andrzej Siewior's avatar
      rtmutex: Check explicit for TASK_RTLOCK_WAIT. · 8fe46535
      Sebastian Andrzej Siewior authored
      
      
      rt_mutex_wake_q_add() needs to  need to distiguish between sleeping
      locks (TASK_RTLOCK_WAIT) and normal locks which use TASK_NORMAL to use
      the proper wake mechanism.
      
      Instead of checking for != TASK_NORMAL make it more robust and check
      explicit for TASK_RTLOCK_WAIT which is the reason why a different wake
      mechanism is used.
      
      No functional change.
      
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20210928150006.597310-2-bigeasy@linutronix.de
      8fe46535
    • Thomas Gleixner's avatar
      locking/rt: Take RCU nesting into account for __might_resched() · ef1f4804
      Thomas Gleixner authored
      
      
      The general rule that rcu_read_lock() held sections cannot voluntary sleep
      does apply even on RT kernels. Though the substitution of spin/rw locks on
      RT enabled kernels has to be exempt from that rule. On !RT a spin_lock()
      can obviously nest inside a RCU read side critical section as the lock
      acquisition is not going to block, but on RT this is not longer the case
      due to the 'sleeping' spinlock substitution.
      
      The RT patches contained a cheap hack to ignore the RCU nesting depth in
      might_sleep() checks, which was a pragmatic but incorrect workaround.
      
      Instead of generally ignoring the RCU nesting depth in __might_sleep() and
      __might_resched() checks, pass the rcu_preempt_depth() via the offsets
      argument to __might_resched() from spin/read/write_lock() which makes the
      checks work correctly even in RCU read side critical sections.
      
      The actual blocking on such a substituted lock within a RCU read side
      critical section is already handled correctly in __schedule() by treating
      it as a "preemption" of the RCU read side critical section.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20210923165358.368305497@linutronix.de
      ef1f4804
    • Thomas Gleixner's avatar
      sched: Make cond_resched_lock() variants RT aware · 3e9cc688
      Thomas Gleixner authored
      
      
      The __might_resched() checks in the cond_resched_lock() variants use
      PREEMPT_LOCK_OFFSET for preempt count offset checking which takes the
      preemption disable by the spin_lock() which is still held at that point
      into account.
      
      On PREEMPT_RT enabled kernels spin/rw_lock held sections stay preemptible
      which means PREEMPT_LOCK_OFFSET is 0, but that still triggers the
      __might_resched() check because that takes RCU read side nesting into
      account.
      
      On RT enabled kernels spin/read/write_lock() issue rcu_read_lock() to
      resemble the !RT semantics, which means in cond_resched_lock() the might
      resched check will see preempt_count() == 0 and rcu_preempt_depth() == 1.
      
      Introduce PREEMPT_LOCK_SCHED_OFFSET for those might resched checks and map
      them depending on CONFIG_PREEMPT_RT.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20210923165358.305969211@linutronix.de
      3e9cc688
    • Thomas Gleixner's avatar
      sched: Make RCU nest depth distinct in __might_resched() · 50e081b9
      Thomas Gleixner authored
      
      
      For !RT kernels RCU nest depth in __might_resched() is always expected to
      be 0, but on RT kernels it can be non zero while the preempt count is
      expected to be always 0.
      
      Instead of playing magic games in interpreting the 'preempt_offset'
      argument, rename it to 'offsets' and use the lower 8 bits for the expected
      preempt count, allow to hand in the expected RCU nest depth in the upper
      bits and adopt the __might_resched() code and related checks and printks.
      
      The affected call sites are updated in subsequent steps.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20210923165358.243232823@linutronix.de
      50e081b9
    • Thomas Gleixner's avatar
      sched: Make might_sleep() output less confusing · 8d713b69
      Thomas Gleixner authored
      
      
      might_sleep() output is pretty informative, but can be confusing at times
      especially with PREEMPT_RCU when the check triggers due to a voluntary
      sleep inside a RCU read side critical section:
      
       BUG: sleeping function called from invalid context at kernel/test.c:110
       in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 415, name: kworker/u112:52
       Preemption disabled at: migrate_disable+0x33/0xa0
      
      in_atomic() is 0, but it still tells that preemption was disabled at
      migrate_disable(), which is completely useless because preemption is not
      disabled. But the interesting information to decode the above, i.e. the RCU
      nesting depth, is not printed.
      
      That becomes even more confusing when might_sleep() is invoked from
      cond_resched_lock() within a RCU read side critical section. Here the
      expected preemption count is 1 and not 0.
      
       BUG: sleeping function called from invalid context at kernel/test.c:131
       in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 415, name: kworker/u112:52
       Preemption disabled at: test_cond_lock+0xf3/0x1c0
      
      So in_atomic() is set, which is expected as the caller holds a spinlock,
      but it's unclear why this is broken and the preempt disable IP is just
      pointing at the correct place, i.e. spin_lock(), which is obviously not
      helpful either.
      
      Make that more useful in general:
      
       - Print preempt_count() and the expected value
      
      and for the CONFIG_PREEMPT_RCU case:
      
       - Print the RCU read side critical section nesting depth
      
       - Print the preempt disable IP only when preempt count
         does not have the expected value.
      
      So the might_sleep() dump from a within a preemptible RCU read side
      critical section becomes:
      
       BUG: sleeping function called from invalid context at kernel/test.c:110
       in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 415, name: kworker/u112:52
       preempt_count: 0, expected: 0
       RCU nest depth: 1, expected: 0
      
      and the cond_resched_lock() case becomes:
      
       BUG: sleeping function called from invalid context at kernel/test.c:141
       in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 415, name: kworker/u112:52
       preempt_count: 1, expected: 1
       RCU nest depth: 1, expected: 0
      
      which makes is pretty obvious what's going on. For all other cases the
      preempt disable IP is still printed as before:
      
       BUG: sleeping function called from invalid context at kernel/test.c: 156
       in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
       preempt_count: 1, expected: 0
       RCU nest depth: 0, expected: 0
       Preemption disabled at:
       [<ffffffff82b48326>] test_might_sleep+0xbe/0xf8
      
       BUG: sleeping function called from invalid context at kernel/test.c: 163
       in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
       preempt_count: 1, expected: 0
       RCU nest depth: 1, expected: 0
       Preemption disabled at:
       [<ffffffff82b48326>] test_might_sleep+0x1e4/0x280
      
      This also prepares to provide a better debugging output for RT enabled
      kernels and their spinlock substitutions.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20210923165358.181022656@linutronix.de
      8d713b69
    • Thomas Gleixner's avatar
      sched: Cleanup might_sleep() printks · a45ed302
      Thomas Gleixner authored
      
      
      Convert them to pr_*(). No functional change.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20210923165358.117496067@linutronix.de
      a45ed302
    • Thomas Gleixner's avatar
      sched: Remove preempt_offset argument from __might_sleep() · 42a38756
      Thomas Gleixner authored
      
      
      All callers hand in 0 and never will hand in anything else.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20210923165358.054321586@linutronix.de
      42a38756
    • Thomas Gleixner's avatar
      sched: Make cond_resched_*lock() variants consistent vs. might_sleep() · 7b5ff4bb
      Thomas Gleixner authored
      Commit 3427445a
      
       ("sched: Exclude cond_resched() from nested sleep
      test") removed the task state check of __might_sleep() for
      cond_resched_lock() because cond_resched_lock() is not a voluntary
      scheduling point which blocks. It's a preemption point which requires the
      lock holder to release the spin lock.
      
      The same rationale applies to cond_resched_rwlock_read/write(), but those
      were not touched.
      
      Make it consistent and use the non-state checking __might_resched() there
      as well.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20210923165357.991262778@linutronix.de
      7b5ff4bb
    • Thomas Gleixner's avatar
      sched: Clean up the might_sleep() underscore zoo · 874f670e
      Thomas Gleixner authored
      
      
      __might_sleep() vs. ___might_sleep() is hard to distinguish. Aside of that
      the three underscore variant is exposed to provide a checkpoint for
      rescheduling points which are distinct from blocking points.
      
      They are semantically a preemption point which means that scheduling is
      state preserving. A real blocking operation, e.g. mutex_lock(), wait*(),
      which cannot preserve a task state which is not equal to RUNNING.
      
      While technically blocking on a "sleeping" spinlock in RT enabled kernels
      falls into the voluntary scheduling category because it has to wait until
      the contended spin/rw lock becomes available, the RT lock substitution code
      can semantically be mapped to a voluntary preemption because the RT lock
      substitution code and the scheduler are providing mechanisms to preserve
      the task state and to take regular non-lock related wakeups into account.
      
      Rename ___might_sleep() to __might_resched() to make the distinction of
      these functions clear.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20210923165357.928693482@linutronix.de
      874f670e
    • Nathan Chancellor's avatar
      locking/ww-mutex: Fix uninitialized use of ret in test_aa() · 1415b49b
      Nathan Chancellor authored
      Clang warns:
      
      kernel/locking/test-ww_mutex.c:138:7: error: variable 'ret' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized]
                      if (!ww_mutex_trylock(&mutex, &ctx)) {
                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      kernel/locking/test-ww_mutex.c:172:9: note: uninitialized use occurs here
              return ret;
                     ^~~
      kernel/locking/test-ww_mutex.c:138:3: note: remove the 'if' if its condition is always false
                      if (!ww_mutex_trylock(&mutex, &ctx)) {
                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      kernel/locking/test-ww_mutex.c:125:9: note: initialize the variable 'ret' to silence this warning
              int ret;
                     ^
                      = 0
      1 error generated.
      
      Assign !ww_mutex_trylock(...) to ret so that it is always initialized.
      
      Fixes: 12235da8
      
       ("kernel/locking: Add context to ww_mutex_trylock()")
      Reported-by: default avatar"kernelci.org bot" <bot@kernelci.org>
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarWaiman Long <longman@redhat.com>
      Link: https://lore.kernel.org/r/20210922145822.3935141-1-nathan@kernel.org
      1415b49b
  3. Sep 17, 2021
    • Shaokun Zhang's avatar
      locking/lockdep: Cleanup the repeated declaration · f7427ba5
      Shaokun Zhang authored
      
      
      'struct task_struct' has been decleared twice, so keep the top one and
      cleanup the repeated one.
      
      Signed-off-by: default avatarShaokun Zhang <zhangshaokun@hisilicon.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1629875224-32751-1-git-send-email-zhangshaokun@hisilicon.com
      f7427ba5
    • Zhouyi Zhou's avatar
      lockdep: Improve comments in wait-type checks · a2e05ddd
      Zhouyi Zhou authored
      
      
      Comments in wait-type checks be improved by mentioning the
      PREEPT_RT kernel configure option.
      
      Signed-off-by: default avatarZhouyi Zhou <zhouzhouyi@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Link: https://lkml.kernel.org/r/20210811025920.20751-1-zhouzhouyi@gmail.com
      a2e05ddd
    • Sebastian Andrzej Siewior's avatar
      lockdep: Let lock_is_held_type() detect recursive read as read · 2507003a
      Sebastian Andrzej Siewior authored
      lock_is_held_type(, 1) detects acquired read locks. It only recognized
      locks acquired with lock_acquire_shared(). Read locks acquired with
      lock_acquire_shared_recursive() are not recognized because a `2' is
      stored as the read value.
      
      Rework the check to additionally recognise lock's read value one and two
      as a read held lock.
      
      Fixes: e9181886
      
       ("locking: More accurate annotations for read_lock()")
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarBoqun Feng <boqun.feng@gmail.com>
      Acked-by: default avatarWaiman Long <longman@redhat.com>
      Link: https://lkml.kernel.org/r/20210903084001.lblecrvz4esl4mrr@linutronix.de
      2507003a
    • Maarten Lankhorst's avatar
      kernel/locking: Add context to ww_mutex_trylock() · 12235da8
      Maarten Lankhorst authored
      
      
      i915 will soon gain an eviction path that trylock a whole lot of locks
      for eviction, getting dmesg failures like below:
      
        BUG: MAX_LOCK_DEPTH too low!
        turning off the locking correctness validator.
        depth: 48  max: 48!
        48 locks held by i915_selftest/5776:
         #0: ffff888101a79240 (&dev->mutex){....}-{3:3}, at: __driver_attach+0x88/0x160
         #1: ffffc900009778c0 (reservation_ww_class_acquire){+.+.}-{0:0}, at: i915_vma_pin.constprop.63+0x39/0x1b0 [i915]
         #2: ffff88800cf74de8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_vma_pin.constprop.63+0x5f/0x1b0 [i915]
         #3: ffff88810c7f9e38 (&vm->mutex/1){+.+.}-{3:3}, at: i915_vma_pin_ww+0x1c4/0x9d0 [i915]
         #4: ffff88810bad5768 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_gem_evict_something+0x110/0x860 [i915]
         #5: ffff88810bad60e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_gem_evict_something+0x110/0x860 [i915]
        ...
         #46: ffff88811964d768 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_gem_evict_something+0x110/0x860 [i915]
         #47: ffff88811964e0e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_gem_evict_something+0x110/0x860 [i915]
        INFO: lockdep is turned off.
      
      Fixing eviction to nest into ww_class_acquire is a high priority, but
      it requires a rework of the entire driver, which can only be done one
      step at a time.
      
      As an intermediate solution, add an acquire context to
      ww_mutex_trylock, which allows us to do proper nesting annotations on
      the trylocks, making the above lockdep splat disappear.
      
      This is also useful in regulator_lock_nested, which may avoid dropping
      regulator_nesting_mutex in the uncontended path, so use it there.
      
      TTM may be another user for this, where we could lock a buffer in a
      fastpath with list locks held, without dropping all locks we hold.
      
      [peterz: rework actual ww_mutex_trylock() implementations]
      Signed-off-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/YUBGPdDDjKlxAuXJ@hirez.programming.kicks-ass.net
      12235da8
  4. Sep 15, 2021
  5. Sep 13, 2021
    • Linus Torvalds's avatar
      Linux 5.15-rc1 · 6880fa6c
      Linus Torvalds authored
      6880fa6c
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-for-v5.15-2021-09-11' of... · b5b65f13
      Linus Torvalds authored
      Merge tag 'perf-tools-for-v5.15-2021-09-11' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
      Pull more perf tools updates from Arnaldo Carvalho de Melo:
      
       - Add missing fields and remove some duplicate fields when printing a
         perf_event_attr.
      
       - Fix hybrid config terms list corruption.
      
       - Update kernel header copies, some resulted in new kernel features
         being automagically added to 'perf trace' syscall/tracepoint argument
         id->string translators.
      
       - Add a file generated during the documentation build to .gitignore.
      
       - Add an option to build without libbfd, as some distros, like Debian
         consider its ABI unstable.
      
       - Add support to print a textual representation of IBS raw sample data
         in 'perf report'.
      
       - Fix bpf 'perf test' sample mismatch reporting
      
       - Fix passing arguments to stackcollapse report in a 'perf script'
         python script.
      
       - Allow build-id with trailing zeros.
      
       - Look for ImageBase in PE file to compute .text offset.
      
      * tag 'perf-tools-for-v5.15-2021-09-11' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (25 commits)
        tools headers UAPI: Update tools's copy of drm.h headers
        tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
        tools headers UAPI: Sync linux/fs.h with the kernel sources
        tools headers UAPI: Sync linux/in.h copy with the kernel sources
        perf tools: Add an option to build without libbfd
        perf tools: Allow build-id with trailing zeros
        perf tools: Fix hybrid config terms list corruption
        perf tools: Factor out copy_config_terms() and free_config_terms()
        perf tools: Fix perf_event_attr__fprintf() missing/dupl. fields
        perf tools: Ignore Documentation dependency file
        perf bpf: Provide a weak btf__load_from_kernel_by_id() for older libbpf versions
        tools include UAPI: Update linux/mount.h copy
        perf beauty: Cover more flags in the  move_mount syscall argument beautifier
        tools headers UAPI: Sync linux/prctl.h with the kernel sources
        tools include UAPI: Sync sound/asound.h copy with the kernel sources
        tools headers UAPI: Sync linux/kvm.h with the kernel sources
        tools headers UAPI: Sync x86's asm/kvm.h with the kernel sources
        perf report: Add support to print a textual representation of IBS raw sample data
        perf report: Add tools/arch/x86/include/asm/amd-ibs.h
        perf env: Add perf_env__cpuid, perf_env__{nr_}pmu_mappings
        ...
      b5b65f13
    • Linus Torvalds's avatar
      Merge tag 'compiler-attributes-for-linus-v5.15-rc1-v2' of git://github.com/ojeda/linux · c3e46874
      Linus Torvalds authored
      Pull compiler attributes updates from Miguel Ojeda:
      
       - Fix __has_attribute(__no_sanitize_coverage__) for GCC 4 (Marco Elver)
      
       - Add Nick as Reviewer for compiler_attributes.h (Nick Desaulniers)
      
       - Move __compiletime_{error|warning} (Nick Desaulniers)
      
      * tag 'compiler-attributes-for-linus-v5.15-rc1-v2' of git://github.com/ojeda/linux:
        compiler_attributes.h: move __compiletime_{error|warning}
        MAINTAINERS: add Nick as Reviewer for compiler_attributes.h
        Compiler Attributes: fix __has_attribute(__no_sanitize_coverage__) for GCC 4
      c3e46874
    • Linus Torvalds's avatar
      Merge tag 'auxdisplay-for-linus-v5.15-rc1' of git://github.com/ojeda/linux · d41adc4e
      Linus Torvalds authored
      Pull auxdisplay updates from Miguel Ojeda:
       "An assortment of improvements for auxdisplay:
      
         - Replace symbolic permissions with octal permissions (Jinchao Wang)
      
         - ks0108: Switch to use module_parport_driver() (Andy Shevchenko)
      
         - charlcd: Drop unneeded initializers and switch to C99 style (Andy
           Shevchenko)
      
         - hd44780: Fix oops on module unloading (Lars Poeschel)
      
         - Add I2C gpio expander example (Ralf Schlatterbeck)"
      
      * tag 'auxdisplay-for-linus-v5.15-rc1' of git://github.com/ojeda/linux:
        auxdisplay: Replace symbolic permissions with octal permissions
        auxdisplay: ks0108: Switch to use module_parport_driver()
        auxdisplay: charlcd: Drop unneeded initializers and switch to C99 style
        auxdisplay: hd44780: Fix oops on module unloading
        auxdisplay: Add I2C gpio expander example
      d41adc4e