Skip to content
  1. Dec 13, 2023
    • Steven Rostedt (Google)'s avatar
      ring-buffer: Do not update before stamp when switching sub-buffers · 9e45e39d
      Steven Rostedt (Google) authored
      The ring buffer timestamps are synchronized by two timestamp placeholders.
      One is the "before_stamp" and the other is the "write_stamp" (sometimes
      referred to as the "after stamp" but only in the comments. These two
      stamps are key to knowing how to handle nested events coming in with a
      lockless system.
      
      When moving across sub-buffers, the before stamp is updated but the write
      stamp is not. There's an effort to put back the before stamp to something
      that seems logical in case there's nested events. But as the current event
      is about to cross sub-buffers, and so will any new nested event that happens,
      updating the before stamp is useless, and could even introduce new race
      conditions.
      
      The first event on a sub-buffer simply uses the sub-buffer's timestamp
      and keeps a "delta" of zero. The "before_stamp" and "write_stamp" are not
      used in the algorithm in this case. There's no reason to try to fix the
      before_stamp when this happens.
      
      As a bonus, it removes a cmpxchg() when crossing sub-buffers!
      
      Link: https://lore.kernel.org/linux-trace-kernel/20231211114420.36dde01b@gandalf.local.home
      
      Cc: stable@vger.kernel.org
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Fixes: a389d86f
      
       ("ring-buffer: Have nested events still record running time stamp")
      Reviewed-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      9e45e39d
    • Steven Rostedt (Google)'s avatar
      tracing: Update snapshot buffer on resize if it is allocated · d06aff1c
      Steven Rostedt (Google) authored
      The snapshot buffer is to mimic the main buffer so that when a snapshot is
      needed, the snapshot and main buffer are swapped. When the snapshot buffer
      is allocated, it is set to the minimal size that the ring buffer may be at
      and still functional. When it is allocated it becomes the same size as the
      main ring buffer, and when the main ring buffer changes in size, it should
      do.
      
      Currently, the resize only updates the snapshot buffer if it's used by the
      current tracer (ie. the preemptirqsoff tracer). But it needs to be updated
      anytime it is allocated.
      
      When changing the size of the main buffer, instead of looking to see if
      the current tracer is utilizing the snapshot buffer, just check if it is
      allocated to know if it should be updated or not.
      
      Also fix typo in comment just above the code change.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20231210225447.48476a6a@rorschach.local.home
      
      Cc: stable@vger.kernel.org
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Fixes: ad909e21
      
       ("tracing: Add internal tracing_snapshot() functions")
      Reviewed-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      d06aff1c
    • Steven Rostedt (Google)'s avatar
      ring-buffer: Fix memory leak of free page · 17d80175
      Steven Rostedt (Google) authored
      Reading the ring buffer does a swap of a sub-buffer within the ring buffer
      with a empty sub-buffer. This allows the reader to have full access to the
      content of the sub-buffer that was swapped out without having to worry
      about contention with the writer.
      
      The readers call ring_buffer_alloc_read_page() to allocate a page that
      will be used to swap with the ring buffer. When the code is finished with
      the reader page, it calls ring_buffer_free_read_page(). Instead of freeing
      the page, it stores it as a spare. Then next call to
      ring_buffer_alloc_read_page() will return this spare instead of calling
      into the memory management system to allocate a new page.
      
      Unfortunately, on freeing of the ring buffer, this spare page is not
      freed, and causes a memory leak.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20231210221250.7b9cc83c@rorschach.local.home
      
      Cc: stable@vger.kernel.org
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Fixes: 73a757e6
      
       ("ring-buffer: Return reader page back into existing ring buffer")
      Acked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      17d80175
    • Beau Belgrave's avatar
      eventfs: Fix events beyond NAME_MAX blocking tasks · 5eaf7f05
      Beau Belgrave authored
      Eventfs uses simple_lookup(), however, it will fail if the name of the
      entry is beyond NAME_MAX length. When this error is encountered, eventfs
      still tries to create dentries instead of skipping the dentry creation.
      When the dentry is attempted to be created in this state d_wait_lookup()
      will loop forever, waiting for the lookup to be removed.
      
      Fix eventfs to return the error in simple_lookup() back to the caller
      instead of continuing to try to create the dentry.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20231210213534.497-1-beaub@linux.microsoft.com
      
      Fixes: 63940449 ("eventfs: Implement eventfs lookup, read, open functions")
      Link: https://lore.kernel.org/linux-trace-kernel/20231208183601.GA46-beaub@linux.microsoft.com/
      
      
      Signed-off-by: default avatarBeau Belgrave <beaub@linux.microsoft.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      5eaf7f05
    • Steven Rostedt (Google)'s avatar
      tracing: Have large events show up as '[LINE TOO BIG]' instead of nothing · b55b0a0d
      Steven Rostedt (Google) authored
      If a large event was added to the ring buffer that is larger than what the
      trace_seq can handle, it just drops the output:
      
       ~# cat /sys/kernel/tracing/trace
       # tracer: nop
       #
       # entries-in-buffer/entries-written: 2/2   #P:8
       #
       #                                _-----=> irqs-off/BH-disabled
       #                               / _----=> need-resched
       #                              | / _---=> hardirq/softirq
       #                              || / _--=> preempt-depth
       #                              ||| / _-=> migrate-disable
       #                              |||| /     delay
       #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
       #              | |         |   |||||     |         |
                  <...>-859     [001] .....   141.118951: tracing_mark_write           <...>-859     [001] .....   141.148201: tracing_mark_write: 78901234
      
      Instead, catch this case and add some context:
      
       ~# cat /sys/kernel/tracing/trace
       # tracer: nop
       #
       # entries-in-buffer/entries-written: 2/2   #P:8
       #
       #                                _-----=> irqs-off/BH-disabled
       #                               / _----=> need-resched
       #                              | / _---=> hardirq/softirq
       #                              || / _--=> preempt-depth
       #                              ||| / _-=> migrate-disable
       #                              |||| /     delay
       #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
       #              | |         |   |||||     |         |
                  <...>-852     [001] .....   121.550551: tracing_mark_write[LINE TOO BIG]
                  <...>-852     [001] .....   121.550581: tracing_mark_write: 78901234
      
      This now emulates the same output as trace_pipe.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20231209171058.78c1a026@gandalf.local.home
      
      
      
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Reviewed-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      b55b0a0d
    • Steven Rostedt (Google)'s avatar
      ring-buffer: Fix writing to the buffer with max_data_size · b3ae7b67
      Steven Rostedt (Google) authored
      The maximum ring buffer data size is the maximum size of data that can be
      recorded on the ring buffer. Events must be smaller than the sub buffer
      data size minus any meta data. This size is checked before trying to
      allocate from the ring buffer because the allocation assumes that the size
      will fit on the sub buffer.
      
      The maximum size was calculated as the size of a sub buffer page (which is
      currently PAGE_SIZE minus the sub buffer header) minus the size of the
      meta data of an individual event. But it missed the possible adding of a
      time stamp for events that are added long enough apart that the event meta
      data can't hold the time delta.
      
      When an event is added that is greater than the current BUF_MAX_DATA_SIZE
      minus the size of a time stamp, but still less than or equal to
      BUF_MAX_DATA_SIZE, the ring buffer would go into an infinite loop, looking
      for a page that can hold the event. Luckily, there's a check for this loop
      and after 1000 iterations and a warning is emitted and the ring buffer is
      disabled. But this should never happen.
      
      This can happen when a large event is added first, or after a long period
      where an absolute timestamp is prefixed to the event, increasing its size
      by 8 bytes. This passes the check and then goes into the algorithm that
      causes the infinite loop.
      
      For events that are the first event on the sub-buffer, it does not need to
      add a timestamp, because the sub-buffer itself contains an absolute
      timestamp, and adding one is redundant.
      
      The fix is to check if the event is to be the first event on the
      sub-buffer, and if it is, then do not add a timestamp.
      
      This also fixes 32 bit adding a timestamp when a read of before_stamp or
      write_stamp is interrupted. There's still no need to add that timestamp if
      the event is going to be the first event on the sub buffer.
      
      Also, if the buffer has "time_stamp_abs" set, then also check if the
      length plus the timestamp is greater than the BUF_MAX_DATA_SIZE.
      
      Link: https://lore.kernel.org/all/20231212104549.58863438@gandalf.local.home/
      Link: https://lore.kernel.org/linux-trace-kernel/20231212071837.5fdd6c13@gandalf.local.home
      Link: https://lore.kernel.org/linux-trace-kernel/20231212111617.39e02849@gandalf.local.home
      
      Cc: stable@vger.kernel.org
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Fixes: a4543a2f ("ring-buffer: Get timestamp after event is allocated")
      Fixes: 58fbc3c6
      
       ("ring-buffer: Consolidate add_timestamp to remove some branches")
      Reported-by: Kent Overstreet <kent.overstreet@linux.dev> # (on IRC)
      Acked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      b3ae7b67
  2. Dec 07, 2023
    • Steven Rostedt (Google)'s avatar
      ring-buffer: Test last update in 32bit version of __rb_time_read() · f458a145
      Steven Rostedt (Google) authored
      Since 64 bit cmpxchg() is very expensive on 32bit architectures, the
      timestamp used by the ring buffer does some interesting tricks to be able
      to still have an atomic 64 bit number. It originally just used 60 bits and
      broke it up into two 32 bit words where the extra 2 bits were used for
      synchronization. But this was not enough for all use cases, and all 64
      bits were required.
      
      The 32bit version of the ring buffer timestamp was then broken up into 3
      32bit words using the same counter trick. But one update was not done. The
      check to see if the read operation was done without interruption only
      checked the first two words and not last one (like it had before this
      update). Fix it by making sure all three updates happen without
      interruption by comparing the initial counter with the last updated
      counter.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20231206100050.3100b7bb@gandalf.local.home
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Fixes: f03f2abc
      
       ("ring-buffer: Have 32 bit time stamps use all 64 bits")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      f458a145
    • Steven Rostedt (Google)'s avatar
      ring-buffer: Force absolute timestamp on discard of event · b2dd7975
      Steven Rostedt (Google) authored
      There's a race where if an event is discarded from the ring buffer and an
      interrupt were to happen at that time and insert an event, the time stamp
      is still used from the discarded event as an offset. This can screw up the
      timings.
      
      If the event is going to be discarded, set the "before_stamp" to zero.
      When a new event comes in, it compares the "before_stamp" with the
      "write_stamp" and if they are not equal, it will insert an absolute
      timestamp. This will prevent the timings from getting out of sync due to
      the discarded event.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20231206100244.5130f9b3@gandalf.local.home
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Fixes: 6f6be606
      
       ("ring-buffer: Force before_stamp and write_stamp to be different on discard")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      b2dd7975
  3. Dec 06, 2023
    • Petr Pavlu's avatar
      tracing: Fix a possible race when disabling buffered events · c0591b1c
      Petr Pavlu authored
      Function trace_buffered_event_disable() is responsible for freeing pages
      backing buffered events and this process can run concurrently with
      trace_event_buffer_lock_reserve().
      
      The following race is currently possible:
      
      * Function trace_buffered_event_disable() is called on CPU 0. It
        increments trace_buffered_event_cnt on each CPU and waits via
        synchronize_rcu() for each user of trace_buffered_event to complete.
      
      * After synchronize_rcu() is finished, function
        trace_buffered_event_disable() has the exclusive access to
        trace_buffered_event. All counters trace_buffered_event_cnt are at 1
        and all pointers trace_buffered_event are still valid.
      
      * At this point, on a different CPU 1, the execution reaches
        trace_event_buffer_lock_reserve(). The function calls
        preempt_disable_notrace() and only now enters an RCU read-side
        critical section. The function proceeds and reads a still valid
        pointer from trace_buffered_event[CPU1] into the local variable
        "entry". However, it doesn't yet read trace_buffered_event_cnt[CPU1]
        which happens later.
      
      * Function trace_buffered_event_disable() continues. It frees
        trace_buffered_event[CPU1] and decrements
        trace_buffered_event_cnt[CPU1] back to 0.
      
      * Function trace_event_buffer_lock_reserve() continues. It reads and
        increments trace_buffered_event_cnt[CPU1] from 0 to 1. This makes it
        believe that it can use the "entry" that it already obtained but the
        pointer is now invalid and any access results in a use-after-free.
      
      Fix the problem by making a second synchronize_rcu() call after all
      trace_buffered_event values are set to NULL. This waits on all potential
      users in trace_event_buffer_lock_reserve() that still read a previous
      pointer from trace_buffered_event.
      
      Link: https://lore.kernel.org/all/20231127151248.7232-2-petr.pavlu@suse.com/
      Link: https://lkml.kernel.org/r/20231205161736.19663-4-petr.pavlu@suse.com
      
      Cc: stable@vger.kernel.org
      Fixes: 0fc1b09f
      
       ("tracing: Use temp buffer when filtering events")
      Signed-off-by: default avatarPetr Pavlu <petr.pavlu@suse.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      c0591b1c
    • Petr Pavlu's avatar
      tracing: Fix a warning when allocating buffered events fails · 34209fe8
      Petr Pavlu authored
      Function trace_buffered_event_disable() produces an unexpected warning
      when the previous call to trace_buffered_event_enable() fails to
      allocate pages for buffered events.
      
      The situation can occur as follows:
      
      * The counter trace_buffered_event_ref is at 0.
      
      * The soft mode gets enabled for some event and
        trace_buffered_event_enable() is called. The function increments
        trace_buffered_event_ref to 1 and starts allocating event pages.
      
      * The allocation fails for some page and trace_buffered_event_disable()
        is called for cleanup.
      
      * Function trace_buffered_event_disable() decrements
        trace_buffered_event_ref back to 0, recognizes that it was the last
        use of buffered events and frees all allocated pages.
      
      * The control goes back to trace_buffered_event_enable() which returns.
        The caller of trace_buffered_event_enable() has no information that
        the function actually failed.
      
      * Some time later, the soft mode is disabled for the same event.
        Function trace_buffered_event_disable() is called. It warns on
        "WARN_ON_ONCE(!trace_buffered_event_ref)" and returns.
      
      Buffered events are just an optimization and can handle failures. Make
      trace_buffered_event_enable() exit on the first failure and left any
      cleanup later to when trace_buffered_event_disable() is called.
      
      Link: https://lore.kernel.org/all/20231127151248.7232-2-petr.pavlu@suse.com/
      Link: https://lkml.kernel.org/r/20231205161736.19663-3-petr.pavlu@suse.com
      
      Fixes: 0fc1b09f
      
       ("tracing: Use temp buffer when filtering events")
      Signed-off-by: default avatarPetr Pavlu <petr.pavlu@suse.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      34209fe8
    • Petr Pavlu's avatar
      tracing: Fix incomplete locking when disabling buffered events · 7fed14f7
      Petr Pavlu authored
      The following warning appears when using buffered events:
      
      [  203.556451] WARNING: CPU: 53 PID: 10220 at kernel/trace/ring_buffer.c:3912 ring_buffer_discard_commit+0x2eb/0x420
      [...]
      [  203.670690] CPU: 53 PID: 10220 Comm: stress-ng-sysin Tainted: G            E      6.7.0-rc2-default #4 56e6d0fcf5581e6e51eaaecbdaec2a2338c80f3a
      [  203.670704] Hardware name: Intel Corp. GROVEPORT/GROVEPORT, BIOS GVPRCRB1.86B.0016.D04.1705030402 05/03/2017
      [  203.670709] RIP: 0010:ring_buffer_discard_commit+0x2eb/0x420
      [  203.735721] Code: 4c 8b 4a 50 48 8b 42 48 49 39 c1 0f 84 b3 00 00 00 49 83 e8 01 75 b1 48 8b 42 10 f0 ff 40 08 0f 0b e9 fc fe ff ff f0 ff 47 08 <0f> 0b e9 77 fd ff ff 48 8b 42 10 f0 ff 40 08 0f 0b e9 f5 fe ff ff
      [  203.735734] RSP: 0018:ffffb4ae4f7b7d80 EFLAGS: 00010202
      [  203.735745] RAX: 0000000000000000 RBX: ffffb4ae4f7b7de0 RCX: ffff8ac10662c000
      [  203.735754] RDX: ffff8ac0c750be00 RSI: ffff8ac10662c000 RDI: ffff8ac0c004d400
      [  203.781832] RBP: ffff8ac0c039cea0 R08: 0000000000000000 R09: 0000000000000000
      [  203.781839] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      [  203.781842] R13: ffff8ac10662c000 R14: ffff8ac0c004d400 R15: ffff8ac10662c008
      [  203.781846] FS:  00007f4cd8a67740(0000) GS:ffff8ad798880000(0000) knlGS:0000000000000000
      [  203.781851] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  203.781855] CR2: 0000559766a74028 CR3: 00000001804c4000 CR4: 00000000001506f0
      [  203.781862] Call Trace:
      [  203.781870]  <TASK>
      [  203.851949]  trace_event_buffer_commit+0x1ea/0x250
      [  203.851967]  trace_event_raw_event_sys_enter+0x83/0xe0
      [  203.851983]  syscall_trace_enter.isra.0+0x182/0x1a0
      [  203.851990]  do_syscall_64+0x3a/0xe0
      [  203.852075]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
      [  203.852090] RIP: 0033:0x7f4cd870fa77
      [  203.982920] Code: 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 b8 89 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e9 43 0e 00 f7 d8 64 89 01 48
      [  203.982932] RSP: 002b:00007fff99717dd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000089
      [  203.982942] RAX: ffffffffffffffda RBX: 0000558ea1d7b6f0 RCX: 00007f4cd870fa77
      [  203.982948] RDX: 0000000000000000 RSI: 00007fff99717de0 RDI: 0000558ea1d7b6f0
      [  203.982957] RBP: 00007fff99717de0 R08: 00007fff997180e0 R09: 00007fff997180e0
      [  203.982962] R10: 00007fff997180e0 R11: 0000000000000246 R12: 00007fff99717f40
      [  204.049239] R13: 00007fff99718590 R14: 0000558e9f2127a8 R15: 00007fff997180b0
      [  204.049256]  </TASK>
      
      For instance, it can be triggered by running these two commands in
      parallel:
      
       $ while true; do
          echo hist:key=id.syscall:val=hitcount > \
            /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger;
        done
       $ stress-ng --sysinfo $(nproc)
      
      The warning indicates that the current ring_buffer_per_cpu is not in the
      committing state. It happens because the active ring_buffer_event
      doesn't actually come from the ring_buffer_per_cpu but is allocated from
      trace_buffered_event.
      
      The bug is in function trace_buffered_event_disable() where the
      following normally happens:
      
      * The code invokes disable_trace_buffered_event() via
        smp_call_function_many() and follows it by synchronize_rcu(). This
        increments the per-CPU variable trace_buffered_event_cnt on each
        target CPU and grants trace_buffered_event_disable() the exclusive
        access to the per-CPU variable trace_buffered_event.
      
      * Maintenance is performed on trace_buffered_event, all per-CPU event
        buffers get freed.
      
      * The code invokes enable_trace_buffered_event() via
        smp_call_function_many(). This decrements trace_buffered_event_cnt and
        releases the access to trace_buffered_event.
      
      A problem is that smp_call_function_many() runs a given function on all
      target CPUs except on the current one. The following can then occur:
      
      * Task X executing trace_buffered_event_disable() runs on CPU 0.
      
      * The control reaches synchronize_rcu() and the task gets rescheduled on
        another CPU 1.
      
      * The RCU synchronization finishes. At this point,
        trace_buffered_event_disable() has the exclusive access to all
        trace_buffered_event variables except trace_buffered_event[CPU0]
        because trace_buffered_event_cnt[CPU0] is never incremented and if the
        buffer is currently unused, remains set to 0.
      
      * A different task Y is scheduled on CPU 0 and hits a trace event. The
        code in trace_event_buffer_lock_reserve() sees that
        trace_buffered_event_cnt[CPU0] is set to 0 and decides the use the
        buffer provided by trace_buffered_event[CPU0].
      
      * Task X continues its execution in trace_buffered_event_disable(). The
        code incorrectly frees the event buffer pointed by
        trace_buffered_event[CPU0] and resets the variable to NULL.
      
      * Task Y writes event data to the now freed buffer and later detects the
        created inconsistency.
      
      The issue is observable since commit dea49978 ("tracing: Fix warning
      in trace_buffered_event_disable()") which moved the call of
      trace_buffered_event_disable() in __ftrace_event_enable_disable()
      earlier, prior to invoking call->class->reg(.. TRACE_REG_UNREGISTER ..).
      The underlying problem in trace_buffered_event_disable() is however
      present since the original implementation in commit 0fc1b09f
      ("tracing: Use temp buffer when filtering events").
      
      Fix the problem by replacing the two smp_call_function_many() calls with
      on_each_cpu_mask() which invokes a given callback on all CPUs.
      
      Link: https://lore.kernel.org/all/20231127151248.7232-2-petr.pavlu@suse.com/
      Link: https://lkml.kernel.org/r/20231205161736.19663-2-petr.pavlu@suse.com
      
      Cc: stable@vger.kernel.org
      Fixes: 0fc1b09f ("tracing: Use temp buffer when filtering events")
      Fixes: dea49978
      
       ("tracing: Fix warning in trace_buffered_event_disable()")
      Signed-off-by: default avatarPetr Pavlu <petr.pavlu@suse.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      7fed14f7
    • Steven Rostedt (Google)'s avatar
      tracing: Disable snapshot buffer when stopping instance tracers · b538bf7d
      Steven Rostedt (Google) authored
      It use to be that only the top level instance had a snapshot buffer (for
      latency tracers like wakeup and irqsoff). When stopping a tracer in an
      instance would not disable the snapshot buffer. This could have some
      unintended consequences if the irqsoff tracer is enabled.
      
      Consolidate the tracing_start/stop() with tracing_start/stop_tr() so that
      all instances behave the same. The tracing_start/stop() functions will
      just call their respective tracing_start/stop_tr() with the global_array
      passed in.
      
      Link: https://lkml.kernel.org/r/20231205220011.041220035@goodmis.org
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Fixes: 6d9b3fa5
      
       ("tracing: Move tracing_max_latency into trace_array")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      b538bf7d
    • Steven Rostedt (Google)'s avatar
      tracing: Stop current tracer when resizing buffer · d78ab792
      Steven Rostedt (Google) authored
      When the ring buffer is being resized, it can cause side effects to the
      running tracer. For instance, there's a race with irqsoff tracer that
      swaps individual per cpu buffers between the main buffer and the snapshot
      buffer. The resize operation modifies the main buffer and then the
      snapshot buffer. If a swap happens in between those two operations it will
      break the tracer.
      
      Simply stop the running tracer before resizing the buffers and enable it
      again when finished.
      
      Link: https://lkml.kernel.org/r/20231205220010.748996423@goodmis.org
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Fixes: 3928a8a2
      
       ("ftrace: make work with new ring buffer")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      d78ab792
    • Steven Rostedt (Google)'s avatar
      tracing: Always update snapshot buffer size · 7be76461
      Steven Rostedt (Google) authored
      It use to be that only the top level instance had a snapshot buffer (for
      latency tracers like wakeup and irqsoff). The update of the ring buffer
      size would check if the instance was the top level and if so, it would
      also update the snapshot buffer as it needs to be the same as the main
      buffer.
      
      Now that lower level instances also has a snapshot buffer, they too need
      to update their snapshot buffer sizes when the main buffer is changed,
      otherwise the following can be triggered:
      
       # cd /sys/kernel/tracing
       # echo 1500 > buffer_size_kb
       # mkdir instances/foo
       # echo irqsoff > instances/foo/current_tracer
       # echo 1000 > instances/foo/buffer_size_kb
      
      Produces:
      
       WARNING: CPU: 2 PID: 856 at kernel/trace/trace.c:1938 update_max_tr_single.part.0+0x27d/0x320
      
      Which is:
      
      	ret = ring_buffer_swap_cpu(tr->max_buffer.buffer, tr->array_buffer.buffer, cpu);
      
      	if (ret == -EBUSY) {
      		[..]
      	}
      
      	WARN_ON_ONCE(ret && ret != -EAGAIN && ret != -EBUSY);  <== here
      
      That's because ring_buffer_swap_cpu() has:
      
      	int ret = -EINVAL;
      
      	[..]
      
      	/* At least make sure the two buffers are somewhat the same */
      	if (cpu_buffer_a->nr_pages != cpu_buffer_b->nr_pages)
      		goto out;
      
      	[..]
       out:
      	return ret;
       }
      
      Instead, update all instances' snapshot buffer sizes when their main
      buffer size is updated.
      
      Link: https://lkml.kernel.org/r/20231205220010.454662151@goodmis.org
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Fixes: 6d9b3fa5
      
       ("tracing: Move tracing_max_latency into trace_array")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      7be76461
  4. Nov 23, 2023
  5. Nov 21, 2023
  6. Nov 20, 2023
  7. Nov 19, 2023
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 037266a5
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Seven small fixes, six in drivers and one in sd.
      
        The sd fix is so large because it changes a struct pointer to a struct
        but otherwise is fairly simple"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: ufs: qcom-ufs: dt-bindings: Document the SM8650 UFS Controller
        scsi: sd: Fix sshdr use in sd_suspend_common()
        scsi: scsi_debug: Delete some bogus error checking
        scsi: scsi_debug: Fix some bugs in sdebug_error_write()
        scsi: ufs: core: Fix racing issue between ufshcd_mcq_abort() and ISR
        scsi: ufs: core: Expand MCQ queue slot to DeviceQueueDepth + 1
        scsi: qla2xxx: Fix system crash due to bad pointer access
      037266a5
    • Linus Torvalds's avatar
      Merge tag 'parisc-for-6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · 2254005e
      Linus Torvalds authored
      Pull parisc fixes from Helge Deller:
       "On parisc we still sometimes need writeable stacks, e.g. if programs
        aren't compiled with gcc-14. To avoid issues with the upcoming
        systemd-254 we therefore have to disable prctl(PR_SET_MDWE) for now
        (for parisc only).
      
        The other two patches are minor: a bugfix for the soft power-off on
        qemu with 64-bit kernel and prefer strscpy() over strlcpy():
      
         - Fix power soft-off on qemu
      
         - Disable prctl(PR_SET_MDWE) since parisc sometimes still needs
           writeable stacks
      
         - Use strscpy instead of strlcpy in show_cpuinfo()"
      
      * tag 'parisc-for-6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        prctl: Disable prctl(PR_SET_MDWE) on parisc
        parisc/power: Fix power soft-off when running on qemu
        parisc: Replace strlcpy() with strscpy()
      2254005e
    • Linus Torvalds's avatar
      Merge tag 'xfs-6.7-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · b8f1fa24
      Linus Torvalds authored
      Pull xfs fixes from Chandan Babu:
      
       - Fix deadlock arising due to intent items in AIL not being cleared
         when log recovery fails
      
       - Fix stale data exposure bug when remapping COW fork extents to data
         fork
      
       - Fix deadlock when data device flush fails
      
       - Fix AGFL minimum size calculation
      
       - Select DEBUG_FS instead of XFS_DEBUG when XFS_ONLINE_SCRUB_STATS is
         selected
      
       - Fix corruption of log inode's extent count field when NREXT64 feature
         is enabled
      
      * tag 'xfs-6.7-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: recovery should not clear di_flushiter unconditionally
        xfs: inode recovery does not validate the recovered inode
        xfs: fix again select in kconfig XFS_ONLINE_SCRUB_STATS
        xfs: fix internal error from AGFL exhaustion
        xfs: up(ic_sema) if flushing data device fails
        xfs: only remap the written blocks in xfs_reflink_end_cow_extent
        XFS: Update MAINTAINERS to catch all XFS documentation
        xfs: abort intent items when recovery intents fail
        xfs: factor out xfs_defer_pending_abort
      b8f1fa24
    • Linus Torvalds's avatar
      Merge tag 'nfsd-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux · bb28378a
      Linus Torvalds authored
      Pull nfsd fixes from Chuck Lever:
      
       - Fix several long-standing bugs in the duplicate reply cache
      
       - Fix a memory leak
      
      * tag 'nfsd-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
        NFSD: Fix checksum mismatches in the duplicate reply cache
        NFSD: Fix "start of NFS reply" pointer passed to nfsd_cache_update()
        NFSD: Update nfsd_cache_append() to use xdr_stream
        nfsd: fix file memleak on client_opens_release
      bb28378a
    • Linus Torvalds's avatar
      Merge tag '6.7-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 · 33b63f15
      Linus Torvalds authored
      Pull smb client fixes from Steve French:
      
       - multichannel fixes (including a lock ordering fix and an important
         refcounting fix)
      
       - spnego fix
      
      * tag '6.7-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: fix lock ordering while disabling multichannel
        cifs: fix leak of iface for primary channel
        cifs: fix check of rc in function generate_smb3signingkey
        cifs: spnego: add ';' in HOST_KEY_LEN
      33b63f15
    • Helge Deller's avatar
      prctl: Disable prctl(PR_SET_MDWE) on parisc · 79383813
      Helge Deller authored
      
      
      systemd-254 tries to use prctl(PR_SET_MDWE) for it's MemoryDenyWriteExecute
      functionality, but fails on parisc which still needs executable stacks in
      certain combinations of gcc/glibc/kernel.
      
      Disable prctl(PR_SET_MDWE) by returning -EINVAL for now on parisc, until
      userspace has catched up.
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Co-developed-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Reported-by: default avatarSam James <sam@gentoo.org>
      Closes: https://github.com/systemd/systemd/issues/29775
      
      
      Tested-by: default avatarSam James <sam@gentoo.org>
      Link: https://lore.kernel.org/all/875y2jro9a.fsf@gentoo.org/
      Cc: <stable@vger.kernel.org> # v6.3+
      79383813
    • Linus Torvalds's avatar
      Merge tag 'for-6.7/dm-fixes' of... · 05aa69b0
      Linus Torvalds authored
      Merge tag 'for-6.7/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
      
      Pull device mapper fixes from Mike Snitzer:
      
       - Various fixes for the DM delay target to address regressions
         introduced during the 6.7 merge window
      
       - Fixes to both DM bufio and the verity target for no-sleep mode,
         to address sleeping while atomic issues
      
       - Update DM crypt target in response to the treewide change that
         made MAX_ORDER inclusive
      
      * tag 'for-6.7/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
        dm-crypt: start allocating with MAX_ORDER
        dm-verity: don't use blocking calls from tasklets
        dm-bufio: fix no-sleep mode
        dm-delay: avoid duplicate logic
        dm-delay: fix bugs introduced by kthread mode
        dm-delay: fix a race between delay_presuspend and delay_bio
      05aa69b0
    • Helge Deller's avatar
      parisc/power: Fix power soft-off when running on qemu · 6ad6e15a
      Helge Deller authored
      Firmware returns the physical address of the power switch,
      so need to use gsc_writel() instead of direct memory access.
      
      Fixes: d0c21947
      
       ("parisc/power: Add power soft-off when running on qemu")
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Cc: stable@vger.kernel.org # v6.0+
      6ad6e15a
    • Kees Cook's avatar
      parisc: Replace strlcpy() with strscpy() · 721d28f3
      Kees Cook authored
      strlcpy() reads the entire source buffer first. This read may exceed
      the destination size limit. This is both inefficient and can lead
      to linear read overflows if a source string is not NUL-terminated[1].
      Additionally, it returns the size of the source string, not the
      resulting size of the destination string. In an effort to remove strlcpy()
      completely[2], replace strlcpy() here with strscpy().
      
      Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy [1]
      Link: https://github.com/KSPP/linux/issues/89
      
       [2]
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Azeem Shaikh <azeemshaikh38@gmail.com>
      Cc: linux-parisc@vger.kernel.org
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      721d28f3
    • Linus Torvalds's avatar
      Merge tag 'i2c-for-6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 23dfa043
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
       "Revert a not-working conversion to generic recovery for PXA,
        use proper IO accessors for designware, and use proper PM level
        for ocores to allow accessing interrupt providers late"
      
      * tag 'i2c-for-6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: ocores: Move system PM hooks to the NOIRQ phase
        i2c: designware: Fix corrupted memory seen in the ISR
        Revert "i2c: pxa: move to generic GPIO recovery"
      23dfa043
    • Linus Torvalds's avatar
      Merge tag 'turbostat-2023.11.07' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux · 9ea991a5
      Linus Torvalds authored
      Pull turbostat updates from Len Brown:
      
       - Turbostat features are now table-driven (Rui Zhang)
      
       - Add support for some new platforms (Sumeet Pawnikar, Rui Zhang)
      
       - Gracefully run in configs when CPUs are limited (Rui Zhang, Srinivas
         Pandruvada)
      
       - misc minor fixes
      
      [ This came in during the merge window, but sorting out the signed tag
        took a while, so thus the late merge   - Linus ]
      
      * tag 'turbostat-2023.11.07' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (86 commits)
        tools/power turbostat: version 2023.11.07
        tools/power/turbostat: bugfix "--show IPC"
        tools/power/turbostat: Add initial support for LunarLake
        tools/power/turbostat: Add initial support for ArrowLake
        tools/power/turbostat: Add initial support for GrandRidge
        tools/power/turbostat: Add initial support for SierraForest
        tools/power/turbostat: Add initial support for GraniteRapids
        tools/power/turbostat: Add MSR_CORE_C1_RES support for spr_features
        tools/power/turbostat: Move process to root cgroup
        tools/power/turbostat: Handle cgroup v2 cpu limitation
        tools/power/turbostat: Abstrct function for parsing cpu string
        tools/power/turbostat: Handle offlined CPUs in cpu_subset
        tools/power/turbostat: Obey allowed CPUs for system summary
        tools/power/turbostat: Obey allowed CPUs for primary thread/core detection
        tools/power/turbostat: Abstract several functions
        tools/power/turbostat: Obey allowed CPUs during startup
        tools/power/turbostat: Obey allowed CPUs when accessing CPU counters
        tools/power/turbostat: Introduce cpu_allowed_set
        tools/power/turbostat: Remove PC7/PC9 support on ADL/RPL
        tools/power/turbostat: Enable MSR_CORE_C1_RES on recent Intel client platforms
        ...
      9ea991a5