Skip to content
  1. Mar 20, 2023
  2. Mar 15, 2023
    • Steven Rostedt (Google)'s avatar
      tracing: Make tracepoint lockdep check actually test something · c2679254
      Steven Rostedt (Google) authored
      
      
      A while ago where the trace events had the following:
      
         rcu_read_lock_sched_notrace();
         rcu_dereference_sched(...);
         rcu_read_unlock_sched_notrace();
      
      If the tracepoint is enabled, it could trigger RCU issues if called in
      the wrong place. And this warning was only triggered if lockdep was
      enabled. If the tracepoint was never enabled with lockdep, the bug would
      not be caught. To handle this, the above sequence was done when lockdep
      was enabled regardless if the tracepoint was enabled or not (although the
      always enabled code really didn't do anything, it would still trigger a
      warning).
      
      But a lot has changed since that lockdep code was added. One is, that
      sequence no longer triggers any warning. Another is, the tracepoint when
      enabled doesn't even do that sequence anymore.
      
      The main check we care about today is whether RCU is "watching" or not.
      So if lockdep is enabled, always check if rcu_is_watching() which will
      trigger a warning if it is not (tracepoints require RCU to be watching).
      
      Note, that old sequence did add a bit of overhead when lockdep was enabled,
      and with the latest kernel updates, would cause the system to slow down
      enough to trigger kernel "stalled" warnings.
      
      Link: http://lore.kernel.org/lkml/20140806181801.GA4605@redhat.com
      Link: http://lore.kernel.org/lkml/20140807175204.C257CAC5@viggo.jf.intel.com
      Link: https://lore.kernel.org/lkml/20230307184645.521db5c9@gandalf.local.home/
      Link: https://lore.kernel.org/linux-trace-kernel/20230310172856.77406446@gandalf.local.home
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "Paul E. McKenney" <paulmck@kernel.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Joel Fernandes <joel@joelfernandes.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Fixes: e6753f23
      
       ("tracepoint: Make rcuidle tracepoint callers use SRCU")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      c2679254
  3. Mar 10, 2023
    • Arnd Bergmann's avatar
      ftrace,kcfi: Define ftrace_stub_graph conditionally · aa69f814
      Arnd Bergmann authored
      When CONFIG_FUNCTION_GRAPH_TRACER is disabled, __kcfi_typeid_ftrace_stub_graph
      is missing, causing a link failure:
      
       ld.lld: error: undefined symbol: __kcfi_typeid_ftrace_stub_graph
       referenced by arch/x86/kernel/ftrace_64.o:(__cfi_ftrace_stub_graph) in archive vmlinux.a
      
      Mark the reference to it as conditional on the same symbol, as
      is done on arm64.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20230131093643.3850272-1-arnd@kernel.org
      
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@kernel.org>
      Fixes: 883bbbff ("ftrace,kcfi: Separate ftrace_stub() and ftrace_stub_graph()")
      See-also: 2598ac6e
      
       ("arm64: ftrace: Define ftrace_stub_graph only with FUNCTION_GRAPH_TRACER")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      aa69f814
    • Chen Zhongjin's avatar
      ftrace: Fix invalid address access in lookup_rec() when index is 0 · ee92fa44
      Chen Zhongjin authored
      KASAN reported follow problem:
      
       BUG: KASAN: use-after-free in lookup_rec
       Read of size 8 at addr ffff000199270ff0 by task modprobe
       CPU: 2 Comm: modprobe
       Call trace:
        kasan_report
        __asan_load8
        lookup_rec
        ftrace_location
        arch_check_ftrace_location
        check_kprobe_address_safe
        register_kprobe
      
      When checking pg->records[pg->index - 1].ip in lookup_rec(), it can get a
      pg which is newly added to ftrace_pages_start in ftrace_process_locs().
      Before the first pg->index++, index is 0 and accessing pg->records[-1].ip
      will cause this problem.
      
      Don't check the ip when pg->index is 0.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20230309080230.36064-1-chenzhongjin@huawei.com
      
      Cc: stable@vger.kernel.org
      Fixes: 9644302e
      
       ("ftrace: Speed up search by skipping pages by address")
      Suggested-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarChen Zhongjin <chenzhongjin@huawei.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      ee92fa44
    • Steven Rostedt (Google)'s avatar
      tracing: Check field value in hist_field_name() · 9f116f76
      Steven Rostedt (Google) authored
      
      
      The function hist_field_name() cannot handle being passed a NULL field
      parameter. It should never be NULL, but due to a previous bug, NULL was
      passed to the function and the kernel crashed due to a NULL dereference.
      Mark Rutland reported this to me on IRC.
      
      The bug was fixed, but to prevent future bugs from crashing the kernel,
      check the field and add a WARN_ON() if it is NULL.
      
      Link: https://lkml.kernel.org/r/20230302020810.762384440@goodmis.org
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Reported-by: default avatarMark Rutland <mark.rutland@arm.com>
      Fixes: c6afad49
      
       ("tracing: Add hist trigger 'sym' and 'sym-offset' modifiers")
      Tested-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      9f116f76
    • Steven Rostedt (Google)'s avatar
      tracing: Do not let histogram values have some modifiers · e0213434
      Steven Rostedt (Google) authored
      Histogram values can not be strings, stacktraces, graphs, symbols,
      syscalls, or grouped in buckets or log. Give an error if a value is set to
      do so.
      
      Note, the histogram code was not prepared to handle these modifiers for
      histograms and caused a bug.
      
      Mark Rutland reported:
      
       # echo 'p:copy_to_user __arch_copy_to_user n=$arg2' >> /sys/kernel/tracing/kprobe_events
       # echo 'hist:keys=n:vals=hitcount.buckets=8:sort=hitcount' > /sys/kernel/tracing/events/kprobes/copy_to_user/trigger
       # cat /sys/kernel/tracing/events/kprobes/copy_to_user/hist
      [  143.694628] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
      [  143.695190] Mem abort info:
      [  143.695362]   ESR = 0x0000000096000004
      [  143.695604]   EC = 0x25: DABT (current EL), IL = 32 bits
      [  143.695889]   SET = 0, FnV = 0
      [  143.696077]   EA = 0, S1PTW = 0
      [  143.696302]   FSC = 0x04: level 0 translation fault
      [  143.702381] Data abort info:
      [  143.702614]   ISV = 0, ISS = 0x00000004
      [  143.702832]   CM = 0, WnR = 0
      [  143.703087] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000448f9000
      [  143.703407] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
      [  143.704137] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
      [  143.704714] Modules linked in:
      [  143.705273] CPU: 0 PID: 133 Comm: cat Not tainted 6.2.0-00003-g6fc512c10a7c #3
      [  143.706138] Hardware name: linux,dummy-virt (DT)
      [  143.706723] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      [  143.707120] pc : hist_field_name.part.0+0x14/0x140
      [  143.707504] lr : hist_field_name.part.0+0x104/0x140
      [  143.707774] sp : ffff800008333a30
      [  143.707952] x29: ffff800008333a30 x28: 0000000000000001 x27: 0000000000400cc0
      [  143.708429] x26: ffffd7a653b20260 x25: 0000000000000000 x24: ffff10d303ee5800
      [  143.708776] x23: ffffd7a6539b27b0 x22: ffff10d303fb8c00 x21: 0000000000000001
      [  143.709127] x20: ffff10d303ec2000 x19: 0000000000000000 x18: 0000000000000000
      [  143.709478] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
      [  143.709824] x14: 0000000000000000 x13: 203a6f666e692072 x12: 6567676972742023
      [  143.710179] x11: 0a230a6d6172676f x10: 000000000000002c x9 : ffffd7a6521e018c
      [  143.710584] x8 : 000000000000002c x7 : 7f7f7f7f7f7f7f7f x6 : 000000000000002c
      [  143.710915] x5 : ffff10d303b0103e x4 : ffffd7a653b20261 x3 : 000000000000003d
      [  143.711239] x2 : 0000000000020001 x1 : 0000000000000001 x0 : 0000000000000000
      [  143.711746] Call trace:
      [  143.712115]  hist_field_name.part.0+0x14/0x140
      [  143.712642]  hist_field_name.part.0+0x104/0x140
      [  143.712925]  hist_field_print+0x28/0x140
      [  143.713125]  event_hist_trigger_print+0x174/0x4d0
      [  143.713348]  hist_show+0xf8/0x980
      [  143.713521]  seq_read_iter+0x1bc/0x4b0
      [  143.713711]  seq_read+0x8c/0xc4
      [  143.713876]  vfs_read+0xc8/0x2a4
      [  143.714043]  ksys_read+0x70/0xfc
      [  143.714218]  __arm64_sys_read+0x24/0x30
      [  143.714400]  invoke_syscall+0x50/0x120
      [  143.714587]  el0_svc_common.constprop.0+0x4c/0x100
      [  143.714807]  do_el0_svc+0x44/0xd0
      [  143.714970]  el0_svc+0x2c/0x84
      [  143.715134]  el0t_64_sync_handler+0xbc/0x140
      [  143.715334]  el0t_64_sync+0x190/0x194
      [  143.715742] Code: a9bd7bfd 910003fd a90153f3 aa0003f3 (f9400000)
      [  143.716510] ---[ end trace 0000000000000000 ]---
      Segmentation fault
      
      Link: https://lkml.kernel.org/r/20230302020810.559462599@goodmis.org
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Fixes: c6afad49
      
       ("tracing: Add hist trigger 'sym' and 'sym-offset' modifiers")
      Reported-by: default avatarMark Rutland <mark.rutland@arm.com>
      Tested-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      e0213434
  4. Feb 19, 2023
  5. Feb 17, 2023
    • Tom Zanussi's avatar
      tracing/histogram: Fix stacktrace histogram Documententation · d8f0ae3e
      Tom Zanussi authored
      
      
      Fix a small problem with the histogram specification in the
      Documentation, and change the example to show output using a
      stacktrace field rather than the global stacktrace.
      
      Link: https://lkml.kernel.org/r/f75f807dd4998249e513515f703a2ff7407605f4.1676063532.git.zanussi@kernel.org
      
      Signed-off-by: default avatarTom Zanussi <zanussi@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      d8f0ae3e
    • Tom Zanussi's avatar
      tracing/histogram: Fix stacktrace key · f5914b30
      Tom Zanussi authored
      
      
      The current code will always use the current stacktrace as a key even
      if a stacktrace contained in a specific event field was specified.
      
      For example, we expect to use the 'unsigned long[] stack' field in the
      below event in the histogram:
      
        # echo 's:block_lat pid_t pid; u64 delta; unsigned long[] stack;' > /sys/kernel/debug/tracing/dynamic_events
        # echo 'hist:keys=delta.buckets=100,stack.stacktrace:sort=delta' > /sys/kernel/debug/tracing/events/synthetic/block_lat/trigger
      
      But in fact, when we type out the trigger, we see that it's using the
      plain old global 'stacktrace' as the key, which is just the stacktrace
      when the event was hit and not the stacktrace contained in the event,
      which is what we want:
      
        # cat /sys/kernel/debug/tracing/events/synthetic/block_lat/trigger
        hist:keys=delta.buckets=100,stacktrace:vals=hitcount:sort=delta.buckets=100:size=2048 [active]
      
      And in fact, there's no code to actually retrieve it from the event,
      so we need to add HIST_FIELD_FN_STACK and hist_field_stack() to get it
      and hook it into the trigger code.  For now, since the stack is just
      using dynamic strings, this could just use the dynamic string
      function, but it seems cleaner to have a dedicated function an be able
      to tweak independently as necessary.
      
      Link: https://lkml.kernel.org/r/11aa614c82976adbfa4ea763dbe885b5fb01d59c.1676063532.git.zanussi@kernel.org
      
      Signed-off-by: default avatarTom Zanussi <zanussi@kernel.org>
      [ Fixed 32bit build warning reported by kernel test robot <lkp@intel.com> ]
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      f5914b30
    • Tom Zanussi's avatar
      tracing/histogram: Fix a few problems with stacktrace variable printing · 2bacfd9f
      Tom Zanussi authored
      
      
      Currently, there are a few problems when printing hist triggers and
      trace output when using stacktrace variables.  This fixes the problems
      seen below:
      
        # echo 'hist:keys=delta.buckets=100,stack.stacktrace:sort=delta' > /sys/kernel/debug/tracing/events/synthetic/block_lat/trigger
        # cat /sys/kernel/debug/tracing/events/synthetic/block_lat/trigger
        hist:keys=delta.buckets=100,stacktrace:vals=hitcount:sort=delta.buckets=100:size=2048 [active]
      
        # echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=stacktrace  if prev_state == 2' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
        # cat /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
        hist:keys=next_pid:vals=hitcount:ts=common_timestamp.usecs,st=stacktrace.stacktrace:sort=hitcount:size=2048:clock=global if prev_state == 2 [active]
      
      and also in the trace output (should be stack.stacktrace):
      
        {  delta: ~ 100-199, stacktrace         __schedule+0xa19/0x1520
      
      Link: https://lkml.kernel.org/r/60bebd4e546728e012a7a2bcbf58716d48ba6edb.1676063532.git.zanussi@kernel.org
      
      Signed-off-by: default avatarTom Zanussi <zanussi@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      2bacfd9f
  6. Feb 16, 2023
    • Steven Rostedt (Google)'s avatar
      tracing: Add BUILD_BUG() to make sure stacktrace fits in strings · 8261ef2e
      Steven Rostedt (Google) authored
      
      
      The max string length for a histogram variable is 256 bytes. The max depth
      of a stacktrace is 16. With 8byte words, that's 16 * 8 = 128. Which can
      easily fit in the string variable. The histogram stacktrace is being
      stored in the string value (with the given max length), with the
      assumption it will fit. To make sure that this is always the case (in the
      case that the stack trace depth increases), add a BUILD_BUG_ON() to test
      this.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20230214002418.0103b9e765d3e5c374d2aa7d@kernel.org/
      
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      8261ef2e
    • Tom Zanussi's avatar
      tracing/histogram: Don't use strlen to find length of stacktrace variables · fc1a9dc1
      Tom Zanussi authored
      
      
      Because stacktraces are saved in dynamic strings,
      trace_event_raw_event_synth() uses strlen to determine the length of
      the stack.  Stacktraces may contain 0-bytes, though, in the saved
      addresses, so the length found and passed to reserve() will be too
      small.
      
      Fix this by using the first unsigned long in the stack variables to
      store the actual number of elements in the stack and have
      trace_event_raw_event_synth() use that to determine the length of the
      stack.
      
      Link: https://lkml.kernel.org/r/1ed6906cd9d6477ef2bd8e63c61de20a9ffe64d7.1676063532.git.zanussi@kernel.org
      
      Signed-off-by: default avatarTom Zanussi <zanussi@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      fc1a9dc1
  7. Feb 08, 2023
  8. Jan 25, 2023
    • Steven Rostedt (Google)'s avatar
      bpf/tracing: Use stage6 of tracing to not duplicate macros · dc513fd5
      Steven Rostedt (Google) authored
      
      
      The bpf events are created by the same macro magic as tracefs trace
      events are. But to hook into bpf, it has its own code. It duplicates many
      of the same macros as the tracefs macros and this is an issue because it
      misses bug fixes as well as any new enhancements that come with the other
      trace macros.
      
      As the trace macros have been put into their own staging files, have bpf
      take advantage of this and use the tracefs stage 6 macros that the "fast
      ssign" portion of the trace event macro uses.
      
      Link: https://lkml.kernel.org/r/20230124202515.873075730@goodmis.org
      Link: https://lore.kernel.org/lkml/1671181385-5719-1-git-send-email-quic_linyyuan@quicinc.com/
      
      Cc: bpf@vger.kernel.org
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Reported-by: default avatarLinyu Yuan <quic_linyyuan@quicinc.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      dc513fd5
    • Steven Rostedt (Google)'s avatar
      perf/tracing: Use stage6 of tracing to not duplicate macros · 92a22cea
      Steven Rostedt (Google) authored
      
      
      The perf events are created by the same macro magic as tracefs trace
      events are. But to hook into perf, it has its own code. It duplicates many
      of the same macros as the tracefs macros and this is an issue because it
      misses bug fixes as well as any new enhancements that come with the other
      trace macros.
      
      As the trace macros have been put into their own staging files, have perf
      take advantage of this and use the tracefs stage 6 macros that the "fast
      assign" portion of the trace event macro uses.
      
      Link: https://lkml.kernel.org/r/20230124202515.716458410@goodmis.org
      Link: https://lore.kernel.org/lkml/1671181385-5719-1-git-send-email-quic_linyyuan@quicinc.com/
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reported-by: default avatarLinyu Yuan <quic_linyyuan@quicinc.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      92a22cea
    • Steven Rostedt (Google)'s avatar
      tracing/histogram: Add simple tests for stacktrace usage of synthetic events · b81a3a10
      Steven Rostedt (Google) authored
      
      
      Update the selftests to include a test of passing a stacktrace between the
      events of a synthetic event.
      
      Link: https://lkml.kernel.org/r/20230117152236.475439286@goodmis.org
      
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Tom Zanussi <zanussi@kernel.org>
      Cc: Ross Zwisler <zwisler@google.com>
      Cc: Ching-lin Yu <chinglinyu@google.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      b81a3a10
    • Steven Rostedt (Google)'s avatar
      tracing/histogram: Document variable stacktrace · 88238513
      Steven Rostedt (Google) authored
      
      
      Add a little documentation (and a useful example) of how a stacktrace can
      be used within a histogram variable and synthetic event.
      
      Link: https://lkml.kernel.org/r/20230117152236.320181354@goodmis.org
      
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Tom Zanussi <zanussi@kernel.org>
      Cc: Ross Zwisler <zwisler@google.com>
      Cc: Ching-lin Yu <chinglinyu@google.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      88238513
    • Steven Rostedt (Google)'s avatar
      tracing/histogram: Add stacktrace type · cc5fc8bf
      Steven Rostedt (Google) authored
      
      
      Now that stacktraces can be part of synthetic events, allow a key to be
      typed as a stacktrace.
      
        # cd /sys/kernel/tracing
        # echo 's:block_lat u64 delta; unsigned long stack[];' >> dynamic_events
        # echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=stacktrace if prev_state == 2' >> events/sched/sched_switch/trigger
        # echo 'hist:keys=prev_pid:delta=common_timestamp.usecs-$ts,st2=$st:onmatch(sched.sched_switch).trace(block_lat,$delta,$st2)' >> events/sched/sched_switch/trigger
        # echo 'hist:keys=delta.buckets=100,stack.stacktrace:sort=delta' > events/synthetic/block_lat/trigger
      
        # cat events/synthetic/block_lat/hist
      
        # event histogram
        #
        # trigger info: hist:keys=delta.buckets=100,stacktrace:vals=hitcount:sort=delta.buckets=100:size=2048 [active]
        #
      
        { delta: ~ 0-99, stacktrace:
                 event_hist_trigger+0x464/0x480
                 event_triggers_call+0x52/0xe0
                 trace_event_buffer_commit+0x193/0x250
                 trace_event_raw_event_sched_switch+0xfc/0x150
                 __traceiter_sched_switch+0x41/0x60
                 __schedule+0x448/0x7b0
                 schedule_idle+0x26/0x40
                 cpu_startup_entry+0x19/0x20
                 start_secondary+0xed/0xf0
                 secondary_startup_64_no_verify+0xe0/0xeb
        } hitcount:          6
        { delta: ~ 0-99, stacktrace:
                 event_hist_trigger+0x464/0x480
                 event_triggers_call+0x52/0xe0
                 trace_event_buffer_commit+0x193/0x250
                 trace_event_raw_event_sched_switch+0xfc/0x150
                 __traceiter_sched_switch+0x41/0x60
                 __schedule+0x448/0x7b0
                 schedule_idle+0x26/0x40
                 cpu_startup_entry+0x19/0x20
                 __pfx_kernel_init+0x0/0x10
                 arch_call_rest_init+0xa/0x24
                 start_kernel+0x964/0x98d
                 secondary_startup_64_no_verify+0xe0/0xeb
        } hitcount:          3
        { delta: ~ 0-99, stacktrace:
                 event_hist_trigger+0x464/0x480
                 event_triggers_call+0x52/0xe0
                 trace_event_buffer_commit+0x193/0x250
                 trace_event_raw_event_sched_switch+0xfc/0x150
                 __traceiter_sched_switch+0x41/0x60
                 __schedule+0x448/0x7b0
                 schedule+0x5a/0xb0
                 worker_thread+0xaf/0x380
                 kthread+0xe9/0x110
                 ret_from_fork+0x2c/0x50
        } hitcount:          1
        { delta: ~ 100-199, stacktrace:
                 event_hist_trigger+0x464/0x480
                 event_triggers_call+0x52/0xe0
                 trace_event_buffer_commit+0x193/0x250
                 trace_event_raw_event_sched_switch+0xfc/0x150
                 __traceiter_sched_switch+0x41/0x60
                 __schedule+0x448/0x7b0
                 schedule_idle+0x26/0x40
                 cpu_startup_entry+0x19/0x20
                 start_secondary+0xed/0xf0
                 secondary_startup_64_no_verify+0xe0/0xeb
        } hitcount:         15
        [..]
        { delta: ~ 8500-8599, stacktrace:
                 event_hist_trigger+0x464/0x480
                 event_triggers_call+0x52/0xe0
                 trace_event_buffer_commit+0x193/0x250
                 trace_event_raw_event_sched_switch+0xfc/0x150
                 __traceiter_sched_switch+0x41/0x60
                 __schedule+0x448/0x7b0
                 schedule_idle+0x26/0x40
                 cpu_startup_entry+0x19/0x20
                 start_secondary+0xed/0xf0
                 secondary_startup_64_no_verify+0xe0/0xeb
        } hitcount:          1
      
        Totals:
            Hits: 89
            Entries: 11
            Dropped: 0
      
      Link: https://lkml.kernel.org/r/20230117152236.167046397@goodmis.org
      
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Tom Zanussi <zanussi@kernel.org>
      Cc: Ross Zwisler <zwisler@google.com>
      Cc: Ching-lin Yu <chinglinyu@google.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      cc5fc8bf
    • Steven Rostedt (Google)'s avatar
      tracing: Allow synthetic events to pass around stacktraces · 00cf3d67
      Steven Rostedt (Google) authored
      
      
      Allow a stacktrace from one event to be displayed by the end event of a
      synthetic event. This is very useful when looking for the longest latency
      of a sleep or something blocked on I/O.
      
       # cd /sys/kernel/tracing/
       # echo 's:block_lat pid_t pid; u64 delta; unsigned long[] stack;' > dynamic_events
       # echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=stacktrace  if prev_state == 1||prev_state == 2' > events/sched/sched_switch/trigger
       # echo 'hist:keys=prev_pid:delta=common_timestamp.usecs-$ts,s=$st:onmax($delta).trace(block_lat,prev_pid,$delta,$s)' >> events/sched/sched_switch/trigger
      
      The above creates a "block_lat" synthetic event that take the stacktrace of
      when a task schedules out in either the interruptible or uninterruptible
      states, and on a new per process max $delta (the time it was scheduled
      out), will print the process id and the stacktrace.
      
        # echo 1 > events/synthetic/block_lat/enable
        # cat trace
       #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
       #              | |         |   |||||     |         |
          kworker/u16:0-767     [006] d..4.   560.645045: block_lat: pid=767 delta=66 stack=STACK:
       => __schedule
       => schedule
       => pipe_read
       => vfs_read
       => ksys_read
       => do_syscall_64
       => 0x966000aa
      
                 <idle>-0       [003] d..4.   561.132117: block_lat: pid=0 delta=413787 stack=STACK:
       => __schedule
       => schedule
       => schedule_hrtimeout_range_clock
       => do_sys_poll
       => __x64_sys_poll
       => do_syscall_64
       => 0x966000aa
      
                  <...>-153     [006] d..4.   562.068407: block_lat: pid=153 delta=54 stack=STACK:
       => __schedule
       => schedule
       => io_schedule
       => rq_qos_wait
       => wbt_wait
       => __rq_qos_throttle
       => blk_mq_submit_bio
       => submit_bio_noacct_nocheck
       => ext4_bio_write_page
       => mpage_submit_page
       => mpage_process_page_bufs
       => mpage_prepare_extent_to_map
       => ext4_do_writepages
       => ext4_writepages
       => do_writepages
       => __writeback_single_inode
      
      Link: https://lkml.kernel.org/r/20230117152236.010941267@goodmis.org
      
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Tom Zanussi <zanussi@kernel.org>
      Cc: Ross Zwisler <zwisler@google.com>
      Cc: Ching-lin Yu <chinglinyu@google.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      00cf3d67
    • Steven Rostedt (Google)'s avatar
      tracing: Allow stacktraces to be saved as histogram variables · 288709c9
      Steven Rostedt (Google) authored
      
      
      Allow to save stacktraces into a histogram variable. This will be used by
      synthetic events to allow a stacktrace from one event to be passed and
      displayed by another event.
      
      The special keyword "stacktrace" is to be used to trigger a stack
      trace for the event that the histogram trigger is attached to.
      
        echo 'hist:keys=pid:st=stacktrace" > events/sched/sched_waking/trigger
      
      Currently nothing can get access to the "$st" variable above that contains
      the stack trace, but that will soon change.
      
      Link: https://lkml.kernel.org/r/20230117152235.856323729@goodmis.org
      
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Tom Zanussi <zanussi@kernel.org>
      Cc: Ross Zwisler <zwisler@google.com>
      Cc: Ching-lin Yu <chinglinyu@google.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      288709c9
    • Steven Rostedt (Google)'s avatar
      tracing: Simplify calculating entry size using struct_size() · 19ff8049
      Steven Rostedt (Google) authored
      
      
      When tracing a dynamic string field for a synthetic event, the offset
      calculation for where to write the next event can use struct_size() to
      find what the current size of the structure is.
      
      This simplifies the code and makes it less error prone.
      
      Link: https://lkml.kernel.org/r/20230117152235.698632147@goodmis.org
      
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Tom Zanussi <zanussi@kernel.org>
      Cc: Ross Zwisler <zwisler@google.com>
      Cc: Ching-lin Yu <chinglinyu@google.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      19ff8049
    • Jia-Ju Bai's avatar
      tracing: Add NULL checks for buffer in ring_buffer_free_read_page() · 3e4272b9
      Jia-Ju Bai authored
      In a previous commit 7433632c
      
      , buffer, buffer->buffers and
      buffer->buffers[cpu] in ring_buffer_wake_waiters() can be NULL,
      and thus the related checks are added.
      
      However, in the same call stack, these variables are also used in
      ring_buffer_free_read_page():
      
      tracing_buffers_release()
        ring_buffer_wake_waiters(iter->array_buffer->buffer)
          cpu_buffer = buffer->buffers[cpu] -> Add checks by previous commit
        ring_buffer_free_read_page(iter->array_buffer->buffer)
          cpu_buffer = buffer->buffers[cpu] -> No check
      
      Thus, to avod possible null-pointer derefernces, the related checks
      should be added.
      
      These results are reported by a static tool designed by myself.
      
      Link: https://lkml.kernel.org/r/20230113125501.760324-1-baijiaju1990@gmail.com
      
      Reported-by: default avatarTOTE Robot <oslab@tsinghua.edu.cn>
      Signed-off-by: default avatarJia-Ju Bai <baijiaju1990@gmail.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      3e4272b9
    • Mark Rutland's avatar
      ftrace: Add sample with custom ops · b56c68f7
      Mark Rutland authored
      
      
      When reworking core ftrace code or architectural ftrace code, it's often
      necessary to test/analyse/benchmark a number of ftrace_ops
      configurations. This patch adds a module which can be used to explore
      some of those configurations.
      
      I'm using this to benchmark various options for changing the way
      trampolines and handling of ftrace_ops work on arm64, and ensuring other
      architectures aren't adversely affected.
      
      For example, in a QEMU+KVM VM running on a 2GHz Xeon E5-2660
      workstation, loading the module in various configurations produces:
      
      | # insmod ftrace-ops.ko
      | ftrace_ops: registering:
      |   relevant ops: 1
      |     tracee: tracee_relevant [ftrace_ops]
      |     tracer: ops_func_nop [ftrace_ops]
      |   irrelevant ops: 0
      |     tracee: tracee_irrelevant [ftrace_ops]
      |     tracer: ops_func_nop [ftrace_ops]
      |   saving registers: NO
      |   assist recursion: NO
      |   assist RCU: NO
      | ftrace_ops: Attempted 100000 calls to tracee_relevant [ftrace_ops] in 1681558ns (16ns / call)
      
      | # insmod ftrace-ops.ko nr_ops_irrelevant=5
      | ftrace_ops: registering:
      |   relevant ops: 1
      |     tracee: tracee_relevant [ftrace_ops]
      |     tracer: ops_func_nop [ftrace_ops]
      |   irrelevant ops: 5
      |     tracee: tracee_irrelevant [ftrace_ops]
      |     tracer: ops_func_nop [ftrace_ops]
      |   saving registers: NO
      |   assist recursion: NO
      |   assist RCU: NO
      | ftrace_ops: Attempted 100000 calls to tracee_relevant [ftrace_ops] in 1693042ns (16ns / call)
      
      | # insmod ftrace-ops.ko nr_ops_relevant=2
      | ftrace_ops: registering:
      |   relevant ops: 2
      |     tracee: tracee_relevant [ftrace_ops]
      |     tracer: ops_func_nop [ftrace_ops]
      |   irrelevant ops: 0
      |     tracee: tracee_irrelevant [ftrace_ops]
      |     tracer: ops_func_nop [ftrace_ops]
      |   saving registers: NO
      |   assist recursion: NO
      |   assist RCU: NO
      | ftrace_ops: Attempted 100000 calls to tracee_relevant [ftrace_ops] in 11965582ns (119ns / call)
      
      | # insmod ftrace-ops.ko save_regs=true
      | ftrace_ops: registering:
      |   relevant ops: 1
      |     tracee: tracee_relevant [ftrace_ops]
      |     tracer: ops_func_nop [ftrace_ops]
      |   irrelevant ops: 0
      |     tracee: tracee_irrelevant [ftrace_ops]
      |     tracer: ops_func_nop [ftrace_ops]
      |   saving registers: YES
      |   assist recursion: NO
      |   assist RCU: NO
      | ftrace_ops: Attempted 100000 calls to tracee_relevant [ftrace_ops] in 4459624ns (44ns / call)
      
      Link: https://lkml.kernel.org/r/20230103124912.2948963-4-mark.rutland@arm.com
      
      Cc: Florent Revest <revest@chromium.org>
      Acked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      b56c68f7
    • Steven Rostedt (Google)'s avatar
      tracing/selftests: Add test for event filtering on function name · 7f09d639
      Steven Rostedt (Google) authored
      
      
      With the new filter logic of passing in the name of a function to match an
      instruction pointer (or the address of the function), add a test to make
      sure that it is functional.
      
      This is also the first test to test plain filtering. The filtering has
      been tested via the trigger logic, which uses the same code, but there was
      nothing to test just the event filter, so this test is the first to add
      such a case.
      
      Link: https://lkml.kernel.org/r/20221219183214.075559302@goodmis.org
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Tom Zanussi <zanussi@kernel.org>
      Cc: Zheng Yejian <zhengyejian1@huawei.com>
      Cc: linux-kselftest@vger.kernel.org
      Suggested-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Reviewed-by: default avatarRoss Zwisler <zwisler@google.com>
      Acked-by: default avatarShuah Khan <skhan@linuxfoundation.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      7f09d639