Skip to content
  1. Aug 21, 2021
    • Tzvetomir Stoyanov (VMware)'s avatar
      tracing: Add a probe that attaches to trace events · 7491e2c4
      Tzvetomir Stoyanov (VMware) authored
      
      
      A new dynamic event is introduced: event probe. The event is attached
      to an existing tracepoint and uses its fields as arguments. The user
      can specify custom format string of the new event, select what tracepoint
      arguments will be printed and how to print them.
      An event probe is created by writing configuration string in
      'dynamic_events' ftrace file:
       e[:[SNAME/]ENAME] SYSTEM/EVENT [FETCHARGS]	- Set an event probe
       -:SNAME/ENAME					- Delete an event probe
      
      Where:
       SNAME	- System name, if omitted 'eprobes' is used.
       ENAME	- Name of the new event in SNAME, if omitted the SYSTEM_EVENT is used.
       SYSTEM	- Name of the system, where the tracepoint is defined, mandatory.
       EVENT	- Name of the tracepoint event in SYSTEM, mandatory.
       FETCHARGS - Arguments:
        <name>=$<field>[:TYPE] - Fetch given filed of the tracepoint and print
      			   it as given TYPE with given name. Supported
      			   types are:
      	                    (u8/u16/u32/u64/s8/s16/s32/s64), basic type
              	            (x8/x16/x32/x64), hexadecimal types
      			    "string", "ustring" and bitfield.
      
      Example, attach an event probe on openat system call and print name of the
      file that will be opened:
       echo "e:esys/eopen syscalls/sys_enter_openat file=\$filename:string" >> dynamic_events
      A new dynamic event is created in events/esys/eopen/ directory. It
      can be deleted with:
       echo "-:esys/eopen" >> dynamic_events
      
      Filters, triggers and histograms can be attached to the new event, it can
      be matched in synthetic events. There is one limitation - an event probe
      can not be attached to kprobe, uprobe or another event probe.
      
      Link: https://lkml.kernel.org/r/20210812145805.2292326-1-tz.stoyanov@gmail.com
      Link: https://lkml.kernel.org/r/20210819152825.142428383@goodmis.org
      
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Co-developed-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarTzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      7491e2c4
  2. Aug 19, 2021
    • Masami Hiramatsu's avatar
      tracing/probes: Reject events which have the same name of existing one · 8e242060
      Masami Hiramatsu authored
      
      
      Since kprobe_events and uprobe_events only check whether the
      other same-type probe event has the same name or not, if the
      user gives the same name of the existing tracepoint event (or
      the other type of probe events), it silently fails to create
      the tracefs entry (but registered.) as below.
      
      /sys/kernel/tracing # ls events/task/task_rename
      enable   filter   format   hist     id       trigger
      /sys/kernel/tracing # echo p:task/task_rename vfs_read >> kprobe_events
      [  113.048508] Could not create tracefs 'task_rename' directory
      /sys/kernel/tracing # cat kprobe_events
      p:task/task_rename vfs_read
      
      To fix this issue, check whether the existing events have the
      same name or not in trace_probe_register_event_call(). If exists,
      it rejects to register the new event.
      
      Link: https://lkml.kernel.org/r/162936876189.187130.17558311387542061930.stgit@devnote2
      
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      8e242060
    • Steven Rostedt (VMware)'s avatar
      tracing/probes: Have process_fetch_insn() take a void * instead of pt_regs · 8565a45d
      Steven Rostedt (VMware) authored
      
      
      In preparation to allow event probes to use the process_fetch_insn()
      callback in trace_probe_tmpl.h, change the data passed to it from a
      pointer to pt_regs, as the event probe will not be using regs, and make it
      a void pointer instead.
      
      Update the process_fetch_insn() callers for kprobe and uprobe events to
      have the regs defined in the function and just typecast the void pointer
      parameter.
      
      Link: https://lkml.kernel.org/r/20210819041842.291622924@goodmis.org
      
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      8565a45d
    • Steven Rostedt (VMware)'s avatar
      tracing/probe: Change traceprobe_set_print_fmt() to take a type · 007517a0
      Steven Rostedt (VMware) authored
      
      
      Instead of a boolean "is_return" have traceprobe_set_print_fmt() take a
      type (currently just PROBE_PRINT_NORMAL and PROBE_PRINT_RETURN). This will
      simplify adding different types. For example, the development of the
      event_probe, will need its own type as it prints an event, and not an IP.
      
      Link: https://lkml.kernel.org/r/20210819041842.104626301@goodmis.org
      
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      007517a0
    • Steven Rostedt (VMware)'s avatar
      tracing/probes: Use struct_size() instead of defining custom macros · 845cbf3e
      Steven Rostedt (VMware) authored
      
      
      Remove SIZEOF_TRACE_KPROBE() and SIZEOF_TRACE_UPROBE() and use
      struct_size() as that's what it is made for. No need to have custom
      macros. Especially since struct_size() has some extra memory checks for
      correctness.
      
      Link: https://lkml.kernel.org/r/20210817035027.795000217@goodmis.org
      
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      845cbf3e
    • Steven Rostedt (VMware)'s avatar
      tracing/probes: Allow for dot delimiter as well as slash for system names · bc1b9734
      Steven Rostedt (VMware) authored
      
      
      Kprobe and uprobe events can add a "system" to the events that are created
      via the kprobe_events and uprobe_events files respectively. If they do not
      include a "system" in the name, then the default "kprobes" or "uprobes" is
      used. The current notation to specify a system for one of these probe
      events is to add a '/' delimiter in the name, where the content before the
      '/' will be the system to use, and the content after will be the event
      name.
      
       echo 'p:my_system/my_event' > kprobe_events
      
      But this is inconsistent with the way histogram triggers separate their
      system / event names. The histogram triggers use a '.' delimiter, which
      can be confusing.
      
      To allow this to be more consistent, as well as keep backward
      compatibility, allow the kprobe and uprobe events to denote a system name
      with either a '/' or a '.'.
      
      That is:
      
        echo 'p:my_system/my_event' > kprobe_events
      
      is equivalent to:
      
        echo 'p:my_system.my_event' > kprobe_events
      
      Link: https://lore.kernel.org/linux-trace-devel/20210813004448.51c7de69ce432d338f4d226b@kernel.org/
      Link: https://lkml.kernel.org/r/20210817035027.580493202@goodmis.org
      
      Suggested-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      bc1b9734
    • Steven Rostedt (VMware)'s avatar
      tracing/probe: Have traceprobe_parse_probe_arg() take a const arg · fcd9db51
      Steven Rostedt (VMware) authored
      
      
      The two places that call traceprobe_parse_probe_arg() allocate a temporary
      buffer to copy the argv[i] into, because argv[i] is constant and the
      traceprobe_parse_probe_arg() will modify it to do the parsing. These two
      places allocate this buffer and then free it right after calling this
      function, leaving the onus of this allocation to the caller.
      
      As there's about to be a third user of this function that will have to do
      the same thing, instead of having the caller allocate the temporary
      buffer, simply move that allocation into the traceprobe_parse_probe_arg()
      itself, which will simplify the code of the callers.
      
      Link: https://lkml.kernel.org/r/20210817035027.385422828@goodmis.org
      
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      fcd9db51
    • Steven Rostedt (VMware)'s avatar
      tracing: Have dynamic events have a ref counter · 1d18538e
      Steven Rostedt (VMware) authored
      
      
      As dynamic events are not created by modules, if something is attached to
      one, calling "try_module_get()" on its "mod" field, is not going to keep
      the dynamic event from going away.
      
      Since dynamic events do not need the "mod" pointer of the event structure,
      make a union out of it in order to save memory (there's one structure for
      each of the thousand+ events in the kernel), and have any event with the
      DYNAMIC flag set to use a ref counter instead.
      
      Link: https://lore.kernel.org/linux-trace-devel/20210813004448.51c7de69ce432d338f4d226b@kernel.org/
      Link: https://lkml.kernel.org/r/20210817035027.174869074@goodmis.org
      
      Suggested-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      1d18538e
    • Steven Rostedt (VMware)'s avatar
      tracing: Add DYNAMIC flag for dynamic events · 8b0e6c74
      Steven Rostedt (VMware) authored
      
      
      To differentiate between static and dynamic events, add a new flag
      DYNAMIC to the event flags that all dynamic events have set. This will
      allow to differentiate when attaching to a dynamic event from a static
      event.
      
      Static events have a mod pointer that references the module they were
      created in (or NULL for core kernel). This can be incremented when the
      event has something attached to it. But there exists no such mechanism for
      dynamic events. This is dangerous as the dynamic events may now disappear
      without the "attachment" knowing that it no longer exists.
      
      To enforce the dynamic flag, change dyn_event_add() to pass the event that
      is being created such that it can set the DYNAMIC flag of the event. This
      helps make sure that no location that creates a dynamic event misses
      setting this flag.
      
      Link: https://lore.kernel.org/linux-trace-devel/20210813004448.51c7de69ce432d338f4d226b@kernel.org/
      Link: https://lkml.kernel.org/r/20210817035026.936958254@goodmis.org
      
      Suggested-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      8b0e6c74
  3. Aug 18, 2021
  4. Aug 17, 2021
  5. Aug 16, 2021
  6. Aug 13, 2021
    • Steven Rostedt (VMware)'s avatar
      tracing / histogram: Fix NULL pointer dereference on strcmp() on NULL event name · 5acce0bf
      Steven Rostedt (VMware) authored
      The following commands:
      
       # echo 'read_max u64 size;' > synthetic_events
       # echo 'hist:keys=common_pid:count=count:onmax($count).trace(read_max,count)' > events/syscalls/sys_enter_read/trigger
      
      Causes:
      
       BUG: kernel NULL pointer dereference, address: 0000000000000000
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 0 P4D 0
       Oops: 0000 [#1] PREEMPT SMP
       CPU: 4 PID: 1763 Comm: bash Not tainted 5.14.0-rc2-test+ #155
       Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01
      v03.03 07/14/2016
       RIP: 0010:strcmp+0xc/0x20
       Code: 75 f7 31 c0 0f b6 0c 06 88 0c 02 48 83 c0 01 84 c9 75 f1 4c 89 c0
      c3 0f 1f 80 00 00 00 00 31 c0 eb 08 48 83 c0 01 84 d2 74 0f <0f> b6 14 07
      3a 14 06 74 ef 19 c0 83 c8 01 c3 31 c0 c3 66 90 48 89
       RSP: 0018:ffffb5fdc0963ca8 EFLAGS: 00010246
       RAX: 0000000000000000 RBX: ffffffffb3a4e040 RCX: 0000000000000000
       RDX: 0000000000000000 RSI: ffff9714c0d0b640 RDI: 0000000000000000
       RBP: 0000000000000000 R08: 00000022986b7cde R09: ffffffffb3a4dff8
       R10: 0000000000000000 R11: 0000000000000000 R12: ffff9714c50603c8
       R13: 0000000000000000 R14: ffff97143fdf9e48 R15: ffff9714c01a2210
       FS:  00007f1fa6785740(0000) GS:ffff9714da400000(0000)
      knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000000 CR3: 000000002d863004 CR4: 00000000001706e0
       Call Trace:
        __find_event_file+0x4e/0x80
        action_create+0x6b7/0xeb0
        ? kstrdup+0x44/0x60
        event_hist_trigger_func+0x1a07/0x2130
        trigger_process_regex+0xbd/0x110
        event_trigger_write+0x71/0xd0
        vfs_write+0xe9/0x310
        ksys_write+0x68/0xe0
        do_syscall_64+0x3b/0x90
        entry_SYSCALL_64_after_hwframe+0x44/0xae
       RIP: 0033:0x7f1fa6879e87
      
      The problem was the "trace(read_max,count)" where the "count" should be
      "$count" as "onmax()" only handles variables (although it really should be
      able to figure out that "count" is a field of sys_enter_read). But there's
      a path that does not find the variable and ends up passing a NULL for the
      event, which ends up getting passed to "strcmp()".
      
      Add a check for NULL to return and error on the command with:
      
       # cat error_log
        hist:syscalls:sys_enter_read: error: Couldn't create or find variable
        Command: hist:keys=common_pid:count=count:onmax($count).trace(read_max,count)
                                      ^
      Link: https://lkml.kernel.org/r/20210808003011.4037f8d0@oasis.local.home
      
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: stable@vger.kernel.org
      Fixes: 50450603
      
       tracing: Add 'onmax' hist trigger action support
      Reviewed-by: default avatarTom Zanussi <zanussi@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      5acce0bf
    • Masami Hiramatsu's avatar
      init: Suppress wrong warning for bootconfig cmdline parameter · d0ac5fba
      Masami Hiramatsu authored
      Since the 'bootconfig' command line parameter is handled before
      parsing the command line, it doesn't use early_param(). But in
      this case, kernel shows a wrong warning message about it.
      
      [    0.013714] Kernel command line: ro console=ttyS0  bootconfig console=tty0
      [    0.013741] Unknown command line parameters: bootconfig
      
      To suppress this message, add a dummy handler for 'bootconfig'.
      
      Link: https://lkml.kernel.org/r/162812945097.77369.1849780946468010448.stgit@devnote2
      
      Fixes: 86d1919a
      
       ("init: print out unknown kernel parameters")
      Reviewed-by: default avatarAndrew Halaney <ahalaney@redhat.com>
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      d0ac5fba
    • Lukas Bulwahn's avatar
      tracing: define needed config DYNAMIC_FTRACE_WITH_ARGS · 12f9951d
      Lukas Bulwahn authored
      Commit 2860cd8a ("livepatch: Use the default ftrace_ops instead of
      REGS when ARGS is available") intends to enable config LIVEPATCH when
      ftrace with ARGS is available. However, the chain of configs to enable
      LIVEPATCH is incomplete, as HAVE_DYNAMIC_FTRACE_WITH_ARGS is available,
      but the definition of DYNAMIC_FTRACE_WITH_ARGS, combining DYNAMIC_FTRACE
      and HAVE_DYNAMIC_FTRACE_WITH_ARGS, needed to enable LIVEPATCH, is missing
      in the commit.
      
      Fortunately, ./scripts/checkkconfigsymbols.py detects this and warns:
      
      DYNAMIC_FTRACE_WITH_ARGS
      Referencing files: kernel/livepatch/Kconfig
      
      So, define the config DYNAMIC_FTRACE_WITH_ARGS analogously to the already
      existing similar configs, DYNAMIC_FTRACE_WITH_REGS and
      DYNAMIC_FTRACE_WITH_DIRECT_CALLS, in ./kernel/trace/Kconfig to connect the
      chain of configs.
      
      Link: https://lore.kernel.org/kernel-janitors/CAKXUXMwT2zS9fgyQHKUUiqo8ynZBdx2UEUu1WnV_q0OCmknqhw@mail.gmail.com/
      Link: https://lkml.kernel.org/r/20210806195027.16808-1-lukas.bulwahn@gmail.com
      
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Miroslav Benes <mbenes@suse.cz>
      Cc: stable@vger.kernel.org
      Fixes: 2860cd8a
      
       ("livepatch: Use the default ftrace_ops instead of REGS when ARGS is available")
      Signed-off-by: default avatarLukas Bulwahn <lukas.bulwahn@gmail.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      12f9951d
    • Daniel Bristot de Oliveira's avatar
      trace/osnoise: Print a stop tracing message · 0e05ba49
      Daniel Bristot de Oliveira authored
      
      
      When using osnoise/timerlat with stop tracing, sometimes it is
      not clear in which CPU the stop condition was hit, mainly
      when using some extra events.
      
      Print a message informing in which CPU the trace stopped, like
      in the example below:
      
                <idle>-0       [006] d.h.  2932.676616: #1672599 context    irq timer_latency     34689 ns
                <idle>-0       [006] dNh.  2932.676618: irq_noise: local_timer:236 start 2932.676615639 duration 2391 ns
                <idle>-0       [006] dNh.  2932.676620: irq_noise: virtio0-output.0:47 start 2932.676620180 duration 86 ns
                <idle>-0       [003] d.h.  2932.676621: #1673374 context    irq timer_latency      1200 ns
                <idle>-0       [006] d...  2932.676623: thread_noise: swapper/6:0 start 2932.676615964 duration 4339 ns
                <idle>-0       [003] dNh.  2932.676623: irq_noise: local_timer:236 start 2932.676620597 duration 1881 ns
                <idle>-0       [006] d...  2932.676623: sched_switch: prev_comm=swapper/6 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=timerlat/6 next_pid=852 next_prio=4
            timerlat/6-852     [006] ....  2932.676623: #1672599 context thread timer_latency     41931 ns
                <idle>-0       [003] d...  2932.676623: thread_noise: swapper/3:0 start 2932.676620854 duration 880 ns
                <idle>-0       [003] d...  2932.676624: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=timerlat/3 next_pid=849 next_prio=4
            timerlat/6-852     [006] ....  2932.676624: timerlat_main: stop tracing hit on cpu 6
            timerlat/3-849     [003] ....  2932.676624: #1673374 context thread timer_latency      4310 ns
      
      Link: https://lkml.kernel.org/r/b30a0d7542adba019185f44ee648e60e14923b11.1626598844.git.bristot@kernel.org
      
      Cc: Tom Zanussi <zanussi@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarDaniel Bristot de Oliveira <bristot@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      0e05ba49
    • Daniel Bristot de Oliveira's avatar
      trace/timerlat: Add a header with PREEMPT_RT additional fields · e1c4ad4a
      Daniel Bristot de Oliveira authored
      
      
      Some extra flags are printed to the trace header when using the
      PREEMPT_RT config. The extra flags are: need-resched-lazy,
      preempt-lazy-depth, and migrate-disable.
      
      Without printing these fields, the timerlat specific fields are
      shifted by three positions, for example:
      
       # tracer: timerlat
       #
       #                                _-----=> irqs-off
       #                               / _----=> need-resched
       #                              | / _---=> hardirq/softirq
       #                              || / _--=> preempt-depth
       #                              || /
       #                              ||||             ACTIVATION
       #           TASK-PID      CPU# ||||   TIMESTAMP    ID            CONTEXT                LATENCY
       #              | |         |   ||||      |         |                  |                       |
                 <idle>-0       [000] d..h...  3279.798871: #1     context    irq timer_latency       830 ns
                  <...>-807     [000] .......  3279.798881: #1     context thread timer_latency     11301 ns
      
      Add a new header for timerlat with the missing fields, to be used
      when the PREEMPT_RT is enabled.
      
      Link: https://lkml.kernel.org/r/babb83529a3211bd0805be0b8c21608230202c55.1626598844.git.bristot@kernel.org
      
      Cc: Tom Zanussi <zanussi@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarDaniel Bristot de Oliveira <bristot@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      e1c4ad4a
    • Daniel Bristot de Oliveira's avatar
      trace/osnoise: Add a header with PREEMPT_RT additional fields · d03721a6
      Daniel Bristot de Oliveira authored
      
      
      Some extra flags are printed to the trace header when using the
      PREEMPT_RT config. The extra flags are: need-resched-lazy,
      preempt-lazy-depth, and migrate-disable.
      
      Without printing these fields, the osnoise specific fields are
      shifted by three positions, for example:
      
       # tracer: osnoise
       #
       #                                _-----=> irqs-off
       #                               / _----=> need-resched
       #                              | / _---=> hardirq/softirq
       #                              || / _--=> preempt-depth                            MAX
       #                              || /                                             SINGLE      Interference counters:
       #                              ||||               RUNTIME      NOISE  %% OF CPU  NOISE    +-----------------------------+
       #           TASK-PID      CPU# ||||   TIMESTAMP    IN US       IN US  AVAILABLE  IN US     HW    NMI    IRQ   SIRQ THREAD
       #              | |         |   ||||      |           |             |    |            |      |      |      |      |      |
                  <...>-741     [000] .......  1105.690909: 1000000        234  99.97660      36     21      0   1001     22      3
                  <...>-742     [001] .......  1105.691923: 1000000        281  99.97190     197      7      0   1012     35     14
                  <...>-743     [002] .......  1105.691958: 1000000       1324  99.86760     118     11      0   1016    155    143
                  <...>-744     [003] .......  1105.691998: 1000000        109  99.98910      21      4      0   1004     33      7
                  <...>-745     [004] .......  1105.692015: 1000000       2023  99.79770      97     37      0   1023     52     18
      
      Add a new header for osnoise with the missing fields, to be used
      when the PREEMPT_RT is enabled.
      
      Link: https://lkml.kernel.org/r/1f03289d2a51fde5a58c2e7def063dc630820ad1.1626598844.git.bristot@kernel.org
      
      Cc: Tom Zanussi <zanussi@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarDaniel Bristot de Oliveira <bristot@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      d03721a6
  7. Aug 06, 2021
    • Mathieu Desnoyers's avatar
      tracepoint: Use rcu get state and cond sync for static call updates · 7b40066c
      Mathieu Desnoyers authored
      State transitions from 1->0->1 and N->2->1 callbacks require RCU
      synchronization. Rather than performing the RCU synchronization every
      time the state change occurs, which is quite slow when many tracepoints
      are registered in batch, instead keep a snapshot of the RCU state on the
      most recent transitions which belong to a chain, and conditionally wait
      for a grace period on the last transition of the chain if one g.p. has
      not elapsed since the last snapshot.
      
      This applies to both RCU and SRCU.
      
      This brings the performance regression caused by commit 231264d6
      ("Fix: tracepoint: static call function vs data state mismatch") back to
      what it was originally.
      
      Before this commit:
      
        # trace-cmd start -e all
        # time trace-cmd start -p nop
      
        real	0m10.593s
        user	0m0.017s
        sys	0m0.259s
      
      After this commit:
      
        # trace-cmd start -e all
        # time trace-cmd start -p nop
      
        real	0m0.878s
        user	0m0.000s
        sys	0m0.103s
      
      Link: https://lkml.kernel.org/r/20210805192954.30688-1-mathieu.desnoyers@efficios.com
      Link: https://lore.kernel.org/io-uring/4ebea8f0-58c9-e571-fd30-0ce4f6f09c70@samba.org/
      
      Cc: stable@vger.kernel.org
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "Paul E. McKenney" <paulmck@kernel.org>
      Cc: Stefan Metzmacher <metze@samba.org>
      Fixes: 231264d6
      
       ("Fix: tracepoint: static call function vs data state mismatch")
      Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      7b40066c
    • Mathieu Desnoyers's avatar
      tracepoint: Fix static call function vs data state mismatch · 231264d6
      Mathieu Desnoyers authored
      On a 1->0->1 callbacks transition, there is an issue with the new
      callback using the old callback's data.
      
      Considering __DO_TRACE_CALL:
      
              do {                                                            \
                      struct tracepoint_func *it_func_ptr;                    \
                      void *__data;                                           \
                      it_func_ptr =                                           \
                              rcu_dereference_raw((&__tracepoint_##name)->funcs); \
                      if (it_func_ptr) {                                      \
                              __data = (it_func_ptr)->data;                   \
      
      ----> [ delayed here on one CPU (e.g. vcpu preempted by the host) ]
      
                              static_call(tp_func_##name)(__data, args);      \
                      }                                                       \
              } while (0)
      
      It has loaded the tp->funcs of the old callback, so it will try to use the old
      data. This can be fixed by adding a RCU sync anywhere in the 1->0->1
      transition chain.
      
      On a N->2->1 transition, we need an rcu-sync because you may have a
      sequence of 3->2->1 (or 1->2->1) where the element 0 data is unchanged
      between 2->1, but was changed from 3->2 (or from 1->2), which may be
      observed by the static call. This can be fixed by adding an
      unconditional RCU sync in transition 2->1.
      
      Note, this fixes a correctness issue at the cost of adding a tremendous
      performance regression to the disabling of tracepoints.
      
      Before this commit:
      
        # trace-cmd start -e all
        # time trace-cmd start -p nop
      
        real	0m0.778s
        user	0m0.000s
        sys	0m0.061s
      
      After this commit:
      
        # trace-cmd start -e all
        # time trace-cmd start -p nop
      
        real	0m10.593s
        user	0m0.017s
        sys	0m0.259s
      
      A follow up fix will introduce a more lightweight scheme based on RCU
      get_state and cond_sync, that will return the performance back to what it
      was. As both this change and the lightweight versions are complex on their
      own, for bisecting any issues that this may cause, they are kept as two
      separate changes.
      
      Link: https://lkml.kernel.org/r/20210805132717.23813-3-mathieu.desnoyers@efficios.com
      Link: https://lore.kernel.org/io-uring/4ebea8f0-58c9-e571-fd30-0ce4f6f09c70@samba.org/
      
      Cc: stable@vger.kernel.org
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "Paul E. McKenney" <paulmck@kernel.org>
      Cc: Stefan Metzmacher <metze@samba.org>
      Fixes: d25e37d8
      
       ("tracepoint: Optimize using static_call()")
      Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      231264d6
    • Mathieu Desnoyers's avatar
      tracepoint: static call: Compare data on transition from 2->1 callees · f7ec4121
      Mathieu Desnoyers authored
      On transition from 2->1 callees, we should be comparing .data rather
      than .func, because the same callback can be registered twice with
      different data, and what we care about here is that the data of array
      element 0 is unchanged to skip rcu sync.
      
      Link: https://lkml.kernel.org/r/20210805132717.23813-2-mathieu.desnoyers@efficios.com
      Link: https://lore.kernel.org/io-uring/4ebea8f0-58c9-e571-fd30-0ce4f6f09c70@samba.org/
      
      Cc: stable@vger.kernel.org
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "Paul E. McKenney" <paulmck@kernel.org>
      Cc: Stefan Metzmacher <metze@samba.org>
      Fixes: 547305a6
      
       ("tracepoint: Fix out of sync data passing by static caller")
      Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      f7ec4121
  8. Aug 05, 2021
    • Steven Rostedt (VMware)'s avatar
      tracing: Quiet smp_processor_id() use in preemptable warning in hwlat · 51397dc6
      Steven Rostedt (VMware) authored
      
      
      The hardware latency detector (hwlat) has a mode that it runs one thread
      across CPUs. The logic to move from the currently running CPU to the next
      one in the list does a smp_processor_id() to find where it currently is.
      Unfortunately, it's done with preemption enabled, and this triggers a
      warning for using smp_processor_id() in a preempt enabled section.
      
      As it is only using smp_processor_id() to get information on where it
      currently is in order to simply move it to the next CPU, it doesn't really
      care if it got moved in the mean time. It will simply balance out later if
      such a case arises.
      
      Switch smp_processor_id() to raw_smp_processor_id() to quiet that warning.
      
      Link: https://lkml.kernel.org/r/20210804141848.79edadc0@oasis.local.home
      
      Acked-by: default avatarDaniel Bristot de Oliveira <bristot@redhat.com>
      Fixes: 8fa826b7
      
       ("trace/hwlat: Implement the mode config option")
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      51397dc6