Commit b72b5fec authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull tracing updates from Steven Rostedt:

 - Add function names as a way to filter function addresses

 - Add sample module to test ftrace ops and dynamic trampolines

 - Allow stack traces to be passed from beginning event to end event for
   synthetic events. This will allow seeing the stack trace of when a
   task is scheduled out and recorded when it gets scheduled back in.

 - Add trace event helper __get_buf() to use as a temporary buffer when
   printing out trace event output.

 - Add kernel command line to create trace instances on boot up.

 - Add enabling of events to instances created at boot up.

 - Add trace_array_puts() to write into instances.

 - Allow boot instances to take a snapshot at the end of boot up.

 - Allow live patch modules to include trace events

 - Minor fixes and clean ups

* tag 'trace-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (31 commits)
  tracing: Remove unnecessary NULL assignment
  tracepoint: Allow livepatch module add trace event
  tracing: Always use canonical ftrace path
  tracing/histogram: Fix stacktrace histogram Documententation
  tracing/histogram: Fix stacktrace key
  tracing/histogram: Fix a few problems with stacktrace variable printing
  tracing: Add BUILD_BUG() to make sure stacktrace fits in strings
  tracing/histogram: Don't use strlen to find length of stacktrace variables
  tracing: Allow boot instances to have snapshot buffers
  tracing: Add trace_array_puts() to write into instance
  tracing: Add enabling of events to boot instances
  tracing: Add creation of instances at boot command line
  tracing: Fix trace_event_raw_event_synth() if else statement
  samples: ftrace: Make some global variables static
  ftrace: sample: avoid open-coded 64-bit division
  samples: ftrace: Include the nospec-branch.h only for x86
  tracing: Acquire buffer from temparary trace sequence
  tracing/histogram: Wrap remaining shell snippets in code blocks
  tracing/osnoise: No need for schedule_hrtimeout range
  bpf/tracing: Use stage6 of tracing to not duplicate macros
  ...
parents 91914238 7568a21e
Loading
Loading
Loading
Loading
+29 −0
Original line number Diff line number Diff line
@@ -1509,6 +1509,15 @@
			boot up that is likely to be overridden by user space
			start up functionality.

			Optionally, the snapshot can also be defined for a tracing
			instance that was created by the trace_instance= command
			line parameter.

			trace_instance=foo,sched_switch ftrace_boot_snapshot=foo

			The above will cause the "foo" tracing instance to trigger
			a snapshot at the end of boot up.

	ftrace_dump_on_oops[=orig_cpu]
			[FTRACE] will dump the trace buffers on oops.
			If no parameter is passed, ftrace will dump
@@ -6283,6 +6292,26 @@
			comma-separated list of trace events to enable. See
			also Documentation/trace/events.rst

	trace_instance=[instance-info]
			[FTRACE] Create a ring buffer instance early in boot up.
			This will be listed in:

				/sys/kernel/tracing/instances

			Events can be enabled at the time the instance is created
			via:

				trace_instance=<name>,<system1>:<event1>,<system2>:<event2>

			Note, the "<system*>:" portion is optional if the event is
			unique.

				trace_instance=foo,sched:sched_switch,irq_handler_entry,initcall

			will enable the "sched_switch" event (note, the "sched:" is optional, and
			the same thing would happen if it was left off). The irq_handler_entry
			event, and all events under the "initcall" system.

	trace_options=[option-list]
			[FTRACE] Enable or disable tracer options at boot.
			The option-list is a comma delimited list of options
+12 −0
Original line number Diff line number Diff line
@@ -207,6 +207,18 @@ field name::
As the kernel will have to know how to retrieve the memory that the pointer
is at from user space.

You can convert any long type to a function address and search by function name::

  call_site.function == security_prepare_creds

The above will filter when the field "call_site" falls on the address within
"security_prepare_creds". That is, it will compare the value of "call_site" and
the filter will return true if it is greater than or equal to the start of
the function "security_prepare_creds" and less than the end of that function.

The ".function" postfix can only be attached to values of size long, and can only
be compared with "==" or "!=".

5.2 Setting filters
-------------------

+198 −44
Original line number Diff line number Diff line
@@ -81,6 +81,7 @@ Documentation written by Tom Zanussi
	.usecs         display a common_timestamp in microseconds
        .percent       display a number of percentage value
        .graph         display a bar-graph of a value
	.stacktrace    display as a stacktrace (must by a long[] type)
	=============  =================================================

  Note that in general the semantics of a given field aren't
@@ -1786,6 +1787,8 @@ or assigned to a variable and referenced in a subsequent expression::
  # echo 'hist:keys=next_pid:us_per_sec=1000000 ...' >> event/trigger
  # echo 'hist:keys=next_pid:timestamp_secs=common_timestamp/$us_per_sec ...' >> event/trigger

Variables can even hold stacktraces, which are useful with synthetic events.

2.2.2 Synthetic Events
----------------------

@@ -1861,7 +1864,7 @@ A histogram can now be defined for the new synthetic event::
The above shows the latency "lat" in a power of 2 grouping.

Like any other event, once a histogram is enabled for the event, the
output can be displayed by reading the event's 'hist' file.
output can be displayed by reading the event's 'hist' file::

  # cat /sys/kernel/tracing/events/synthetic/wakeup_latency/hist

@@ -1908,7 +1911,7 @@ output can be displayed by reading the event's 'hist' file.


The latency values can also be grouped linearly by a given size with
the ".buckets" modifier and specify a size (in this case groups of 10).
the ".buckets" modifier and specify a size (in this case groups of 10)::

  # echo 'hist:keys=pid,prio,lat.buckets=10:sort=lat' >> \
        /sys/kernel/tracing/events/synthetic/wakeup_latency/trigger
@@ -1940,6 +1943,157 @@ the ".buckets" modifier and specify a size (in this case groups of 10).
      Entries: 16
      Dropped: 0

To save stacktraces, create a synthetic event with a field of type "unsigned long[]"
or even just "long[]". For example, to see how long a task is blocked in an
uninterruptible state::

  # cd /sys/kernel/tracing
  # echo 's:block_lat pid_t pid; u64 delta; unsigned long[] stack;' > dynamic_events
  # echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=stacktrace  if prev_state == 2' >> events/sched/sched_switch/trigger
  # echo 'hist:keys=prev_pid:delta=common_timestamp.usecs-$ts,s=$st:onmax($delta).trace(block_lat,prev_pid,$delta,$s)' >> events/sched/sched_switch/trigger
  # echo 1 > events/synthetic/block_lat/enable
  # cat trace

  # tracer: nop
  #
  # entries-in-buffer/entries-written: 2/2   #P:8
  #
  #                                _-----=> irqs-off/BH-disabled
  #                               / _----=> need-resched
  #                              | / _---=> hardirq/softirq
  #                              || / _--=> preempt-depth
  #                              ||| / _-=> migrate-disable
  #                              |||| /     delay
  #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
  #              | |         |   |||||     |         |
            <idle>-0       [005] d..4.   521.164922: block_lat: pid=0 delta=8322 stack=STACK:
  => __schedule+0x448/0x7b0
  => schedule+0x5a/0xb0
  => io_schedule+0x42/0x70
  => bit_wait_io+0xd/0x60
  => __wait_on_bit+0x4b/0x140
  => out_of_line_wait_on_bit+0x91/0xb0
  => jbd2_journal_commit_transaction+0x1679/0x1a70
  => kjournald2+0xa9/0x280
  => kthread+0xe9/0x110
  => ret_from_fork+0x2c/0x50

             <...>-2       [004] d..4.   525.184257: block_lat: pid=2 delta=76 stack=STACK:
  => __schedule+0x448/0x7b0
  => schedule+0x5a/0xb0
  => schedule_timeout+0x11a/0x150
  => wait_for_completion_killable+0x144/0x1f0
  => __kthread_create_on_node+0xe7/0x1e0
  => kthread_create_on_node+0x51/0x70
  => create_worker+0xcc/0x1a0
  => worker_thread+0x2ad/0x380
  => kthread+0xe9/0x110
  => ret_from_fork+0x2c/0x50

A synthetic event that has a stacktrace field may use it as a key in
histogram::

  # echo 'hist:keys=delta.buckets=100,stack.stacktrace:sort=delta' > events/synthetic/block_lat/trigger
  # cat events/synthetic/block_lat/hist

  # event histogram
  #
  # trigger info: hist:keys=delta.buckets=100,stack.stacktrace:vals=hitcount:sort=delta.buckets=100:size=2048 [active]
  #
  { delta: ~ 0-99, stack.stacktrace         __schedule+0xa19/0x1520
         schedule+0x6b/0x110
         io_schedule+0x46/0x80
         bit_wait_io+0x11/0x80
         __wait_on_bit+0x4e/0x120
         out_of_line_wait_on_bit+0x8d/0xb0
         __wait_on_buffer+0x33/0x40
         jbd2_journal_commit_transaction+0x155a/0x19b0
         kjournald2+0xab/0x270
         kthread+0xfa/0x130
         ret_from_fork+0x29/0x50
  } hitcount:          1
  { delta: ~ 0-99, stack.stacktrace         __schedule+0xa19/0x1520
         schedule+0x6b/0x110
         io_schedule+0x46/0x80
         rq_qos_wait+0xd0/0x170
         wbt_wait+0x9e/0xf0
         __rq_qos_throttle+0x25/0x40
         blk_mq_submit_bio+0x2c3/0x5b0
         __submit_bio+0xff/0x190
         submit_bio_noacct_nocheck+0x25b/0x2b0
         submit_bio_noacct+0x20b/0x600
         submit_bio+0x28/0x90
         ext4_bio_write_page+0x1e0/0x8c0
         mpage_submit_page+0x60/0x80
         mpage_process_page_bufs+0x16c/0x180
         mpage_prepare_extent_to_map+0x23f/0x530
  } hitcount:          1
  { delta: ~ 0-99, stack.stacktrace         __schedule+0xa19/0x1520
         schedule+0x6b/0x110
         schedule_hrtimeout_range_clock+0x97/0x110
         schedule_hrtimeout_range+0x13/0x20
         usleep_range_state+0x65/0x90
         __intel_wait_for_register+0x1c1/0x230 [i915]
         intel_psr_wait_for_idle_locked+0x171/0x2a0 [i915]
         intel_pipe_update_start+0x169/0x360 [i915]
         intel_update_crtc+0x112/0x490 [i915]
         skl_commit_modeset_enables+0x199/0x600 [i915]
         intel_atomic_commit_tail+0x7c4/0x1080 [i915]
         intel_atomic_commit_work+0x12/0x20 [i915]
         process_one_work+0x21c/0x3f0
         worker_thread+0x50/0x3e0
         kthread+0xfa/0x130
  } hitcount:          3
  { delta: ~ 0-99, stack.stacktrace         __schedule+0xa19/0x1520
         schedule+0x6b/0x110
         schedule_timeout+0x11e/0x160
         __wait_for_common+0x8f/0x190
         wait_for_completion+0x24/0x30
         __flush_work.isra.0+0x1cc/0x360
         flush_work+0xe/0x20
         drm_mode_rmfb+0x18b/0x1d0 [drm]
         drm_mode_rmfb_ioctl+0x10/0x20 [drm]
         drm_ioctl_kernel+0xb8/0x150 [drm]
         drm_ioctl+0x243/0x560 [drm]
         __x64_sys_ioctl+0x92/0xd0
         do_syscall_64+0x59/0x90
         entry_SYSCALL_64_after_hwframe+0x72/0xdc
  } hitcount:          1
  { delta: ~ 0-99, stack.stacktrace         __schedule+0xa19/0x1520
         schedule+0x6b/0x110
         schedule_timeout+0x87/0x160
         __wait_for_common+0x8f/0x190
         wait_for_completion_timeout+0x1d/0x30
         drm_atomic_helper_wait_for_flip_done+0x57/0x90 [drm_kms_helper]
         intel_atomic_commit_tail+0x8ce/0x1080 [i915]
         intel_atomic_commit_work+0x12/0x20 [i915]
         process_one_work+0x21c/0x3f0
         worker_thread+0x50/0x3e0
         kthread+0xfa/0x130
         ret_from_fork+0x29/0x50
  } hitcount:          1
  { delta: ~ 100-199, stack.stacktrace         __schedule+0xa19/0x1520
         schedule+0x6b/0x110
         schedule_hrtimeout_range_clock+0x97/0x110
         schedule_hrtimeout_range+0x13/0x20
         usleep_range_state+0x65/0x90
         pci_set_low_power_state+0x17f/0x1f0
         pci_set_power_state+0x49/0x250
         pci_finish_runtime_suspend+0x4a/0x90
         pci_pm_runtime_suspend+0xcb/0x1b0
         __rpm_callback+0x48/0x120
         rpm_callback+0x67/0x70
         rpm_suspend+0x167/0x780
         rpm_idle+0x25a/0x380
         pm_runtime_work+0x93/0xc0
         process_one_work+0x21c/0x3f0
  } hitcount:          1

  Totals:
    Hits: 10
    Entries: 7
    Dropped: 0

2.2.3 Hist trigger 'handlers' and 'actions'
-------------------------------------------

@@ -2054,7 +2208,7 @@ The following commonly-used handler.action pairs are available:
              wakeup_new_test($testpid) if comm=="cyclictest"' >> \
              /sys/kernel/tracing/events/sched/sched_wakeup_new/trigger

    Or, equivalently, using the 'trace' keyword syntax:
    Or, equivalently, using the 'trace' keyword syntax::

      # echo 'hist:keys=$testpid:testpid=pid:onmatch(sched.sched_wakeup_new).\
              trace(wakeup_new_test,$testpid) if comm=="cyclictest"' >> \
@@ -2191,7 +2345,7 @@ The following commonly-used handler.action pairs are available:
    resulting latency, stored in wakeup_lat, exceeds the current
    maximum latency, a snapshot is taken.  As part of the setup, all
    the scheduler events are also enabled, which are the events that
    will show up in the snapshot when it is taken at some point:
    will show up in the snapshot when it is taken at some point::

      # echo 1 > /sys/kernel/tracing/events/sched/enable

@@ -2210,7 +2364,7 @@ The following commonly-used handler.action pairs are available:
    following the rest of the fields.

    If a snapshot was taken, there is also a message indicating that,
    along with the value and event that triggered the global maximum:
    along with the value and event that triggered the global maximum::

      # cat /sys/kernel/tracing/events/sched/sched_switch/hist
        { next_pid:       2101 } hitcount:        200
@@ -2310,7 +2464,7 @@ The following commonly-used handler.action pairs are available:
    $cwnd variable.  If the value has changed, a snapshot is taken.
    As part of the setup, all the scheduler and tcp events are also
    enabled, which are the events that will show up in the snapshot
    when it is taken at some point:
    when it is taken at some point::

      # echo 1 > /sys/kernel/tracing/events/sched/enable
      # echo 1 > /sys/kernel/tracing/events/tcp/enable
@@ -2341,7 +2495,7 @@ The following commonly-used handler.action pairs are available:
      { dport:        443 } hitcount:        211
	changed:         10  snd_wnd:      26960  srtt:      17379  rcv_wnd:      28800

    Snapshot taken (see tracing/snapshot).  Details::
      Snapshot taken (see tracing/snapshot).  Details:

          triggering value { onchange($cwnd) }:         10
          triggered by event with key: { dport:         80 }
+1 −1
Original line number Diff line number Diff line
@@ -297,7 +297,7 @@ bool mac_pton(const char *s, u8 *mac);
 *
 * Use tracing_on/tracing_off when you want to quickly turn on or off
 * tracing. It simply enables or disables the recording of the trace events.
 * This also corresponds to the user space /sys/kernel/debug/tracing/tracing_on
 * This also corresponds to the user space /sys/kernel/tracing/tracing_on
 * file, which gives a means for the kernel and userspace to interact.
 * Place a tracing_off() in the kernel where you want tracing to end.
 * From user space, examine the trace, and then echo 1 > tracing_on
+12 −0
Original line number Diff line number Diff line
@@ -33,6 +33,18 @@ struct trace_array;
int register_ftrace_export(struct trace_export *export);
int unregister_ftrace_export(struct trace_export *export);

/**
 * trace_array_puts - write a constant string into the trace buffer.
 * @tr:    The trace array to write to
 * @str:   The constant string to write
 */
#define trace_array_puts(tr, str)					\
	({								\
		str ? __trace_array_puts(tr, _THIS_IP_, str, strlen(str)) : -1;	\
	})
int __trace_array_puts(struct trace_array *tr, unsigned long ip,
		       const char *str, int size);

void trace_printk_init_buffers(void);
__printf(3, 4)
int trace_array_printk(struct trace_array *tr, unsigned long ip,
Loading