Commit b70100f2 authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull probes updates from Masami Hiramatsu:

 - kprobes: use struct_size() for variable size kretprobe_instance data
   structure.

 - eprobe: Simplify trace_eprobe list iteration.

 - probe events: Data structure field access support on BTF argument.

     - Update BTF argument support on the functions in the kernel
       loadable modules (only loaded modules are supported).

     - Move generic BTF access function (search function prototype and
       get function parameters) to a separated file.

     - Add a function to search a member of data structure in BTF.

     - Support accessing BTF data structure member from probe args by
       C-like arrow('->') and dot('.') operators. e.g.
          't sched_switch next=next->pid vruntime=next->se.vruntime'

     - Support accessing BTF data structure member from $retval. e.g.
          'f getname_flags%return +0($retval->name):string'

     - Add string type checking if BTF type info is available. This will
       reject if user specify ":string" type for non "char pointer"
       type.

     - Automatically assume the fprobe event as a function return event
       if $retval is used.

 - selftests/ftrace: Add BTF data field access test cases.

 - Documentation: Update fprobe event example with BTF data field.

* tag 'probes-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  Documentation: tracing: Update fprobe event example with BTF field
  selftests/ftrace: Add BTF fields access testcases
  tracing/fprobe-event: Assume fprobe is a return event by $retval
  tracing/probes: Add string type check with BTF
  tracing/probes: Support BTF field access from $retval
  tracing/probes: Support BTF based data structure field access
  tracing/probes: Add a function to search a member of a struct/union
  tracing/probes: Move finding func-proto API and getting func-param API to trace_btf
  tracing/probes: Support BTF argument on module functions
  tracing/eprobe: Iterate trace_eprobe directly
  kernel: kprobes: Use struct_size()
parents e021c5f1 a2439a4c
Loading
Loading
Loading
Loading
+46 −18
Original line number Diff line number Diff line
@@ -79,9 +79,9 @@ automatically set by the given name. ::
 f:fprobes/myprobe vfs_read count=count pos=pos

It also chooses the fetch type from BTF information. For example, in the above
example, the ``count`` is unsigned long, and the ``pos`` is a pointer. Thus, both
are converted to 64bit unsigned long, but only ``pos`` has "%Lx" print-format as
below ::
example, the ``count`` is unsigned long, and the ``pos`` is a pointer. Thus,
both are converted to 64bit unsigned long, but only ``pos`` has "%Lx"
print-format as below ::

 # cat events/fprobes/myprobe/format
 name: myprobe
@@ -105,9 +105,47 @@ is expanded to all function arguments of the function or the tracepoint. ::
 # cat dynamic_events
 f:fprobes/myprobe vfs_read file=file buf=buf count=count pos=pos

BTF also affects the ``$retval``. If user doesn't set any type, the retval type is
automatically picked from the BTF. If the function returns ``void``, ``$retval``
is rejected.
BTF also affects the ``$retval``. If user doesn't set any type, the retval
type is automatically picked from the BTF. If the function returns ``void``,
``$retval`` is rejected.

You can access the data fields of a data structure using allow operator ``->``
(for pointer type) and dot operator ``.`` (for data structure type.)::

# echo 't sched_switch preempt prev_pid=prev->pid next_pid=next->pid' >> dynamic_events

The field access operators, ``->`` and ``.`` can be combined for accessing deeper
members and other structure members pointed by the member. e.g. ``foo->bar.baz->qux``
If there is non-name union member, you can directly access it as the C code does.
For example::

 struct {
	union {
	int a;
	int b;
	};
 } *foo;

To access ``a`` and ``b``, use ``foo->a`` and ``foo->b`` in this case.

This data field access is available for the return value via ``$retval``,
e.g. ``$retval->name``.

For these BTF arguments and fields, ``:string`` and ``:ustring`` change the
behavior. If these are used for BTF argument or field, it checks whether
the BTF type of the argument or the data field is ``char *`` or ``char []``,
or not.  If not, it rejects applying the string types. Also, with the BTF
support, you don't need a memory dereference operator (``+0(PTR)``) for
accessing the string pointed by a ``PTR``. It automatically adds the memory
dereference operator according to the BTF type. e.g. ::

# echo 't sched_switch prev->comm:string' >> dynamic_events
# echo 'f getname_flags%return $retval->name:string' >> dynamic_events

The ``prev->comm`` is an embedded char array in the data structure, and
``$retval->name`` is a char pointer in the data structure. But in both
cases, you can use ``:string`` type to get the string.


Usage examples
--------------
@@ -161,10 +199,10 @@ parameters. This means you can access any field values in the task
structure pointed by the ``prev`` and ``next`` arguments.

For example, usually ``task_struct::start_time`` is not traced, but with this
traceprobe event, you can trace it as below.
traceprobe event, you can trace that field as below.
::

  # echo 't sched_switch comm=+1896(next):string start_time=+1728(next):u64' > dynamic_events
  # echo 't sched_switch comm=next->comm:string next->start_time' > dynamic_events
  # head -n 20 trace | tail
 #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
 #              | |         |   |||||     |         |
@@ -176,13 +214,3 @@ traceprobe event, you can trace it as below.
           <idle>-0       [000] d..3.  5606.690317: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="kworker/0:1" usage=1 start_time=137000000
      kworker/0:1-14      [000] d..3.  5606.690339: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="swapper/0" usage=2 start_time=0
           <idle>-0       [000] d..3.  5606.692368: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="kworker/0:1" usage=1 start_time=137000000

Currently, to find the offset of a specific field in the data structure,
you need to build kernel with debuginfo and run `perf probe` command with
`-D` option. e.g.
::

 # perf probe -D "__probestub_sched_switch next->comm:string next->start_time"
 p:probe/__probestub_sched_switch __probestub_sched_switch+0 comm=+1896(%cx):string start_time=+1728(%cx):u64

And replace the ``%cx`` with the ``next``.
+1 −0
Original line number Diff line number Diff line
@@ -209,6 +209,7 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec);
bool btf_type_is_void(const struct btf_type *t);
s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind);
s32 bpf_find_btf_id(const char *name, u32 kind, struct btf **btf_p);
const struct btf_type *btf_type_skip_modifiers(const struct btf *btf,
					       u32 id, u32 *res_id);
const struct btf_type *btf_type_resolve_ptr(const struct btf *btf,
+1 −1
Original line number Diff line number Diff line
@@ -553,7 +553,7 @@ s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind)
	return -ENOENT;
}

static s32 bpf_find_btf_id(const char *name, u32 kind, struct btf **btf_p)
s32 bpf_find_btf_id(const char *name, u32 kind, struct btf **btf_p)
{
	struct btf *btf;
	s32 ret;
+2 −4
Original line number Diff line number Diff line
@@ -2232,8 +2232,7 @@ int register_kretprobe(struct kretprobe *rp)
		return -ENOMEM;

	for (i = 0; i < rp->maxactive; i++) {
		inst = kzalloc(sizeof(struct kretprobe_instance) +
			       rp->data_size, GFP_KERNEL);
		inst = kzalloc(struct_size(inst, data, rp->data_size), GFP_KERNEL);
		if (inst == NULL) {
			rethook_free(rp->rh);
			rp->rh = NULL;
@@ -2256,8 +2255,7 @@ int register_kretprobe(struct kretprobe *rp)

	rp->rph->rp = rp;
	for (i = 0; i < rp->maxactive; i++) {
		inst = kzalloc(sizeof(struct kretprobe_instance) +
			       rp->data_size, GFP_KERNEL);
		inst = kzalloc(struct_size(inst, data, rp->data_size), GFP_KERNEL);
		if (inst == NULL) {
			refcount_set(&rp->rph->ref, i);
			free_rp_inst(rp);
+1 −0
Original line number Diff line number Diff line
@@ -99,6 +99,7 @@ obj-$(CONFIG_KGDB_KDB) += trace_kdb.o
endif
obj-$(CONFIG_DYNAMIC_EVENTS) += trace_dynevent.o
obj-$(CONFIG_PROBE_EVENTS) += trace_probe.o
obj-$(CONFIG_PROBE_EVENTS_BTF_ARGS) += trace_btf.o
obj-$(CONFIG_UPROBE_EVENTS) += trace_uprobe.o
obj-$(CONFIG_BOOTTIME_TRACING) += trace_boot.o
obj-$(CONFIG_FTRACE_RECORD_RECURSION) += trace_recursion_record.o
Loading