Commit cdf072ac authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull tracing updates from Steven Rostedt:
 "Major changes:

   - Changed location of tracing repo from personal git repo to:
     git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git

   - Added Masami Hiramatsu as co-maintainer

   - Updated MAINTAINERS file to separate out FTRACE as it is more than
     just TRACING.

  Minor changes:

   - Added Mark Rutland as FTRACE reviewer

   - Updated user_events to make it on its way to remove the BROKEN tag.
     The changes should now be acceptable but will run it through a
     cycle and hopefully we can remove the BROKEN tag next release.

   - Added filtering to eprobes

   - Added a delta time to the benchmark trace event

   - Have the histogram and filter callbacks called via a switch
     statement instead of indirect functions. This speeds it up to avoid
     retpolines.

   - Add a way to wake up ring buffer waiters waiting for the ring
     buffer to fill up to its watermark.

   - New ioctl() on the trace_pipe_raw file to wake up ring buffer
     waiters.

   - Wake up waiters when the ring buffer is disabled. A reader may
     block when the ring buffer is disabled, but if it was blocked when
     the ring buffer is disabled it should then wake up.

  Fixes:

   - Allow splice to read partially read ring buffer pages. This fixes
     splice never moving forward.

   - Fix inverted compare that made the "shortest" ring buffer wait
     queue actually the longest.

   - Fix a race in the ring buffer between resetting a page when a
     writer goes to another page, and the reader.

   - Fix ftrace accounting bug when function hooks are added at boot up
     before the weak functions are set to "disabled".

   - Fix bug that freed a user allocated snapshot buffer when enabling a
     tracer.

   - Fix possible recursive locks in osnoise tracer

   - Fix recursive locking direct functions

   - Other minor clean ups and fixes"

* tag 'trace-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (44 commits)
  ftrace: Create separate entry in MAINTAINERS for function hooks
  tracing: Update MAINTAINERS to reflect new tracing git repo
  tracing: Do not free snapshot if tracer is on cmdline
  ftrace: Still disable enabled records marked as disabled
  tracing/user_events: Move pages/locks into groups to prepare for namespaces
  tracing: Add Masami Hiramatsu as co-maintainer
  tracing: Remove unused variable 'dups'
  MAINTAINERS: add myself as a tracing reviewer
  ring-buffer: Fix race between reset page and reading page
  tracing/user_events: Update ABI documentation to align to bits vs bytes
  tracing/user_events: Use bits vs bytes for enabled status page data
  tracing/user_events: Use refcount instead of atomic for ref tracking
  tracing/user_events: Ensure user provided strings are safely formatted
  tracing/user_events: Use WRITE instead of READ for io vector import
  tracing/user_events: Use NULL for strstr checks
  tracing: Fix spelling mistake "preapre" -> "prepare"
  tracing: Wake up waiters when tracing is disabled
  tracing: Add ioctl() to force ring buffer waiters to wake up
  tracing: Wake up ring buffer waiters on closing of the file
  ring-buffer: Add ring_buffer_wake_waiters()
  ...
parents dc553428 4f881a69
Loading
Loading
Loading
Loading
+58 −28
Original line number Diff line number Diff line
@@ -20,14 +20,14 @@ dynamic_events is the same as the ioctl with the u: prefix applied.

Typically programs will register a set of events that they wish to expose to
tools that can read trace_events (such as ftrace and perf). The registration
process gives back two ints to the program for each event. The first int is the
status index. This index describes which byte in the
process gives back two ints to the program for each event. The first int is
the status bit. This describes which bit in little-endian format in the
/sys/kernel/debug/tracing/user_events_status file represents this event. The
second int is the write index. This index describes the data when a write() or
second int is the write index which describes the data when a write() or
writev() is called on the /sys/kernel/debug/tracing/user_events_data file.

The structures referenced in this document are contained with the
/include/uap/linux/user_events.h file in the source tree.
The structures referenced in this document are contained within the
/include/uapi/linux/user_events.h file in the source tree.

**NOTE:** *Both user_events_status and user_events_data are under the tracefs
filesystem and may be mounted at different paths than above.*
@@ -38,18 +38,18 @@ Registering within a user process is done via ioctl() out to the
/sys/kernel/debug/tracing/user_events_data file. The command to issue is
DIAG_IOCSREG.

This command takes a struct user_reg as an argument::
This command takes a packed struct user_reg as an argument::

  struct user_reg {
        u32 size;
        u64 name_args;
        u32 status_index;
        u32 status_bit;
        u32 write_index;
  };

The struct user_reg requires two inputs, the first is the size of the structure
to ensure forward and backward compatibility. The second is the command string
to issue for registering. Upon success two outputs are set, the status index
to issue for registering. Upon success two outputs are set, the status bit
and the write index.

User based events show up under tracefs like any other event under the
@@ -111,15 +111,56 @@ in realtime. This allows user programs to only incur the cost of the write() or
writev() calls when something is actively attached to the event.

User programs call mmap() on /sys/kernel/debug/tracing/user_events_status to
check the status for each event that is registered. The byte to check in the
file is given back after the register ioctl() via user_reg.status_index.
check the status for each event that is registered. The bit to check in the
file is given back after the register ioctl() via user_reg.status_bit. The bit
is always in little-endian format. Programs can check if the bit is set either
using a byte-wise index with a mask or a long-wise index with a little-endian
mask.

Currently the size of user_events_status is a single page, however, custom
kernel configurations can change this size to allow more user based events. In
all cases the size of the file is a multiple of a page size.

For example, if the register ioctl() gives back a status_index of 3 you would
check byte 3 of the returned mmap data to see if anything is attached to that
event.
For example, if the register ioctl() gives back a status_bit of 3 you would
check byte 0 (3 / 8) of the returned mmap data and then AND the result with 8
(1 << (3 % 8)) to see if anything is attached to that event.

A byte-wise index check is performed as follows::

  int index, mask;
  char *status_page;

  index = status_bit / 8;
  mask = 1 << (status_bit % 8);

  ...

  if (status_page[index] & mask) {
        /* Enabled */
  }

A long-wise index check is performed as follows::

  #include <asm/bitsperlong.h>
  #include <endian.h>

  #if __BITS_PER_LONG == 64
  #define endian_swap(x) htole64(x)
  #else
  #define endian_swap(x) htole32(x)
  #endif

  long index, mask, *status_page;

  index = status_bit / __BITS_PER_LONG;
  mask = 1L << (status_bit % __BITS_PER_LONG);
  mask = endian_swap(mask);

  ...

  if (status_page[index] & mask) {
        /* Enabled */
  }

Administrators can easily check the status of all registered events by reading
the user_events_status file directly via a terminal. The output is as follows::
@@ -137,7 +178,7 @@ For example, on a system that has a single event the output looks like this::

  Active: 1
  Busy: 0
  Max: 4096
  Max: 32768

If a user enables the user event via ftrace, the output would change to this::

@@ -145,21 +186,10 @@ If a user enables the user event via ftrace, the output would change to this::

  Active: 1
  Busy: 1
  Max: 4096

**NOTE:** *A status index of 0 will never be returned. This allows user
programs to have an index that can be used on error cases.*

Status Bits
^^^^^^^^^^^
The byte being checked will be non-zero if anything is attached. Programs can
check specific bits in the byte to see what mechanism has been attached.

The following values are defined to aid in checking what has been attached:

**EVENT_STATUS_FTRACE** - Bit set if ftrace has been attached (Bit 0).
  Max: 32768

**EVENT_STATUS_PERF** - Bit set if perf has been attached (Bit 1).
**NOTE:** *A status bit of 0 will never be returned. This allows user programs
to have a bit that can be used on error cases.*

Writing Data
------------
+18 −8
Original line number Diff line number Diff line
@@ -8433,6 +8433,19 @@ L: platform-driver-x86@vger.kernel.org
S:	Maintained
F:	drivers/platform/x86/fujitsu-tablet.c
FUNCTION HOOKS (FTRACE)
M:	Steven Rostedt <rostedt@goodmis.org>
M:	Masami Hiramatsu <mhiramat@kernel.org>
R:	Mark Rutland <mark.rutland@arm.com>
S:	Maintained
T:	git git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
F:	Documentation/trace/ftrace*
F:	kernel/trace/ftrace*
F:	kernel/trace/fgraph.c
F:	arch/*/*/*/*ftrace*
F:	arch/*/*/*ftrace*
F:	include/*/ftrace.h
FUNGIBLE ETHERNET DRIVERS
M:	Dimitris Michailidis <dmichail@fungible.com>
L:	netdev@vger.kernel.org
@@ -11422,7 +11435,7 @@ M: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
M:	"David S. Miller" <davem@davemloft.net>
M:	Masami Hiramatsu <mhiramat@kernel.org>
S:	Maintained
T:	git git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
T:	git git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
F:	Documentation/trace/kprobes.rst
F:	include/asm-generic/kprobes.h
F:	include/linux/kprobes.h
@@ -20771,14 +20784,11 @@ F: drivers/hwmon/pmbus/tps546d24.c
TRACING
M:	Steven Rostedt <rostedt@goodmis.org>
M:	Ingo Molnar <mingo@redhat.com>
M:	Masami Hiramatsu <mhiramat@kernel.org>
S:	Maintained
T:	git git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
F:	Documentation/trace/ftrace.rst
F:	arch/*/*/*/*ftrace*
F:	arch/*/*/*ftrace*
T:	git git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
F:	Documentation/trace/*
F:	fs/tracefs/
F:	include/*/ftrace.h
F:	include/linux/trace*.h
F:	include/trace/
F:	kernel/trace/
@@ -20787,7 +20797,7 @@ F: tools/testing/selftests/ftrace/
TRACING MMIO ACCESSES (MMIOTRACE)
M:	Steven Rostedt <rostedt@goodmis.org>
M:	Ingo Molnar <mingo@kernel.org>
M:	Masami Hiramatsu <mhiramat@kernel.org>
R:	Karol Herbst <karolherbst@gmail.com>
R:	Pekka Paalanen <ppaalanen@gmail.com>
L:	linux-kernel@vger.kernel.org
+0 −1
Original line number Diff line number Diff line
@@ -23,7 +23,6 @@
#define HAVE_FUNCTION_GRAPH_RET_ADDR_PTR

#ifndef __ASSEMBLY__
extern atomic_t modifying_ftrace_code;
extern void __fentry__(void);

static inline unsigned long ftrace_call_adjust(unsigned long addr)
+0 −2
Original line number Diff line number Diff line
@@ -50,8 +50,6 @@ extern const int kretprobe_blacklist_size;

void arch_remove_kprobe(struct kprobe *p);

extern void arch_kprobe_override_function(struct pt_regs *regs);

/* Architecture specific copy of original instruction*/
struct arch_specific_insn {
	/* copy of the original instruction */
+0 −2
Original line number Diff line number Diff line
@@ -59,8 +59,6 @@
DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);

#define stack_addr(regs) ((unsigned long *)regs->sp)

#define W(row, b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, ba, bb, bc, bd, be, bf)\
	(((b0##UL << 0x0)|(b1##UL << 0x1)|(b2##UL << 0x2)|(b3##UL << 0x3) |   \
	  (b4##UL << 0x4)|(b5##UL << 0x5)|(b6##UL << 0x6)|(b7##UL << 0x7) |   \
Loading