Skip to content
  1. Oct 13, 2022
  2. Oct 12, 2022
  3. Oct 06, 2022
    • Steven Rostedt (Google)'s avatar
      ftrace: Create separate entry in MAINTAINERS for function hooks · 4f881a69
      Steven Rostedt (Google) authored
      
      
      The function hooks (ftrace) is a completely different subsystem from the
      general tracing. It manages how to attach callbacks to most functions in
      the kernel. It is also used by live kernel patching. It really is not part
      of tracing, although tracing uses it.
      
      Create a separate entry for FUNCTION HOOKS (FTRACE) to be separate from
      tracing itself in the MAINTAINERS file.
      
      Perhaps it should be moved out of the kernel/trace directory, but that's
      for another time.
      
      Link: https://lkml.kernel.org/r/20221006144439.459272364@goodmis.org
      
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      4f881a69
    • Steven Rostedt (Google)'s avatar
      tracing: Update MAINTAINERS to reflect new tracing git repo · fb17b268
      Steven Rostedt (Google) authored
      
      
      The tracing git repo will no longer be housed in my personal git repo,
      but instead live in trace/linux-trace.git.
      
      Update the MAINTAINERS file appropriately.
      
      Link: https://lkml.kernel.org/r/20221006144439.282193367@goodmis.org
      
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      fb17b268
    • Steven Rostedt (Google)'s avatar
      tracing: Do not free snapshot if tracer is on cmdline · a541a955
      Steven Rostedt (Google) authored
      The ftrace_boot_snapshot and alloc_snapshot cmdline options allocate the
      snapshot buffer at boot up for use later. The ftrace_boot_snapshot in
      particular requires the snapshot to be allocated because it will take a
      snapshot at the end of boot up allowing to see the traces that happened
      during boot so that it's not lost when user space takes over.
      
      When a tracer is registered (started) there's a path that checks if it
      requires the snapshot buffer or not, and if it does not and it was
      allocated it will do a synchronization and free the snapshot buffer.
      
      This is only required if the previous tracer was using it for "max
      latency" snapshots, as it needs to make sure all max snapshots are
      complete before freeing. But this is only needed if the previous tracer
      was using the snapshot buffer for latency (like irqoff tracer and
      friends). But it does not make sense to free it, if the previous tracer
      was not using it, and the snapshot was allocated by the cmdline
      parameters. This basically takes away the point of allocating it in the
      first place!
      
      Note, the allocated snapshot worked fine for just trace events, but fails
      when a tracer is enabled on the cmdline.
      
      Further investigation, this goes back even further and it does not require
      a tracer on the cmdline to fail. Simply enable snapshots and then enable a
      tracer, and it will remove the snapshot.
      
      Link: https://lkml.kernel.org/r/20221005113757.041df7fe@gandalf.local.home
      
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: stable@vger.kernel.org
      Fixes: 45ad21ca
      
       ("tracing: Have trace_array keep track if snapshot buffer is allocated")
      Reported-by: default avatarRoss Zwisler <zwisler@kernel.org>
      Tested-by: default avatarRoss Zwisler <zwisler@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      a541a955
    • Steven Rostedt (Google)'s avatar
      ftrace: Still disable enabled records marked as disabled · cf04f2d5
      Steven Rostedt (Google) authored
      Weak functions started causing havoc as they showed up in the
      "available_filter_functions" and this confused people as to why some
      functions marked as "notrace" were listed, but when enabled they did
      nothing. This was because weak functions can still have fentry calls, and
      these addresses get added to the "available_filter_functions" file.
      kallsyms is what converts those addresses to names, and since the weak
      functions are not listed in kallsyms, it would just pick the function
      before that.
      
      To solve this, there was a trick to detect weak functions listed, and
      these records would be marked as DISABLED so that they do not get enabled
      and are mostly ignored. As the processing of the list of all functions to
      figure out what is weak or not can take a long time, this process is put
      off into a kernel thread and run in parallel with the rest of start up.
      
      Now the issue happens whet function tracing is enabled via the kernel
      command line. As it starts very early in boot up, it can be enabled before
      the records that are weak are marked to be disabled. This causes an issue
      in the accounting, as the weak records are enabled by the command line
      function tracing, but after boot up, they are not disabled.
      
      The ftrace records have several accounting flags and a ref count. The
      DISABLED flag is just one. If the record is enabled before it is marked
      DISABLED it will get an ENABLED flag and also have its ref counter
      incremented. After it is marked for DISABLED, neither the ENABLED flag nor
      the ref counter is cleared. There's sanity checks on the records that are
      performed after an ftrace function is registered or unregistered, and this
      detected that there were records marked as ENABLED with ref counter that
      should not have been.
      
      Note, the module loading code uses the DISABLED flag as well to keep its
      functions from being modified while its being loaded and some of these
      flags may get set in this process. So changing the verification code to
      ignore DISABLED records is a no go, as it still needs to verify that the
      module records are working too.
      
      Also, the weak functions still are calling a trampoline. Even though they
      should never be called, it is dangerous to leave these weak functions
      calling a trampoline that is freed, so they should still be set back to
      nops.
      
      There's two places that need to not skip records that have the ENABLED
      and the DISABLED flags set. That is where the ftrace_ops is processed and
      sets the records ref counts, and then later when the function itself is to
      be updated, and the ENABLED flag gets removed. Add a helper function
      "skip_record()" that returns true if the record has the DISABLED flag set
      but not the ENABLED flag.
      
      Link: https://lkml.kernel.org/r/20221005003809.27d2b97b@gandalf.local.home
      
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: stable@vger.kernel.org
      Fixes: b39181f7
      
       ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      cf04f2d5
  4. Oct 04, 2022
    • Beau Belgrave's avatar
      tracing/user_events: Move pages/locks into groups to prepare for namespaces · e5d27181
      Beau Belgrave authored
      
      
      In order to enable namespaces or any sort of isolation within
      user_events the register lock and pages need to be broken up into
      groups. Each event and file now has a group pointer which stores the
      actual pages to map, lookup data and synchronization objects.
      
      This only enables a single group that maps to init_user_ns, as IMA
      namespace has done. This enables user_events to start the work of
      supporting namespaces by walking the namespaces up to the init_user_ns.
      Future patches will address other user namespaces and will align to the
      approaches the IMA namespace uses.
      
      Link: https://lore.kernel.org/linux-kernel/20220915193221.1728029-15-stefanb@linux.ibm.com/#t
      Link: https://lkml.kernel.org/r/20221001001016.2832-2-beaub@linux.microsoft.com
      
      Signed-off-by: default avatarBeau Belgrave <beaub@linux.microsoft.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      e5d27181
    • Steven Rostedt (Google)'s avatar
      tracing: Add Masami Hiramatsu as co-maintainer · 5d8d2bb9
      Steven Rostedt (Google) authored
      
      
      Masami has been maintaining kprobes for a while now and that code has
      been an integral part of tracing. He has also been an excellent reviewer
      of all the tracing code and contributor as well.
      
      The tracing subsystem needs another active maintainer to keep it running
      smoothly, and I do not know anyone more qualified for the job than Masami.
      
      Ingo has also told me that he has not been active in the tracing code for
      some time and said he could be removed from the TRACING portion of the
      MAINTAINERS file.
      
      Link: https://lkml.kernel.org/r/20220930124131.7b6432dd@gandalf.local.home
      
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Acked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      5d8d2bb9
    • Chen Zhongjin's avatar
      tracing: Remove unused variable 'dups' · ed87277f
      Chen Zhongjin authored
      Reported by Clang [-Wunused-but-set-variable]
      
      'commit c193707d
      
       ("tracing: Remove code which merges duplicates")'
      This commit removed the code which merges duplicates in detect_dups(),
      but forgot to delete the variable 'dups' which used to merge
      duplicates in the loop.
      
      Now only 'total_dups' is needed, remove 'dups' for clean code.
      
      Link: https://lkml.kernel.org/r/20220930103236.253985-1-chenzhongjin@huawei.com
      
      Signed-off-by: default avatarChen Zhongjin <chenzhongjin@huawei.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      ed87277f
    • Mark Rutland's avatar
      MAINTAINERS: add myself as a tracing reviewer · 0e0f0b74
      Mark Rutland authored
      
      
      Since I'm actively involved in a number of arch bits that intersect
      ftrace (e.g. the actual arch implementation on arm64, stacktracing,
      entry management, and general instrumentation safety), add myself as a
      reviewer of the core ftrace code so that I have the change to catch any
      potential problems early.
      
      I spoke with Steven about this at LPC, and it seemed to make sense to
      add me as a reviewer.
      
      Link: https://lkml.kernel.org/r/20220928114621.248038-1-mark.rutland@arm.com
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      0e0f0b74
  5. Sep 29, 2022
  6. Sep 28, 2022
    • Steven Rostedt (Google)'s avatar
      ring-buffer: Check pending waiters when doing wake ups as well · ec0bbc5e
      Steven Rostedt (Google) authored
      The wake up waiters only checks the "wakeup_full" variable and not the
      "full_waiters_pending". The full_waiters_pending is set when a waiter is
      added to the wait queue. The wakeup_full is only set when an event is
      triggered, and it clears the full_waiters_pending to avoid multiple calls
      to irq_work_queue().
      
      The irq_work callback really needs to check both wakeup_full as well as
      full_waiters_pending such that this code can be used to wake up waiters
      when a file is closed that represents the ring buffer and the waiters need
      to be woken up.
      
      Link: https://lkml.kernel.org/r/20220927231824.209460321@goodmis.org
      
      Cc: stable@vger.kernel.org
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Fixes: 15693458
      
       ("tracing/ring-buffer: Move poll wake ups into ring buffer code")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      ec0bbc5e
    • Steven Rostedt (Google)'s avatar
      ring-buffer: Have the shortest_full queue be the shortest not longest · 3b19d614
      Steven Rostedt (Google) authored
      The logic to know when the shortest waiters on the ring buffer should be
      woken up or not has uses a less than instead of a greater than compare,
      which causes the shortest_full to actually be the longest.
      
      Link: https://lkml.kernel.org/r/20220927231823.718039222@goodmis.org
      
      Cc: stable@vger.kernel.org
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Fixes: 2c2b0a78
      
       ("ring-buffer: Add percentage of ring buffer full to wake up reader")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      3b19d614
    • Steven Rostedt (Google)'s avatar
      ring-buffer: Allow splice to read previous partially read pages · fa8f4a89
      Steven Rostedt (Google) authored
      If a page is partially read, and then the splice system call is run
      against the ring buffer, it will always fail to read, no matter how much
      is in the ring buffer. That's because the code path for a partial read of
      the page does will fail if the "full" flag is set.
      
      The splice system call wants full pages, so if the read of the ring buffer
      is not yet full, it should return zero, and the splice will block. But if
      a previous read was done, where the beginning has been consumed, it should
      still be given to the splice caller if the rest of the page has been
      written to.
      
      This caused the splice command to never consume data in this scenario, and
      let the ring buffer just fill up and lose events.
      
      Link: https://lkml.kernel.org/r/20220927144317.46be6b80@gandalf.local.home
      
      Cc: stable@vger.kernel.org
      Fixes: 8789a9e7
      
       ("ring-buffer: read page interface")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      fa8f4a89
    • Song Liu's avatar
      ftrace: Fix recursive locking direct_mutex in ftrace_modify_direct_caller · 9d2ce78d
      Song Liu authored
      Naveen reported recursive locking of direct_mutex with sample
      ftrace-direct-modify.ko:
      
      [   74.762406] WARNING: possible recursive locking detected
      [   74.762887] 6.0.0-rc6+ #33 Not tainted
      [   74.763216] --------------------------------------------
      [   74.763672] event-sample-fn/1084 is trying to acquire lock:
      [   74.764152] ffffffff86c9d6b0 (direct_mutex){+.+.}-{3:3}, at: \
          register_ftrace_function+0x1f/0x180
      [   74.764922]
      [   74.764922] but task is already holding lock:
      [   74.765421] ffffffff86c9d6b0 (direct_mutex){+.+.}-{3:3}, at: \
          modify_ftrace_direct+0x34/0x1f0
      [   74.766142]
      [   74.766142] other info that might help us debug this:
      [   74.766701]  Possible unsafe locking scenario:
      [   74.766701]
      [   74.767216]        CPU0
      [   74.767437]        ----
      [   74.767656]   lock(direct_mutex);
      [   74.767952]   lock(direct_mutex);
      [   74.768245]
      [   74.768245]  *** DEADLOCK ***
      [   74.768245]
      [   74.768750]  May be due to missing lock nesting notation
      [   74.768750]
      [   74.769332] 1 lock held by event-sample-fn/1084:
      [   74.769731]  #0: ffffffff86c9d6b0 (direct_mutex){+.+.}-{3:3}, at: \
          modify_ftrace_direct+0x34/0x1f0
      [   74.770496]
      [   74.770496] stack backtrace:
      [   74.770884] CPU: 4 PID: 1084 Comm: event-sample-fn Not tainted ...
      [   74.771498] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ...
      [   74.772474] Call Trace:
      [   74.772696]  <TASK>
      [   74.772896]  dump_stack_lvl+0x44/0x5b
      [   74.773223]  __lock_acquire.cold.74+0xac/0x2b7
      [   74.773616]  lock_acquire+0xd2/0x310
      [   74.773936]  ? register_ftrace_function+0x1f/0x180
      [   74.774357]  ? lock_is_held_type+0xd8/0x130
      [   74.774744]  ? my_tramp2+0x11/0x11 [ftrace_direct_modify]
      [   74.775213]  __mutex_lock+0x99/0x1010
      [   74.775536]  ? register_ftrace_function+0x1f/0x180
      [   74.775954]  ? slab_free_freelist_hook.isra.43+0x115/0x160
      [   74.776424]  ? ftrace_set_hash+0x195/0x220
      [   74.776779]  ? register_ftrace_function+0x1f/0x180
      [   74.777194]  ? kfree+0x3e1/0x440
      [   74.777482]  ? my_tramp2+0x11/0x11 [ftrace_direct_modify]
      [   74.777941]  ? __schedule+0xb40/0xb40
      [   74.778258]  ? register_ftrace_function+0x1f/0x180
      [   74.778672]  ? my_tramp1+0xf/0xf [ftrace_direct_modify]
      [   74.779128]  register_ftrace_function+0x1f/0x180
      [   74.779527]  ? ftrace_set_filter_ip+0x33/0x70
      [   74.779910]  ? __schedule+0xb40/0xb40
      [   74.780231]  ? my_tramp1+0xf/0xf [ftrace_direct_modify]
      [   74.780678]  ? my_tramp2+0x11/0x11 [ftrace_direct_modify]
      [   74.781147]  ftrace_modify_direct_caller+0x5b/0x90
      [   74.781563]  ? 0xffffffffa0201000
      [   74.781859]  ? my_tramp1+0xf/0xf [ftrace_direct_modify]
      [   74.782309]  modify_ftrace_direct+0x1b2/0x1f0
      [   74.782690]  ? __schedule+0xb40/0xb40
      [   74.783014]  ? simple_thread+0x2a/0xb0 [ftrace_direct_modify]
      [   74.783508]  ? __schedule+0xb40/0xb40
      [   74.783832]  ? my_tramp2+0x11/0x11 [ftrace_direct_modify]
      [   74.784294]  simple_thread+0x76/0xb0 [ftrace_direct_modify]
      [   74.784766]  kthread+0xf5/0x120
      [   74.785052]  ? kthread_complete_and_exit+0x20/0x20
      [   74.785464]  ret_from_fork+0x22/0x30
      [   74.785781]  </TASK>
      
      Fix this by using register_ftrace_function_nolock in
      ftrace_modify_direct_caller.
      
      Link: https://lkml.kernel.org/r/20220927004146.1215303-1-song@kernel.org
      
      Fixes: 53cd885b
      
       ("ftrace: Allow IPMODIFY and DIRECT ops on the same function")
      Reported-and-tested-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      9d2ce78d
    • Zheng Yejian's avatar
      ftrace: Properly unset FTRACE_HASH_FL_MOD · 0ce0638e
      Zheng Yejian authored
      When executing following commands like what document said, but the log
      "#### all functions enabled ####" was not shown as expect:
        1. Set a 'mod' filter:
          $ echo 'write*:mod:ext3' > /sys/kernel/tracing/set_ftrace_filter
        2. Invert above filter:
          $ echo '!write*:mod:ext3' >> /sys/kernel/tracing/set_ftrace_filter
        3. Read the file:
          $ cat /sys/kernel/tracing/set_ftrace_filter
      
      By some debugging, I found that flag FTRACE_HASH_FL_MOD was not unset
      after inversion like above step 2 and then result of ftrace_hash_empty()
      is incorrect.
      
      Link: https://lkml.kernel.org/r/20220926152008.2239274-1-zhengyejian1@huawei.com
      
      Cc: <mingo@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: 8c08f0d5
      
       ("ftrace: Have cached module filters be an active filter")
      Signed-off-by: default avatarZheng Yejian <zhengyejian1@huawei.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      0ce0638e
    • Tao Chen's avatar
      tracing/eprobe: Fix alloc event dir failed when event name no set · dc399ade
      Tao Chen authored
      The event dir will alloc failed when event name no set, using the
      command:
      "echo "e:esys/ syscalls/sys_enter_openat file=\$filename:string"
      >> dynamic_events"
      It seems that dir name="syscalls/sys_enter_openat" is not allowed
      in debugfs. So just use the "sys_enter_openat" as the event name.
      
      Link: https://lkml.kernel.org/r/1664028814-45923-1-git-send-email-chentao.kernel@linux.alibaba.com
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Tom Zanussi <zanussi@kernel.org>
      Cc: Linyu Yuan <quic_linyyuan@quicinc.com>
      Cc: Tao Chen <chentao.kernel@linux.alibaba.com
      Cc: stable@vger.kernel.org
      Fixes: 95c104c3
      
       ("tracing: Auto generate event name when creating a group of events")
      Acked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarTao Chen <chentao.kernel@linux.alibaba.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      dc399ade
    • Chen Zhongjin's avatar
      x86: kprobes: Remove unused macro stack_addr · ae398ad8
      Chen Zhongjin authored
      An unused macro reported by [-Wunused-macros].
      
      This macro is used to access the sp in pt_regs because at that time
      x86_32 can only get sp by kernel_stack_pointer(regs).
      
      '3c88c692
      
       ("x86/stackframe/32: Provide consistent pt_regs")'
      This commit have unified the pt_regs and from them we can get sp from
      pt_regs with regs->sp easily. Nowhere is using this macro anymore.
      
      Refrencing pt_regs directly is more clear. Remove this macro for
      code cleaning.
      
      Link: https://lkml.kernel.org/r/20220924072629.104759-1-chenzhongjin@huawei.com
      
      Signed-off-by: default avatarChen Zhongjin <chenzhongjin@huawei.com>
      Acked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      ae398ad8
    • Gaosheng Cui's avatar
      ftrace: Remove obsoleted code from ftrace and task_struct · 3008119a
      Gaosheng Cui authored
      The trace of "struct task_struct" was no longer used since
      commit 345ddcc8
      
       ("ftrace: Have set_ftrace_pid use the
      bitmap like events do"), and the functions about flags for
      current->trace is useless, so remove them.
      
      Link: https://lkml.kernel.org/r/20220923090012.505990-1-cuigaosheng1@huawei.com
      
      Signed-off-by: default avatarGaosheng Cui <cuigaosheng1@huawei.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      3008119a
    • Waiman Long's avatar
      tracing: Disable interrupt or preemption before acquiring arch_spinlock_t · c0a581d7
      Waiman Long authored
      It was found that some tracing functions in kernel/trace/trace.c acquire
      an arch_spinlock_t with preemption and irqs enabled. An example is the
      tracing_saved_cmdlines_size_read() function which intermittently causes
      a "BUG: using smp_processor_id() in preemptible" warning when the LTP
      read_all_proc test is run.
      
      That can be problematic in case preemption happens after acquiring the
      lock. Add the necessary preemption or interrupt disabling code in the
      appropriate places before acquiring an arch_spinlock_t.
      
      The convention here is to disable preemption for trace_cmdline_lock and
      interupt for max_lock.
      
      Link: https://lkml.kernel.org/r/20220922145622.1744826-1-longman@redhat.com
      
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: stable@vger.kernel.org
      Fixes: a35873a0 ("tracing: Add conditional snapshot")
      Fixes: 939c7a4f
      
       ("tracing: Introduce saved_cmdlines_size file")
      Suggested-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      c0a581d7
  7. Sep 27, 2022