Skip to content
  1. Dec 13, 2014
    • Steven Rostedt (Red Hat)'s avatar
      tracing/sched: Check preempt_count() for current when reading task->state · aee4e5f3
      Steven Rostedt (Red Hat) authored
      When recording the state of a task for the sched_switch tracepoint a check of
      task_preempt_count() is performed to see if PREEMPT_ACTIVE is set. This is
      because, technically, a task being preempted is really in the TASK_RUNNING
      state, and that is what should be recorded when tracing a sched_switch,
      even if the task put itself into another state (it hasn't scheduled out
      in that state yet).
      
      But with the change to use per_cpu preempt counts, the
      task_thread_info(p)->preempt_count is no longer used, and instead
      task_preempt_count(p) is used.
      
      The problem is that this does not use the current preempt count but a stale
      one from a previous sched_switch. The task_preempt_count(p) uses
      saved_preempt_count and not preempt_count(). But for tracing sched_switch,
      if p is current, we really want preempt_count().
      
      I hit this bug when I was tracing sleep and the call from do_nanosleep()
      scheduled out in the "RUNNING" state.
      
                 sleep-4290  [000] 537272.259992: sched_switch:         sleep:4290 [120] R ==> swapper/0:0 [120]
                 sleep-4290  [000] 537272.260015: kernel_stack:         <stack trace>
      => __schedule (ffffffff8150864a)
      => schedule (ffffffff815089f8)
      => do_nanosleep (ffffffff8150b76c)
      => hrtimer_nanosleep (ffffffff8108d66b)
      => SyS_nanosleep (ffffffff8108d750)
      => return_to_handler (ffffffff8150e8e5)
      => tracesys_phase2 (ffffffff8150c844)
      
      After a bit of hair pulling, I found that the state was really
      TASK_INTERRUPTIBLE, but the saved_preempt_count had an old PREEMPT_ACTIVE
      set and caused the sched_switch tracepoint to show it as RUNNING.
      
      Link: http://lkml.kernel.org/r/20141210174428.3cb7542a@gandalf.local.home
      
      
      
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: stable@vger.kernel.org # 3.13+
      Cc: Peter Zijlstra <peterz@infradead.org>
      Fixes: 01028747
      
       "sched: Create more preempt_count accessors"
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      aee4e5f3
  2. Dec 12, 2014
  3. Dec 04, 2014
    • Dan Carpenter's avatar
      tracing: Truncated output is better than nothing · 3558a5ac
      Dan Carpenter authored
      The initial reason for this patch is that I noticed that:
      
      	if (len > TRACE_BUF_SIZE)
      
      is off by one.  In this code, if len == TRACE_BUF_SIZE, then it means we
      have truncated the last character off the output string.  If we truncate
      two or more characters then we exit without printing.
      
      After some discussion, we decided that printing truncated data is better
      than not printing at all so we should just use vscnprintf() and remove
      the test entirely.  Also I have updated memcpy() to copy the NUL char
      instead of setting the NUL in a separate step.
      
      Link: http://lkml.kernel.org/r/20141127155752.GA21914@mwanda
      
      
      
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      3558a5ac
    • Byungchul Park's avatar
      tracing: Add additional marks to signal very large time deltas · 8e1e1df2
      Byungchul Park authored
      Currently, function graph tracer prints "!" or "+" just before
      function execution time to signal a function overhead, depending
      on the time. And some tracers tracing latency also print "!" or
      "+" just after time to signal overhead, depending on the interval
      between events. Even it is usually enough to do that, we sometimes
      need to signal for bigger execution time than 100 micro seconds.
      
      For example, I used function graph tracer to detect if there is
      any case that exit_mm() takes too much time. I did following steps
      in /sys/kernel/debug/tracing. It was easier to detect very large
      excution time with patched kernel than with original kernel.
      
      $ echo exit_mm > set_graph_function
      $ echo function_graph > current_tracer
      $ echo > trace
      $ cat trace_pipe > $LOGFILE
       ... (do something and terminate logging)
      $ grep "\\$" $LOGFILE
       3) $ 22082032 us |                      } /* kernel_map_pages */
       3) $ 22082040 us |                    } /* free_pages_prepare */
       3) $ 22082113 us |                  } /* free_hot_cold_page */
       3) $ 22083455 us |                } /* free_hot_cold_page_list */
       3) $ 22083895 us |              } /* release_pages */
       3) $ 22177873 us |            } /* free_pages_and_swap_cache */
       3) $ 22178929 us |          } /* unmap_single_vma */
       3) $ 22198885 us |        } /* unmap_vmas */
       3) $ 22206949 us |      } /* exit_mmap */
       3) $ 22207659 us |    } /* mmput */
       3) $ 22207793 us |  } /* exit_mm */
      
      And then, it was easy to find out that a schedule-out occured by
      sub_preempt_count() within kernel_map_pages().
      
      To detect very large function exection time caused by either problematic
      function implementation or scheduling issues, this patch can be useful.
      
      Link: http://lkml.kernel.org/r/1416789259-24038-1-git-send-email-byungchul.park@lge.com
      
      
      
      Signed-off-by: default avatarByungchul Park <byungchul.park@lge.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      8e1e1df2
    • Joonsoo Kim's avatar
      Documentation: describe trace_buf_size parameter more accurately · 3e6fb8e9
      Joonsoo Kim authored
      I'm stuck into panic that too litte free memory is left when I boot with
      trace_buf_size parameter. After digging into the problem, I found that
      trace_buf_size is the size of trace buffer on each cpu rather than total
      size of trace buffer. To prevent victim like me, change description of
      trace_buf_size parameter more accurately.
      
      Link: http://lkml.kernel.org/r/1417570760-10620-1-git-send-email-iamjoonsoo.kim@lge.com
      
      
      
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      3e6fb8e9
  4. Dec 03, 2014
  5. Dec 02, 2014
  6. Nov 25, 2014
    • Steven Rostedt (Red Hat)'s avatar
      ftrace/x86: Have static function tracing always test for function graph · 62a207d7
      Steven Rostedt (Red Hat) authored
      
      
      New updates to the ftrace generic code had ftrace_stub not always being
      called when ftrace is off. This causes the static tracer to always save
      and restore functions. But it also showed that when function tracing is
      running, the function graph tracer can not. We should always check to see
      if function graph tracing is running even if the function tracer is running
      too. The function tracer code is not the only one that uses the hook to
      function mcount.
      
      Cc: Markos Chandras <Markos.Chandras@imgtec.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      62a207d7
  7. Nov 22, 2014
  8. Nov 21, 2014
  9. Nov 20, 2014
  10. Nov 19, 2014
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Fix race of function probes counting · a9ce7c36
      Steven Rostedt (Red Hat) authored
      The function probe counting for traceon and traceoff suffered a race
      condition where if the probe was executing on two or more CPUs at the
      same time, it could decrement the counter by more than one when
      disabling (or enabling) the tracer only once.
      
      The way the traceon and traceoff probes are suppose to work is that
      they disable (or enable) tracing once per count. If a user were to
      echo 'schedule:traceoff:3' into set_ftrace_filter, then when the
      schedule function was called, it would disable tracing. But the count
      should only be decremented once (to 2). Then if the user enabled tracing
      again (via tracing_on file), the next call to schedule would disable
      tracing again and the count would be decremented to 1.
      
      But if multiple CPUS called schedule at the same time, it is possible
      that the count would be decremented more than once because of the
      simple "count--" used.
      
      By reading the count into a local variable and using memory barriers
      we can guarantee that the count would only be decremented once per
      disable (or enable).
      
      The stack trace probe had a similar race, but here the stack trace will
      decrement for each time it is called. But this had the read-modify-
      write race, where it could stack trace more than the number of times
      that was specified. This case we use a cmpxchg to stack trace only the
      number of times specified.
      
      The dump probes can still use the old "update_count()" function as
      they only run once, and that is controlled by the dump logic
      itself.
      
      Link: http://lkml.kernel.org/r/20141118134643.4b550ee4@gandalf.local.home
      
      
      
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      a9ce7c36