Skip to content
  1. Jul 03, 2011
    • Frederic Weisbecker's avatar
      x86: Don't use frame pointer to save old stack on irq entry · a2bbe750
      Frederic Weisbecker authored
      
      
      rbp is used in SAVE_ARGS_IRQ to save the old stack pointer
      in order to restore it later in ret_from_intr.
      
      It is convenient because we save its value in the irq regs
      and it's easily restored using the leave instruction.
      
      However this is a kind of abuse of the frame pointer which
      role is to help unwinding the kernel by chaining frames
      together, each node following the return address to the
      previous frame.
      
      But although we are breaking the frame by changing the stack
      pointer, there is no preceding return address before the new
      frame. Hence using the frame pointer to link the two stacks
      breaks the stack unwinders that find a random value instead of
      a return address here.
      
      There is no workaround that can work in every case. We are using
      the fixup_bp_irq_link() function to dereference that abused frame
      pointer in the case of non nesting interrupt (which means stack
      changed).
      But that doesn't fix the case of interrupts that don't change the
      stack (but we still have the unconditional frame link), which is
      the case of hardirq interrupting softirq. We have no way to detect
      this transition so the frame irq link is considered as a real frame
      pointer and the return address is dereferenced but it is still a
      spurious one.
      
      There are two possible results of this: either the spurious return
      address, a random stack value, luckily belongs to the kernel text
      and then the unwinding can continue and we just have a weird entry
      in the stack trace. Or it doesn't belong to the kernel text and
      unwinding stops there.
      
      This is the reason why stacktraces (including perf callchains) on
      irqs that interrupted softirqs don't work very well.
      
      To solve this, we don't save the old stack pointer on rbp anymore
      but we save it to a scratch register that we push on the new
      stack and that we pop back later on irq return.
      
      This preserves the whole frame chain without spurious return addresses
      in the middle and drops the need for the horrid fixup_bp_irq_link()
      workaround.
      
      And finally irqs that interrupt softirq are sanely unwinded.
      
      Before:
      
          99.81%         perf  [kernel.kallsyms]  [k] perf_pending_event
                         |
                         --- perf_pending_event
                             irq_work_run
                             smp_irq_work_interrupt
                             irq_work_interrupt
                            |
                            |--41.60%-- __read
                            |          |
                            |          |--99.90%-- create_worker
                            |          |          bench_sched_messaging
                            |          |          cmd_bench
                            |          |          run_builtin
                            |          |          main
                            |          |          __libc_start_main
                            |           --0.10%-- [...]
      
      After:
      
           1.64%  swapper  [kernel.kallsyms]  [k] perf_pending_event
                  |
                  --- perf_pending_event
                      irq_work_run
                      smp_irq_work_interrupt
                      irq_work_interrupt
                     |
                     |--95.00%-- arch_irq_work_raise
                     |          irq_work_queue
                     |          __perf_event_overflow
                     |          perf_swevent_overflow
                     |          perf_swevent_event
                     |          perf_tp_event
                     |          perf_trace_softirq
                     |          __do_softirq
                     |          call_softirq
                     |          do_softirq
                     |          irq_exit
                     |          |
                     |          |--73.68%-- smp_apic_timer_interrupt
                     |          |          apic_timer_interrupt
                     |          |          |
                     |          |          |--96.43%-- amd_e400_idle
                     |          |          |          cpu_idle
                     |          |          |          start_secondary
      
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jan Beulich <JBeulich@novell.com>
      a2bbe750
    • Frederic Weisbecker's avatar
      x86: Remove useless unwinder backlink from irq regs saving · 48ffee7d
      Frederic Weisbecker authored
      
      
      The unwinder backlink in interrupt entry is very useless.
      It's actually not part of the stack frame chain and thus is
      never used.
      
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jan Beulich <JBeulich@novell.com>
      48ffee7d
    • Frederic Weisbecker's avatar
      x86,64: Separate arg1 from rbp handling in SAVE_REGS_IRQ · 3b99a3ef
      Frederic Weisbecker authored
      
      
      Just for clarity in the code. Have a first block that handles
      the frame pointer and a separate one that handles pt_regs
      pointer and its use.
      
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jan Beulich <JBeulich@novell.com>
      3b99a3ef
    • Frederic Weisbecker's avatar
      x86,64: Simplify save_regs() · 1871853f
      Frederic Weisbecker authored
      
      
      The save_regs function that saves the regs on low level
      irq entry is complicated because of the fact it changes
      its stack in the middle and also because it manipulates
      data allocated in the caller frame and accesses there
      are directly calculated from callee rsp value with the
      return address in the middle of the way.
      
      This complicates the static stack offsets calculation and
      require more dynamic ones. It also needs a save/restore
      of the function's return address.
      
      To simplify and optimize this, turn save_regs() into a
      macro.
      
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jan Beulich <JBeulich@novell.com>
      1871853f
    • Frederic Weisbecker's avatar
      x86: Fetch stack from regs when possible in dump_trace() · 47ce11a2
      Frederic Weisbecker authored
      
      
      When regs are passed to dump_stack(), we fetch the frame
      pointer from the regs but the stack pointer is taken from
      the current frame.
      
      Thus the frame and stack pointers may not come from the same
      context. For example this can result in the unwinder to
      think the context is in irq, due to the current value of
      the stack, but the frame pointer coming from the regs points
      to a frame from another place. It then tries to fix up
      the irq link but ends up dereferencing a random frame
      pointer that doesn't belong to the irq stack:
      
      [ 9131.706906] ------------[ cut here ]------------
      [ 9131.707003] WARNING: at arch/x86/kernel/dumpstack_64.c:129 dump_trace+0x2aa/0x330()
      [ 9131.707003] Hardware name: AMD690VM-FMH
      [ 9131.707003] Perf: bad frame pointer = 0000000000000005 in callchain
      [ 9131.707003] Modules linked in:
      [ 9131.707003] Pid: 1050, comm: perf Not tainted 3.0.0-rc3+ #181
      [ 9131.707003] Call Trace:
      [ 9131.707003]  <IRQ>  [<ffffffff8104bd4a>] warn_slowpath_common+0x7a/0xb0
      [ 9131.707003]  [<ffffffff8104be21>] warn_slowpath_fmt+0x41/0x50
      [ 9131.707003]  [<ffffffff8178b873>] ? bad_to_user+0x6d/0x10be
      [ 9131.707003]  [<ffffffff8100c2da>] dump_trace+0x2aa/0x330
      [ 9131.707003]  [<ffffffff810107d3>] ? native_sched_clock+0x13/0x50
      [ 9131.707003]  [<ffffffff8101b164>] perf_callchain_kernel+0x54/0x70
      [ 9131.707003]  [<ffffffff810d391f>] perf_prepare_sample+0x19f/0x2a0
      [ 9131.707003]  [<ffffffff810d546c>] __perf_event_overflow+0x16c/0x290
      [ 9131.707003]  [<ffffffff810d5430>] ? __perf_event_overflow+0x130/0x290
      [ 9131.707003]  [<ffffffff810107d3>] ? native_sched_clock+0x13/0x50
      [ 9131.707003]  [<ffffffff8100fbb9>] ? sched_clock+0x9/0x10
      [ 9131.707003]  [<ffffffff810752e5>] ? T.375+0x15/0x90
      [ 9131.707003]  [<ffffffff81084da4>] ? trace_hardirqs_on_caller+0x64/0x180
      [ 9131.707003]  [<ffffffff810817bd>] ? trace_hardirqs_off+0xd/0x10
      [ 9131.707003]  [<ffffffff810d5764>] perf_event_overflow+0x14/0x20
      [ 9131.707003]  [<ffffffff810d588c>] perf_swevent_hrtimer+0x11c/0x130
      [ 9131.707003]  [<ffffffff817821a1>] ? error_exit+0x51/0xb0
      [ 9131.707003]  [<ffffffff81072e93>] __run_hrtimer+0x83/0x1e0
      [ 9131.707003]  [<ffffffff810d5770>] ? perf_event_overflow+0x20/0x20
      [ 9131.707003]  [<ffffffff81073256>] hrtimer_interrupt+0x106/0x250
      [ 9131.707003]  [<ffffffff812a3bfd>] ? trace_hardirqs_off_thunk+0x3a/0x3c
      [ 9131.707003]  [<ffffffff81024833>] smp_apic_timer_interrupt+0x53/0x90
      [ 9131.707003]  [<ffffffff81789053>] apic_timer_interrupt+0x13/0x20
      [ 9131.707003]  <EOI>  [<ffffffff817821a1>] ? error_exit+0x51/0xb0
      [ 9131.707003]  [<ffffffff8178219c>] ? error_exit+0x4c/0xb0
      [ 9131.707003] ---[ end trace b2560d4876709347 ]---
      
      Fix this by simply taking the stack pointer from regs->sp
      when regs are provided.
      
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      47ce11a2
    • Frederic Weisbecker's avatar
      x86: Save stack pointer in perf live regs savings · 9e46294d
      Frederic Weisbecker authored
      
      
      In order to prepare for fetching the stack pointer from the
      regs when possible in dump_trace() instead of taking the
      local one, save the current stack pointer in perf live regs saving.
      
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      9e46294d
  2. Jun 16, 2011
  3. Jun 14, 2011
  4. Jun 13, 2011
  5. Jun 12, 2011
  6. Jun 11, 2011