Skip to content
  1. Apr 02, 2015
    • Alexander Shishkin's avatar
      perf: Add AUX record · 68db7e98
      Alexander Shishkin authored
      
      
      When there's new data in the AUX space, output a record indicating its
      offset and size and a set of flags, such as PERF_AUX_FLAG_TRUNCATED, to
      mean the described data was truncated to fit in the ring buffer.
      
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kaixu Xia <kaixu.xia@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: adrian.hunter@intel.com
      Cc: kan.liang@intel.com
      Cc: markus.t.metzger@intel.com
      Cc: mathieu.poirier@linaro.org
      Link: http://lkml.kernel.org/r/1421237903-181015-7-git-send-email-alexander.shishkin@linux.intel.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      68db7e98
    • Alexander Shishkin's avatar
      perf: Add a pmu capability for "exclusive" events · bed5b25a
      Alexander Shishkin authored
      
      
      Usually, pmus that do, for example, instruction tracing, would only ever
      be able to have one event per task per cpu (or per perf_event_context). For
      such pmus it makes sense to disallow creating conflicting events early on,
      so as to provide consistent behavior for the user.
      
      This patch adds a pmu capability that indicates such constraint on event
      creation.
      
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kaixu Xia <kaixu.xia@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: adrian.hunter@intel.com
      Cc: kan.liang@intel.com
      Cc: markus.t.metzger@intel.com
      Cc: mathieu.poirier@linaro.org
      Link: http://lkml.kernel.org/r/1422613866-113186-1-git-send-email-alexander.shishkin@linux.intel.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      bed5b25a
    • Alexander Shishkin's avatar
      perf: Add a capability for AUX_NO_SG pmus to do software double buffering · 6a279230
      Alexander Shishkin authored
      
      
      For pmus that don't support scatter-gather for AUX data in hardware, it
      might still make sense to implement software double buffering to avoid
      losing data while the user is reading data out. For this purpose, add
      a pmu capability that guarantees multiple high-order chunks for AUX buffer,
      so that the pmu driver can do switchover tricks.
      
      To make use of this feature, add PERF_PMU_CAP_AUX_SW_DOUBLEBUF to your
      pmu's capability mask. This will make the ring buffer AUX allocation code
      ensure that the biggest high order allocation for the aux buffer pages is
      no bigger than half of the total requested buffer size, thus making sure
      that the buffer has at least two high order allocations.
      
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kaixu Xia <kaixu.xia@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: adrian.hunter@intel.com
      Cc: kan.liang@intel.com
      Cc: markus.t.metzger@intel.com
      Cc: mathieu.poirier@linaro.org
      Link: http://lkml.kernel.org/r/1421237903-181015-5-git-send-email-alexander.shishkin@linux.intel.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6a279230
    • Alexander Shishkin's avatar
      perf: Support high-order allocations for AUX space · 0a4e38e6
      Alexander Shishkin authored
      
      
      Some pmus (such as BTS or Intel PT without multiple-entry ToPA capability)
      don't support scatter-gather and will prefer larger contiguous areas for
      their output regions.
      
      This patch adds a new pmu capability to request higher order allocations.
      
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kaixu Xia <kaixu.xia@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: adrian.hunter@intel.com
      Cc: kan.liang@intel.com
      Cc: markus.t.metzger@intel.com
      Cc: mathieu.poirier@linaro.org
      Link: http://lkml.kernel.org/r/1421237903-181015-4-git-send-email-alexander.shishkin@linux.intel.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0a4e38e6
    • Peter Zijlstra's avatar
      perf: Add AUX area to ring buffer for raw data streams · 45bfb2e5
      Peter Zijlstra authored
      
      
      This patch introduces "AUX space" in the perf mmap buffer, intended for
      exporting high bandwidth data streams to userspace, such as instruction
      flow traces.
      
      AUX space is a ring buffer, defined by aux_{offset,size} fields in the
      user_page structure, and read/write pointers aux_{head,tail}, which abide
      by the same rules as data_* counterparts of the main perf buffer.
      
      In order to allocate/mmap AUX, userspace needs to set up aux_offset to
      such an offset that will be greater than data_offset+data_size and
      aux_size to be the desired buffer size. Both need to be page aligned.
      Then, same aux_offset and aux_size should be passed to mmap() call and
      if everything adds up, you should have an AUX buffer as a result.
      
      Pages that are mapped into this buffer also come out of user's mlock
      rlimit plus perf_event_mlock_kb allowance.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kaixu Xia <kaixu.xia@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: adrian.hunter@intel.com
      Cc: kan.liang@intel.com
      Cc: markus.t.metzger@intel.com
      Cc: mathieu.poirier@linaro.org
      Link: http://lkml.kernel.org/r/1421237903-181015-3-git-send-email-alexander.shishkin@linux.intel.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      45bfb2e5
    • Alexander Shishkin's avatar
      perf: Add data_{offset,size} to user_page · e8c6deac
      Alexander Shishkin authored
      
      
      Currently, the actual perf ring buffer is one page into the mmap area,
      following the user page and the userspace follows this convention. This
      patch adds data_{offset,size} fields to user_page that can be used by
      userspace instead for locating perf data in the mmap area. This is also
      helpful when mapping existing or shared buffers if their size is not
      known in advance.
      
      Right now, it is made to follow the existing convention that
      
      	data_offset == PAGE_SIZE and
      	data_offset + data_size == mmap_size.
      
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kaixu Xia <kaixu.xia@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: adrian.hunter@intel.com
      Cc: kan.liang@intel.com
      Cc: markus.t.metzger@intel.com
      Cc: mathieu.poirier@linaro.org
      Link: http://lkml.kernel.org/r/1421237903-181015-2-git-send-email-alexander.shishkin@linux.intel.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e8c6deac
    • Ingo Molnar's avatar
      bpf: Fix the build on BPF_SYSCALL=y && !CONFIG_TRACING kernels, make it more configurable · e1abf2cc
      Ingo Molnar authored
      
      
      So bpf_tracing.o depends on CONFIG_BPF_SYSCALL - but that's not its only
      dependency, it also depends on the tracing infrastructure and on kprobes,
      without which it will fail to build with:
      
        In file included from kernel/trace/bpf_trace.c:14:0:
        kernel/trace/trace.h: In function ‘trace_test_and_set_recursion’:
        kernel/trace/trace.h:491:28: error: ‘struct task_struct’ has no member named ‘trace_recursion’
          unsigned int val = current->trace_recursion;
        [...]
      
      It took quite some time to trigger this build failure, because right now
      BPF_SYSCALL is very obscure, depends on CONFIG_EXPERT. So also make BPF_SYSCALL
      more configurable, not just under CONFIG_EXPERT.
      
      If BPF_SYSCALL, tracing and kprobes are enabled then enable the bpf_tracing
      gateway as well.
      
      We might want to make this an interactive option later on, although
      I'd not complicate it unnecessarily: enabling BPF_SYSCALL is enough of
      an indicator that the user wants BPF support.
      
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e1abf2cc
    • Alexei Starovoitov's avatar
      samples/bpf: Add kmem_alloc()/free() tracker tool · 9811e353
      Alexei Starovoitov authored
      
      
      One BPF program attaches to kmem_cache_alloc_node() and
      remembers all allocated objects in the map.
      Another program attaches to kmem_cache_free() and deletes
      corresponding object from the map.
      
      User space walks the map every second and prints any objects
      which are older than 1 second.
      
      Usage:
      
      	$ sudo tracex4
      
      Then start few long living processes. The 'tracex4' will print
      something like this:
      
      	obj 0xffff880465928000 is 13sec old was allocated at ip ffffffff8105dc32
      	obj 0xffff88043181c280 is 13sec old was allocated at ip ffffffff8105dc32
      	obj 0xffff880465848000 is  8sec old was allocated at ip ffffffff8105dc32
      	obj 0xffff8804338bc280 is 15sec old was allocated at ip ffffffff8105dc32
      
      	$ addr2line -fispe vmlinux ffffffff8105dc32
      	do_fork at fork.c:1665
      
      As soon as processes exit the memory is reclaimed and 'tracex4'
      prints nothing.
      
      Similar experiment can be done with the __kmalloc()/kfree() pair.
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/1427312966-8434-10-git-send-email-ast@plumgrid.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9811e353
    • Alexei Starovoitov's avatar
      samples/bpf: Add IO latency analysis (iosnoop/heatmap) tool · 5c7fc2d2
      Alexei Starovoitov authored
      
      
      BPF C program attaches to
      blk_mq_start_request()/blk_update_request() kprobe events to
      calculate IO latency.
      
      For every completed block IO event it computes the time delta
      in nsec and records in a histogram map:
      
      	map[log10(delta)*10]++
      
      User space reads this histogram map every 2 seconds and prints
      it as a 'heatmap' using gray shades of text terminal. Black
      spaces have many events and white spaces have very few events.
      Left most space is the smallest latency, right most space is
      the largest latency in the range.
      
      Usage:
      
      	$ sudo ./tracex3
      	and do 'sudo dd if=/dev/sda of=/dev/null' in other terminal.
      
      Observe IO latencies and how different activity (like 'make
      kernel') affects it.
      
      Similar experiments can be done for network transmit latencies,
      syscalls, etc.
      
      '-t' flag prints the heatmap using normal ascii characters:
      
      $ sudo ./tracex3 -t
        heatmap of IO latency
        # - many events with this latency
          - few events
      	|1us      |10us     |100us    |1ms      |10ms     |100ms    |1s |10s
      				 *ooo. *O.#.                                    # 221
      			      .  *#     .                                       # 125
      				 ..   .o#*..                                    # 55
      			    .  . .  .  .#O                                      # 37
      				 .#                                             # 175
      				       .#*.                                     # 37
      				  #                                             # 199
      		      .              . *#*.                                     # 55
      				       *#..*                                    # 42
      				  #                                             # 266
      			      ...***Oo#*OO**o#* .                               # 629
      				  #                                             # 271
      				      . .#o* o.*o*                              # 221
      				. . o* *#O..                                    # 50
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/1427312966-8434-9-git-send-email-ast@plumgrid.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5c7fc2d2
    • Alexei Starovoitov's avatar
      samples/bpf: Add counting example for kfree_skb() function calls and the write() syscall · d822a192
      Alexei Starovoitov authored
      
      
      this example has two probes in one C file that attach to
      different kprove events and use two different maps.
      
      1st probe is x64 specific equivalent of dropmon. It attaches to
      kfree_skb, retrevies 'ip' address of kfree_skb() caller and
      counts number of packet drops at that 'ip' address. User space
      prints 'location - count' map every second.
      
      2nd probe attaches to kprobe:sys_write and computes a histogram
      of different write sizes
      
      Usage:
      	$ sudo tracex2
      	location 0xffffffff81695995 count 1
      	location 0xffffffff816d0da9 count 2
      
      	location 0xffffffff81695995 count 2
      	location 0xffffffff816d0da9 count 2
      
      	location 0xffffffff81695995 count 3
      	location 0xffffffff816d0da9 count 2
      
      	557145+0 records in
      	557145+0 records out
      	285258240 bytes (285 MB) copied, 1.02379 s, 279 MB/s
      		   syscall write() stats
      	     byte_size       : count     distribution
      	       1 -> 1        : 3        |                                      |
      	       2 -> 3        : 0        |                                      |
      	       4 -> 7        : 0        |                                      |
      	       8 -> 15       : 0        |                                      |
      	      16 -> 31       : 2        |                                      |
      	      32 -> 63       : 3        |                                      |
      	      64 -> 127      : 1        |                                      |
      	     128 -> 255      : 1        |                                      |
      	     256 -> 511      : 0        |                                      |
      	     512 -> 1023     : 1118968  |************************************* |
      
      Ctrl-C at any time. Kernel will auto cleanup maps and programs
      
      	$ addr2line -ape ./bld_x64/vmlinux 0xffffffff81695995
      	0xffffffff816d0da9 0xffffffff81695995:
      	./bld_x64/../net/ipv4/icmp.c:1038 0xffffffff816d0da9:
      	./bld_x64/../net/unix/af_unix.c:1231
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/1427312966-8434-8-git-send-email-ast@plumgrid.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d822a192
    • Alexei Starovoitov's avatar
      samples/bpf: Add simple non-portable kprobe filter example · b896c4f9
      Alexei Starovoitov authored
      
      
      tracex1_kern.c - C program compiled into BPF.
      
      It attaches to kprobe:netif_receive_skb()
      
      When skb->dev->name == "lo", it prints sample debug message into
      trace_pipe via bpf_trace_printk() helper function.
      
      tracex1_user.c - corresponding user space component that:
        - loads BPF program via bpf() syscall
        - opens kprobes:netif_receive_skb event via perf_event_open()
          syscall
        - attaches the program to event via ioctl(event_fd,
          PERF_EVENT_IOC_SET_BPF, prog_fd);
        - prints from trace_pipe
      
      Note, this BPF program is non-portable. It must be recompiled
      with current kernel headers. kprobe is not a stable ABI and
      BPF+kprobe scripts may no longer be meaningful when kernel
      internals change.
      
      No matter in what way the kernel changes, neither the kprobe,
      nor the BPF program can ever crash or corrupt the kernel,
      assuming the kprobes, perf and BPF subsystem has no bugs.
      
      The verifier will detect that the program is using
      bpf_trace_printk() and the kernel will print 'this is a DEBUG
      kernel' warning banner, which means that bpf_trace_printk()
      should be used for debugging of the BPF program only.
      
      Usage:
      $ sudo tracex1
                  ping-19826 [000] d.s2 63103.382648: : skb ffff880466b1ca00 len 84
                  ping-19826 [000] d.s2 63103.382684: : skb ffff880466b1d300 len 84
      
                  ping-19826 [000] d.s2 63104.382533: : skb ffff880466b1ca00 len 84
                  ping-19826 [000] d.s2 63104.382594: : skb ffff880466b1d300 len 84
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/1427312966-8434-7-git-send-email-ast@plumgrid.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b896c4f9
    • Alexei Starovoitov's avatar
      tracing: Allow BPF programs to call bpf_trace_printk() · 9c959c86
      Alexei Starovoitov authored
      
      
      Debugging of BPF programs needs some form of printk from the
      program, so let programs call limited trace_printk() with %d %u
      %x %p modifiers only.
      
      Similar to kernel modules, during program load verifier checks
      whether program is calling bpf_trace_printk() and if so, kernel
      allocates trace_printk buffers and emits big 'this is debug
      only' banner.
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Reviewed-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1427312966-8434-6-git-send-email-ast@plumgrid.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9c959c86
    • Alexei Starovoitov's avatar
      tracing: Allow BPF programs to call bpf_ktime_get_ns() · d9847d31
      Alexei Starovoitov authored
      
      
      bpf_ktime_get_ns() is used by programs to compute time delta
      between events or as a timestamp
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Reviewed-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1427312966-8434-5-git-send-email-ast@plumgrid.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d9847d31
    • Alexei Starovoitov's avatar
      tracing, perf: Implement BPF programs attached to kprobes · 2541517c
      Alexei Starovoitov authored
      
      
      BPF programs, attached to kprobes, provide a safe way to execute
      user-defined BPF byte-code programs without being able to crash or
      hang the kernel in any way. The BPF engine makes sure that such
      programs have a finite execution time and that they cannot break
      out of their sandbox.
      
      The user interface is to attach to a kprobe via the perf syscall:
      
      	struct perf_event_attr attr = {
      		.type	= PERF_TYPE_TRACEPOINT,
      		.config	= event_id,
      		...
      	};
      
      	event_fd = perf_event_open(&attr,...);
      	ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
      
      'prog_fd' is a file descriptor associated with BPF program
      previously loaded.
      
      'event_id' is an ID of the kprobe created.
      
      Closing 'event_fd':
      
      	close(event_fd);
      
      ... automatically detaches BPF program from it.
      
      BPF programs can call in-kernel helper functions to:
      
        - lookup/update/delete elements in maps
      
        - probe_read - wraper of probe_kernel_read() used to access any
          kernel data structures
      
      BPF programs receive 'struct pt_regs *' as an input ('struct pt_regs' is
      architecture dependent) and return 0 to ignore the event and 1 to store
      kprobe event into the ring buffer.
      
      Note, kprobes are a fundamentally _not_ a stable kernel ABI,
      so BPF programs attached to kprobes must be recompiled for
      every kernel version and user must supply correct LINUX_VERSION_CODE
      in attr.kern_version during bpf_prog_load() call.
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Reviewed-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Reviewed-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1427312966-8434-4-git-send-email-ast@plumgrid.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2541517c
    • Alexei Starovoitov's avatar
      tracing: Add kprobe flag · 72cbbc89
      Alexei Starovoitov authored
      
      
      add TRACE_EVENT_FL_KPROBE flag to differentiate kprobe type of
      tracepoints, since bpf programs can only be attached to kprobe
      type of PERF_TYPE_TRACEPOINT perf events.
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Reviewed-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Reviewed-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1427312966-8434-3-git-send-email-ast@plumgrid.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      72cbbc89
    • Daniel Borkmann's avatar
      bpf: Make internal bpf API independent of CONFIG_BPF_SYSCALL #ifdefs · 4e537f7f
      Daniel Borkmann authored
      
      
      Socket filter code and other subsystems with upcoming eBPF
      support should not need to deal with the fact that we have
      CONFIG_BPF_SYSCALL defined or not.
      
      Having the bpf syscall as a config option is a nice thing and
      I'd expect it to stay that way for expert users (I presume one
      day the default setting of it might change, though), but code
      making use of it should not care if it's actually enabled or
      not.
      
      Instead, hide this via header files and let the rest deal with it.
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Reviewed-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/1427312966-8434-2-git-send-email-ast@plumgrid.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      4e537f7f
    • Ingo Molnar's avatar
      Merge branch 'perf/timer' into perf/core · 223aa646
      Ingo Molnar authored
      
      
      This WIP branch is now ready to be merged.
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      223aa646
  2. Apr 01, 2015
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo' of... · aaa9fa38
      Ingo Molnar authored
      
      Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      User visible changes:
      
       - Fix 'perf script' pipe mode segfault, by always initializing ordered_events in
         perf_session__new(). (Arnaldo Carvalho de Melo)
      
       - Fix ppid for synthesized fork events. (David Ahern)
      
       - Fix kernel symbol resolution of callchains in S/390 by remembering the
         cpumode. (David Hildenbrand)
      
      Infrastructure changes:
      
       - Disable libbabeltrace check by default in the build system. (Jiri Olsa)
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      aaa9fa38
    • Arnaldo Carvalho de Melo's avatar
      perf ordered_samples: Remove references to perf_{evlist,tool} and machines · 9870d780
      Arnaldo Carvalho de Melo authored
      As these can be obtained from the ordered_events pointer, via
      container_of, reducing the cross section of ordered_samples.
      
      These were added to ordered_samples in:
      
       commit b7b61cbe
      
      
       Author: Arnaldo Carvalho de Melo <acme@redhat.com>
       Date:   Tue Mar 3 11:58:45 2015 -0300
      
          perf ordered_events: Shorten function signatures
      
          By keeping pointers to machines, evlist and tool in ordered_events.
      
      But that was more a transitional patch while moving stuff out from
      perf_session.c to ordered_events.c and possibly not even needed by then,
      as we could use the container_of() method and instead of having the
      nr_unordered_samples stats in events_stats, we can have it in
      ordered_samples.
      
      Based-on-a-patch-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-4lk0t9js82g0tfc0x1onpkjt@git.kernel.org
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9870d780
    • Arnaldo Carvalho de Melo's avatar
      perf session: Always initialize ordered_events · aae59fab
      Arnaldo Carvalho de Melo authored
      
      
      Even when it is not used to actually reorder events, some of its fields
      are used, like session->ordered_events->tool, to shorten function
      signatures where tool, for instance, was being passed, as the tool is
      needed for the ordered_events code, we need it there and might as well
      use it for other perf_session needs.
      
      This fixes a problem where 'perf script' had some condition that made
      session->ordered_events not to be initialized even with its
      script->tool ordered_events related flags asking for it to be, which
      looks like another bug and needs to be investigated further.
      
      Always initializing session->ordered_events at least leaves the current
      assumptions in place, so do it now.
      
      Reported-by: default avatarDavid Ahern <dsahern@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Tested-by: default avatarDavid Ahern <dsahern@gmail.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-b1xxk0rwkz2a0gip1uufmjqg@git.kernel.org
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      aae59fab
    • David Ahern's avatar
      perf tools: Fix ppid for synthesized fork events · ca6c41c5
      David Ahern authored
      363b785f
      
       added synthesized fork events and set a thread's parent id to
      itself. Since we are already processing /proc/<pid>/status the ppid can
      be determined properly. Make it so.
      
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Acked-by: default avatarDon Zickus <dzickus@redhat.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Joe Mario <jmario@redhat.com>
      Link: http://lkml.kernel.org/r/1427747758-18510-2-git-send-email-dsahern@gmail.com
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ca6c41c5
    • David Ahern's avatar
      perf tools: Refactor comm/tgid lookup · 5aa0b030
      David Ahern authored
      
      
      Rather than parsing /proc/pid/status file one line at a time, read it
      into a buffer in one shot and search for all strings in one pass.
      
      tgid conversion also simplified -- removing the isspace walk. As noted
      by Arnaldo those are not needed for atoi == strtol calls.
      
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Acked-by: default avatarDon Zickus <dzickus@redhat.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Joe Mario <jmario@redhat.com>
      Link: http://lkml.kernel.org/r/1427747758-18510-1-git-send-email-dsahern@gmail.com
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      5aa0b030
    • David Hildenbrand's avatar
      perf callchain: Fix kernel symbol resolution by remembering the cpumode · 73dbcd65
      David Hildenbrand authored
      Commit 2e77784b
      
       ("perf callchain: Move cpumode resolve code to
      add_callchain_ip") promised "No change in behavior.".
      
      As this commit breaks callchains on s390x (symbols not getting resolved,
      observed when profiling the kernel), this statement is wrong. The cpumode
      must be kept when iterating over all ips, otherwise the default
      (PERF_RECORD_MISC_USER) will be used by error.
      
      Signed-off-by: default avatarDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1427703060-59883-1-git-send-email-dahi@linux.vnet.ibm.com
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      73dbcd65
  3. Mar 30, 2015
    • Jiri Olsa's avatar
      perf build: Disable libbabeltrace check by default · 6ab2b762
      Jiri Olsa authored
      
      
      Disabling libbabeltrace check by default and replacing the
      NO_LIBBABELTRACE make variable with LIBBABELTRACE.
      
      Users wanting the libbabeltrace feature need to build via:
      
        $ make LIBBABELTRACE=1
      
      The reason for this is that the libababeltrace interface we use (version
      1.3) hasn't been packaged/released yet, thus the failing feature check
      only slows down build and confuses other (non CTF) developers.
      
      Requested-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jeremie Galarneau <jgalar@efficios.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/20150328103030.GA8431@krava.redhat.com
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6ab2b762
  4. Mar 27, 2015
    • Peter Zijlstra's avatar
      perf: Add per event clockid support · 34f43927
      Peter Zijlstra authored
      
      
      While thinking on the whole clock discussion it occurred to me we have
      two distinct uses of time:
      
       1) the tracking of event/ctx/cgroup enabled/running/stopped times
          which includes the self-monitoring support in struct
          perf_event_mmap_page.
      
       2) the actual timestamps visible in the data records.
      
      And we've been conflating them.
      
      The first is all about tracking time deltas, nobody should really care
      in what time base that happens, its all relative information, as long
      as its internally consistent it works.
      
      The second however is what people are worried about when having to
      merge their data with external sources. And here we have the
      discussion on MONOTONIC vs MONOTONIC_RAW etc..
      
      Where MONOTONIC is good for correlating between machines (static
      offset), MONOTNIC_RAW is required for correlating against a fixed rate
      hardware clock.
      
      This means configurability; now 1) makes that hard because it needs to
      be internally consistent across groups of unrelated events; which is
      why we had to have a global perf_clock().
      
      However, for 2) it doesn't really matter, perf itself doesn't care
      what it writes into the buffer.
      
      The below patch makes the distinction between these two cases by
      adding perf_event_clock() which is used for the second case. It
      further makes this configurable on a per-event basis, but adds a few
      sanity checks such that we cannot combine events with different clocks
      in confusing ways.
      
      And since we then have per-event configurability we might as well
      retain the 'legacy' behaviour as a default.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      34f43927
    • Ingo Molnar's avatar
    • Ingo Molnar's avatar
      Merge branch 'timers/core' into perf/timer, to apply dependent patch · 4e6d7c2a
      Ingo Molnar authored
      
      
      An upcoming patch will depend on tai_ns() and NMI-safe ktime_get_raw_fast(),
      so merge timers/core here in a separate topic branch until it's all cooked
      and timers/core is merged upstream.
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      4e6d7c2a
    • Peter Zijlstra's avatar
      perf: Fix racy group access · ccd41c86
      Peter Zijlstra authored
      
      
      While looking at some fuzzer output I noticed that we do not hold any
      locks on leader->ctx and therefore the sibling_list iteration is
      unsafe.
      
      Acquire the relevant ctx->mutex before calling into the pmu specific
      code.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Link: http://lkml.kernel.org/r/20150225151639.GL5029@twins.programming.kicks-ass.net
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ccd41c86
    • David Ahern's avatar
      perf/x86: Remove redundant calls to perf_pmu_{dis|en}able() · 9332d250
      David Ahern authored
      
      
      perf_pmu_disable() is called before pmu->add() and perf_pmu_enable() is called
      afterwards. No need to call these inside of x86_pmu_add() as well.
      
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1424281543-67335-1-git-send-email-dsahern@gmail.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9332d250
    • Ingo Molnar's avatar
    • Ingo Molnar's avatar
    • Peter Zijlstra's avatar
      time: Add ktime_get_tai_ns() · fe5fba05
      Peter Zijlstra authored
      
      
      Because it was the only clock for which we didn't have a _ns()
      accessor yet.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      fe5fba05
    • Peter Zijlstra's avatar
      time: Introduce tk_fast_raw · f09cb9a1
      Peter Zijlstra authored
      
      
      Add the NMI safe CLOCK_MONOTONIC_RAW accessor..
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20150319093400.562746929@infradead.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f09cb9a1
    • Peter Zijlstra's avatar
      time: Parametrize all tk_fast_mono users · 4498e746
      Peter Zijlstra authored
      
      
      In preparation for more tk_fast instances, remove all hard-coded
      tk_fast_mono references.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20150319093400.484279927@infradead.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      4498e746
    • Peter Zijlstra's avatar
      time: Add timerkeeper::tkr_raw · 4a4ad80d
      Peter Zijlstra authored
      
      
      Introduce tkr_raw and make use of it.
      
        base_raw -> tkr_raw.base
        clock->{mult,shift} -> tkr_raw.{mult.shift}
      
      Kill timekeeping_get_ns_raw() in favour of
      timekeeping_get_ns(&tkr_raw), this removes all mono_raw special
      casing.
      
      Duplicate the updates to tkr_mono.cycle_last into tkr_raw.cycle_last,
      both need the same value.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20150319093400.422589590@infradead.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      4a4ad80d
    • Peter Zijlstra's avatar
      time: Rename timekeeper::tkr to timekeeper::tkr_mono · 876e7881
      Peter Zijlstra authored
      
      
      In preparation of adding another tkr field, rename this one to
      tkr_mono. Also rename tk_read_base::base_mono to tk_read_base::base,
      since the structure is not specific to CLOCK_MONOTONIC and the mono
      name got added to the tk_read_base instance.
      
      Lots of trivial churn.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20150319093400.344679419@infradead.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      876e7881
    • Andi Kleen's avatar
      perf/x86/intel: Add INST_RETIRED.ALL workarounds · 294fe0f5
      Andi Kleen authored
      
      
      On Broadwell INST_RETIRED.ALL cannot be used with any period
      that doesn't have the lowest 6 bits cleared. And the period
      should not be smaller than 128.
      
      This is erratum BDM11 and BDM55:
      
        http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/5th-gen-core-family-spec-update.pdf
      
      BDM11: When using a period < 100; we may get incorrect PEBS/PMI
      interrupts and/or an invalid counter state.
      BDM55: When bit0-5 of the period are !0 we may get redundant PEBS
      records on overflow.
      
      Add a new callback to enforce this, and set it for Broadwell.
      
      How does this handle the case when an app requests a specific
      period with some of the bottom bits set?
      
      Short answer:
      
      Any useful instruction sampling period needs to be 4-6 orders
      of magnitude larger than 128, as an PMI every 128 instructions
      would instantly overwhelm the system and be throttled.
      So the +-64 error from this is really small compared to the
      period, much smaller than normal system jitter.
      
      Long answer (by Peterz):
      
      IFF we guarantee perf_event_attr::sample_period >= 128.
      
      Suppose we start out with sample_period=192; then we'll set period_left
      to 192, we'll end up with left = 128 (we truncate the lower bits). We
      get an interrupt, find that period_left = 64 (>0 so we return 0 and
      don't get an overflow handler), up that to 128. Then we trigger again,
      at n=256. Then we find period_left = -64 (<=0 so we return 1 and do get
      an overflow). We increment with sample_period so we get left = 128. We
      fire again, at n=384, period_left = 0 (<=0 so we return 1 and get an
      overflow). And on and on.
      
      So while the individual interrupts are 'wrong' we get then with
      interval=256,128 in exactly the right ratio to average out at 192. And
      this works for everything >=128.
      
      So the num_samples*fixed_period thing is still entirely correct +- 127,
      which is good enough I'd say, as you already have that error anyhow.
      
      So no need to 'fix' the tools, al we need to do is refuse to create
      INST_RETIRED:ALL events with sample_period < 128.
      
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      [ Updated comments and changelog a bit. ]
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1424225886-18652-3-git-send-email-andi@firstfloor.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      294fe0f5
    • Andi Kleen's avatar
      perf/x86/intel: Add Broadwell core support · 91f1b705
      Andi Kleen authored
      
      
      Add Broadwell support for Broadwell to perf.
      
      The basic support is very similar to Haswell. We use the new cache
      event list added for Haswell earlier. The only differences
      are a few bits related to remote nodes. To avoid an extra,
      mostly identical, table these are patched up in the initialization code.
      
      The constraint list has one new event that needs to be handled over Haswell.
      
      Includes code and testing from Kan Liang.
      
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1424225886-18652-2-git-send-email-andi@firstfloor.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      91f1b705
    • Andi Kleen's avatar
      perf/x86/intel: Add new cache events table for Haswell · 0f1b5ca2
      Andi Kleen authored
      
      
      Haswell offcore events are quite different from Sandy Bridge.
      Add a new table to handle Haswell properly.
      
      Note that the offcore bits listed in the SDM are not quite correct
      (this is currently being fixed). An uptodate list of bits is
      in the patch.
      
      The basic setup is similar to Sandy Bridge. The prefetch columns
      have been removed, as prefetch counting is not very reliable
      on Haswell. One L1 event that is not in the event list anymore
      has been also removed.
      
      - data reads do not include code reads (comparable to earlier Sandy Bridge tables)
      - data counts include speculative execution (except L1 write, dtlb, bpu)
      - remote node access includes both remote memory, remote cache, remote mmio.
      - prefetches are not included in the counts for consistency
        (different from Sandy Bridge, which includes prefetches in the remote node)
      
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      [ Removed the HSM30 comments; we don't have them for SNB/IVB either. ]
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1424225886-18652-1-git-send-email-andi@firstfloor.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0f1b5ca2
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo' of... · 30fdaa6b
      Ingo Molnar authored
      
      Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      User visible changes:
      
        - Show the first event with an invalid filter (David Ahern, Arnaldo Carvalho de Melo)
      
        - Fix garbage output when intermixing syscalls from different threads in 'perf trace' (Arnaldo Carvalho de Melo)
      
        - Fix 'perf timechart' SIBGUS error on sparc64 (David Ahern)
      
      Infrastructure changes:
      
        - Set JOBS based on CPU or processor, making it work on SPARC, where
          /proc/cpuinfo has "CPU", not "processor" (David Ahern)
      
        - Zero should not be considered "not found" in libtraceevent's eval_flag() (Steven Rostedt)
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      30fdaa6b