Skip to content
  1. Dec 29, 2018
  2. Dec 21, 2018
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Do not hardcode the size of the tracepoint common_ fields · b9b6a2ea
      Arnaldo Carvalho de Melo authored
      We shouldn't hardcode the size of the tracepoint common_ fields, use the
      offset of the 'id'/'__syscallnr' field in the sys_enter event instead.
      
      This caused the augmented syscalls code to fail on a particular build of a
      PREEMPT_RT_FULL kernel where these extra 'common_migrate_disable' and
      'common_padding' fields were before the syscall id one:
      
        # cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/format
        name: sys_enter
        ID: 22
        format:
      	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
      	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
      	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
      	field:int common_pid;	offset:4;	size:4;	signed:1;
      	field:unsigned short common_migrate_disable;	offset:8;	size:2;	signed:0;
      	field:unsigned short common_padding;	offset:10;	size:2;	signed:0;
      
      	field:long id;	offset:16;	size:8;	signed:1;
      	field:unsigned long args[6];	offset:24;	size:48;	signed:0;
      
        print fmt: "NR %ld (%lx, %lx, %lx, %lx, %lx, %lx)", REC->id, REC->args[0], REC->args[1], REC->args[2], REC->args[3], REC->args[4], REC->args[5]
        #
      
      All those 'common_' prefixed fields are zeroed when they hit a BPF tracepoint
      hook, we better just discard those, i.e. somehow pass an offset to the
      BPF program from the start of the ctx and make adjustments in the 'perf trace'
      handlers to adjust the offset of the syscall arg offsets obtained from tracefs.
      
      Till then, fix it the quick way and add this to the augmented_raw_syscalls.c to
      bet it to work in such kernels:
      
        diff --git a/tools/perf/examples/bpf/augmented_raw_syscalls.c b/tools/perf/examples/bpf/augmented_raw_syscalls.c
        index 53c233370fae..1f746f931e13 100644
        --- a/tools/perf/examples/bpf/augmented_raw_syscalls.c
        +++ b/tools/perf/examples/bpf/augmented_raw_syscalls.c
        @@ -38,12 +38,14 @@ struct bpf_map SEC("maps") syscalls = {
      
         struct syscall_enter_args {
                unsigned long long common_tp_fields;
        +       long               rt_common_tp_fields;
                long               syscall_nr;
                unsigned long      args[6];
         };
      
         struct syscall_exit_args {
                unsigned long long common_tp_fields;
        +       long               rt_common_tp_fields;
                long               syscall_nr;
                long               ret;
         };
      
      Just to check that this was the case. Fix it properly later, for now remove the
      hardcoding of the offset in the 'perf trace' side and document the situation
      with this patch.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-2pqavrktqkliu5b9nzouio21@git.kernel.org
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b9b6a2ea
    • Stanislav Fomichev's avatar
      perf build: Don't unconditionally link the libbfd feature test to -liberty and -lz · 14541b1e
      Stanislav Fomichev authored
      Current libbfd feature test unconditionally links against -liberty and -lz.
      While it's required on some systems (e.g. opensuse), it's completely
      unnecessary on the others, where only -lbdf is sufficient (debian).
      This patch streamlines (and renames) the following feature checks:
      
      feature-libbfd           - only link against -lbfd (debian),
                                 see commit 2cf90407 ("perf tools: Fix bfd
      			   dependency libraries detection")
      feature-libbfd-liberty   - link against -lbfd and -liberty
      feature-libbfd-liberty-z - link against -lbfd, -liberty and -lz (opensuse),
                                 see commit 280e7c48
      
       ("perf tools: fix BFD
      			   detection on opensuse")
      
      (feature-liberty{,-z} were renamed to feature-libbfd-liberty{,z}
      for clarity)
      
      The main motivation is to fix this feature test for bpftool which is
      currently broken on debian (libbfd feature shows OFF, but we still
      unconditionally link against -lbfd and it works).
      
      Tested on debian with only -lbfd installed (without -liberty); I'd
      appreciate if somebody on the other systems can test this new detection
      method.
      
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/4dfc634cfcfb236883971b5107cf3c28ec8a31be.1542328222.git.sdf@google.com
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      14541b1e
    • Arnaldo Carvalho de Melo's avatar
      perf beauty mmap: PROT_WRITE should come before PROT_EXEC · 5ce29d52
      Arnaldo Carvalho de Melo authored
      
      
      To match strace output:
      
        # cat mmap.c
        #include <sys/mman.h>
      
        int main(void)
        {
      	  mmap(0, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
      	  return 0;
        }
        # strace -e mmap ./mmap |& grep -v ^+++
        mmap(NULL, 103484, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f5bae400000
        mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5bae3fe000
        mmap(NULL, 3889792, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5bade40000
        mmap(0x7f5bae1ec000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1ac000) = 0x7f5bae1ec000
        mmap(0x7f5bae1f2000, 14976, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f5bae1f2000
        mmap(NULL, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5bae419000
        # trace -e mmap ./mmap |& grep -v ^+++
        mmap(NULL, 103484, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f6646c25000
        mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f6646c23000
        mmap(NULL, 3889792, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f6646665000
        mmap(0x7f6646a11000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1ac000) = 0x7f6646a11000
        mmap(0x7f6646a17000, 14976, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f6646a17000
        mmap(NULL, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f6646c3e000
        #
      
      Reported-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-nt49d6iqle80cw8f529ovaqi@git.kernel.org
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      5ce29d52
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Check if the raw_syscalls:sys_{enter,exit} are setup before setting tp filter · f76214f9
      Arnaldo Carvalho de Melo authored
      While updating 'perf trace' on an machine with an old precompiled
      augmented_raw_syscalls.o that didn't setup the syscall map the new 'perf
      trace' codebase notices the augmented_raw_syscalls.o eBPF event, decides
      to use it instead of the old raw_syscalls:sys_{enter,exit} method, but
      then because we don't have the syscall map tries to set the tracepoint
      filter on the sys_{enter,exit} evsels, that are NULL, segfaulting.
      
      Make the code more robust by checking it those tracepoints have
      their respective evsels in place before trying to set the tp filter.
      
      With this we still get everything to work, just not setting up the
      syscall filters, which is better than a segfault. Now to update the
      precompiled augmented_raw_syscalls.o and continue development :-)
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-3ft5rjdl05wgz2pwpx2z8btu@git.kernel.org
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f76214f9
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo-4.21-20181218' of... · 883f4def
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo-4.21-20181218' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
       into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      - Implement BPF based syscall filtering in 'perf trace', using BPF maps and
        the augmented_raw_syscalls.c BPF proggie (Arnaldo Carvalho de Melo)
      
      - Allow specifying in .perfconfig a set of events use in 'perf trace' in
        addition to any other specified from the command line. This initially
        will be used to always use the augmented_raw_syscalls.o precompiled
        BPF program for getting pointer contents. (Arnaldo Carvalho de Melo)
      
      - Allow fine grained control about how the syscall output should be
        formatted. This will be used to allow producing the same output produced
        by the 'strace' tool, to then use in regression tests comparing the
        output of 'perf trace' with the one produced from 'strace' (Arnaldo Carvalho de Melo)
      
      - Beautify the renameat2 olddirfd, newdirfd and flags arguments (Arnaldo Carvalho de Melo)
      
      - Beautify arch_prctl 'code' syscall arg (Arnaldo Carvalho de Melo)
      
      - Beautify fadvise64 'advice' syscall arg (Arnaldo Carvalho de Melo)
      
      - Relax checks on perf-PID.map ownership, resulting in symbols in
        executable anonymous maps setup by JITs in things like node.js to
        be resolved in a 'perf top' session run by root without the need
        for --force to be used (Arnaldo Carvalho de Melo)
      
      - Update asm-generic/unistd.h copy (Arnaldo Carvalho de Melo)
      
      - Do not use the first and last symbols when setting up address filters in
        auxtrace, this fails when we don't have a symbol table, filter the entire
        area based on the dso size. (Adrian Hunter)
      
      - Do not use kernel headers to build libsubcmd, we shouldn't use
        anything from outside tools/, fixes the build with the Android NDK (Arnaldo Carvalho de Melo)
      
      - Add several prototypes for systems lacking those, such as open_memstream(),
        sigqueue(), fixing warnings building with Android's bionic libc that were
        preventing the use of -Werror there (Arnaldo Carvalho de Melo)
      
      - Use LDFLAGS in the libtraceevent build commands, allowing developers
        to override its values (Jiri Olsa)
      
      - Link libperf-jvmti.so with LDFLAGS variable, allowing distro
        packages to propagate its settings when building this library (Jiri Olsa)
      
      - cs-etm (ARM CoreSight) fixes: (Leo Yan)
      
        - Correct packets swapping in cs_etm__flush()
        - Avoid stale branch samples when flush packet
        - Remove unused 'trace_on' in cs_etm_decoder
        - Refactor enumeration cs_etm_sample_type
        - Rename CS_ETM_TRACE_ON to CS_ETM_DISCONTINUITY
        - Treat NO_SYNC element as trace discontinuity
        - Treat EO_TRACE element as trace discontinuity
        - Generate branch sample for exception packet
      
      - Use shebangs in the 'perf test' shell scripts, making them identifiable as
        shell scripts (Michael Petlan)
      
      - Avoid segfaults caused by negated options in 'perf stat' (Michael Petlan)
      
      - Fix processing of dereferenced args in bprintk events in libtracevent (Steven Rostedt)
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      883f4def
  3. Dec 19, 2018
  4. Dec 18, 2018
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Allow suppressing the syscall argument names · 9d6dc178
      Arnaldo Carvalho de Melo authored
      To show just the values:
      
      Default:
      
        # trace -e open*,close,*sleep sleep 1
        openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC           ) = 3
        close(fd: 3                                                           ) = 0
        openat(dfd: CWD, filename: /lib64/libc.so.6, flags: CLOEXEC           ) = 3
        close(fd: 3                                                           ) = 0
        openat(dfd: CWD, filename: /usr/lib/locale/locale-archive, flags: CLOEXEC) = 3
        close(fd: 3                                                           ) = 0
        nanosleep(rqtp: 0x7ffc0c4ea0d0, rmtp: 0                               ) = 0
        close(fd: 1                                                           ) = 0
        close(fd: 2                                                           ) = 0
        #
      
      Remove it:
      
        # perf config trace.show_arg_names=no
        # trace -e open*,close,*sleep sleep 1
        openat(CWD, /etc/ld.so.cache, CLOEXEC                                 ) = 3
        close(3                                                               ) = 0
        openat(CWD, /lib64/libc.so.6, CLOEXEC                                 ) = 3
        close(3                                                               ) = 0
        openat(CWD, /usr/lib/locale/locale-archive, CLOEXEC                   ) = 3
        close(3                                                               ) = 0
        nanosleep(0x7ffced3a8c40, 0                                           ) = 0
        close(1                                                               ) = 0
        close(2                                                               ) = 0
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-ta9tbdwgodpw719sr2bjm8eb@git.kernel.org
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9d6dc178
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Allow configuring if the syscall start timestamp should be printed · b036146f
      Arnaldo Carvalho de Melo authored
        # trace -e open*,close,*sleep sleep 1
           0.000 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC           ) = 3
           0.016 close(fd: 3                                                           ) = 0
           0.024 openat(dfd: CWD, filename: /lib64/libc.so.6, flags: CLOEXEC           ) = 3
           0.074 close(fd: 3                                                           ) = 0
           0.235 openat(dfd: CWD, filename: /usr/lib/locale/locale-archive, flags: CLOEXEC) = 3
           0.251 close(fd: 3                                                           ) = 0
           0.285 nanosleep(rqtp: 0x7ffd68e6d620, rmtp: 0                               ) = 0
        1000.386 close(fd: 1                                                           ) = 0
        1000.395 close(fd: 2                                                           ) = 0
        #
      
        # perf config trace.show_timestamp=no
        # trace -e open*,close,*sleep sleep 1
        openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC           ) = 3
        close(fd: 3                                                           ) = 0
        openat(dfd: CWD, filename: /lib64/libc.so.6, flags: CLOEXEC           ) = 3
        close(fd: 3                                                           ) = 0
        openat(dfd: CWD, filename: , flags: CLOEXEC                           ) = 3
        close(fd: 3                                                           ) = 0
        nanosleep(rqtp: 0x7fffa79c38e0, rmtp: 0                               ) = 0
        close(fd: 1                                                           ) = 0
        close(fd: 2                                                           ) = 0
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-mjjnicy48367jah6ls4k0nk8@git.kernel.org
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b036146f
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Allow configuring default for perf_event_attr.inherit · d32de87e
      Arnaldo Carvalho de Melo authored
      I.f. if children should inherit the parent perf_event configuration,
      i.e. if we should trace children as well or just the parent.
      
      The default is to follow children, to disable this and have a behaviour
      similar to strace, set this config option or use the --no_inherit 'perf
      trace' option.
      
      E.g.:
      
      Default:
      
        # perf config trace.no_inherit
        # trace -e clone,*sleep time sleep 1
           0.000 time/21107 clone(clone_flags: CHILD_CLEARTID|CHILD_SETTID|0x11, newsp: 0, child_tidptr: 0x7f7b8f9ae810) = 21108 (time)
               ? time/21108  ... [continued]: clone()
           0.691 sleep/21108 nanosleep(rqtp: 0x7ffed01d0540, rmtp: 0                               ) = 0
        0.00user 0.00system 0:01.00elapsed 0%CPU (0avgtext+0avgdata 1988maxresident)k
        0inputs+0outputs (0major+76minor)pagefaults 0swaps
        #
      
      Disable it:
      
        # trace -e clone,*sleep time sleep 1
           0.000 clone(clone_flags: CHILD_CLEARTID|CHILD_SETTID|0x11, newsp: 0, child_tidptr: 0x7ff41e100810) = 21414 (time)
        0.00user 0.00system 0:01.00elapsed 0%CPU (0avgtext+0avgdata 1964maxresident)k
        0inputs+0outputs (0major+76minor)pagefaults 0swaps
        #
      
      Notice that since there is just one thread, the "comm/TID" column is
      suppressed.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-thd8s16pagyza71ufi5vjlan@git.kernel.org
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d32de87e