Skip to content
  1. Aug 16, 2023
    • Ian Rogers's avatar
      perf bpf: Remove support for embedding clang for compiling BPF events (-e foo.c) · 56b11a21
      Ian Rogers authored
      
      
      This never was in the default build for perf, is difficult to maintain
      as it uses clang/llvm internals so ditch it, keeping, for now, the
      external compilation of .c BPF into .o bytecode and its subsequent
      loading, that is also going to be removed, do it separately to help
      bisection and to properly document what is being removed and why.
      
      Committer notes:
      
      Extracted from a larger patch and removed some leftovers, namely
      deleting these now unused feature tests:
      
          tools/build/feature/test-clang.cpp
          tools/build/feature/test-cxx.cpp
          tools/build/feature/test-llvm-version.cpp
          tools/build/feature/test-llvm.cpp
      
      Testing the use of BPF events after applying this patch:
      
      To use the external clang/llvm toolchain to compile a .c event and then
      use libbpf to load it, to get the syscalls:sys_enter_open* tracepoints
      and read the filename pointer, putting it into the ring buffer right
      after the usual tracepoint payload for 'perf trace' to then print it:
      
        [root@quaco ~]# perf trace -e /home/acme/git/perf-tools-next/tools/perf/examples/bpf/augmented_raw_syscalls.c,open* --max-events=10
           0.000 systemd-oomd/959 openat(dfd: CWD, filename: "/proc/meminfo", flags: RDONLY|CLOEXEC) = 12
           0.083 abrt-dump-jour/1453 openat(dfd: CWD, filename: "/var/log/journal/d6a97235307247e09f13f326fb607e3c/system.journal", flags: RDONLY|CLOEXEC|NONBLOCK) = 4
           0.063 abrt-dump-jour/1454 openat(dfd: CWD, filename: "/var/log/journal/d6a97235307247e09f13f326fb607e3c/system.journal", flags: RDONLY|CLOEXEC|NONBLOCK) = 4
           0.082 abrt-dump-jour/1455 openat(dfd: CWD, filename: "/var/log/journal/d6a97235307247e09f13f326fb607e3c/system.journal", flags: RDONLY|CLOEXEC|NONBLOCK) = 4
         250.124 systemd-oomd/959 openat(dfd: CWD, filename: "/proc/meminfo", flags: RDONLY|CLOEXEC) = 12
         250.521 systemd-oomd/959 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/memory.pressure", flags: RDONLY|CLOEXEC) = 12
         251.047 systemd-oomd/959 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/memory.current", flags: RDONLY|CLOEXEC) = 12
         251.162 systemd-oomd/959 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/memory.min", flags: RDONLY|CLOEXEC) = 12
         251.242 systemd-oomd/959 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/memory.low", flags: RDONLY|CLOEXEC) = 12
         251.353 systemd-oomd/959 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/memory.swap.current", flags: RDONLY|CLOEXEC) = 12
        [root@quaco ~]#
      
      Same thing, but with a prebuilt .o BPF bytecode:
      
        [root@quaco ~]# perf trace -e /home/acme/git/perf-tools-next/tools/perf/examples/bpf/augmented_raw_syscalls.o,open* --max-events=10
           0.000 systemd-oomd/959 openat(dfd: CWD, filename: "/proc/meminfo", flags: RDONLY|CLOEXEC) = 12
           0.083 abrt-dump-jour/1453 openat(dfd: CWD, filename: "/var/log/journal/d6a97235307247e09f13f326fb607e3c/system.journal", flags: RDONLY|CLOEXEC|NONBLOCK) = 4
           0.083 abrt-dump-jour/1455 openat(dfd: CWD, filename: "/var/log/journal/d6a97235307247e09f13f326fb607e3c/system.journal", flags: RDONLY|CLOEXEC|NONBLOCK) = 4
           0.062 abrt-dump-jour/1454 openat(dfd: CWD, filename: "/var/log/journal/d6a97235307247e09f13f326fb607e3c/system.journal", flags: RDONLY|CLOEXEC|NONBLOCK) = 4
         249.985 systemd-oomd/959 openat(dfd: CWD, filename: "/proc/meminfo", flags: RDONLY|CLOEXEC) = 12
         466.763 thermald/1234 openat(dfd: CWD, filename: "/sys/class/powercap/intel-rapl/intel-rapl:0/intel-rapl:0:2/energy_uj") = 13
         467.145 thermald/1234 openat(dfd: CWD, filename: "/sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj") = 13
         467.311 thermald/1234 openat(dfd: CWD, filename: "/sys/class/thermal/thermal_zone2/temp") = 13
         500.040 cgroupify/24006 openat(dfd: 4, filename: ".", flags: RDONLY|CLOEXEC|DIRECTORY|NONBLOCK) = 5
         500.295 cgroupify/24006 openat(dfd: 4, filename: "24616/cgroup.procs") = 5
        [root@quaco ~]#
      
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrii Nakryiko <andrii@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Carsten Haitzler <carsten.haitzler@arm.com>
      Cc: Eduard Zingerman <eddyz87@gmail.com>
      Cc: Fangrui Song <maskray@google.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
      Cc: Tom Rix <trix@redhat.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Wang ShaoBo <bobo.shaobowang@huawei.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Cc: Yonghong Song <yhs@fb.com>
      Cc: YueHaibing <yuehaibing@huawei.com>
      Link: https://lore.kernel.org/lkml/ZNZWsAXg2px1sm2h@kernel.org
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      56b11a21
    • Arnaldo Carvalho de Melo's avatar
      perf tests trace+probe_vfs_getname.sh: Accept quotes surrounding the filename · 6f769c34
      Arnaldo Carvalho de Melo authored
      With augmented_raw_syscalls transformed into a BPF skel made the output have a "
      around the filenames, which is not what the old perf probe vfs_getname
      method of obtaining filenames did, so accept the augmented way, with the
      quotes.
      
      At this point probably removing all the logic for the vfs_getname method
      is in order, will do it at some point.
      
      For now lets accept with/without quotes and make that test pass.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/lkml/
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6f769c34
    • Arnaldo Carvalho de Melo's avatar
      perf test trace+probe_vfs_getname.sh: Remove stray \ before / · 7777ac3d
      Arnaldo Carvalho de Melo authored
      Running on fedora:38 in verbose mode I noticed:
      
        # perf test -v 117
        grep: warning: stray \ before /
        117: Check open filename arg using perf trace + vfs_getname          :
      
      Remove that \ before /.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/lkml/ZNvTDsSMO3nw9Tnp@kernel.org
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7777ac3d
  2. Aug 12, 2023
    • Ian Rogers's avatar
      perf script python: Add stub for PMU symbol to the python binding · 33d9c506
      Ian Rogers authored
      
      
      Fix missing symbol seen in:
      
        ```
         19: 'import perf' in python                                         :
        --- start ---
        test child forked, pid 2640936
        python usage test: "echo "import sys ; sys.path.insert(0, 'python'); import perf" | '/usr/bin/python3' "
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
        ImportError: tools/perf/python/perf.cpython-311-x86_64-linux-gnu.so: undefined symbol: perf_pmus__supports_extended_type
        test child finished with -1
        ---- end ----
        'import perf' in python: FAILED!
        ```
      
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Link: https://lore.kernel.org/r/20230810180944.2794188-1-irogers@google.com
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      33d9c506
  3. Aug 11, 2023
    • Athira Rajeev's avatar
      perf symbols: Fix DSO kernel load and symbol process to correctly map DSO to... · e59fea47
      Athira Rajeev authored
      
      perf symbols: Fix DSO kernel load and symbol process to correctly map DSO to its long_name, type and adjust_symbols
      
      Test "object code reading" fails sometimes for kernel address as below:
      
          Reading object code for memory address: 0xc000000000004c3c
          File is: [kernel.kallsyms]
          On file address is: 0x14c3c
          dso__data_read_offset failed
          test child finished with -1
          ---- end ----
          Object code reading: FAILED!
      
      Here dso__data_read_offset() fails for symbol address
      0xc000000000004c3c. This is because the DSO long_name here is
      "[kernel.kallsyms]" and hence open_dso() fails to open this file. There
      is an incorrect DSO to map handling here. The key points here are:
      
      - The DSO long_name is set to "[kernel.kallsyms]". This file is
        not present and hence returns error
      - The DSO binary type is set to DSO_BINARY_TYPE__NOT_FOUND
      - The DSO adjust_symbols member is set to zero
      
      In the end dso__data_read_offset() returns -1 and the address 0x14c3c
      can not be resolved. Hence the test fails. But the address actually maps
      to the kernel DSO
      
          # objdump -z -d --start-address=0xc000000000004c3c --stop-address=0xc000000000004cbc /home/athira/linux/vmlinux
      
          /home/athira/linux/vmlinux:     file format elf64-powerpcle
      
          Disassembly of section .head.text:
      
          c000000000004c3c <exc_virt_0x4c00_system_call+0x3c>:
          c000000000004c3c:	a6 02 9b 7d 	mfsrr1  r12
          c000000000004c40:	78 13 42 7c 	mr      r2,r2
          c000000000004c44:	18 00 4d e9 	ld      r10,24(r13)
          c000000000004c48:	60 c6 4a 61 	ori     r10,r10,50784
          c000000000004c4c:	a6 03 49 7d 	mtctr   r10
      
      Fix dso__process_kernel_symbol() to set the binary_type and
      adjust_symbols members. dso->adjust_symbols is used by
      map__rip_2objdump() which converts the symbol start address to the
      objdump address. Also set dso->long_name in dso__load_vmlinux().
      
      Suggested-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarAthira Rajeev <atrajeev@linux.vnet.ibm.com>
      Acked-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: https://lore.kernel.org/r/20230811051546.70039-1-atrajeev@linux.vnet.ibm.com
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e59fea47
    • Arnaldo Carvalho de Melo's avatar
      perf build: Remove -Wno-unused-but-set-variable from the flex flags when... · 878460e8
      Arnaldo Carvalho de Melo authored
      perf build: Remove -Wno-unused-but-set-variable from the flex flags when building with clang < 13.0.0
      
      clang < 13.0.0 doesn't grok -Wno-unused-but-set-variable, so just remove
      it to avoid:
      
        error: unknown warning option '-Wno-unused-but-set-variable'; did you mean '-Wno-unused-const-variable'? [-Werror,-Wunknown-warning-option]
        make[4]: *** [/git/perf-6.5.0-rc4/tools/build/Makefile.build:128: /tmp/build/perf/util/pmu-flex.o] Error 1
        make[4]: *** Waiting for unfinished jobs....
      
      Fixes: ddc8e4c9 ("perf build: Disable fewer bison warnings")
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/lkml/ZNUSWr52jUnVaaa%2F@kernel.org/
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      878460e8
  4. Aug 10, 2023
  5. Aug 09, 2023
  6. Aug 08, 2023
    • Ivan Babrou's avatar
      perf script: Print "cgroup" field on the same line as "comm" · 8c49c6e1
      Ivan Babrou authored
      Commit 3fd7a168 ("perf script: Add 'cgroup' field for output")
      added support for printing cgroup path in perf script output.
      
      It was okay if you didn't want any stacks:
      
          $ sudo perf script --comms jpegtran:23f4bf -F comm,tid,cpu,time,cgroup
          jpegtran:23f4bf 3321915 [013] 404718.587488:  /idle.slice/polish.service
          jpegtran:23f4bf 3321915 [031] 404718.592073:  /idle.slice/polish.service
      
      With stacks it gets messier as cgroup is printed after the stack:
      
          $ perf script --comms jpegtran:23f4bf -F comm,tid,cpu,time,cgroup,ip,sym
          jpegtran:23f4bf 3321915 [013] 404718.587488:
                          5c554 compress_output
                          570d9 jpeg_finish_compress
                          3476e jpegtran_main
                          330ee jpegtran::main
                          326e2 core::ops::function::FnOnce::call_once (inlined)
                          326e2 std::sys_common::backtrace::__rust_begin_short_backtrace
          /idle.slice/polish.service
          jpegtran:23f4bf 3321915 [031] 404718.592073:
                          8474d jsimd_encode_mcu_AC_first_prepare_sse2.PADDING
                      55af68e62fff [unknown]
          /idle.slice/polish.service
      
      Let's instead print cgroup on the same line as comm:
      
          $ perf script --comms jpegtran:23f4bf -F comm,tid,cpu,time,cgroup,ip,sym
          jpegtran:23f4bf 3321915 [013] 404718.587488:  /idle.slice/polish.service
                          5c554 compress_output
                          570d9 jpeg_finish_compress
                          3476e jpegtran_main
                          330ee jpegtran::main
                          326e2 core::ops::function::FnOnce::call_once (inlined)
                          326e2 std::sys_common::backtrace::__rust_begin_short_backtrace
      
          jpegtran:23f4bf 3321915 [031] 404718.592073:  /idle.slice/polish.service
                          8474d jsimd_encode_mcu_AC_first_prepare_sse2.PADDING
                      55af68e62fff [unknown]
      
      Fixes: 3fd7a168
      
       ("perf script: Add 'cgroup' field for output")
      Signed-off-by: default avatarIvan Babrou <ivan@cloudflare.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: kernel-team@cloudflare.com
      Link: https://lore.kernel.org/r/20230718000737.49077-1-ivan@cloudflare.com
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8c49c6e1
    • Arnaldo Carvalho de Melo's avatar
      tools arch x86: Sync the msr-index.h copy with the kernel sources · 8cdd4aef
      Arnaldo Carvalho de Melo authored
      To pick up the changes from these csets:
      
        522b1d69 ("x86/cpu/amd: Add a Zenbleed fix")
      
      That cause no changes to tooling:
      
        $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
        $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
        $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
        $ diff -u before after
        $
      
      Just silences this perf build warning:
      
        Warning: Kernel ABI header differences:
          diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov (AMD) <bp@alien8.de>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/lkml/ZND17H7BI4ariERn@kernel.org
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8cdd4aef
    • Arnaldo Carvalho de Melo's avatar
      Revert "perf report: Append inlines to non-DWARF callchains" · c0b06758
      Arnaldo Carvalho de Melo authored
      This reverts commit 46d21ec0
      
      .
      
      The tests were made with a specific workload, further tests on a
      recently updated fedora 38 system with a system wide perf.data file
      shows 'perf report' taking excessive time resolving inlines in vmlinux,
      so lets revert this until a full investigation and improvement on the
      addr2line support code is made.
      
      Reported-by: default avatarJesper Dangaard Brouer <hawk@kernel.org>
      Acked-by: default avatarArtem Savkov <asavkov@redhat.com>
      Tested-by: default avatarJesper Dangaard Brouer <hawk@kernel.org>
      Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/ZMl8VyhdwhClTM5g@kernel.org
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c0b06758
    • Linus Torvalds's avatar
      Merge tag 'xsa432-6.5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · da703fe9
      Linus Torvalds authored
      Pull xen netback buffer overflow fix from Juergen Gross:
       "The fix for XSA-423 added logic to Linux'es netback driver to deal
        with a frontend splitting a packet in a way such that not all of the
        headers would come in one piece.
      
        Unfortunately the logic introduced there didn't account for the
        extreme case of the entire packet being split into as many pieces as
        permitted by the protocol, yet still being smaller than the area
        that's specially dealt with to keep all (possible) headers together.
      
        Such an unusual packet would therefore trigger a buffer overrun in the
        driver"
      
      * tag 'xsa432-6.5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen/netback: Fix buffer overrun triggered by unusual packet
      da703fe9
    • Linus Torvalds's avatar
      Merge tag 'gds-for-linus-2023-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 64094e7e
      Linus Torvalds authored
      Pull x86/gds fixes from Dave Hansen:
       "Mitigate Gather Data Sampling issue:
      
         - Add Base GDS mitigation
      
         - Support GDS_NO under KVM
      
         - Fix a documentation typo"
      
      * tag 'gds-for-linus-2023-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        Documentation/x86: Fix backwards on/off logic about YMM support
        KVM: Add GDS_NO support to KVM
        x86/speculation: Add Kconfig option for GDS
        x86/speculation: Add force option to GDS mitigation
        x86/speculation: Add Gather Data Sampling mitigation
      64094e7e
    • Linus Torvalds's avatar
      Merge tag 'x86_bugs_srso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 138bcddb
      Linus Torvalds authored
      Pull x86/srso fixes from Borislav Petkov:
       "Add a mitigation for the speculative RAS (Return Address Stack)
        overflow vulnerability on AMD processors.
      
        In short, this is yet another issue where userspace poisons a
        microarchitectural structure which can then be used to leak privileged
        information through a side channel"
      
      * tag 'x86_bugs_srso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/srso: Tie SBPB bit setting to microcode patch detection
        x86/srso: Add a forgotten NOENDBR annotation
        x86/srso: Fix return thunks in generated code
        x86/srso: Add IBPB on VMEXIT
        x86/srso: Add IBPB
        x86/srso: Add SRSO_NO support
        x86/srso: Add IBPB_BRTYPE support
        x86/srso: Add a Speculative RAS Overflow mitigation
        x86/bugs: Increase the x86 bugs vector size to two u32s
      138bcddb
    • Linus Torvalds's avatar
      Merge tag 'wq-for-6.5-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq · 14f9643d
      Linus Torvalds authored
      Pull workqueue fixes from Tejun Heo:
      
       - The recently added cpu_intensive auto detection and warning mechanism
         was spuriously triggered on slow CPUs.
      
         While not causing serious issues, it's still a nuisance and can cause
         unintended concurrency management behaviors.
      
         Relax the threshold on machines with lower BogoMIPS. While BogoMIPS
         is not an accurate measure of performance by most measures, we don't
         have to be accurate and it has rough but strong enough correlation.
      
       - A correction in Kconfig help text
      
      * tag 'wq-for-6.5-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
        workqueue: Scale up wq_cpu_intensive_thresh_us if BogoMIPS is below 4000
        workqueue: Fix cpu_intensive_thresh_us name in help text
      14f9643d
    • Linus Torvalds's avatar
      Merge tag 'tpmdd-v6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd · 8043e222
      Linus Torvalds authored
      Pull tpm fixes from Jarkko Sakkinen:
       "A few more bug fixes"
      
      * tag 'tpmdd-v6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
        tpm/tpm_tis: Disable interrupts for Lenovo P620 devices
        tpm: Disable RNG for all AMD fTPMs
        sysctl: set variable key_sysctls storage-class-specifier to static
        tpm/tpm_tis: Disable interrupts for TUXEDO InfinityBook S 15/17 Gen7
      8043e222
    • Arnaldo Carvalho de Melo's avatar
      perf probe: Make synthesize_perf_probe_point() private to probe-event.c · aeb50d3f
      Arnaldo Carvalho de Melo authored
      Not used in any other place, so just make it static.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/lkml/ZM0pjfOe6R4X%2Fcql@kernel.org/
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      aeb50d3f
    • Arnaldo Carvalho de Melo's avatar
      perf probe: Free string returned by synthesize_perf_probe_point() on failure... · a612bbf8
      Arnaldo Carvalho de Melo authored
      perf probe: Free string returned by synthesize_perf_probe_point() on failure in synthesize_perf_probe_command()
      
      Building perf with EXTRA_CFLAGS="-fsanitize=address" a leak was detected
      elsewhere and lead to an audit, where we found that
      synthesize_perf_probe_command() may leak synthesize_perf_probe_point()
      return on failure, fix it.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/lkml/ZM0mzpQktHnhXJXr@kernel.org
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a612bbf8
    • Arnaldo Carvalho de Melo's avatar
      perf probe: Free string returned by synthesize_perf_probe_point() on failure to add a probe · 7bc0153c
      Arnaldo Carvalho de Melo authored
      
      
      Building perf with EXTRA_CFLAGS="-fsanitize=address" a leak is detect
      when trying to add a probe to a non-existent function:
      
        # perf probe -x ~/bin/perf dso__neW
        Probe point 'dso__neW' not found.
          Error: Failed to add events.
      
        =================================================================
        ==296634==ERROR: LeakSanitizer: detected memory leaks
      
        Direct leak of 128 byte(s) in 1 object(s) allocated from:
            #0 0x7f67642ba097 in calloc (/lib64/libasan.so.8+0xba097)
            #1 0x7f67641a76f1 in allocate_cfi (/lib64/libdw.so.1+0x3f6f1)
      
        Direct leak of 65 byte(s) in 1 object(s) allocated from:
            #0 0x7f67642b95b5 in __interceptor_realloc.part.0 (/lib64/libasan.so.8+0xb95b5)
            #1 0x6cac75 in strbuf_grow util/strbuf.c:64
            #2 0x6ca934 in strbuf_init util/strbuf.c:25
            #3 0x9337d2 in synthesize_perf_probe_point util/probe-event.c:2018
            #4 0x92be51 in try_to_find_probe_trace_events util/probe-event.c:964
            #5 0x93d5c6 in convert_to_probe_trace_events util/probe-event.c:3512
            #6 0x93d6d5 in convert_perf_probe_events util/probe-event.c:3529
            #7 0x56f37f in perf_add_probe_events /var/home/acme/git/perf-tools-next/tools/perf/builtin-probe.c:354
            #8 0x572fbc in __cmd_probe /var/home/acme/git/perf-tools-next/tools/perf/builtin-probe.c:738
            #9 0x5730f2 in cmd_probe /var/home/acme/git/perf-tools-next/tools/perf/builtin-probe.c:766
            #10 0x635d81 in run_builtin /var/home/acme/git/perf-tools-next/tools/perf/perf.c:323
            #11 0x6362c1 in handle_internal_command /var/home/acme/git/perf-tools-next/tools/perf/perf.c:377
            #12 0x63667a in run_argv /var/home/acme/git/perf-tools-next/tools/perf/perf.c:421
            #13 0x636b8d in main /var/home/acme/git/perf-tools-next/tools/perf/perf.c:537
            #14 0x7f676302950f in __libc_start_call_main (/lib64/libc.so.6+0x2950f)
      
        SUMMARY: AddressSanitizer: 193 byte(s) leaked in 2 allocation(s).
        #
      
      synthesize_perf_probe_point() returns a "detachec" strbuf, i.e. a
      malloc'ed string that needs to be free'd.
      
      An audit will be performed to find other such cases.
      
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/lkml/ZM0l1Oxamr4SVjfY@kernel.org
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7bc0153c
    • Jonathan McDowell's avatar
      tpm/tpm_tis: Disable interrupts for Lenovo P620 devices · e117e7ad
      Jonathan McDowell authored
      The Lenovo ThinkStation P620 suffers from an irq storm issue like various
      other Lenovo machines, so add an entry for it to tpm_tis_dmi_table and
      force polling.
      
      It is worth noting that 481c2d14 (tpm,tpm_tis: Disable interrupts after
      1000 unhandled IRQs) does not seem to fix the problem on this machine, but
      setting 'tpm_tis.interrupts=0' on the kernel command line does.
      
      [jarkko@kernel.org: truncated the commit ID in the description to 12
      characters]
      Cc: stable@vger.kernel.org # v6.4+
      Fixes: e644b2f4
      
       ("tpm, tpm_tis: Enable interrupt test")
      Signed-off-by: default avatarJonathan McDowell <noodles@meta.com>
      Reviewed-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      Signed-off-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      e117e7ad
    • Mario Limonciello's avatar
      tpm: Disable RNG for all AMD fTPMs · 554b841d
      Mario Limonciello authored
      The TPM RNG functionality is not necessary for entropy when the CPU
      already supports the RDRAND instruction. The TPM RNG functionality
      was previously disabled on a subset of AMD fTPM series, but reports
      continue to show problems on some systems causing stutter root caused
      to TPM RNG functionality.
      
      Expand disabling TPM RNG use for all AMD fTPMs whether they have versions
      that claim to have fixed or not. To accomplish this, move the detection
      into part of the TPM CRB registration and add a flag indicating that
      the TPM should opt-out of registration to hwrng.
      
      Cc: stable@vger.kernel.org # 6.1.y+
      Fixes: b006c439 ("hwrng: core - start hwrng kthread also for untrusted sources")
      Fixes: f1324bbc
      
       ("tpm: disable hwrng for fTPM on some AMD designs")
      Reported-by: default avatar <daniil.stas@posteo.net>
      Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217719
      
      
      Reported-by: default avatar <bitlord0xff@gmail.com>
      Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217212
      
      
      Signed-off-by: default avatarMario Limonciello <mario.limonciello@amd.com>
      Reviewed-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      Signed-off-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      554b841d
    • Tom Rix's avatar
      sysctl: set variable key_sysctls storage-class-specifier to static · 0de030b3
      Tom Rix authored
      
      
      smatch reports
      security/keys/sysctl.c:12:18: warning: symbol
        'key_sysctls' was not declared. Should it be static?
      
      This variable is only used in its defining file, so it should be static.
      
      Signed-off-by: default avatarTom Rix <trix@redhat.com>
      Signed-off-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      0de030b3
    • Takashi Iwai's avatar
      tpm/tpm_tis: Disable interrupts for TUXEDO InfinityBook S 15/17 Gen7 · 0b15afc9
      Takashi Iwai authored
      TUXEDO InfinityBook S 15/17 Gen7 suffers from an IRQ problem on
      tpm_tis like a few other laptops.  Add an entry for the workaround.
      
      Cc: stable@vger.kernel.org
      Fixes: e644b2f4 ("tpm, tpm_tis: Enable interrupt test")
      Link: https://bugzilla.suse.com/show_bug.cgi?id=1213645
      
      
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Acked-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      Signed-off-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      0b15afc9
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · a027b2ec
      Linus Torvalds authored
      Pull kvm fixes from Paolo Bonzini:
       "x86:
      
         - Fix SEV race condition
      
        ARM:
      
         - Fixes for the configuration of SVE/SME traps when hVHE mode is in
           use
      
         - Allow use of pKVM on systems with FF-A implementations that are
           v1.0 compatible
      
         - Request/release percpu IRQs (arch timer, vGIC maintenance)
           correctly when pKVM is in use
      
         - Fix function prototype after __kvm_host_psci_cpu_entry() rename
      
         - Skip to the next instruction when emulating writes to TCR_EL1 on
           AmpereOne systems
      
        Selftests:
      
         - Fix missing include"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        selftests/rseq: Fix build with undefined __weak
        KVM: SEV: remove ghcb variable declarations
        KVM: SEV: only access GHCB fields once
        KVM: SEV: snapshot the GHCB before accessing it
        KVM: arm64: Skip instruction after emulating write to TCR_EL1
        KVM: arm64: fix __kvm_host_psci_cpu_entry() prototype
        KVM: arm64: Fix resetting SME trap values on reset for (h)VHE
        KVM: arm64: Fix resetting SVE trap values on reset for hVHE
        KVM: arm64: Use the appropriate feature trap register when activating traps
        KVM: arm64: Helper to write to appropriate feature trap register based on mode
        KVM: arm64: Disable SME traps for (h)VHE at setup
        KVM: arm64: Use the appropriate feature trap register for SVE at EL2 setup
        KVM: arm64: Factor out code for checking (h)VHE mode into a macro
        KVM: arm64: Rephrase percpu enable/disable tracking in terms of hyp
        KVM: arm64: Fix hardware enable/disable flows for pKVM
        KVM: arm64: Allow pKVM on v1.0 compatible FF-A implementations
      a027b2ec
    • Linus Torvalds's avatar
      Merge tag 'mmc-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · 016ce297
      Linus Torvalds authored
      Pull MMC fixes from Ulf Hansson:
      
       - moxart: Fix big-endian conversion for SCR structure
      
       - sdhci-f-sdh30: Replace with sdhci_pltfm to fix PM support
      
      * tag 'mmc-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
        mmc: sdhci-f-sdh30: Replace with sdhci_pltfm
        mmc: moxart: read scr register without changing byte order
      016ce297
    • Bob Peterson's avatar
      gfs2: Don't use filemap_splice_read · 0be84321
      Bob Peterson authored
      Starting with patch 2cb1e089, gfs2 started using the new function
      filemap_splice_read rather than the old (and subsequently deleted)
      function generic_file_splice_read.
      
      filemap_splice_read works by taking references to a number of folios in
      the page cache and splicing those folios into a pipe.  The folios are
      then read from the pipe and the folio references are dropped.  This can
      take an arbitrary amount of time.  We cannot allow that in gfs2 because
      those folio references will pin the inode glock to the node and prevent
      it from being demoted, which can lead to cluster-wide deadlocks.
      
      Instead, use copy_splice_read.
      
      (In addition, the old generic_file_splice_read called into ->read_iter,
      which called gfs2_file_read_iter, which took the inode glock during the
      operation.  The new filemap_splice_read interface does not take the
      inode glock anymore.  This is fixable, but it still wouldn't prevent
      cluster-wide deadlocks.)
      
      Fixes: 2cb1e089
      
       ("splice: Use filemap_splice_read() instead of generic_file_splice_read()")
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      0be84321
    • Andreas Gruenbacher's avatar
      gfs2: Fix freeze consistency check in gfs2_trans_add_meta · 2cbd8064
      Andreas Gruenbacher authored
      
      
      Function gfs2_trans_add_meta() checks for the SDF_FROZEN flag to make
      sure that no buffers are added to a transaction while the filesystem is
      frozen.  With the recent freeze/thaw rework, the SDF_FROZEN flag is
      cleared after thaw_super() is called, which is sufficient for
      serializing freeze/thaw.
      
      However, other filesystem operations started after thaw_super() may now
      be calling gfs2_trans_add_meta() before the SDF_FROZEN flag is cleared,
      which will trigger the SDF_FROZEN check in gfs2_trans_add_meta().  Fix
      that by checking the s_writers.frozen state instead.
      
      In addition, make sure not to call gfs2_assert_withdraw() with the
      sd_log_lock spin lock held.  Check for a withdrawn filesystem before
      checking for a frozen filesystem, and don't pin/add buffers to the
      current transaction in case of a failure in either case.
      
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      2cbd8064
  7. Aug 07, 2023
  8. Aug 06, 2023
    • Christian Brauner's avatar
      fs: rely on ->iterate_shared to determine f_pos locking · 7d84d1b9
      Christian Brauner authored
      
      
      Now that we removed ->iterate we don't need to check for either
      ->iterate or ->iterate_shared in file_needs_f_pos_lock(). Simply check
      for ->iterate_shared instead. This will tell us whether we need to
      unconditionally take the lock. Not just does it allow us to avoid
      checking f_inode's mode it also actually clearly shows that we're
      locking because of readdir.
      
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      7d84d1b9
    • Linus Torvalds's avatar
      vfs: get rid of old '->iterate' directory operation · 3e327154
      Linus Torvalds authored
      
      
      All users now just use '->iterate_shared()', which only takes the
      directory inode lock for reading.
      
      Filesystems that never got convered to shared mode now instead use a
      wrapper that drops the lock, re-takes it in write mode, calls the old
      function, and then downgrades the lock back to read mode.
      
      This way the VFS layer and other callers no longer need to care about
      filesystems that never got converted to the modern era.
      
      The filesystems that use the new wrapper are ceph, coda, exfat, jfs,
      ntfs, ocfs2, overlayfs, and vboxsf.
      
      Honestly, several of them look like they really could just iterate their
      directories in shared mode and skip the wrapper entirely, but the point
      of this change is to not change semantics or fix filesystems that
      haven't been fixed in the last 7+ years, but to finally get rid of the
      dual iterators.
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      3e327154
    • Linus Torvalds's avatar
      proc: fix missing conversion to 'iterate_shared' · 0a2c2baa
      Linus Torvalds authored
      I'm looking at the directory handling due to the discussion about f_pos
      locking (see commit 79796425: "file: reinstate f_pos locking
      optimization for regular files"), and wanting to clean that up.
      
      And one source of ugliness is how we were supposed to move filesystems
      over to the '->iterate_shared()' function that only takes the inode lock
      for reading many many years ago, but several filesystems still use the
      bad old '->iterate()' that takes the inode lock for exclusive access.
      
      See commit 61922694 ("introduce a parallel variant of ->iterate()")
      that also added some documentation stating
      
            Old method is only used if the new one is absent; eventually it will
            be removed.  Switch while you still can; the old one won't stay.
      
      and that was back in April 2016.  Here we are, many years later, and the
      old version is still clearly sadly alive and well.
      
      Now, some of those old style iterators are probably just because the
      filesystem may end up having per-inode mutable data that it uses for
      iterating a directory, but at least one case is just a mistake.
      
      Al switched over most filesystems to use '->iterate_shared()' back when
      it was introduced.  In particular, the /proc filesystem was converted as
      one of the first ones in commit f50752ea ("switch all procfs
      directories ->iterate_shared()").
      
      But then later one new user of '->iterate()' was then re-introduced by
      commit 6d9c939d
      
       ("procfs: add smack subdir to attrs").
      
      And that's clearly not what we wanted, since that new case just uses the
      same 'proc_pident_readdir()' and 'proc_pident_lookup()' helper functions
      that other /proc pident directories use, and they are most definitely
      safe to use with the inode lock held shared.
      
      So just fix it.
      
      This still leaves a fair number of oddball filesystems using the
      old-style directory iterator (ceph, coda, exfat, jfs, ntfs, ocfs2,
      overlayfs, and vboxsf), but at least we don't have any remaining in the
      core filesystems.
      
      I'm going to add a wrapper function that just drops the read-lock and
      takes it as a write lock, so that we can clean up the core vfs layer and
      make all the ugly 'this filesystem needs exclusive inode locking' be
      just filesystem-internal warts.
      
      I just didn't want to make that conversion when we still had a core user
      left.
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      0a2c2baa