Skip to content
  1. Apr 27, 2024
  2. Apr 17, 2024
    • Greg Kroah-Hartman's avatar
      Linux 5.15.156 · c52b9710
      Greg Kroah-Hartman authored
      
      
      Link: https://lore.kernel.org/r/20240415141942.235939111@linuxfoundation.org
      Tested-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Tested-by: default avatarKelsey Steele <kelseysteele@linux.microsoft.com>
      Tested-by: default avatarMark Brown <broonie@kernel.org>
      Tested-by: default avatarRon Economos <re@w6rz.net>
      Tested-by: default avatarHarshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
      Tested-by: default avatarJon Hunter <jonathanh@nvidia.com>
      Tested-by: default avatarLinux Kernel Functional Testing <lkft@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      v5.15.156
      c52b9710
    • Ville Syrjälä's avatar
      drm/i915/cdclk: Fix CDCLK programming order when pipes are active · 88168b94
      Ville Syrjälä authored
      commit 7b1f6b5a upstream.
      
      Currently we always reprogram CDCLK from the
      intel_set_cdclk_pre_plane_update() when using squash/crawl.
      The code only works correctly for the cd2x update or full
      modeset cases, and it was simply never updated to deal with
      squash/crawl.
      
      If the CDCLK frequency is increasing we must reprogram it
      before we do anything else that might depend on the new
      higher frequency, and conversely we must not decrease
      the frequency until everything that might still depend
      on the old higher frequency has been dealt with.
      
      Since cdclk_state->pipe is only relevant when doing a cd2x
      update we can't use it to determine the correct sequence
      during squash/crawl. To that end introduce cdclk_state->disable_pipes
      which simply indicates that we must perform the update
      while the pipes are disable (ie. during
      intel_set_cdclk_pre_plane_update()). Otherwise we use the
      same old vs. new CDCLK frequency comparsiong as for cd2x
      updates.
      
      The only remaining problem case is when the voltage_level
      needs to increase due to a DDI port, but the CDCLK frequency
      is decreasing (and not all pipes are being disabled). The
      current approach will not bump the voltage level up until
      after the port has already been enabled, which is too late.
      But we'll take care of that case separately.
      
      v2: Don't break the "must disable pipes case"
      v3: Keep the on stack 'pipe' for future use
      
      Cc: stable@vger.kernel.org
      Fixes: d62686ba
      
       ("drm/i915/adl_p: CDCLK crawl support for ADL")
      Reviewed-by: default avatarUma Shankar <uma.shankar@intel.com>
      Reviewed-by: default avatarGustavo Sousa <gustavo.sousa@intel.com>
      Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240402155016.13733-2-ville.syrjala@linux.intel.com
      (cherry picked from commit 3aecee90
      
      )
      Signed-off-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      88168b94
    • Josh Poimboeuf's avatar
      x86/bugs: Replace CONFIG_SPECTRE_BHI_{ON,OFF} with CONFIG_MITIGATION_SPECTRE_BHI · b2bf5858
      Josh Poimboeuf authored
      commit 4f511739
      
       upstream.
      
      For consistency with the other CONFIG_MITIGATION_* options, replace the
      CONFIG_SPECTRE_BHI_{ON,OFF} options with a single
      CONFIG_MITIGATION_SPECTRE_BHI option.
      
      [ mingo: Fix ]
      
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nikolay Borisov <nik.borisov@suse.com>
      Link: https://lore.kernel.org/r/3833812ea63e7fdbe36bf8b932e63f70d18e2a2a.1712813475.git.jpoimboe@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2bf5858
    • Josh Poimboeuf's avatar
      x86/bugs: Remove CONFIG_BHI_MITIGATION_AUTO and spectre_bhi=auto · d315f5eb
      Josh Poimboeuf authored
      commit 36d4fe14
      
       upstream.
      
      Unlike most other mitigations' "auto" options, spectre_bhi=auto only
      mitigates newer systems, which is confusing and not particularly useful.
      
      Remove it.
      
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Reviewed-by: default avatarNikolay Borisov <nik.borisov@suse.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: https://lore.kernel.org/r/412e9dc87971b622bbbaf64740ebc1f140bff343.1712813475.git.jpoimboe@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d315f5eb
    • Josh Poimboeuf's avatar
      x86/bugs: Clarify that syscall hardening isn't a BHI mitigation · ebba2270
      Josh Poimboeuf authored
      commit 5f882f3b upstream.
      
      While syscall hardening helps prevent some BHI attacks, there's still
      other low-hanging fruit remaining.  Don't classify it as a mitigation
      and make it clear that the system may still be vulnerable if it doesn't
      have a HW or SW mitigation enabled.
      
      Fixes: ec9404e4
      
       ("x86/bhi: Add BHI mitigation knob")
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Sean Christopherson <seanjc@google.com>
      Link: https://lore.kernel.org/r/b5951dae3fdee7f1520d5136a27be3bdfe95f88b.1712813475.git.jpoimboe@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ebba2270
    • Josh Poimboeuf's avatar
      x86/bugs: Fix BHI handling of RRSBA · e47d1cbd
      Josh Poimboeuf authored
      commit 1cea8a28 upstream.
      
      The ARCH_CAP_RRSBA check isn't correct: RRSBA may have already been
      disabled by the Spectre v2 mitigation (or can otherwise be disabled by
      the BHI mitigation itself if needed).  In that case retpolines are fine.
      
      Fixes: ec9404e4
      
       ("x86/bhi: Add BHI mitigation knob")
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Sean Christopherson <seanjc@google.com>
      Link: https://lore.kernel.org/r/6f56f13da34a0834b69163467449be7f58f253dc.1712813475.git.jpoimboe@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e47d1cbd
    • Ingo Molnar's avatar
      x86/bugs: Rename various 'ia32_cap' variables to 'x86_arch_cap_msr' · b4f2718f
      Ingo Molnar authored
      commit d0485730
      
       upstream.
      
      So we are using the 'ia32_cap' value in a number of places,
      which got its name from MSR_IA32_ARCH_CAPABILITIES MSR register.
      
      But there's very little 'IA32' about it - this isn't 32-bit only
      code, nor does it originate from there, it's just a historic
      quirk that many Intel MSR names are prefixed with IA32_.
      
      This is already clear from the helper method around the MSR:
      x86_read_arch_cap_msr(), which doesn't have the IA32 prefix.
      
      So rename 'ia32_cap' to 'x86_arch_cap_msr' to be consistent with
      its role and with the naming of the helper function.
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Nikolay Borisov <nik.borisov@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Sean Christopherson <seanjc@google.com>
      Link: https://lore.kernel.org/r/9592a18a814368e75f8f4b9d74d3883aa4fd1eaf.1712813475.git.jpoimboe@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b4f2718f
    • Josh Poimboeuf's avatar
      x86/bugs: Cache the value of MSR_IA32_ARCH_CAPABILITIES · c768db14
      Josh Poimboeuf authored
      commit cb2db5bb upstream.
      
      There's no need to keep reading MSR_IA32_ARCH_CAPABILITIES over and
      over.  It's even read in the BHI sysfs function which is a big no-no.
      Just read it once and cache it.
      
      Fixes: ec9404e4
      
       ("x86/bhi: Add BHI mitigation knob")
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Reviewed-by: default avatarNikolay Borisov <nik.borisov@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Sean Christopherson <seanjc@google.com>
      Link: https://lore.kernel.org/r/9592a18a814368e75f8f4b9d74d3883aa4fd1eaf.1712813475.git.jpoimboe@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c768db14
    • Josh Poimboeuf's avatar
      x86/bugs: Fix BHI documentation · 145d9930
      Josh Poimboeuf authored
      commit dfe64890 upstream.
      
      Fix up some inaccuracies in the BHI documentation.
      
      Fixes: ec9404e4
      
       ("x86/bhi: Add BHI mitigation knob")
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Reviewed-by: default avatarNikolay Borisov <nik.borisov@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Sean Christopherson <seanjc@google.com>
      Link: https://lore.kernel.org/r/8c84f7451bfe0dd08543c6082a383f390d4aa7e2.1712813475.git.jpoimboe@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      145d9930
    • Daniel Sneddon's avatar
      x86/bugs: Fix return type of spectre_bhi_state() · 2c761457
      Daniel Sneddon authored
      commit 04f4230e upstream.
      
      The definition of spectre_bhi_state() incorrectly returns a const char
      * const. This causes the a compiler warning when building with W=1:
      
       warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
       2812 | static const char * const spectre_bhi_state(void)
      
      Remove the const qualifier from the pointer.
      
      Fixes: ec9404e4
      
       ("x86/bhi: Add BHI mitigation knob")
      Reported-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarDaniel Sneddon <daniel.sneddon@linux.intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: https://lore.kernel.org/r/20240409230806.1545822-1-daniel.sneddon@linux.intel.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2c761457
    • Arnd Bergmann's avatar
      irqflags: Explicitly ignore lockdep_hrtimer_exit() argument · c6fd0e4f
      Arnd Bergmann authored
      commit c1d11fc2 upstream.
      
      When building with 'make W=1' but CONFIG_TRACE_IRQFLAGS=n, the
      unused argument to lockdep_hrtimer_exit() causes a warning:
      
      kernel/time/hrtimer.c:1655:14: error: variable 'expires_in_hardirq' set but not used [-Werror=unused-but-set-variable]
      
      This is intentional behavior, so add a cast to void to shut up the warning.
      
      Fixes: 73d20564
      
       ("hrtimer: Don't dereference the hrtimer pointer after the callback")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20240408074609.3170807-1-arnd@kernel.org
      Closes: https://lore.kernel.org/oe-kbuild-all/202311191229.55QXHVc6-lkp@intel.com/
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c6fd0e4f
    • Adam Dunlap's avatar
      x86/apic: Force native_apic_mem_read() to use the MOV instruction · 69843741
      Adam Dunlap authored
      commit 5ce344be
      
       upstream.
      
      When done from a virtual machine, instructions that touch APIC memory
      must be emulated. By convention, MMIO accesses are typically performed
      via io.h helpers such as readl() or writeq() to simplify instruction
      emulation/decoding (ex: in KVM hosts and SEV guests) [0].
      
      Currently, native_apic_mem_read() does not follow this convention,
      allowing the compiler to emit instructions other than the MOV
      instruction generated by readl(). In particular, when the kernel is
      compiled with clang and run as a SEV-ES or SEV-SNP guest, the compiler
      would emit a TESTL instruction which is not supported by the SEV-ES
      emulator, causing a boot failure in that environment. It is likely the
      same problem would happen in a TDX guest as that uses the same
      instruction emulator as SEV-ES.
      
      To make sure all emulators can emulate APIC memory reads via MOV, use
      the readl() function in native_apic_mem_read(). It is expected that any
      emulator would support MOV in any addressing mode as it is the most
      generic and is what is usually emitted currently.
      
      The TESTL instruction is emitted when native_apic_mem_read() is inlined
      into apic_mem_wait_icr_idle(). The emulator comes from
      insn_decode_mmio() in arch/x86/lib/insn-eval.c. It's not worth it to
      extend insn_decode_mmio() to support more instructions since, in theory,
      the compiler could choose to output nearly any instruction for such
      reads which would bloat the emulator beyond reason.
      
        [0] https://lore.kernel.org/all/20220405232939.73860-12-kirill.shutemov@linux.intel.com/
      
        [ bp: Massage commit message, fix typos. ]
      
      Signed-off-by: default avatarAdam Dunlap <acdunlap@google.com>
      Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Tested-by: default avatarKevin Loughlin <kevinloughlin@google.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20240318230927.2191933-1-acdunlap@google.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      69843741
    • John Stultz's avatar
      selftests: timers: Fix abs() warning in posix_timers test · c2981e32
      John Stultz authored
      commit ed366de8 upstream.
      
      Building with clang results in the following warning:
      
        posix_timers.c:69:6: warning: absolute value function 'abs' given an
            argument of type 'long long' but has parameter of type 'int' which may
            cause truncation of value [-Wabsolute-value]
              if (abs(diff - DELAY * USECS_PER_SEC) > USECS_PER_SEC / 2) {
                  ^
      So switch to using llabs() instead.
      
      Fixes: 0bc4b0cf
      
       ("selftests: add basic posix timers selftests")
      Signed-off-by: default avatarJohn Stultz <jstultz@google.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20240410232637.4135564-3-jstultz@google.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c2981e32
    • Sean Christopherson's avatar
      x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n · 70688450
      Sean Christopherson authored
      commit f337a6a2 upstream.
      
      Initialize cpu_mitigations to CPU_MITIGATIONS_OFF if the kernel is built
      with CONFIG_SPECULATION_MITIGATIONS=n, as the help text quite clearly
      states that disabling SPECULATION_MITIGATIONS is supposed to turn off all
      mitigations by default.
      
        │ If you say N, all mitigations will be disabled. You really
        │ should know what you are doing to say so.
      
      As is, the kernel still defaults to CPU_MITIGATIONS_AUTO, which results in
      some mitigations being enabled in spite of SPECULATION_MITIGATIONS=n.
      
      Fixes: f43b9876
      
       ("x86/retbleed: Add fine grained Kconfig knobs")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Reviewed-by: default avatarDaniel Sneddon <daniel.sneddon@linux.intel.com>
      Cc: stable@vger.kernel.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: https://lore.kernel.org/r/20240409175108.1512861-2-seanjc@google.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      70688450
    • Namhyung Kim's avatar
      perf/x86: Fix out of range data · e8f4a290
      Namhyung Kim authored
      commit dec8ced8 upstream.
      
      On x86 each struct cpu_hw_events maintains a table for counter assignment but
      it missed to update one for the deleted event in x86_pmu_del().  This
      can make perf_clear_dirty_counters() reset used counter if it's called
      before event scheduling or enabling.  Then it would return out of range
      data which doesn't make sense.
      
      The following code can reproduce the problem.
      
        $ cat repro.c
        #include <pthread.h>
        #include <stdio.h>
        #include <stdlib.h>
        #include <unistd.h>
        #include <linux/perf_event.h>
        #include <sys/ioctl.h>
        #include <sys/mman.h>
        #include <sys/syscall.h>
      
        struct perf_event_attr attr = {
        	.type = PERF_TYPE_HARDWARE,
        	.config = PERF_COUNT_HW_CPU_CYCLES,
        	.disabled = 1,
        };
      
        void *worker(void *arg)
        {
        	int cpu = (long)arg;
        	int fd1 = syscall(SYS_perf_event_open, &attr, -1, cpu, -1, 0);
        	int fd2 = syscall(SYS_perf_event_open, &attr, -1, cpu, -1, 0);
        	void *p;
      
        	do {
        		ioctl(fd1, PERF_EVENT_IOC_ENABLE, 0);
        		p = mmap(NULL, 4096, PROT_READ, MAP_SHARED, fd1, 0);
        		ioctl(fd2, PERF_EVENT_IOC_ENABLE, 0);
      
        		ioctl(fd2, PERF_EVENT_IOC_DISABLE, 0);
        		munmap(p, 4096);
        		ioctl(fd1, PERF_EVENT_IOC_DISABLE, 0);
        	} while (1);
      
        	return NULL;
        }
      
        int main(void)
        {
        	int i;
        	int n = sysconf(_SC_NPROCESSORS_ONLN);
        	pthread_t *th = calloc(n, sizeof(*th));
      
        	for (i = 0; i < n; i++)
        		pthread_create(&th[i], NULL, worker, (void *)(long)i);
        	for (i = 0; i < n; i++)
        		pthread_join(th[i], NULL);
      
        	free(th);
        	return 0;
        }
      
      And you can see the out of range data using perf stat like this.
      Probably it'd be easier to see on a large machine.
      
        $ gcc -o repro repro.c -pthread
        $ ./repro &
        $ sudo perf stat -A -I 1000 2>&1 | awk '{ if (length($3) > 15) print }'
             1.001028462 CPU6   196,719,295,683,763      cycles                           # 194290.996 GHz                       (71.54%)
             1.001028462 CPU3   396,077,485,787,730      branch-misses                    # 15804359784.80% of all branches      (71.07%)
             1.001028462 CPU17  197,608,350,727,877      branch-misses                    # 14594186554.56% of all branches      (71.22%)
             2.020064073 CPU4   198,372,472,612,140      cycles                           # 194681.113 GHz                       (70.95%)
             2.020064073 CPU6   199,419,277,896,696      cycles                           # 195720.007 GHz                       (70.57%)
             2.020064073 CPU20  198,147,174,025,639      cycles                           # 194474.654 GHz                       (71.03%)
             2.020064073 CPU20  198,421,240,580,145      stalled-cycles-frontend          #  100.14% frontend cycles idle        (70.93%)
             3.037443155 CPU4   197,382,689,923,416      cycles                           # 194043.065 GHz                       (71.30%)
             3.037443155 CPU20  196,324,797,879,414      cycles                           # 193003.773 GHz                       (71.69%)
             3.037443155 CPU5   197,679,956,608,205      stalled-cycles-backend           # 1315606428.66% backend cycles idle   (71.19%)
             3.037443155 CPU5   198,571,860,474,851      instructions                     # 13215422.58  insn per cycle
      
      It should move the contents in the cpuc->assign as well.
      
      Fixes: 5471eea5
      
       ("perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task")
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20240306061003.1894224-1-namhyung@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e8f4a290
    • Gavin Shan's avatar
      vhost: Add smp_rmb() in vhost_vq_avail_empty() · acf9b01d
      Gavin Shan authored
      commit 22e1992c upstream.
      
      A smp_rmb() has been missed in vhost_vq_avail_empty(), spotted by
      Will. Otherwise, it's not ensured the available ring entries pushed
      by guest can be observed by vhost in time, leading to stale available
      ring entries fetched by vhost in vhost_get_vq_desc(), as reported by
      Yihuang Yu on NVidia's grace-hopper (ARM64) platform.
      
        /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64      \
        -accel kvm -machine virt,gic-version=host -cpu host          \
        -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \
        -m 4096M,slots=16,maxmem=64G                                 \
        -object memory-backend-ram,id=mem0,size=4096M                \
         :                                                           \
        -netdev tap,id=vnet0,vhost=true                              \
        -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0
         :
        guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM
        virtio_net virtio0: output.0:id 100 is not a head!
      
      Add the missed smp_rmb() in vhost_vq_avail_empty(). When tx_can_batch()
      returns true, it means there's still pending tx buffers. Since it might
      read indices, so it still can bypass the smp_rmb() in vhost_get_vq_desc().
      Note that it should be safe until vq->avail_idx is changed by commit
      275bf960 ("vhost: better detection of available buffers").
      
      Fixes: 275bf960
      
       ("vhost: better detection of available buffers")
      Cc: <stable@kernel.org> # v4.11+
      Reported-by: default avatarYihuang Yu <yihyu@redhat.com>
      Suggested-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarGavin Shan <gshan@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Message-Id: <20240328002149.1141302-2-gshan@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      acf9b01d
    • Ville Syrjälä's avatar
      drm/client: Fully protect modes[] with dev->mode_config.mutex · d2dc6600
      Ville Syrjälä authored
      commit 3eadd887
      
       upstream.
      
      The modes[] array contains pointers to modes on the connectors'
      mode lists, which are protected by dev->mode_config.mutex.
      Thus we need to extend modes[] the same protection or by the
      time we use it the elements may already be pointing to
      freed/reused memory.
      
      Cc: stable@vger.kernel.org
      Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10583
      Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240404203336.10454-2-ville.syrjala@linux.intel.com
      Reviewed-by: default avatarDmitry Baryshkov <dmitry.baryshkov@linaro.org>
      Reviewed-by: default avatarJani Nikula <jani.nikula@intel.com>
      Reviewed-by: default avatarThomas Zimmermann <tzimmermann@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d2dc6600
    • Boris Burkov's avatar
      btrfs: qgroup: correctly model root qgroup rsv in convert · 773d38f4
      Boris Burkov authored
      commit 141fb8cd upstream.
      
      We use add_root_meta_rsv and sub_root_meta_rsv to track prealloc and
      pertrans reservations for subvolumes when quotas are enabled. The
      convert function does not properly increment pertrans after decrementing
      prealloc, so the count is not accurate.
      
      Note: we check that the fs is not read-only to mirror the logic in
      qgroup_convert_meta, which checks that before adding to the pertrans rsv.
      
      Fixes: 8287475a
      
       ("btrfs: qgroup: Use root::qgroup_meta_rsv_* to record qgroup meta reserved space")
      CC: stable@vger.kernel.org # 6.1+
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarBoris Burkov <boris@bur.io>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      773d38f4
    • Jacob Pan's avatar
      iommu/vt-d: Allocate local memory for page request queue · 23b57c55
      Jacob Pan authored
      [ Upstream commit a34f3e20 ]
      
      The page request queue is per IOMMU, its allocation should be made
      NUMA-aware for performance reasons.
      
      Fixes: a222a7f0
      
       ("iommu/vt-d: Implement page request handling")
      Signed-off-by: default avatarJacob Pan <jacob.jun.pan@linux.intel.com>
      Reviewed-by: default avatarKevin Tian <kevin.tian@intel.com>
      Link: https://lore.kernel.org/r/20240403214007.985600-1-jacob.jun.pan@linux.intel.com
      Signed-off-by: default avatarLu Baolu <baolu.lu@linux.intel.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      23b57c55
    • Arnd Bergmann's avatar
      tracing: hide unused ftrace_event_id_fops · 81f3ad64
      Arnd Bergmann authored
      [ Upstream commit 5281ec83 ]
      
      When CONFIG_PERF_EVENTS, a 'make W=1' build produces a warning about the
      unused ftrace_event_id_fops variable:
      
      kernel/trace/trace_events.c:2155:37: error: 'ftrace_event_id_fops' defined but not used [-Werror=unused-const-variable=]
       2155 | static const struct file_operations ftrace_event_id_fops = {
      
      Hide this in the same #ifdef as the reference to it.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20240403080702.3509288-7-arnd@kernel.org
      
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Zheng Yejian <zhengyejian1@huawei.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Ajay Kaher <akaher@vmware.com>
      Cc: Jinjie Ruan <ruanjinjie@huawei.com>
      Cc: Clément Léger <cleger@rivosinc.com>
      Cc: Dan Carpenter <dan.carpenter@linaro.org>
      Cc: "Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com>
      Fixes: 620a30e9
      
       ("tracing: Don't pass file_operations array to event_create_dir()")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      81f3ad64
    • David Arinzon's avatar
      net: ena: Fix incorrect descriptor free behavior · fdfbf54d
      David Arinzon authored
      [ Upstream commit bf02d9fe ]
      
      ENA has two types of TX queues:
      - queues which only process TX packets arriving from the network stack
      - queues which only process TX packets forwarded to it by XDP_REDIRECT
        or XDP_TX instructions
      
      The ena_free_tx_bufs() cycles through all descriptors in a TX queue
      and unmaps + frees every descriptor that hasn't been acknowledged yet
      by the device (uncompleted TX transactions).
      The function assumes that the processed TX queue is necessarily from
      the first category listed above and ends up using napi_consume_skb()
      for descriptors belonging to an XDP specific queue.
      
      This patch solves a bug in which, in case of a VF reset, the
      descriptors aren't freed correctly, leading to crashes.
      
      Fixes: 548c4940
      
       ("net: ena: Implement XDP_TX action")
      Signed-off-by: default avatarShay Agroskin <shayagr@amazon.com>
      Signed-off-by: default avatarDavid Arinzon <darinzon@amazon.com>
      Reviewed-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      fdfbf54d
    • David Arinzon's avatar
      net: ena: Wrong missing IO completions check order · ec25a9ce
      David Arinzon authored
      [ Upstream commit f7e41718 ]
      
      Missing IO completions check is called every second (HZ jiffies).
      This commit fixes several issues with this check:
      
      1. Duplicate queues check:
         Max of 4 queues are scanned on each check due to monitor budget.
         Once reaching the budget, this check exits under the assumption that
         the next check will continue to scan the remainder of the queues,
         but in practice, next check will first scan the last already scanned
         queue which is not necessary and may cause the full queue scan to
         last a couple of seconds longer.
         The fix is to start every check with the next queue to scan.
         For example, on 8 IO queues:
         Bug: [0,1,2,3], [3,4,5,6], [6,7]
         Fix: [0,1,2,3], [4,5,6,7]
      
      2. Unbalanced queues check:
         In case the number of active IO queues is not a multiple of budget,
         there will be checks which don't utilize the full budget
         because the full scan exits when reaching the last queue id.
         The fix is to run every TX completion check with exact queue budget
         regardless of the queue id.
         For example, on 7 IO queues:
         Bug: [0,1,2,3], [4,5,6], [0,1,2,3]
         Fix: [0,1,2,3], [4,5,6,0], [1,2,3,4]
         The budget may be lowered in case the number of IO queues is less
         than the budget (4) to make sure there are no duplicate queues on
         the same check.
         For example, on 3 IO queues:
         Bug: [0,1,2,0], [1,2,0,1]
         Fix: [0,1,2], [0,1,2]
      
      Fixes: 1738cd3e
      
       ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
      Signed-off-by: default avatarAmit Bernstein <amitbern@amazon.com>
      Signed-off-by: default avatarDavid Arinzon <darinzon@amazon.com>
      Reviewed-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ec25a9ce
    • David Arinzon's avatar
      net: ena: Fix potential sign extension issue · e667a05c
      David Arinzon authored
      [ Upstream commit 713a8519 ]
      
      Small unsigned types are promoted to larger signed types in
      the case of multiplication, the result of which may overflow.
      In case the result of such a multiplication has its MSB
      turned on, it will be sign extended with '1's.
      This changes the multiplication result.
      
      Code example of the phenomenon:
      -------------------------------
      u16 x, y;
      size_t z1, z2;
      
      x = y = 0xffff;
      printk("x=%x y=%x\n",x,y);
      
      z1 = x*y;
      z2 = (size_t)x*y;
      
      printk("z1=%lx z2=%lx\n", z1, z2);
      
      Output:
      -------
      x=ffff y=ffff
      z1=fffffffffffe0001 z2=fffe0001
      
      The expected result of ffff*ffff is fffe0001, and without the
      explicit casting to avoid the unwanted sign extension we got
      fffffffffffe0001.
      
      This commit adds an explicit casting to avoid the sign extension
      issue.
      
      Fixes: 689b2bda
      
       ("net: ena: add functions for handling Low Latency Queues in ena_com")
      Signed-off-by: default avatarArthur Kiyanovski <akiyano@amazon.com>
      Signed-off-by: default avatarDavid Arinzon <darinzon@amazon.com>
      Reviewed-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e667a05c
    • Michal Luczaj's avatar
      af_unix: Fix garbage collector racing against connect() · e76c2678
      Michal Luczaj authored
      [ Upstream commit 47d8ac01 ]
      
      Garbage collector does not take into account the risk of embryo getting
      enqueued during the garbage collection. If such embryo has a peer that
      carries SCM_RIGHTS, two consecutive passes of scan_children() may see a
      different set of children. Leading to an incorrectly elevated inflight
      count, and then a dangling pointer within the gc_inflight_list.
      
      sockets are AF_UNIX/SOCK_STREAM
      S is an unconnected socket
      L is a listening in-flight socket bound to addr, not in fdtable
      V's fd will be passed via sendmsg(), gets inflight count bumped
      
      connect(S, addr)	sendmsg(S, [V]); close(V)	__unix_gc()
      ----------------	-------------------------	-----------
      
      NS = unix_create1()
      skb1 = sock_wmalloc(NS)
      L = unix_find_other(addr)
      unix_state_lock(L)
      unix_peer(S) = NS
      			// V count=1 inflight=0
      
       			NS = unix_peer(S)
       			skb2 = sock_alloc()
      			skb_queue_tail(NS, skb2[V])
      
      			// V became in-flight
      			// V count=2 inflight=1
      
      			close(V)
      
      			// V count=1 inflight=1
      			// GC candidate condition met
      
      						for u in gc_inflight_list:
      						  if (total_refs == inflight_refs)
      						    add u to gc_candidates
      
      						// gc_candidates={L, V}
      
      						for u in gc_candidates:
      						  scan_children(u, dec_inflight)
      
      						// embryo (skb1) was not
      						// reachable from L yet, so V's
      						// inflight remains unchanged
      __skb_queue_tail(L, skb1)
      unix_state_unlock(L)
      						for u in gc_candidates:
      						  if (u.inflight)
      						    scan_children(u, inc_inflight_move_tail)
      
      						// V count=1 inflight=2 (!)
      
      If there is a GC-candidate listening socket, lock/unlock its state. This
      makes GC wait until the end of any ongoing connect() to that socket. After
      flipping the lock, a possibly SCM-laden embryo is already enqueued. And if
      there is another embryo coming, it can not possibly carry SCM_RIGHTS. At
      this point, unix_inflight() can not happen because unix_gc_lock is already
      taken. Inflight graph remains unaffected.
      
      Fixes: 1fd05ba5
      
       ("[AF_UNIX]: Rewrite garbage collector, fixes race.")
      Signed-off-by: default avatarMichal Luczaj <mhal@rbox.co>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20240409201047.1032217-1-mhal@rbox.co
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e76c2678