- Apr 13, 2024
-
-
Dave Airlie authored
commit 3b0daecf upstream. This uses calloc instead of doing the multiplication which might overflow. Cc: stable@vger.kernel.org Signed-off-by:
Dave Airlie <airlied@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Apr 10, 2024
-
-
Greg Kroah-Hartman authored
Link: https://lore.kernel.org/r/20240408125359.506372836@linuxfoundation.org Tested-by:
SeongJae Park <sj@kernel.org> Tested-by:
Kelsey Steele <kelseysteele@linux.microsoft.com> Tested-by:
Ron Economos <re@w6rz.net> Tested-by:
Linux Kernel Functional Testing <lkft@linaro.org> Tested-by:
Shuah Khan <skhan@linuxfoundation.org> Link: https://lore.kernel.org/r/20240409172909.473227113@linuxfoundation.org Link: https://lore.kernel.org/r/20240409173628.028890390@linuxfoundation.org Tested-by:
Florian Fainelli <florian.fainelli@broadcom.com> Tested-by:
kernelci.org bot <bot@kernelci.org> Tested-by:
Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Tested-by:
Linux Kernel Functional Testing <lkft@linaro.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Greg Kroah-Hartman authored
commit 2bb69f5f upstream. Part of a merge commit from Linus that adjusted the default setting of SPECTRE_BHI_ON. Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Daniel Sneddon authored
commit ed2e8d49 upstream. Intel processors that aren't vulnerable to BHI will set MSR_IA32_ARCH_CAPABILITIES[BHI_NO] = 1;. Guests may use this BHI_NO bit to determine if they need to implement BHI mitigations or not. Allow this bit to be passed to the guests. Signed-off-by:
Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by:
Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by:
Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by:
Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by:
Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pawan Gupta authored
commit 95a6ccbd upstream. BHI mitigation mode spectre_bhi=auto does not deploy the software mitigation by default. In a cloud environment, it is a likely scenario where userspace is trusted but the guests are not trusted. Deploying system wide mitigation in such cases is not desirable. Update the auto mode to unconditionally mitigate against malicious guests. Deploy the software sequence at VMexit in auto mode also, when hardware mitigation is not available. Unlike the force =on mode, software sequence is not deployed at syscalls in auto mode. Suggested-by:
Alexandre Chartre <alexandre.chartre@oracle.com> Signed-off-by:
Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by:
Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by:
Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by:
Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pawan Gupta authored
commit ec9404e4 upstream. Branch history clearing software sequences and hardware control BHI_DIS_S were defined to mitigate Branch History Injection (BHI). Add cmdline spectre_bhi={on|off|auto} to control BHI mitigation: auto - Deploy the hardware mitigation BHI_DIS_S, if available. on - Deploy the hardware mitigation BHI_DIS_S, if available, otherwise deploy the software sequence at syscall entry and VMexit. off - Turn off BHI mitigation. The default is auto mode which does not deploy the software sequence mitigation. This is because of the hardening done in the syscall dispatch path, which is the likely target of BHI. Signed-off-by:
Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by:
Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by:
Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by:
Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pawan Gupta authored
commit be482ff9 upstream. Mitigation for BHI is selected based on the bug enumeration. Add bits needed to enumerate BHI bug. Signed-off-by:
Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by:
Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by:
Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by:
Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Daniel Sneddon authored
commit 0f4a8376 upstream. Newer processors supports a hardware control BHI_DIS_S to mitigate Branch History Injection (BHI). Setting BHI_DIS_S protects the kernel from userspace BHI attacks without having to manually overwrite the branch history. Define MSR_SPEC_CTRL bit BHI_DIS_S and its enumeration CPUID.BHI_CTRL. Mitigation is enabled later. Signed-off-by:
Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by:
Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by:
Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by:
Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by:
Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pawan Gupta authored
commit 7390db8a upstream. Branch History Injection (BHI) attacks may allow a malicious application to influence indirect branch prediction in kernel by poisoning the branch history. eIBRS isolates indirect branch targets in ring0. The BHB can still influence the choice of indirect branch predictor entry, and although branch predictor entries are isolated between modes when eIBRS is enabled, the BHB itself is not isolated between modes. Alder Lake and new processors supports a hardware control BHI_DIS_S to mitigate BHI. For older processors Intel has released a software sequence to clear the branch history on parts that don't support BHI_DIS_S. Add support to execute the software sequence at syscall entry and VMexit to overwrite the branch history. For now, branch history is not cleared at interrupt entry, as malicious applications are not believed to have sufficient control over the registers, since previous register state is cleared at interrupt entry. Researchers continue to poke at this area and it may become necessary to clear at interrupt entry as well in the future. This mitigation is only defined here. It is enabled later. Signed-off-by:
Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Co-developed-by:
Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by:
Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by:
Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by:
Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Linus Torvalds authored
commit 1e3ad783 upstream. Make <asm/syscall.h> build a switch statement instead, and the compiler can either decide to generate an indirect jump, or - more likely these days due to mitigations - just a series of conditional branches. Yes, the conditional branches also have branch prediction, but the branch prediction is much more controlled, in that it just causes speculatively running the wrong system call (harmless), rather than speculatively running possibly wrong random less controlled code gadgets. This doesn't mitigate other indirect calls, but the system call indirection is the first and most easily triggered case. Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by:
Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Josh Poimboeuf authored
commit 0cd01ac5 upstream. Change the format of the 'spectre_v2' vulnerabilities sysfs file slightly by converting the commas to semicolons, so that mitigations for future variants can be grouped together and separated by commas. Signed-off-by:
Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by:
Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
min15.li authored
commit 31a59782 upstream. In the function nvme_passthru_end(), only the value of the command opcode is checked, without checking the command type (IO command or Admin command). When we send a Dataset Management command (The opcode of the Dataset Management command is the same as the Set Feature command), kernel thinks it is a set feature command, then sets the controller's keep alive interval, and calls nvme_keep_alive_work(). Signed-off-by:
min15.li <min15.li@samsung.com> Reviewed-by:
Kanchan Joshi <joshi.k@samsung.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Keith Busch <kbusch@kernel.org> Fixes: b58da2d2 ("nvme: update keep alive interval when kato is modified") Signed-off-by:
Tokunori Ikegami <ikegami.t@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Antoine Tenart authored
commit ed4cccef upstream. If packets are GROed with fraglist they might be segmented later on and continue their journey in the stack. In skb_segment_list those skbs can be reused as-is. This is an issue as their destructor was removed in skb_gro_receive_list but not the reference to their socket, and then they can't be orphaned. Fix this by also removing the reference to the socket. For example this could be observed, kernel BUG at include/linux/skbuff.h:3131! (skb_orphan) RIP: 0010:ip6_rcv_core+0x11bc/0x19a0 Call Trace: ipv6_list_rcv+0x250/0x3f0 __netif_receive_skb_list_core+0x49d/0x8f0 netif_receive_skb_list_internal+0x634/0xd40 napi_complete_done+0x1d2/0x7d0 gro_cell_poll+0x118/0x1f0 A similar construction is found in skb_gro_receive, apply the same change there. Fixes: 5e10da53 ("skbuff: allow 'slow_gro' for skb carring sock reference") Signed-off-by:
Antoine Tenart <atenart@kernel.org> Reviewed-by:
Willem de Bruijn <willemb@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David Hildenbrand authored
commit 65291dcf upstream. folio_is_secretmem() currently relies on secretmem folios being LRU folios, to save some cycles. However, folios might reside in a folio batch without the LRU flag set, or temporarily have their LRU flag cleared. Consequently, the LRU flag is unreliable for this purpose. In particular, this is the case when secretmem_fault() allocates a fresh page and calls filemap_add_folio()->folio_add_lru(). The folio might be added to the per-cpu folio batch and won't get the LRU flag set until the batch was drained using e.g., lru_add_drain(). Consequently, folio_is_secretmem() might not detect secretmem folios and GUP-fast can succeed in grabbing a secretmem folio, crashing the kernel when we would later try reading/writing to the folio, because the folio has been unmapped from the directmap. Fix it by removing that unreliable check. Link: https://lkml.kernel.org/r/20240326143210.291116-2-david@redhat.com Fixes: 1507f512 ("mm: introduce memfd_secret system call to create "secret" memory areas") Signed-off-by:
David Hildenbrand <david@redhat.com> Reported-by:
xingwei lee <xrivendell7@gmail.com> Reported-by:
yue sun <samsun1006219@gmail.com> Closes: https://lore.kernel.org/lkml/CABOYnLyevJeravW=QrH0JUPYEcDN160aZFb7kwndm-J2rmz0HQ@mail.gmail.com/ Debugged-by:
Miklos Szeredi <miklos@szeredi.hu> Tested-by:
Miklos Szeredi <mszeredi@redhat.com> Reviewed-by:
Mike Rapoport (IBM) <rppt@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
David Hildenbrand <david@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Davide Caratti authored
commit 7a1b3490 upstream. Current MPTCP servers increment MPTcpExtMPCapableFallbackACK when they accept non-MPC connections. As reported by Christoph, this is "surprising" because the counter might become greater than MPTcpExtMPCapableSYNRX. MPTcpExtMPCapableFallbackACK counter's name suggests it should only be incremented when a connection was seen using MPTCP options, then a fallback to TCP has been done. Let's do that by incrementing it when the subflow context of an inbound MPC connection attempt is dropped. Also, update mptcp_connect.sh kselftest, to ensure that the above MIB does not increment in case a pure TCP client connects to a MPTCP server. Fixes: fc518953 ("mptcp: add and use MIB counter infrastructure") Cc: stable@vger.kernel.org Reported-by:
Christoph Paasch <cpaasch@apple.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/449 Signed-off-by:
Davide Caratti <dcaratti@redhat.com> Reviewed-by:
Mat Martineau <martineau@kernel.org> Reviewed-by:
Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by:
Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240329-upstream-net-20240329-fallback-mib-v1-1-324a8981da48@kernel.org Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Borislav Petkov (AMD) authored
Commit 0e110732 upstream. The srso_alias_untrain_ret() dummy thunk in the !CONFIG_MITIGATION_SRSO case is there only for the altenative in CALL_UNTRAIN_RET to have a symbol to resolve. However, testing with kernels which don't have CONFIG_MITIGATION_SRSO enabled, leads to the warning in patch_return() to fire: missing return thunk: srso_alias_untrain_ret+0x0/0x10-0x0: eb 0e 66 66 2e WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:826 apply_returns (arch/x86/kernel/alternative.c:826 Put in a plain "ret" there so that gcc doesn't put a return thunk in in its place which special and gets checked. In addition: ERROR: modpost: "srso_alias_untrain_ret" [arch/x86/kvm/kvm-amd.ko] undefined! make[2]: *** [scripts/Makefile.modpost:145: Module.symvers] Chyba 1 make[1]: *** [/usr/src/linux-6.8.3/Makefile:1873: modpost] Chyba 2 make: *** [Makefile:240: __sub-make] Chyba 2 since !SRSO builds would use the dummy return thunk as reported by petr.pisar@atlas.cz, https://bugzilla.kernel.org/show_bug.cgi?id=218679 . Reported-by:
kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202404020901.da75a60f-oliver.sang@intel.com Signed-off-by:
Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/all/202404020901.da75a60f-oliver.sang@intel.com/ Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Borislav Petkov (AMD) authored
Commit 4535e1a4 upstream. The original version of the mitigation would patch in the calls to the untraining routines directly. That is, the alternative() in UNTRAIN_RET will patch in the CALL to srso_alias_untrain_ret() directly. However, even if commit e7c25c44 ("x86/cpu: Cleanup the untrain mess") meant well in trying to clean up the situation, due to micro- architectural reasons, the untraining routine srso_alias_untrain_ret() must be the target of a CALL instruction and not of a JMP instruction as it is done now. Reshuffle the alternative macros to accomplish that. Fixes: e7c25c44 ("x86/cpu: Cleanup the untrain mess") Signed-off-by:
Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by:
Ingo Molnar <mingo@kernel.org> Cc: stable@kernel.org Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stefan O'Rear authored
commit d14fa1fc upstream. childregs represents the registers which are active for the new thread in user context. For a kernel thread, childregs->gp is never used since the kernel gp is not touched by switch_to. For a user mode helper, the gp value can be observed in user space after execve or possibly by other means. [From the email thread] The /* Kernel thread */ comment is somewhat inaccurate in that it is also used for user_mode_helper threads, which exec a user process, e.g. /sbin/init or when /proc/sys/kernel/core_pattern is a pipe. Such threads do not have PF_KTHREAD set and are valid targets for ptrace etc. even before they exec. childregs is the *user* context during syscall execution and it is observable from userspace in at least five ways: 1. kernel_execve does not currently clear integer registers, so the starting register state for PID 1 and other user processes started by the kernel has sp = user stack, gp = kernel __global_pointer$, all other integer registers zeroed by the memset in the patch comment. This is a bug in its own right, but I'm unwilling to bet that it is the only way to exploit the issue addressed by this patch. 2. ptrace(PTRACE_GETREGSET): you can PTRACE_ATTACH to a user_mode_helper thread before it execs, but ptrace requires SIGSTOP to be delivered which can only happen at user/kernel boundaries. 3. /proc/*/task/*/syscall: this is perfectly happy to read pt_regs for user_mode_helpers before the exec completes, but gp is not one of the registers it returns. 4. PERF_SAMPLE_REGS_USER: LOCKDOWN_PERF normally prevents access to kernel addresses via PERF_SAMPLE_REGS_INTR, but due to this bug kernel addresses are also exposed via PERF_SAMPLE_REGS_USER which is permitted under LOCKDOWN_PERF. I have not attempted to write exploit code. 5. Much of the tracing infrastructure allows access to user registers. I have not attempted to determine which forms of tracing allow access to user registers without already allowing access to kernel registers. Fixes: 7db91e57 ("RISC-V: Task implementation") Cc: stable@vger.kernel.org Signed-off-by:
Stefan O'Rear <sorear@fastmail.com> Reviewed-by:
Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240327061258.2370291-1-sorear@fastmail.com Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Samuel Holland authored
commit d080a08b upstream. These macros did not initialize __kr_err, so they could fail even if the access did not fault. Cc: stable@vger.kernel.org Fixes: d464118c ("riscv: implement __get_kernel_nofault and __put_user_nofault") Signed-off-by:
Samuel Holland <samuel.holland@sifive.com> Reviewed-by:
Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by:
Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20240312022030.320789-1-samuel.holland@sifive.com Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sumanth Korikkar authored
commit 378ca2d2 upstream. Align system call table on 8 bytes. With sys_call_table entry size of 8 bytes that eliminates the possibility of a system call pointer crossing cache line boundary. Cc: stable@kernel.org Suggested-by:
Ulrich Weigand <ulrich.weigand@de.ibm.com> Reviewed-by:
Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by:
Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by:
Vasily Gorbik <gor@linux.ibm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Borislav Petkov (AMD) authored
commit 3ddf944b upstream. Modifying a MCA bank's MCA_CTL bits which control which error types to be reported is done over /sys/devices/system/machinecheck/ ├── machinecheck0 │ ├── bank0 │ ├── bank1 │ ├── bank10 │ ├── bank11 ... sysfs nodes by writing the new bit mask of events to enable. When the write is accepted, the kernel deletes all current timers and reinits all banks. Doing that in parallel can lead to initializing a timer which is already armed and in the timer wheel, i.e., in use already: ODEBUG: init active (active state 0) object: ffff888063a28000 object type: timer_list hint: mce_timer_fn+0x0/0x240 arch/x86/kernel/cpu/mce/core.c:2642 WARNING: CPU: 0 PID: 8120 at lib/debugobjects.c:514 debug_print_object+0x1a0/0x2a0 lib/debugobjects.c:514 Fix that by grabbing the sysfs mutex as the rest of the MCA sysfs code does. Reported by: Yue Sun <samsun1006219@gmail.com> Reported by: xingwei lee <xrivendell7@gmail.com> Signed-off-by:
Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/CAEkJfYNiENwQY8yV1LYJ9LjJs%2Bx_-PqMv98gKig55=2vbzffRw@mail.gmail.com Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Herve Codina authored
commit 8917e738 upstream. In the following sequence: 1) of_platform_depopulate() 2) of_overlay_remove() During the step 1, devices are destroyed and devlinks are removed. During the step 2, OF nodes are destroyed but __of_changeset_entry_destroy() can raise warnings related to missing of_node_put(): ERROR: memory leak, expected refcount 1 instead of 2 ... Indeed, during the devlink removals performed at step 1, the removal itself releasing the device (and the attached of_node) is done by a job queued in a workqueue and so, it is done asynchronously with respect to function calls. When the warning is present, of_node_put() will be called but wrongly too late from the workqueue job. In order to be sure that any ongoing devlink removals are done before the of_node destruction, synchronize the of_changeset_destroy() with the devlink removals. Fixes: 80dd33cf ("drivers: base: Fix device link removal") Cc: stable@vger.kernel.org Signed-off-by:
Herve Codina <herve.codina@bootlin.com> Reviewed-by:
Saravana Kannan <saravanak@google.com> Tested-by:
Luca Ceresoli <luca.ceresoli@bootlin.com> Reviewed-by:
Nuno Sa <nuno.sa@analog.com> Link: https://lore.kernel.org/r/20240325152140.198219-3-herve.codina@bootlin.com Signed-off-by:
Rob Herring <robh@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Herve Codina authored
commit 0462c56c upstream. The commit 80dd33cf ("drivers: base: Fix device link removal") introduces a workqueue to release the consumer and supplier devices used in the devlink. In the job queued, devices are release and in turn, when all the references to these devices are dropped, the release function of the device itself is called. Nothing is present to provide some synchronisation with this workqueue in order to ensure that all ongoing releasing operations are done and so, some other operations can be started safely. For instance, in the following sequence: 1) of_platform_depopulate() 2) of_overlay_remove() During the step 1, devices are released and related devlinks are removed (jobs pushed in the workqueue). During the step 2, OF nodes are destroyed but, without any synchronisation with devlink removal jobs, of_overlay_remove() can raise warnings related to missing of_node_put(): ERROR: memory leak, expected refcount 1 instead of 2 Indeed, the missing of_node_put() call is going to be done, too late, from the workqueue job execution. Introduce device_link_wait_removal() to offer a way to synchronize operations waiting for the end of devlink removals (i.e. end of workqueue jobs). Also, as a flushing operation is done on the workqueue, the workqueue used is moved from a system-wide workqueue to a local one. Cc: stable@vger.kernel.org Signed-off-by:
Herve Codina <herve.codina@bootlin.com> Tested-by:
Luca Ceresoli <luca.ceresoli@bootlin.com> Reviewed-by:
Nuno Sa <nuno.sa@analog.com> Reviewed-by:
Saravana Kannan <saravanak@google.com> Acked-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20240325152140.198219-2-herve.codina@bootlin.com Signed-off-by:
Rob Herring <robh@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
I Gede Agastya Darma Laksana authored
commit 1576f263 upstream. This patch addresses an issue with the Panasonic CF-SZ6's existing quirk, specifically its headset microphone functionality. Previously, the quirk used ALC269_FIXUP_HEADSET_MODE, which does not support the CF-SZ6's design of a single 3.5mm jack for both mic and audio output effectively. The device uses pin 0x19 for the headset mic without jack detection. Following verification on the CF-SZ6 and discussions with the original patch author, i determined that the update to ALC269_FIXUP_ASPIRE_HEADSET_MIC is the appropriate solution. This change is custom-designed for the CF-SZ6's unique hardware setup, which includes a single 3.5mm jack for both mic and audio output, connecting the headset microphone to pin 0x19 without the use of jack detection. Fixes: 0fca97a2 ("ALSA: hda/realtek - Add Panasonic CF-SZ6 headset jack quirk") Signed-off-by:
I Gede Agastya Darma Laksana <gedeagas22@gmail.com> Cc: <stable@vger.kernel.org> Message-ID: <20240401174602.14133-1-gedeagas22@gmail.com> Signed-off-by:
Takashi Iwai <tiwai@suse.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jann Horn authored
[ Upstream commit 055ca835 ] When you try to splice between a normal pipe and a notification pipe, get_pipe_info(..., true) fails, so splice() falls back to treating the notification pipe like a normal pipe - so we end up in iter_file_splice_write(), which first locks the input pipe, then calls vfs_iter_write(), which locks the output pipe. Lockdep complains about that, because we're taking a pipe lock while already holding another pipe lock. I think this probably (?) can't actually lead to deadlocks, since you'd need another way to nest locking a normal pipe into locking a watch_queue pipe, but the lockdep annotations don't make that clear. Bail out earlier in pipe_write() for notification pipes, before taking the pipe lock. Reported-and-tested-by:
<syzbot+011e4ea1da6692cf881c@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=011e4ea1da6692cf881c Fixes: c73be61c ("pipe: Add general notification queue support") Signed-off-by:
Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20231124150822.2121798-1-jannh@google.com Signed-off-by:
Christian Brauner <brauner@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Jann Horn authored
[ Upstream commit 28148a17 ] Since commit 8782fb61 ("mm: pagewalk: Fix race between unmap and page walker"), walk_page_range() on kernel ranges won't work anymore, walk_page_range_novma() must be used instead. Note: I don't have an openrisc development setup, so this is completely untested. Fixes: 8782fb61 ("mm: pagewalk: Fix race between unmap and page walker") Signed-off-by:
Jann Horn <jannh@google.com> Signed-off-by:
Stafford Horne <shorne@gmail.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Jann Horn authored
[ Upstream commit c8e7ff41 ] The flag uhid->running can be set to false by uhid_device_add_worker() without holding the uhid->devlock. Mark all reads/writes of the flag that might race with READ_ONCE()/WRITE_ONCE() for clarity and correctness. Signed-off-by:
Jann Horn <jannh@google.com> Signed-off-by:
Jiri Kosina <jkosina@suse.cz> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Jeff Layton authored
[ Upstream commit 10396f4d ] Currently the CB_RECALL_ANY job takes a cl_rpc_users reference to the client. While a callback job is technically an RPC that counter is really more for client-driven RPCs, and this has the effect of preventing the client from being unhashed until the callback completes. If nfsd decides to send a CB_RECALL_ANY just as the client reboots, we can end up in a situation where the callback can't complete on the (now dead) callback channel, but the new client can't connect because the old client can't be unhashed. This usually manifests as a NFS4ERR_DELAY return on the CREATE_SESSION operation. The job is only holding a reference to the client so it can clear a flag after the RPC completes. Fix this by having CB_RECALL_ANY instead hold a reference to the cl_nfsdfs.cl_ref. Typically we only take that sort of reference when dealing with the nfsdfs info files, but it should work appropriately here to ensure that the nfs4_client doesn't disappear. Fixes: 44df6f43 ("NFSD: add delegation reaper to react to low memory condition") Reported-by:
Vladimir Benes <vbenes@redhat.com> Signed-off-by:
Jeff Layton <jlayton@kernel.org> Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Arnd Bergmann authored
[ Upstream commit 3137b83a ] Building with W=1 shows a warning for an unused variable when CONFIG_PCI is diabled: drivers/ata/sata_mv.c:790:35: error: unused variable 'mv_pci_tbl' [-Werror,-Wunused-const-variable] static const struct pci_device_id mv_pci_tbl[] = { Move the table into the same block that containsn the pci_driver definition. Fixes: 7bb3c529 ("sata_mv: Remove PCI dependency") Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Signed-off-by:
Damien Le Moal <dlemoal@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Arnd Bergmann authored
[ Upstream commit 1197c5b2 ] The myrb and myrs drivers use an odd way of implementing their sysfs files, calling snprintf() with a fixed length of 32 bytes to print into a page sized buffer. One of the strings is actually longer than 32 bytes, which clang can warn about: drivers/scsi/myrb.c:1906:10: error: 'snprintf' will always be truncated; specified size is 32, but format string expands to at least 34 [-Werror,-Wformat-truncation] drivers/scsi/myrs.c:1089:10: error: 'snprintf' will always be truncated; specified size is 32, but format string expands to at least 34 [-Werror,-Wformat-truncation] These could all be plain sprintf() without a length as the buffer is always long enough. On the other hand, sysfs files should not be overly long either, so just double the length to make sure the longest strings don't get truncated here. Fixes: 77266186 ("scsi: myrs: Add Mylex RAID controller (SCSI interface)") Fixes: 081ff398 ("scsi: myrb: Add Mylex RAID controller (block interface)") Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20240326223825.4084412-8-arnd@kernel.org Reviewed-by:
Hannes Reinecke <hare@suse.de> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Arnd Bergmann authored
[ Upstream commit 52f80bb1 ] gcc warns about a memcpy() with overlapping pointers because of an incorrect size calculation: In file included from include/linux/string.h:369, from drivers/ata/sata_sx4.c:66: In function 'memcpy_fromio', inlined from 'pdc20621_get_from_dimm.constprop' at drivers/ata/sata_sx4.c:962:2: include/linux/fortify-string.h:97:33: error: '__builtin_memcpy' accessing 4294934464 bytes at offsets 0 and [16, 16400] overlaps 6442385281 bytes at offset -2147450817 [-Werror=restrict] 97 | #define __underlying_memcpy __builtin_memcpy | ^ include/linux/fortify-string.h:620:9: note: in expansion of macro '__underlying_memcpy' 620 | __underlying_##op(p, q, __fortify_size); \ | ^~~~~~~~~~~~~ include/linux/fortify-string.h:665:26: note: in expansion of macro '__fortify_memcpy_chk' 665 | #define memcpy(p, q, s) __fortify_memcpy_chk(p, q, s, \ | ^~~~~~~~~~~~~~~~~~~~ include/asm-generic/io.h:1184:9: note: in expansion of macro 'memcpy' 1184 | memcpy(buffer, __io_virt(addr), size); | ^~~~~~ The problem here is the overflow of an unsigned 32-bit number to a negative that gets converted into a signed 'long', keeping a large positive number. Replace the complex calculation with a more readable min() variant that avoids the warning. Fixes: 1da177e4 ("Linux-2.6.12-rc2") Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Signed-off-by:
Damien Le Moal <dlemoal@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Stephen Lee authored
[ Upstream commit fc563aa9 ] In snd_soc_info_volsw(), mask is generated by figuring out the index of the most significant bit set in max and converting the index to a bitmask through bit shift 1. Unintended wraparound occurs when max is an integer value with msb bit set. Since the bit shift value 1 is treated as an integer type, the left shift operation will wraparound and set mask to 0 instead of all 1's. In order to fix this, we type cast 1 as `1ULL` to prevent the wraparound. Fixes: 7077148f ("ASoC: core: Split ops out of soc-core.c") Signed-off-by:
Stephen Lee <slee08177@gmail.com> Link: https://msgid.link/r/20240326010131.6211-1-slee08177@gmail.com Signed-off-by:
Mark Brown <broonie@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Pierre-Louis Bossart authored
[ Upstream commit aae86cfd ] The disable_irq_lock protects the 'disable_irq' value, we need to lock before testing it. Fixes: b69de265 ("ASoC: rt711: fix for JD event handling in ClockStop Mode0") Signed-off-by:
Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by:
Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by:
Chao Song <chao.song@linux.intel.com> Link: https://msgid.link/r/20240325221817.206465-4-pierre-louis.bossart@linux.intel.com Signed-off-by:
Mark Brown <broonie@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Pierre-Louis Bossart authored
[ Upstream commit ee287771 ] The disable_irq_lock protects the 'disable_irq' value, we need to lock before testing it. Fixes: 23adeb70 ("ASoC: rt711-sdca: fix for JD event handling in ClockStop Mode0") Signed-off-by:
Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by:
Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by:
Chao Song <chao.song@linux.intel.com> Link: https://msgid.link/r/20240325221817.206465-3-pierre-louis.bossart@linux.intel.com Signed-off-by:
Mark Brown <broonie@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Pierre-Louis Bossart authored
[ Upstream commit 310a5caa ] The disable_irq_lock protects the 'disable_irq' value, we need to lock before testing it. Fixes: 02fb23d7 ("ASoC: rt5682-sdw: fix for JD event handling in ClockStop Mode0") Signed-off-by:
Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by:
Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by:
Chao Song <chao.song@linux.intel.com> Link: https://msgid.link/r/20240325221817.206465-2-pierre-louis.bossart@linux.intel.com Signed-off-by:
Mark Brown <broonie@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Paul Barker authored
[ Upstream commit 596a4254 ] The TX queue should be serviced each time the poll function is called, even if the full RX work budget has been consumed. This prevents starvation of the TX queue when RX bandwidth usage is high. Fixes: c156633f ("Renesas Ethernet AVB driver proper") Signed-off-by:
Paul Barker <paul.barker.ct@bp.renesas.com> Reviewed-by:
Sergey Shtylyov <s.shtylyov@omp.ru> Link: https://lore.kernel.org/r/20240402145305.82148-1-paul.barker.ct@bp.renesas.com Signed-off-by:
Paolo Abeni <pabeni@redhat.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Wei Fang authored
[ Upstream commit cbc17e78 ] Setting mac_managed_pm during interface up is too late. In situations where the link is not brought up yet and the system suspends the regular PHY power management will run. Since the FEC ETHEREN control bit is cleared (automatically) on suspend the controller is off in resume. When the regular PHY power management resume path runs in this context it will write to the MII_DATA register but nothing will be transmitted on the MDIO bus. This can be observed by the following log: fec 5b040000.ethernet eth0: MDIO read timeout Microchip LAN87xx T1 5b040000.ethernet-1:04: PM: dpm_run_callback(): mdio_bus_phy_resume+0x0/0xc8 returns -110 Microchip LAN87xx T1 5b040000.ethernet-1:04: PM: failed to resume: error -110 The data written will however remain in the MII_DATA register. When the link later is set to administrative up it will trigger a call to fec_restart() which will restore the MII_SPEED register. This triggers the quirk explained in f166f890 ("net: ethernet: fec: Replace interrupt driven MDIO with polled IO") causing an extra MII_EVENT. This extra event desynchronizes all the MDIO register reads, causing them to complete too early. Leading all reads to read as 0 because fec_enet_mdio_wait() returns too early. When a Microchip LAN8700R PHY is connected to the FEC, the 0 reads causes the PHY to be initialized incorrectly and the PHY will not transmit any ethernet signal in this state. It cannot be brought out of this state without a power cycle of the PHY. Fixes: 557d5dc8 ("net: fec: use mac-managed PHY PM") Closes: https://lore.kernel.org/netdev/1f45bdbe-eab1-4e59-8f24-add177590d27@actia.se/ Signed-off-by:
Wei Fang <wei.fang@nxp.com> [jernberg: commit message] Signed-off-by:
John Ernberg <john.ernberg@actia.se> Link: https://lore.kernel.org/r/20240328155909.59613-2-john.ernberg@actia.se Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Denis Kirjanov authored
[ Upstream commit eca485d2 ] Signed-off-by:
Dennis Kirjanov <dkirjanov@suse.de> Signed-off-by:
David S. Miller <davem@davemloft.net> Stable-dep-of: cbc17e78 ("net: fec: Set mac_managed_pm during probe") Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Oleksij Rempel authored
[ Upstream commit 4d17d43d ] In case external PHY is used, we need to take care of embedded PHY. Since there are no methods to disable this PHY from the MAC side and keeping RMII reference clock, we need to suspend it. This patch will reduce electrical noise (PHY is continuing to send FLPs) and power consumption by 0,22W. Signed-off-by:
Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by:
David S. Miller <davem@davemloft.net> Stable-dep-of: cbc17e78 ("net: fec: Set mac_managed_pm during probe") Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Ivan Vecera authored
[ Upstream commit ea558de7 ] As for ice bug fixed by commit b7306b42 ("ice: manage interrupts during poll exit") followed by commit 23be7075 ("ice: fix software generating extra interrupts") I'm seeing the similar issue also with i40e driver. In certain situation when busy-loop is enabled together with adaptive coalescing, the driver occasionally misses that there are outstanding descriptors to clean when exiting busy poll. Try to catch the remaining work by triggering a software interrupt when exiting busy poll. No extra interrupts will be generated when busy polling is not used. The issue was found when running sockperf ping-pong tcp test with adaptive coalescing and busy poll enabled (50 as value busy_pool and busy_read sysctl knobs) and results in huge latency spikes with more than 100000us. The fix is inspired from the ice driver and do the following: 1) During napi poll exit in case of busy-poll (napo_complete_done() returns false) this is recorded to q_vector that we were in busy loop. 2) Extends i40e_buildreg_itr() to be able to add an enforced software interrupt into built value 2) In i40e_update_enable_itr() enforces a software interrupt trigger if we are exiting busy poll to catch any pending clean-ups 3) Reuses unused 3rd ITR (interrupt throttle) index and set it to 20K interrupts per second to limit the number of these sw interrupts. Test results ============ Prior: [root@dell-per640-07 net]# sockperf ping-pong -i 10.9.9.1 --tcp -m 1000 --mps=max -t 120 sockperf: == version #3.10-no.git == sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s) [ 0] IP = 10.9.9.1 PORT = 11111 # TCP sockperf: Warmup stage (sending a few dummy messages)... sockperf: Starting test... sockperf: Test end (interrupted by timer) sockperf: Test ended sockperf: [Total Run] RunTime=119.999 sec; Warm up time=400 msec; SentMessages=2438563; ReceivedMessages=2438562 sockperf: ========= Printing statistics for Server No: 0 sockperf: [Valid Duration] RunTime=119.549 sec; SentMessages=2429473; ReceivedMessages=2429473 sockperf: ====> avg-latency=24.571 (std-dev=93.297, mean-ad=4.904, median-ad=1.510, siqr=1.063, cv=3.797, std-error=0.060, 99.0% ci=[24.417, 24.725]) sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0 sockperf: Summary: Latency is 24.571 usec sockperf: Total 2429473 observations; each percentile contains 24294.73 observations sockperf: ---> <MAX> observation = 103294.331 sockperf: ---> percentile 99.999 = 45.633 sockperf: ---> percentile 99.990 = 37.013 sockperf: ---> percentile 99.900 = 35.910 sockperf: ---> percentile 99.000 = 33.390 sockperf: ---> percentile 90.000 = 28.626 sockperf: ---> percentile 75.000 = 27.741 sockperf: ---> percentile 50.000 = 26.743 sockperf: ---> percentile 25.000 = 25.614 sockperf: ---> <MIN> observation = 12.220 After: [root@dell-per640-07 net]# sockperf ping-pong -i 10.9.9.1 --tcp -m 1000 --mps=max -t 120 sockperf: == version #3.10-no.git == sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s) [ 0] IP = 10.9.9.1 PORT = 11111 # TCP sockperf: Warmup stage (sending a few dummy messages)... sockperf: Starting test... sockperf: Test end (interrupted by timer) sockperf: Test ended sockperf: [Total Run] RunTime=119.999 sec; Warm up time=400 msec; SentMessages=2400055; ReceivedMessages=2400054 sockperf: ========= Printing statistics for Server No: 0 sockperf: [Valid Duration] RunTime=119.549 sec; SentMessages=2391186; ReceivedMessages=2391186 sockperf: ====> avg-latency=24.965 (std-dev=5.934, mean-ad=4.642, median-ad=1.485, siqr=1.067, cv=0.238, std-error=0.004, 99.0% ci=[24.955, 24.975]) sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0 sockperf: Summary: Latency is 24.965 usec sockperf: Total 2391186 observations; each percentile contains 23911.86 observations sockperf: ---> <MAX> observation = 195.841 sockperf: ---> percentile 99.999 = 45.026 sockperf: ---> percentile 99.990 = 39.009 sockperf: ---> percentile 99.900 = 35.922 sockperf: ---> percentile 99.000 = 33.482 sockperf: ---> percentile 90.000 = 28.902 sockperf: ---> percentile 75.000 = 27.821 sockperf: ---> percentile 50.000 = 26.860 sockperf: ---> percentile 25.000 = 25.685 sockperf: ---> <MIN> observation = 12.277 Fixes: 0bcd952f ("ethernet/intel: consolidate NAPI and NAPI exit") Reported-by:
Hugo Ferreira <hferreir@redhat.com> Reviewed-by:
Michal Schmidt <mschmidt@redhat.com> Signed-off-by:
Ivan Vecera <ivecera@redhat.com> Reviewed-by:
Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by:
Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-