- Apr 10, 2024
-
-
Pu Lehui authored
[ Upstream commit ea687311 ] RISC-V perf driver does not yet support branch sampling. Although the specification is in the works [0], it is best to disable such events until support is available, otherwise we will get unexpected results. Due to this reason, two riscv bpf testcases get_branch_snapshot and perf_branches/perf_branches_hw fail. Link: https://github.com/riscv/riscv-control-transfer-records [0] Fixes: f5bfa23f ("RISC-V: Add a perf core library for pmu drivers") Signed-off-by: Pu Lehui <pulehui@huawei.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20240312012053.1178140-1-pulehui@huaweicloud.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Richard Fitzgerald authored
[ Upstream commit f193957b ] wm_adsp_write_ctl() must hold the pwr_lock mutex when calling cs_dsp_get_ctl(). This was previously partially fixed by commit 781118bc ("ASoC: wm_adsp: Fix missing locking in wm_adsp_[read|write]_ctl()") but this only put locking around the call to cs_dsp_coeff_write_ctrl(), missing the call to cs_dsp_get_ctl(). Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com> Fixes: 781118bc ("ASoC: wm_adsp: Fix missing locking in wm_adsp_[read|write]_ctl()") Link: https://msgid.link/r/20240307110227.41421-1-rf@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Dominique Martinet authored
[ Upstream commit be3193e5 ] Previous conversion to iov missed these debug statements which would now always print the requested size instead of the actual server reply. Write also added a loop in a much older commit but we didn't report these, while reads do report each iteration -- it's more coherent to keep reporting all requests to server so move that at the same time. Fixes: 7f024647 ("9p: convert to advancing variant of iov_iter_get_pages_alloc()") Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> Message-ID: <20240109-9p-rw-trace-v1-1-327178114257@codewreck.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Jann Horn authored
[ Upstream commit 055ca835 ] When you try to splice between a normal pipe and a notification pipe, get_pipe_info(..., true) fails, so splice() falls back to treating the notification pipe like a normal pipe - so we end up in iter_file_splice_write(), which first locks the input pipe, then calls vfs_iter_write(), which locks the output pipe. Lockdep complains about that, because we're taking a pipe lock while already holding another pipe lock. I think this probably (?) can't actually lead to deadlocks, since you'd need another way to nest locking a normal pipe into locking a watch_queue pipe, but the lockdep annotations don't make that clear. Bail out earlier in pipe_write() for notification pipes, before taking the pipe lock. Reported-and-tested-by: <syzbot+011e4ea1da6692cf881c@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=011e4ea1da6692cf881c Fixes: c73be61c ("pipe: Add general notification queue support") Signed-off-by: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20231124150822.2121798-1-jannh@google.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Ashish Kalra authored
[ Upstream commit 0aa6b90e ] Some BIOSes allow the end user to set the minimum SEV ASID value (CPUID 0x8000001F_EDX) to be greater than the maximum number of encrypted guests, or maximum SEV ASID value (CPUID 0x8000001F_ECX) in order to dedicate all the SEV ASIDs to SEV-ES or SEV-SNP. The SEV support, as coded, does not handle the case where the minimum SEV ASID value can be greater than the maximum SEV ASID value. As a result, the following confusing message is issued: [ 30.715724] kvm_amd: SEV enabled (ASIDs 1007 - 1006) Fix the support to properly handle this case. Fixes: 916391a2 ("KVM: SVM: Add support for SEV-ES capability in KVM") Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Cc: stable@vger.kernel.org Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20240104190520.62510-1-Ashish.Kalra@amd.com Link: https://lore.kernel.org/r/20240131235609.4161407-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Sean Christopherson authored
[ Upstream commit 466eec4a ] Convert all local ASID variables and parameters throughout the SEV code from signed integers to unsigned integers. As ASIDs are fundamentally unsigned values, and the global min/max variables are appropriately unsigned integers, too. Functionally, this is a glorified nop as KVM guarantees min_sev_asid is non-zero, and no CPU supports -1u as the _only_ asid, i.e. the signed vs. unsigned goof won't cause problems in practice. Opportunistically use sev_get_asid() in sev_flush_encrypted_page() instead of open coding an equivalent. Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20240131235609.4161407-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Stable-dep-of: 0aa6b90e ("KVM: SVM: Add support for allowing zero SEV ASIDs") Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Paul Barker authored
[ Upstream commit 101b7641 ] The error statistics should be updated each time the poll function is called, even if the full RX work budget has been consumed. This prevents the counts from becoming stuck when RX bandwidth usage is high. This also ensures that error counters are not updated after we've re-enabled interrupts as that could result in a race condition. Also drop an unnecessary space. Fixes: c156633f ("Renesas Ethernet AVB driver proper") Signed-off-by: Paul Barker <paul.barker.ct@bp.renesas.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Link: https://lore.kernel.org/r/20240402145305.82148-2-paul.barker.ct@bp.renesas.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Paul Barker authored
[ Upstream commit 596a4254 ] The TX queue should be serviced each time the poll function is called, even if the full RX work budget has been consumed. This prevents starvation of the TX queue when RX bandwidth usage is high. Fixes: c156633f ("Renesas Ethernet AVB driver proper") Signed-off-by: Paul Barker <paul.barker.ct@bp.renesas.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Link: https://lore.kernel.org/r/20240402145305.82148-1-paul.barker.ct@bp.renesas.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Claudiu Beznea authored
[ Upstream commit 2b993bfd ] ravb_poll() initial code used to interrogate the first descriptor of the RX queue in case gPTP is false to determine if ravb_rx() should be called. This is done for non-gPTP IPs. For gPTP IPs the driver PTP-specific information was used to determine if receive function should be called. As every IP has its own receive function that interrogates the RX descriptors list in the same way the ravb_poll() was doing there is no need to double check this in ravb_poll(). Removing the code from ravb_poll() leads to a cleaner code. Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Stable-dep-of: 596a4254 ("net: ravb: Always process TX descriptor ring") Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Vitaly Lifshits authored
[ Upstream commit 861e8086 ] Forcing SMBUS inside the ULP enabling flow leads to sporadic PHY loss on some systems. It is suspected to be caused by initiating PHY transactions before the interface settles. Separating this configuration from the ULP enabling flow and moving it to the shutdown function allows enough time for the interface to settle and avoids adding a delay. Fixes: 6607c99e ("e1000e: i219 - fix to enable both ULP and EEE in Sx state") Co-developed-by: Dima Ruinskiy <dima.ruinskiy@intel.com> Signed-off-by: Dima Ruinskiy <dima.ruinskiy@intel.com> Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Vitaly Lifshits authored
[ Upstream commit 662200e3 ] Add curly braces to avoid entering to an if statement where it is not always required in e1000_shutdown function. This improves code readability and might prevent non-deterministic behaviour in the future. Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20240301184806.2634508-5-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: 861e8086 ("e1000e: move force SMBUS from enable ulp function to avoid PHY loss issue") Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Vitaly Lifshits authored
[ Upstream commit 6dbdd4de ] On some Meteor Lake systems accessing the PHY via the MDIO interface may result in an MDI error. This issue happens sporadically and in most cases a second access to the PHY via the MDIO interface results in success. As a workaround, introduce a retry counter which is set to 3 on Meteor Lake systems. The driver will only return an error if 3 consecutive PHY access attempts fail. The retry mechanism is disabled in specific flows, where MDI errors are expected. Fixes: cc23f4f0 ("e1000e: Add support for Meteor Lake") Suggested-by: Nikolay Mushayev <nikolay.mushayev@intel.com> Co-developed-by: Nir Efrati <nir.efrati@intel.com> Signed-off-by: Nir Efrati <nir.efrati@intel.com> Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Jesse Brandeburg authored
[ Upstream commit b9a45254 ] Refactor several older Intel drivers to use FIELD_GET(), which reduces lines of code and adds clarity of intent. This code was generated by the following coccinelle/spatch script and then manually repaired. @get@ constant shift,mask; type T; expression a; @@ ( -((T)((a) & mask) >> shift) +FIELD_GET(mask, a) and applied via: spatch --sp-file field_prep.cocci --in-place --dir \ drivers/net/ethernet/intel/ Cc: Julia Lawall <Julia.Lawall@inria.fr> CC: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Stable-dep-of: 6dbdd4de ("e1000e: Workaround for sporadic MDI error on Meteor Lake systems") Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Jesse Brandeburg authored
[ Upstream commit 3314f209 ] This series is introducing the use of FIELD_GET and FIELD_PREP which requires bitfield.h to be included. Fix all the includes in this one change, and rearrange includes into alphabetical order to ease readability and future maintenance. virtchnl.h and it's usage was modified to have it's own includes as it should. This required including bits.h for virtchnl.h. Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Stable-dep-of: 6dbdd4de ("e1000e: Workaround for sporadic MDI error on Meteor Lake systems") Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Ivan Vecera authored
[ Upstream commit 56df3459 ] Similarly as for ice driver [1] there are also circular header dependencies in i40e driver: i40e.h -> i40e_virtchnl_pf.h -> i40e.h Another issue is that i40e header files does not contain their own dependencies on other header files (both private and standard) so their inclusion in .c file require to add these deps in certain order to that .c file to make it compilable. Fix both issues by removal the mentioned circular dependency, by filling i40e headers with their dependencies so they can be placed anywhere in a source code. Additionally remove bunch of includes from i40e.h super header file that are not necessary and include i40e.h only in .c files that really require it. [1] 649c87c6 ("ice: remove circular header dependencies on ice.h") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Stable-dep-of: 6dbdd4de ("e1000e: Workaround for sporadic MDI error on Meteor Lake systems") Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Ivan Vecera authored
[ Upstream commit 5dfd37c3 ] Header i40e_osdep.h contains only IO primitives and couple of debug printing macros. Split this header file to i40e_io.h and i40e_debug.h and move i40e_debug_mask enum to i40e_debug.h Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Stable-dep-of: 6dbdd4de ("e1000e: Workaround for sporadic MDI error on Meteor Lake systems") Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Ivan Vecera authored
[ Upstream commit ef5d5407 ] Structures i40e_dma_mem & i40e_virt_mem are defined i40e_osdep.h while memory allocation functions that use them are declared in i40e_alloc.h Move them there. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Stable-dep-of: 6dbdd4de ("e1000e: Workaround for sporadic MDI error on Meteor Lake systems") Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Ivan Vecera authored
[ Upstream commit d3276f92 ] Enum i40e_memory_type enum is unused in i40e_allocate_dma_mem() thus can be safely removed. Useless macros in i40e_alloc.h can be removed as well. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Stable-dep-of: 6dbdd4de ("e1000e: Workaround for sporadic MDI error on Meteor Lake systems") Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Ivan Vecera authored
[ Upstream commit 7151d87a ] The <linux/avf/virtchnl.h> uses BIT, struct_size and ETH_ALEN macros but does not include appropriate header files that defines them. Add these dependencies so this header file can be included anywhere. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Stable-dep-of: 6dbdd4de ("e1000e: Workaround for sporadic MDI error on Meteor Lake systems") Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Ivan Vecera authored
[ Upstream commit 8196b5fd ] The macros I40E_MDIO_CLAUSE22* and I40E_MDIO_CLAUSE45* are using I40E_MASK together with the same values I40E_GLGEN_MSCA_STCODE_SHIFT and I40E_GLGEN_MSCA_OPCODE_SHIFT to define masks. Introduce I40E_GLGEN_MSCA_OPCODE_MASK and I40E_GLGEN_MSCA_STCODE_MASK for both shifts in i40e_register.h and use them to refactor the macros mentioned above. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Stable-dep-of: 6dbdd4de ("e1000e: Workaround for sporadic MDI error on Meteor Lake systems") Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Ivan Vecera authored
[ Upstream commit 39ec612a ] The .back field placed in i40e_hw is used to get pointer to i40e_pf instance but it is not necessary as the i40e_hw is a part of i40e_pf and containerof macro can be used to obtain the pointer to i40e_pf. Remove .back field from i40e_hw structure, introduce i40e_hw_to_pf() and i40e_hw_to_dev() helpers and use them. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Stable-dep-of: 6dbdd4de ("e1000e: Workaround for sporadic MDI error on Meteor Lake systems") Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Ivan Vecera authored
[ Upstream commit ea558de7 ] As for ice bug fixed by commit b7306b42 ("ice: manage interrupts during poll exit") followed by commit 23be7075 ("ice: fix software generating extra interrupts") I'm seeing the similar issue also with i40e driver. In certain situation when busy-loop is enabled together with adaptive coalescing, the driver occasionally misses that there are outstanding descriptors to clean when exiting busy poll. Try to catch the remaining work by triggering a software interrupt when exiting busy poll. No extra interrupts will be generated when busy polling is not used. The issue was found when running sockperf ping-pong tcp test with adaptive coalescing and busy poll enabled (50 as value busy_pool and busy_read sysctl knobs) and results in huge latency spikes with more than 100000us. The fix is inspired from the ice driver and do the following: 1) During napi poll exit in case of busy-poll (napo_complete_done() returns false) this is recorded to q_vector that we were in busy loop. 2) Extends i40e_buildreg_itr() to be able to add an enforced software interrupt into built value 2) In i40e_update_enable_itr() enforces a software interrupt trigger if we are exiting busy poll to catch any pending clean-ups 3) Reuses unused 3rd ITR (interrupt throttle) index and set it to 20K interrupts per second to limit the number of these sw interrupts. Test results ============ Prior: [root@dell-per640-07 net]# sockperf ping-pong -i 10.9.9.1 --tcp -m 1000 --mps=max -t 120 sockperf: == version #3.10-no.git == sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s) [ 0] IP = 10.9.9.1 PORT = 11111 # TCP sockperf: Warmup stage (sending a few dummy messages)... sockperf: Starting test... sockperf: Test end (interrupted by timer) sockperf: Test ended sockperf: [Total Run] RunTime=119.999 sec; Warm up time=400 msec; SentMessages=2438563; ReceivedMessages=2438562 sockperf: ========= Printing statistics for Server No: 0 sockperf: [Valid Duration] RunTime=119.549 sec; SentMessages=2429473; ReceivedMessages=2429473 sockperf: ====> avg-latency=24.571 (std-dev=93.297, mean-ad=4.904, median-ad=1.510, siqr=1.063, cv=3.797, std-error=0.060, 99.0% ci=[24.417, 24.725]) sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0 sockperf: Summary: Latency is 24.571 usec sockperf: Total 2429473 observations; each percentile contains 24294.73 observations sockperf: ---> <MAX> observation = 103294.331 sockperf: ---> percentile 99.999 = 45.633 sockperf: ---> percentile 99.990 = 37.013 sockperf: ---> percentile 99.900 = 35.910 sockperf: ---> percentile 99.000 = 33.390 sockperf: ---> percentile 90.000 = 28.626 sockperf: ---> percentile 75.000 = 27.741 sockperf: ---> percentile 50.000 = 26.743 sockperf: ---> percentile 25.000 = 25.614 sockperf: ---> <MIN> observation = 12.220 After: [root@dell-per640-07 net]# sockperf ping-pong -i 10.9.9.1 --tcp -m 1000 --mps=max -t 120 sockperf: == version #3.10-no.git == sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s) [ 0] IP = 10.9.9.1 PORT = 11111 # TCP sockperf: Warmup stage (sending a few dummy messages)... sockperf: Starting test... sockperf: Test end (interrupted by timer) sockperf: Test ended sockperf: [Total Run] RunTime=119.999 sec; Warm up time=400 msec; SentMessages=2400055; ReceivedMessages=2400054 sockperf: ========= Printing statistics for Server No: 0 sockperf: [Valid Duration] RunTime=119.549 sec; SentMessages=2391186; ReceivedMessages=2391186 sockperf: ====> avg-latency=24.965 (std-dev=5.934, mean-ad=4.642, median-ad=1.485, siqr=1.067, cv=0.238, std-error=0.004, 99.0% ci=[24.955, 24.975]) sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0 sockperf: Summary: Latency is 24.965 usec sockperf: Total 2391186 observations; each percentile contains 23911.86 observations sockperf: ---> <MAX> observation = 195.841 sockperf: ---> percentile 99.999 = 45.026 sockperf: ---> percentile 99.990 = 39.009 sockperf: ---> percentile 99.900 = 35.922 sockperf: ---> percentile 99.000 = 33.482 sockperf: ---> percentile 90.000 = 28.902 sockperf: ---> percentile 75.000 = 27.821 sockperf: ---> percentile 50.000 = 26.860 sockperf: ---> percentile 25.000 = 25.685 sockperf: ---> <MIN> observation = 12.277 Fixes: 0bcd952f ("ethernet/intel: consolidate NAPI and NAPI exit") Reported-by: Hugo Ferreira <hferreir@redhat.com> Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Ivan Vecera authored
[ Upstream commit addca917 ] Enum type names should not be suffixed by '_t'. Either to use 'typedef enum name name_t' to so plain 'name_t var' instead of 'enum name_t var'. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20231113231047.548659-6-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: ea558de7 ("i40e: Enforce software interrupt during busy-poll exit") Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Mario Limonciello authored
[ Upstream commit ca299b45 ] If the system hasn't entered GFXOFF when suspend starts it can cause hangs accessing GC and RLC during the suspend stage. Cc: <stable@vger.kernel.org> # 6.1.y: 5095d541 ("drm/amd: Evict resources during PM ops prepare() callback") Cc: <stable@vger.kernel.org> # 6.1.y: cb11ca32 ("drm/amd: Add concept of running prepare_suspend() sequence for IP blocks") Cc: <stable@vger.kernel.org> # 6.1.y: 2ceec37b ("drm/amd: Add missing kernel doc for prepare_suspend()") Cc: <stable@vger.kernel.org> # 6.1.y: 3a9626c8 ("drm/amd: Stop evicting resources on APUs in suspend") Cc: <stable@vger.kernel.org> # 6.6.y: 5095d541 ("drm/amd: Evict resources during PM ops prepare() callback") Cc: <stable@vger.kernel.org> # 6.6.y: cb11ca32 ("drm/amd: Add concept of running prepare_suspend() sequence for IP blocks") Cc: <stable@vger.kernel.org> # 6.6.y: 2ceec37b ("drm/amd: Add missing kernel doc for prepare_suspend()") Cc: <stable@vger.kernel.org> # 6.6.y: 3a9626c8 ("drm/amd: Stop evicting resources on APUs in suspend") Cc: <stable@vger.kernel.org> # 6.1+ Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3132 Fixes: ab475033 ("drm/amdgpu/sdma5.2: add begin/end_use ring callbacks") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Mario Limonciello authored
[ Upstream commit cb11ca32 ] If any IP blocks allocate memory during their hw_fini() sequence this can cause the suspend to fail under memory pressure. Introduce a new phase that IP blocks can use to allocate memory before suspend starts so that it can potentially be evicted into swap instead. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: ca299b45 ("drm/amd: Flush GFXOFF requests in prepare stage") Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Mario Limonciello authored
[ Upstream commit 5095d541 ] Linux PM core has a prepare() callback run before suspend. If the system is under high memory pressure, the resources may need to be evicted into swap instead. If the storage backing for swap is offlined during the suspend() step then such a call may fail. So move this step into prepare() to move evict majority of resources and update all non-pmops callers to call the same callback. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2362 Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: ca299b45 ("drm/amd: Flush GFXOFF requests in prepare stage") Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Chris Park authored
[ Upstream commit 72d72e8f ] [Why] Disabling stream encoder invokes a function that no longer exists. [How] Check if the function declaration is NULL in disable stream encoder. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Charlene Liu <charlene.liu@amd.com> Acked-by: Wayne Lin <wayne.lin@amd.com> Signed-off-by: Chris Park <chris.park@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Dmytro Laktyushkin authored
[ Upstream commit e8d13128 ] [Why] Secondary DP2 display fails to light up in some instances [How] Clock needs to be on when DPSTREAMCLK*_EN =1. This change moves dtbclk_p enable/disable point to make sure this is the case Reviewed-by: Charlene Liu <charlene.liu@amd.com> Reviewed-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com> Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Daniel Miess <daniel.miess@amd.com> Signed-off-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 72d72e8f ("drm/amd/display: Prevent crash when disable stream") Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Krishna Kurapati authored
[ Upstream commit f5e9bda0 ] When orientation switch is enabled in ucsi glink, there is a xhci probe failure seen when booting up in host mode in reverse orientation. During bootup the following things happen in multiple drivers: a) DWC3 controller driver initializes the core in device mode when the dr_mode is set to DRD. It relies on role_switch call to change role to host. b) QMP driver initializes the lanes to TYPEC_ORIENTATION_NORMAL as a normal routine. It relies on the typec_switch_set call to get notified of orientation changes. c) UCSI core reads the UCSI_GET_CONNECTOR_STATUS via the glink and provides initial role switch to dwc3 controller. When booting up in host mode with orientation TYPEC_ORIENTATION_REVERSE, then we see the following things happening in order: a) UCSI gives initial role as host to dwc3 controller ucsi_register_port. Upon receiving this notification, the dwc3 core needs to program GCTL from PRTCAP_DEVICE to PRTCAP_HOST and as part of this change, it asserts GCTL Core soft reset and waits for it to be completed before shifting it to host. Only after the reset is done will the dwc3_host_init be invoked and xhci is probed. DWC3 controller expects that the usb phy's are stable during this process i.e., the phy init is already done. b) During the 100ms wait for GCTL core soft reset, the actual notification from PPM is received by ucsi_glink via pmic glink for changing role to host. The pmic_glink_ucsi_notify routine first sends the orientation change to QMP and then sends role to dwc3 via ucsi framework. This is happening exactly at the time GCTL core soft reset is being processed. c) When QMP driver receives typec switch to TYPEC_ORIENTATION_REVERSE, it then re-programs the phy at the instant GCTL core soft reset has been asserted by dwc3 controller due to which the QMP PLL lock fails in qmp_combo_usb_power_on. d) After the 100ms of GCTL core soft reset is completed, the dwc3 core goes for initializing the host mode and invokes xhci probe. But at this point the QMP is non-responsive and as a result, the xhci plat probe fails during xhci_reset. Fix this by passing orientation switch to available ucsi instances if their gpio configuration is available before ucsi_register is invoked so that by the time, the pmic_glink_ucsi_notify provides typec_switch to QMP, the lane is already configured and the call would be a NOP thus not racing with role switch. Cc: stable@vger.kernel.org Fixes: c6165ed2 ("usb: ucsi: glink: use the connector orientation GPIO to provide switch events") Suggested-by: Wesley Cheng <quic_wcheng@quicinc.com> Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com> Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://lore.kernel.org/r/20240301040914.458492-1-quic_kriskura@quicinc.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Aleksandr Loktionov authored
commit f37c4eac upstream. To fix the regression introduced by commit 52424f97, which causes servers hang in very hard to reproduce conditions with resets races. Using two sources for the information is the root cause. In this function before the fix bumping v didn't mean bumping vf pointer. But the code used this variables interchangeably, so stale vf could point to different/not intended vf. Remove redundant "v" variable and iterate via single VF pointer across whole function instead to guarantee VF pointer validity. Fixes: 52424f97 ("i40e: Fix VF hang when reset is triggered on another VF") Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Aleksandr Loktionov authored
commit eb58c598 upstream. The bug usually affects untrusted VFs, because they are limited to 18 MACs, it affects them badly, not letting to create MAC all filters. Not stable to reproduce, it happens when VF user creates MAC filters when other MACVLAN operations are happened in parallel. But consequence is that VF can't receive desired traffic. Fix counter to be bumped only for new or active filters. Fixes: 621650ca ("i40e: Refactoring VF MAC filters counting to make more reliable") Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Aleksandr Mishin authored
commit ef15ddee upstream. In rvu_map_cgx_lmac_pf() the 'iter', which is used as an array index, can reach value (up to 14) that exceed the size (MAX_LMAC_COUNT = 8) of the array. Fix this bug by adding 'iter' value check. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 91c6945e ("octeontx2-af: cn10k: Add RPM MAC support") Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Su Hui authored
commit e709acbd upstream. otx2_rxtx_enable() return negative error code such as -EIO, check -EIO rather than EIO to fix this problem. Fixes: c9262522 ("octeontx2-pf: Disable packet I/O for graceful exit") Signed-off-by: Su Hui <suhui@nfschina.com> Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://lore.kernel.org/r/20240328020620.4054692-1-suhui@nfschina.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Hariprasad Kelam authored
commit 0ba80d96 upstream. The current implementation for loading coalesced KPU profiles has a limitation. The "offset" field, which is used to locate profiles within the profile is restricted to a u16. This restricts the number of profiles that can be loaded. This patch addresses this limitation by increasing the size of the "offset" field. Fixes: 11c730bf ("octeontx2-af: support for coalescing KPU profiles") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Antoine Tenart authored
commit 64235eab upstream. GRO has a fundamental issue with UDP tunnel packets as it can't detect those in a foolproof way and GRO could happen before they reach the tunnel endpoint. Previous commits have fixed issues when UDP tunnel packets come from a remote host, but if those packets are issued locally they could run into checksum issues. If the inner packet has a partial checksum the information will be lost in the GRO logic, either in udp4/6_gro_complete or in udp_gro_complete_segment and packets will have an invalid checksum when leaving the host. Prevent local UDP tunnel packets from ever being GROed at the outer UDP level. Due to skb->encapsulation being wrongly used in some drivers this is actually only preventing UDP tunnel packets with a partial checksum to be GROed (see iptunnel_handle_offloads) but those were also the packets triggering issues so in practice this should be sufficient. Fixes: 9fd1ff5d ("udp: Support UDP fraglist GRO/GSO.") Fixes: 36707061 ("udp: allow forwarding of plain (non-fraglisted) UDP GRO packets") Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Antoine Tenart authored
commit f0b8c303 upstream. UDP GRO validates checksums and in udp4/6_gro_complete fraglist packets are converted to CHECKSUM_UNNECESSARY to avoid later checks. However this is an issue for CHECKSUM_PARTIAL packets as they can be looped in an egress path and then their partial checksums are not fixed. Different issues can be observed, from invalid checksum on packets to traces like: gen01: hw csum failure skb len=3008 headroom=160 headlen=1376 tailroom=0 mac=(106,14) net=(120,40) trans=160 shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) csum(0xffff232e ip_summed=2 complete_sw=0 valid=0 level=0) hash(0x77e3d716 sw=1 l4=1) proto=0x86dd pkttype=0 iif=12 ... Fix this by only converting CHECKSUM_NONE packets to CHECKSUM_UNNECESSARY by reusing __skb_incr_checksum_unnecessary. All other checksum types are kept as-is, including CHECKSUM_COMPLETE as fraglist packets being segmented back would have their skb->csum valid. Fixes: 9fd1ff5d ("udp: Support UDP fraglist GRO/GSO.") Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Antoine Tenart authored
commit 3d010c80 upstream. When rx-udp-gro-forwarding is enabled UDP packets might be GROed when being forwarded. If such packets might land in a tunnel this can cause various issues and udp_gro_receive makes sure this isn't the case by looking for a matching socket. This is performed in udp4/6_gro_lookup_skb but only in the current netns. This is an issue with tunneled packets when the endpoint is in another netns. In such cases the packets will be GROed at the UDP level, which leads to various issues later on. The same thing can happen with rx-gro-list. We saw this with geneve packets being GROed at the UDP level. In such case gso_size is set; later the packet goes through the geneve rx path, the geneve header is pulled, the offset are adjusted and frag_list skbs are not adjusted with regard to geneve. When those skbs hit skb_fragment, it will misbehave. Different outcomes are possible depending on what the GROed skbs look like; from corrupted packets to kernel crashes. One example is a BUG_ON[1] triggered in skb_segment while processing the frag_list. Because gso_size is wrong (geneve header was pulled) skb_segment thinks there is "geneve header size" of data in frag_list, although it's in fact the next packet. The BUG_ON itself has nothing to do with the issue. This is only one of the potential issues. Looking up for a matching socket in udp_gro_receive is fragile: the lookup could be extended to all netns (not speaking about performances) but nothing prevents those packets from being modified in between and we could still not find a matching socket. It's OK to keep the current logic there as it should cover most cases but we also need to make sure we handle tunnel packets being GROed too early. This is done by extending the checks in udp_unexpected_gso: GSO packets lacking the SKB_GSO_UDP_TUNNEL/_CSUM bits and landing in a tunnel must be segmented. [1] kernel BUG at net/core/skbuff.c:4408! RIP: 0010:skb_segment+0xd2a/0xf70 __udp_gso_segment+0xaa/0x560 Fixes: 9fd1ff5d ("udp: Support UDP fraglist GRO/GSO.") Fixes: 36707061 ("udp: allow forwarding of plain (non-fraglisted) UDP GRO packets") Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Atlas Yu authored
commit 5e864d90 upstream. On devices that support DASH, the current code in the "rtl_loop_wait" function raises false alarms when DASH is disabled. This occurs because the function attempts to wait for the DASH firmware to be ready, even though it's not relevant in this case. r8169 0000:0c:00.0 eth0: RTL8168ep/8111ep, 38:7c:76:49:08:d9, XID 502, IRQ 86 r8169 0000:0c:00.0 eth0: jumbo features [frames: 9194 bytes, tx checksumming: ko] r8169 0000:0c:00.0 eth0: DASH disabled ... r8169 0000:0c:00.0 eth0: rtl_ep_ocp_read_cond == 0 (loop: 30, delay: 10000). This patch modifies the driver start/stop functions to skip checking the DASH firmware status when DASH is explicitly disabled. This prevents unnecessary delays and false alarms. The patch has been tested on several ThinkStation P8/PX workstations. Fixes: 0ab0c45d ("r8169: add handling DASH when DASH is disabled") Signed-off-by: Atlas Yu <atlas.yu@canonical.com> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/20240328055152.18443-1-atlas.yu@canonical.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David Thompson authored
commit 09ba28e1 upstream. The mlxbf_gige driver intermittantly encounters a NULL pointer exception while the system is shutting down via "reboot" command. The mlxbf_driver will experience an exception right after executing its shutdown() method. One example of this exception is: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000070 Mem abort info: ESR = 0x0000000096000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=000000011d373000 [0000000000000070] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 96000004 [#1] SMP CPU: 0 PID: 13 Comm: ksoftirqd/0 Tainted: G S OE 5.15.0-bf.6.gef6992a #1 Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS 4.0.2.12669 Apr 21 2023 pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : mlxbf_gige_handle_tx_complete+0xc8/0x170 [mlxbf_gige] lr : mlxbf_gige_poll+0x54/0x160 [mlxbf_gige] sp : ffff8000080d3c10 x29: ffff8000080d3c10 x28: ffffcce72cbb7000 x27: ffff8000080d3d58 x26: ffff0000814e7340 x25: ffff331cd1a05000 x24: ffffcce72c4ea008 x23: ffff0000814e4b40 x22: ffff0000814e4d10 x21: ffff0000814e4128 x20: 0000000000000000 x19: ffff0000814e4a80 x18: ffffffffffffffff x17: 000000000000001c x16: ffffcce72b4553f4 x15: ffff80008805b8a7 x14: 0000000000000000 x13: 0000000000000030 x12: 0101010101010101 x11: 7f7f7f7f7f7f7f7f x10: c2ac898b17576267 x9 : ffffcce720fa5404 x8 : ffff000080812138 x7 : 0000000000002e9a x6 : 0000000000000080 x5 : ffff00008de3b000 x4 : 0000000000000000 x3 : 0000000000000001 x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 Call trace: mlxbf_gige_handle_tx_complete+0xc8/0x170 [mlxbf_gige] mlxbf_gige_poll+0x54/0x160 [mlxbf_gige] __napi_poll+0x40/0x1c8 net_rx_action+0x314/0x3a0 __do_softirq+0x128/0x334 run_ksoftirqd+0x54/0x6c smpboot_thread_fn+0x14c/0x190 kthread+0x10c/0x110 ret_from_fork+0x10/0x20 Code: 8b070000 f9000ea0 f95056c0 f86178a1 (b9407002) ---[ end trace 7cc3941aa0d8e6a4 ]--- Kernel panic - not syncing: Oops: Fatal exception in interrupt Kernel Offset: 0x4ce722520000 from 0xffff800008000000 PHYS_OFFSET: 0x80000000 CPU features: 0x000005c1,a3330e5a Memory Limit: none ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]--- During system shutdown, the mlxbf_gige driver's shutdown() is always executed. However, the driver's stop() method will only execute if networking interface configuration logic within the Linux distribution has been setup to do so. If shutdown() executes but stop() does not execute, NAPI remains enabled and this can lead to an exception if NAPI is scheduled while the hardware interface has only been partially deinitialized. The networking interface managed by the mlxbf_gige driver must be properly stopped during system shutdown so that IFF_UP is cleared, the hardware interface is put into a clean state, and NAPI is fully deinitialized. Fixes: f92e1869 ("Add Mellanox BlueField Gigabit Ethernet driver") Signed-off-by: David Thompson <davthompson@nvidia.com> Link: https://lore.kernel.org/r/20240325210929.25362-1-davthompson@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Kuniyuki Iwashima authored
commit d21d4060 upstream. syzkaller reported infinite recursive calls of fib6_dump_done() during netlink socket destruction. [1] From the log, syzkaller sent an AF_UNSPEC RTM_GETROUTE message, and then the response was generated. The following recvmmsg() resumed the dump for IPv6, but the first call of inet6_dump_fib() failed at kzalloc() due to the fault injection. [0] 12:01:34 executing program 3: r0 = socket$nl_route(0x10, 0x3, 0x0) sendmsg$nl_route(r0, ... snip ...) recvmmsg(r0, ... snip ...) (fail_nth: 8) Here, fib6_dump_done() was set to nlk_sk(sk)->cb.done, and the next call of inet6_dump_fib() set it to nlk_sk(sk)->cb.args[3]. syzkaller stopped receiving the response halfway through, and finally netlink_sock_destruct() called nlk_sk(sk)->cb.done(). fib6_dump_done() calls fib6_dump_end() and nlk_sk(sk)->cb.done() if it is still not NULL. fib6_dump_end() rewrites nlk_sk(sk)->cb.done() by nlk_sk(sk)->cb.args[3], but it has the same function, not NULL, calling itself recursively and hitting the stack guard page. To avoid the issue, let's set the destructor after kzalloc(). [0]: FAULT_INJECTION: forcing a failure. name failslab, interval 1, probability 0, space 0, times 0 CPU: 1 PID: 432110 Comm: syz-executor.3 Not tainted 6.8.0-12821-g537c2e91d354-dirty #11 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:117) should_fail_ex (lib/fault-inject.c:52 lib/fault-inject.c:153) should_failslab (mm/slub.c:3733) kmalloc_trace (mm/slub.c:3748 mm/slub.c:3827 mm/slub.c:3992) inet6_dump_fib (./include/linux/slab.h:628 ./include/linux/slab.h:749 net/ipv6/ip6_fib.c:662) rtnl_dump_all (net/core/rtnetlink.c:4029) netlink_dump (net/netlink/af_netlink.c:2269) netlink_recvmsg (net/netlink/af_netlink.c:1988) ____sys_recvmsg (net/socket.c:1046 net/socket.c:2801) ___sys_recvmsg (net/socket.c:2846) do_recvmmsg (net/socket.c:2943) __x64_sys_recvmmsg (net/socket.c:3041 net/socket.c:3034 net/socket.c:3034) [1]: BUG: TASK stack guard page was hit at 00000000f2fa9af1 (stack is 00000000b7912430..000000009a436beb) stack guard page: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 223719 Comm: kworker/1:3 Not tainted 6.8.0-12821-g537c2e91d354-dirty #11 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: events netlink_sock_destruct_work RIP: 0010:fib6_dump_done (net/ipv6/ip6_fib.c:570) Code: 3c 24 e8 f3 e9 51 fd e9 28 fd ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 41 57 41 56 41 55 41 54 55 48 89 fd <53> 48 8d 5d 60 e8 b6 4d 07 fd 48 89 da 48 b8 00 00 00 00 00 fc ff RSP: 0018:ffffc9000d980000 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffffffff84405990 RCX: ffffffff844059d3 RDX: ffff8881028e0000 RSI: ffffffff84405ac2 RDI: ffff88810c02f358 RBP: ffff88810c02f358 R08: 0000000000000007 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000224 R12: 0000000000000000 R13: ffff888007c82c78 R14: ffff888007c82c68 R15: ffff888007c82c68 FS: 0000000000000000(0000) GS:ffff88811b100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffc9000d97fff8 CR3: 0000000102309002 CR4: 0000000000770ef0 PKRU: 55555554 Call Trace: <#DF> </#DF> <TASK> fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1)) fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1)) ... fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1)) fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1)) netlink_sock_destruct (net/netlink/af_netlink.c:401) __sk_destruct (net/core/sock.c:2177 (discriminator 2)) sk_destruct (net/core/sock.c:2224) __sk_free (net/core/sock.c:2235) sk_free (net/core/sock.c:2246) process_one_work (kernel/workqueue.c:3259) worker_thread (kernel/workqueue.c:3329 kernel/workqueue.c:3416) kthread (kernel/kthread.c:388) ret_from_fork (arch/x86/kernel/process.c:153) ret_from_fork_asm (arch/x86/entry/entry_64.S:256) Modules linked in: Fixes: 1da177e4 ("Linux-2.6.12-rc2") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240401211003.25274-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-