Skip to content
  1. Jun 22, 2023
  2. Jun 21, 2023
  3. Jun 20, 2023
  4. Jun 19, 2023
  5. Jun 18, 2023
  6. Jun 17, 2023
    • Íñigo Huguet's avatar
      sfc: use budget for TX completions · 4aaf2c52
      Íñigo Huguet authored
      
      
      When running workloads heavy unbalanced towards TX (high TX, low RX
      traffic), sfc driver can retain the CPU during too long times. Although
      in many cases this is not enough to be visible, it can affect
      performance and system responsiveness.
      
      A way to reproduce it is to use a debug kernel and run some parallel
      netperf TX tests. In some systems, this will lead to this message being
      logged:
        kernel:watchdog: BUG: soft lockup - CPU#12 stuck for 22s!
      
      The reason is that sfc driver doesn't account any NAPI budget for the TX
      completion events work. With high-TX/low-RX traffic, this makes that the
      CPU is held for long time for NAPI poll.
      
      Documentations says "drivers can process completions for any number of Tx
      packets but should only process up to budget number of Rx packets".
      However, many drivers do limit the amount of TX completions that they
      process in a single NAPI poll.
      
      In the same way, this patch adds a limit for the TX work in sfc. With
      the patch applied, the watchdog warning never appears.
      
      Tested with netperf in different combinations: single process / parallel
      processes, TCP / UDP and different sizes of UDP messages. Repeated the
      tests before and after the patch, without any noticeable difference in
      network or CPU performance.
      
      Test hardware:
      Intel(R) Xeon(R) CPU E5-1620 v4 @ 3.50GHz (4 cores, 2 threads/core)
      Solarflare Communications XtremeScale X2522-25G Network Adapter
      
      Fixes: 5227eccc ("sfc: remove tx and MCDI handling from NAPI budget consideration")
      Fixes: d19a5372 ("sfc_ef100: TX path for EF100 NICs")
      Reported-by: default avatarFei Liu <feliu@redhat.com>
      Signed-off-by: default avatarÍñigo Huguet <ihuguet@redhat.com>
      Acked-by: default avatarMartin Habets <habetsm.xilinx@gmail.com>
      Link: https://lore.kernel.org/r/20230615084929.10506-1-ihuguet@redhat.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4aaf2c52
    • Azeem Shaikh's avatar
      ieee802154: Replace strlcpy with strscpy · cd912503
      Azeem Shaikh authored
      strlcpy() reads the entire source buffer first.
      This read may exceed the destination size limit.
      This is both inefficient and can lead to linear read
      overflows if a source string is not NUL-terminated [1].
      In an effort to remove strlcpy() completely [2], replace
      strlcpy() here with strscpy().
      
      Direct replacement is safe here since the return values
      from the helper macros are ignored by the callers.
      
      [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy
      [2] https://github.com/KSPP/linux/issues/89
      
      
      
      Signed-off-by: default avatarAzeem Shaikh <azeemshaikh38@gmail.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230613003326.3538391-1-azeemshaikh38@gmail.com
      
      
      Signed-off-by: default avatarStefan Schmidt <stefan@datenfreihafen.org>
      cd912503
    • Leon Romanovsky's avatar
      net/mlx5e: Fix scheduling of IPsec ASO query while in atomic · a128f9d4
      Leon Romanovsky authored
      
      
      ASO query can be scheduled in atomic context as such it can't use usleep.
      Use udelay as recommended in Documentation/timers/timers-howto.rst.
      
      Fixes: 76e463f6 ("net/mlx5e: Overcome slow response for first IPsec ASO WQE")
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      a128f9d4
    • Leon Romanovsky's avatar
      net/mlx5e: Drop XFRM state lock when modifying flow steering · c75b9425
      Leon Romanovsky authored
      
      
      XFRM state which is changed to be XFRM_STATE_EXPIRED doesn't really
      need to hold lock while modifying flow steering rules to drop traffic.
      
      That state can be deleted only and as such mlx5e_ipsec_handle_tx_limit()
      work will be canceled anyway and won't run in parallel.
      
      Fixes: b2f7b01d ("net/mlx5e: Simulate missing IPsec TX limits hardware functionality")
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      c75b9425
    • Patrisious Haddad's avatar
      net/mlx5e: Fix ESN update kernel panic · fef06678
      Patrisious Haddad authored
      
      
      Previously during mlx5e_ipsec_handle_event the driver tried to execute
      an operation that could sleep, while holding a spinlock, which caused
      the kernel panic mentioned below.
      
      Move the function call that can sleep outside of the spinlock context.
      
       Call Trace:
       <TASK>
       dump_stack_lvl+0x49/0x6c
       __schedule_bug.cold+0x42/0x4e
       schedule_debug.constprop.0+0xe0/0x118
       __schedule+0x59/0x58a
       ? __mod_timer+0x2a1/0x3ef
       schedule+0x5e/0xd4
       schedule_timeout+0x99/0x164
       ? __pfx_process_timeout+0x10/0x10
       __wait_for_common+0x90/0x1da
       ? __pfx_schedule_timeout+0x10/0x10
       wait_func+0x34/0x142 [mlx5_core]
       mlx5_cmd_invoke+0x1f3/0x313 [mlx5_core]
       cmd_exec+0x1fe/0x325 [mlx5_core]
       mlx5_cmd_do+0x22/0x50 [mlx5_core]
       mlx5_cmd_exec+0x1c/0x40 [mlx5_core]
       mlx5_modify_ipsec_obj+0xb2/0x17f [mlx5_core]
       mlx5e_ipsec_update_esn_state+0x69/0xf0 [mlx5_core]
       ? wake_affine+0x62/0x1f8
       mlx5e_ipsec_handle_event+0xb1/0xc0 [mlx5_core]
       process_one_work+0x1e2/0x3e6
       ? __pfx_worker_thread+0x10/0x10
       worker_thread+0x54/0x3ad
       ? __pfx_worker_thread+0x10/0x10
       kthread+0xda/0x101
       ? __pfx_kthread+0x10/0x10
       ret_from_fork+0x29/0x37
       </TASK>
       BUG: workqueue leaked lock or atomic: kworker/u256:4/0x7fffffff/189754#012     last function: mlx5e_ipsec_handle_event [mlx5_core]
       CPU: 66 PID: 189754 Comm: kworker/u256:4 Kdump: loaded Tainted: G        W          6.2.0-2596.20230309201517_5.el8uek.rc1.x86_64 #2
       Hardware name: Oracle Corporation ORACLE SERVER X9-2/ASMMBX9-2, BIOS 61070300 08/17/2022
       Workqueue: mlx5e_ipsec: eth%d mlx5e_ipsec_handle_event [mlx5_core]
       Call Trace:
       <TASK>
       dump_stack_lvl+0x49/0x6c
       process_one_work.cold+0x2b/0x3c
       ? __pfx_worker_thread+0x10/0x10
       worker_thread+0x54/0x3ad
       ? __pfx_worker_thread+0x10/0x10
       kthread+0xda/0x101
       ? __pfx_kthread+0x10/0x10
       ret_from_fork+0x29/0x37
       </TASK>
       BUG: scheduling while atomic: kworker/u256:4/189754/0x00000000
      
      Fixes: cee137a6 ("net/mlx5e: Handle ESN update events")
      Signed-off-by: default avatarPatrisious Haddad <phaddad@nvidia.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      fef06678
    • Leon Romanovsky's avatar
      net/mlx5e: Don't delay release of hardware objects · cf5bb023
      Leon Romanovsky authored
      
      
      XFRM core provides two callbacks to release resources, one is .xdo_dev_policy_delete()
      and another is .xdo_dev_policy_free(). This separation allows delayed release so
      "ip xfrm policy free" commands won't starve. Unfortunately, mlx5 command interface
      can't run in .xdo_dev_policy_free() callbacks as the latter runs in ATOMIC context.
      
       BUG: scheduling while atomic: swapper/7/0/0x00000100
       Modules linked in: act_mirred act_tunnel_key cls_flower sch_ingress vxlan mlx5_vdpa vringh vhost_iotlb vdpa rpcrdma rdma_ucm ib_iser libiscsi ib_umad scsi_transport_iscsi rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay mlx5_core zram zsmalloc fuse
       CPU: 7 PID: 0 Comm: swapper/7 Not tainted 6.3.0+ #1
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       Call Trace:
        <IRQ>
        dump_stack_lvl+0x33/0x50
        __schedule_bug+0x4e/0x60
        __schedule+0x5d5/0x780
        ? __mod_timer+0x286/0x3d0
        schedule+0x50/0x90
        schedule_timeout+0x7c/0xf0
        ? __bpf_trace_tick_stop+0x10/0x10
        __wait_for_common+0x88/0x190
        ? usleep_range_state+0x90/0x90
        cmd_exec+0x42e/0xb40 [mlx5_core]
        mlx5_cmd_do+0x1e/0x40 [mlx5_core]
        mlx5_cmd_exec+0x18/0x30 [mlx5_core]
        mlx5_cmd_delete_fte+0xa8/0xd0 [mlx5_core]
        del_hw_fte+0x60/0x120 [mlx5_core]
        mlx5_del_flow_rules+0xec/0x270 [mlx5_core]
        ? default_send_IPI_single_phys+0x26/0x30
        mlx5e_accel_ipsec_fs_del_pol+0x1a/0x60 [mlx5_core]
        mlx5e_xfrm_free_policy+0x15/0x20 [mlx5_core]
        xfrm_policy_destroy+0x5a/0xb0
        xfrm4_dst_destroy+0x7b/0x100
        dst_destroy+0x37/0x120
        rcu_core+0x2d6/0x540
        __do_softirq+0xcd/0x273
        irq_exit_rcu+0x82/0xb0
        sysvec_apic_timer_interrupt+0x72/0x90
        </IRQ>
        <TASK>
        asm_sysvec_apic_timer_interrupt+0x16/0x20
       RIP: 0010:default_idle+0x13/0x20
       Code: c0 08 00 00 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 72 ff ff ff cc cc cc cc 8b 05 7a 4d ee 00 85 c0 7e 07 0f 00 2d 2f 98 2e 00 fb f4 <fa> c3 66 66 2e 0f 1f 84 00 00 00 00 00 65 48 8b 04 25 40 b4 02 00
       RSP: 0018:ffff888100843ee0 EFLAGS: 00000242
       RAX: 0000000000000001 RBX: ffff888100812b00 RCX: 4000000000000000
       RDX: 0000000000000001 RSI: 0000000000000083 RDI: 000000000002d2ec
       RBP: 0000000000000007 R08: 00000021daeded59 R09: 0000000000000001
       R10: 0000000000000000 R11: 000000000000000f R12: 0000000000000000
       R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
        default_idle_call+0x30/0xb0
        do_idle+0x1c1/0x1d0
        cpu_startup_entry+0x19/0x20
        start_secondary+0xfe/0x120
        secondary_startup_64_no_verify+0xf3/0xfb
        </TASK>
       bad: scheduling from the idle thread!
      
      Fixes: a5b8ca94 ("net/mlx5e: Add XFRM policy offload logic")
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      cf5bb023
    • Saeed Mahameed's avatar
      net/mlx5: Free IRQ rmap and notifier on kernel shutdown · 314ded53
      Saeed Mahameed authored
      
      
      The kernel IRQ system needs the irq affinity notifier to be clear
      before attempting to free the irq, see WARN_ON log below.
      
      On a normal driver unload we don't have this issue since we do the
      complete cleanup of the irq resources.
      
      To fix this, put the important resources cleanup in a helper function
      and use it in both normal driver unload and shutdown flows.
      
      [ 4497.498434] ------------[ cut here ]------------
      [ 4497.498726] WARNING: CPU: 0 PID: 9 at kernel/irq/manage.c:2034 free_irq+0x295/0x340
      [ 4497.499193] Modules linked in:
      [ 4497.499386] CPU: 0 PID: 9 Comm: kworker/0:1 Tainted: G        W          6.4.0-rc4+ #10
      [ 4497.499876] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014
      [ 4497.500518] Workqueue: events do_poweroff
      [ 4497.500849] RIP: 0010:free_irq+0x295/0x340
      [ 4497.501132] Code: 85 c0 0f 84 1d ff ff ff 48 89 ef ff d0 0f 1f 00 e9 10 ff ff ff 0f 0b e9 72 ff ff ff 49 8d 7f 28 ff d0 0f 1f 00 e9 df fd ff ff <0f> 0b 48 c7 80 c0 008
      [ 4497.502269] RSP: 0018:ffffc90000053da0 EFLAGS: 00010282
      [ 4497.502589] RAX: ffff888100949600 RBX: ffff88810330b948 RCX: 0000000000000000
      [ 4497.503035] RDX: ffff888100949600 RSI: ffff888100400490 RDI: 0000000000000023
      [ 4497.503472] RBP: ffff88810330c7e0 R08: ffff8881004005d0 R09: ffffffff8273a260
      [ 4497.503923] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8881009ae000
      [ 4497.504359] R13: ffff8881009ae148 R14: 0000000000000000 R15: ffff888100949600
      [ 4497.504804] FS:  0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
      [ 4497.505302] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 4497.505671] CR2: 00007fce98806298 CR3: 000000000262e005 CR4: 0000000000370ef0
      [ 4497.506104] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 4497.506540] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 4497.507002] Call Trace:
      [ 4497.507158]  <TASK>
      [ 4497.507299]  ? free_irq+0x295/0x340
      [ 4497.507522]  ? __warn+0x7c/0x130
      [ 4497.507740]  ? free_irq+0x295/0x340
      [ 4497.507963]  ? report_bug+0x171/0x1a0
      [ 4497.508197]  ? handle_bug+0x3c/0x70
      [ 4497.508417]  ? exc_invalid_op+0x17/0x70
      [ 4497.508662]  ? asm_exc_invalid_op+0x1a/0x20
      [ 4497.508926]  ? free_irq+0x295/0x340
      [ 4497.509146]  mlx5_irq_pool_free_irqs+0x48/0x90
      [ 4497.509421]  mlx5_irq_table_free_irqs+0x38/0x50
      [ 4497.509714]  mlx5_core_eq_free_irqs+0x27/0x40
      [ 4497.509984]  shutdown+0x7b/0x100
      [ 4497.510184]  pci_device_shutdown+0x30/0x60
      [ 4497.510440]  device_shutdown+0x14d/0x240
      [ 4497.510698]  kernel_power_off+0x30/0x70
      [ 4497.510938]  process_one_work+0x1e6/0x3e0
      [ 4497.511183]  worker_thread+0x49/0x3b0
      [ 4497.511407]  ? __pfx_worker_thread+0x10/0x10
      [ 4497.511679]  kthread+0xe0/0x110
      [ 4497.511879]  ? __pfx_kthread+0x10/0x10
      [ 4497.512114]  ret_from_fork+0x29/0x50
      [ 4497.512342]  </TASK>
      
      Fixes: 9c2d0801 ("net/mlx5: Free irqs only on shutdown callback")
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Reviewed-by: default avatarShay Drory <shayd@nvidia.com>
      314ded53
    • Yevgeny Kliteynik's avatar
      net/mlx5: DR, Fix wrong action data allocation in decap action · ef4c5afc
      Yevgeny Kliteynik authored
      
      
      When TUNNEL_L3_TO_L2 decap action was created, a pointer to a local
      variable was passed as its HW action data, resulting in attempt to
      free invalid address:
      
        BUG: KASAN: invalid-free in mlx5dr_action_destroy+0x318/0x410 [mlx5_core]
      
      Fixes: 4781df92 ("net/mlx5: DR, Move STEv0 modify header logic")
      Signed-off-by: default avatarYevgeny Kliteynik <kliteyn@nvidia.com>
      Reviewed-by: default avatarAlex Vesker <valex@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      ef4c5afc
    • Yevgeny Kliteynik's avatar
      net/mlx5: DR, Support SW created encap actions for FW table · 87cd0649
      Yevgeny Kliteynik authored
      
      
      In some cases, steering might need to use SW-created action in
      FW table, which results in wrong packet reformat being used:
      
        mlx5_core 0000:81:00.1: mlx5_cmd_check:756:(pid 1154):
            SET_FLOW_TABLE_ENTRY(0×936) op_mod(0×0) failed,
            status bad resource(0×5), syndrome (0xf2ff71)
      
      This patch adds support for usage of SW-created packet reformat (encap)
      actions in FW tables, and adds clear error flow for attempt to use
      SW-created modify header on FW tables.
      
      Fixes: 6a48faee ("net/mlx5: Add direct rule fs_cmd implementation")
      Signed-off-by: default avatarYevgeny Kliteynik <kliteyn@nvidia.com>
      Reviewed-by: default avatarErez Shitrit <erezsh@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      87cd0649
    • Chris Mi's avatar
      net/mlx5e: TC, Cleanup ct resources for nic flow · fb7be476
      Chris Mi authored
      
      
      The cited commit removes special handling of CT action. But it
      removes too much. Pre ct/ct_nat tables and some other resources
      are not destroyed due to the cited commit.
      
      Fix it by adding it back.
      
      Fixes: 08fe94ec ("net/mlx5e: TC, Remove special handling of CT action")
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      fb7be476
    • Chris Mi's avatar
      net/mlx5e: TC, Add null pointer check for hardware miss support · b100573a
      Chris Mi authored
      
      
      The cited commits add hardware miss support to tc action. But if
      the rules can't be offloaded, the pointers are null and system
      will panic when accessing them.
      
      Fix it by checking null pointer.
      
      Fixes: 08fe94ec ("net/mlx5e: TC, Remove special handling of CT action")
      Fixes: 67027828 ("net/mlx5e: TC, Set CT miss to the specific ct action instance")
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      b100573a
    • Eli Cohen's avatar
      net/mlx5: Fix driver load with single msix vector · 0ab999d4
      Eli Cohen authored
      
      
      When a PCI device has just one msix vector available, we want to share
      this vector between async and completion events. Current code fails to
      do that assuming it will always have at least one dedicated vector for
      completion events. Fix this by detecting when the pool contains just a
      single vector.
      
      Fixes: 3354822c ("net/mlx5: Use dynamic msix vectors allocation")
      Signed-off-by: default avatarEli Cohen <elic@nvidia.com>
      Reviewed-by: default avatarShay Drory <shayd@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      0ab999d4
    • Maxim Mikityanskiy's avatar
      net/mlx5e: xsk: Set napi_id to support busy polling on XSK RQ · 62a522d3
      Maxim Mikityanskiy authored
      
      
      The cited commit missed setting napi_id on XSK RQs, it only affected
      regular RQs. Add the missing part to support socket busy polling on XSK
      RQs.
      
      Fixes: a2740f52 ("net/mlx5e: xsk: Set napi_id to support busy polling")
      Signed-off-by: default avatarMaxim Mikityanskiy <maxtram95@gmail.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      62a522d3
    • Maxim Mikityanskiy's avatar
      net/mlx5e: XDP, Allow growing tail for XDP multi buffer · 4e7401fc
      Maxim Mikityanskiy authored
      
      
      The cited commits missed passing frag_size to __xdp_rxq_info_reg, which
      is required by bpf_xdp_adjust_tail to support growing the tail pointer
      in fragmented packets. Pass the missing parameter when the current RQ
      mode allows XDP multi buffer.
      
      Fixes: ea5d49bd ("net/mlx5e: Add XDP multi buffer support to the non-linear legacy RQ")
      Fixes: 9cb9482e ("net/mlx5e: Use fragments of the same size in non-linear legacy RQ with XDP")
      Signed-off-by: default avatarMaxim Mikityanskiy <maxtram95@gmail.com>
      Cc: Tariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      4e7401fc
  7. Jun 16, 2023