Skip to content
  1. Dec 28, 2021
  2. Dec 27, 2021
    • Krzysztof Kozlowski's avatar
      nfc: uapi: use kernel size_t to fix user-space builds · 79b69a83
      Krzysztof Kozlowski authored
      Fix user-space builds if it includes /usr/include/linux/nfc.h before
      some of other headers:
      
        /usr/include/linux/nfc.h:281:9: error: unknown type name ‘size_t’
          281 |         size_t service_name_len;
              |         ^~~~~~
      
      Fixes: d646960f
      
       ("NFC: Initial LLCP support")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      79b69a83
    • Dmitry V. Levin's avatar
      uapi: fix linux/nfc.h userspace compilation errors · 7175f02c
      Dmitry V. Levin authored
      Replace sa_family_t with __kernel_sa_family_t to fix the following
      linux/nfc.h userspace compilation errors:
      
      /usr/include/linux/nfc.h:266:2: error: unknown type name 'sa_family_t'
        sa_family_t sa_family;
      /usr/include/linux/nfc.h:274:2: error: unknown type name 'sa_family_t'
        sa_family_t sa_family;
      
      Fixes: 23b7869c ("NFC: add the NFC socket raw protocol")
      Fixes: d646960f
      
       ("NFC: Initial LLCP support")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarDmitry V. Levin <ldv@altlinux.org>
      Reviewed-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7175f02c
    • Matthias-Christian Ott's avatar
      net: usb: pegasus: Do not drop long Ethernet frames · ca506fca
      Matthias-Christian Ott authored
      The D-Link DSB-650TX (2001:4002) is unable to receive Ethernet frames
      that are longer than 1518 octets, for example, Ethernet frames that
      contain 802.1Q VLAN tags.
      
      The frames are sent to the pegasus driver via USB but the driver
      discards them because they have the Long_pkt field set to 1 in the
      received status report. The function read_bulk_callback of the pegasus
      driver treats such received "packets" (in the terminology of the
      hardware) as errors but the field simply does just indicate that the
      Ethernet frame (MAC destination to FCS) is longer than 1518 octets.
      
      It seems that in the 1990s there was a distinction between
      "giant" (> 1518) and "runt" (< 64) frames and the hardware includes
      flags to indicate this distinction. It seems that the purpose of the
      distinction "giant" frames was to not allow infinitely long frames due
      to transmission errors and to allow hardware to have an upper limit of
      the frame size. However, the hardware already has such limit with its
      2048 octet receive buffer and, therefore, Long_pkt is merely a
      convention and should not be treated as a receive error.
      
      Actually, the hardware is even able to receive Ethernet frames with 2048
      octets which exceeds the claimed limit frame size limit of the driver of
      1536 octets (PEGASUS_MTU).
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarMatthias-Christian Ott <ott@mirix.org>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca506fca
    • Zekun Shen's avatar
      atlantic: Fix buff_ring OOB in aq_ring_rx_clean · 5f501532
      Zekun Shen authored
      
      
      The function obtain the next buffer without boundary check.
      We should return with I/O error code.
      
      The bug is found by fuzzing and the crash report is attached.
      It is an OOB bug although reported as use-after-free.
      
      [    4.804724] BUG: KASAN: use-after-free in aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
      [    4.805661] Read of size 4 at addr ffff888034fe93a8 by task ksoftirqd/0/9
      [    4.806505]
      [    4.806703] CPU: 0 PID: 9 Comm: ksoftirqd/0 Tainted: G        W         5.6.0 #34
      [    4.809030] Call Trace:
      [    4.809343]  dump_stack+0x76/0xa0
      [    4.809755]  print_address_description.constprop.0+0x16/0x200
      [    4.810455]  ? aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
      [    4.811234]  ? aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
      [    4.813183]  __kasan_report.cold+0x37/0x7c
      [    4.813715]  ? aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
      [    4.814393]  kasan_report+0xe/0x20
      [    4.814837]  aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
      [    4.815499]  ? hw_atl_b0_hw_ring_rx_receive+0x9a5/0xb90 [atlantic]
      [    4.816290]  aq_vec_poll+0x179/0x5d0 [atlantic]
      [    4.816870]  ? _GLOBAL__sub_I_65535_1_aq_pci_func_init+0x20/0x20 [atlantic]
      [    4.817746]  ? __next_timer_interrupt+0xba/0xf0
      [    4.818322]  net_rx_action+0x363/0xbd0
      [    4.818803]  ? call_timer_fn+0x240/0x240
      [    4.819302]  ? __switch_to_asm+0x40/0x70
      [    4.819809]  ? napi_busy_loop+0x520/0x520
      [    4.820324]  __do_softirq+0x18c/0x634
      [    4.820797]  ? takeover_tasklets+0x5f0/0x5f0
      [    4.821343]  run_ksoftirqd+0x15/0x20
      [    4.821804]  smpboot_thread_fn+0x2f1/0x6b0
      [    4.822331]  ? smpboot_unregister_percpu_thread+0x160/0x160
      [    4.823041]  ? __kthread_parkme+0x80/0x100
      [    4.823571]  ? smpboot_unregister_percpu_thread+0x160/0x160
      [    4.824301]  kthread+0x2b5/0x3b0
      [    4.824723]  ? kthread_create_on_node+0xd0/0xd0
      [    4.825304]  ret_from_fork+0x35/0x40
      
      Signed-off-by: default avatarZekun Shen <bruceshenzk@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5f501532
    • yangxingwu's avatar
      net: udp: fix alignment problem in udp4_seq_show() · 6c25449e
      yangxingwu authored
      
      
      $ cat /pro/net/udp
      
      before:
      
        sl  local_address rem_address   st tx_queue rx_queue tr tm->when
      26050: 0100007F:0035 00000000:0000 07 00000000:00000000 00:00000000
      26320: 0100007F:0143 00000000:0000 07 00000000:00000000 00:00000000
      27135: 00000000:8472 00000000:0000 07 00000000:00000000 00:00000000
      
      after:
      
         sl  local_address rem_address   st tx_queue rx_queue tr tm->when
      26050: 0100007F:0035 00000000:0000 07 00000000:00000000 00:00000000
      26320: 0100007F:0143 00000000:0000 07 00000000:00000000 00:00000000
      27135: 00000000:8472 00000000:0000 07 00000000:00000000 00:00000000
      
      Signed-off-by: default avataryangxingwu <xingwu.yang@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6c25449e
    • Karsten Graul's avatar
      net/smc: fix using of uninitialized completions · 6d7373da
      Karsten Graul authored
      In smc_wr_tx_send_wait() the completion on index specified by
      pend->idx is initialized and after smc_wr_tx_send() was called the wait
      for completion starts. pend->idx is used to get the correct index for
      the wait, but the pend structure could already be cleared in
      smc_wr_tx_process_cqe().
      Introduce pnd_idx to hold and use a local copy of the correct index.
      
      Fixes: 09c61d24
      
       ("net/smc: wait for departure of an IB message")
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6d7373da
    • William Zhao's avatar
      ip6_vti: initialize __ip6_tnl_parm struct in vti6_siocdevprivate · c1833c39
      William Zhao authored
      
      
      The "__ip6_tnl_parm" struct was left uninitialized causing an invalid
      load of random data when the "__ip6_tnl_parm" struct was used elsewhere.
      As an example, in the function "ip6_tnl_xmit_ctl()", it tries to access
      the "collect_md" member. With "__ip6_tnl_parm" being uninitialized and
      containing random data, the UBSAN detected that "collect_md" held a
      non-boolean value.
      
      The UBSAN issue is as follows:
      ===============================================================
      UBSAN: invalid-load in net/ipv6/ip6_tunnel.c:1025:14
      load of value 30 is not a valid value for type '_Bool'
      CPU: 1 PID: 228 Comm: kworker/1:3 Not tainted 5.16.0-rc4+ #8
      Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      Workqueue: ipv6_addrconf addrconf_dad_work
      Call Trace:
      <TASK>
      dump_stack_lvl+0x44/0x57
      ubsan_epilogue+0x5/0x40
      __ubsan_handle_load_invalid_value+0x66/0x70
      ? __cpuhp_setup_state+0x1d3/0x210
      ip6_tnl_xmit_ctl.cold.52+0x2c/0x6f [ip6_tunnel]
      vti6_tnl_xmit+0x79c/0x1e96 [ip6_vti]
      ? lock_is_held_type+0xd9/0x130
      ? vti6_rcv+0x100/0x100 [ip6_vti]
      ? lock_is_held_type+0xd9/0x130
      ? rcu_read_lock_bh_held+0xc0/0xc0
      ? lock_acquired+0x262/0xb10
      dev_hard_start_xmit+0x1e6/0x820
      __dev_queue_xmit+0x2079/0x3340
      ? mark_lock.part.52+0xf7/0x1050
      ? netdev_core_pick_tx+0x290/0x290
      ? kvm_clock_read+0x14/0x30
      ? kvm_sched_clock_read+0x5/0x10
      ? sched_clock_cpu+0x15/0x200
      ? find_held_lock+0x3a/0x1c0
      ? lock_release+0x42f/0xc90
      ? lock_downgrade+0x6b0/0x6b0
      ? mark_held_locks+0xb7/0x120
      ? neigh_connected_output+0x31f/0x470
      ? lockdep_hardirqs_on+0x79/0x100
      ? neigh_connected_output+0x31f/0x470
      ? ip6_finish_output2+0x9b0/0x1d90
      ? rcu_read_lock_bh_held+0x62/0xc0
      ? ip6_finish_output2+0x9b0/0x1d90
      ip6_finish_output2+0x9b0/0x1d90
      ? ip6_append_data+0x330/0x330
      ? ip6_mtu+0x166/0x370
      ? __ip6_finish_output+0x1ad/0xfb0
      ? nf_hook_slow+0xa6/0x170
      ip6_output+0x1fb/0x710
      ? nf_hook.constprop.32+0x317/0x430
      ? ip6_finish_output+0x180/0x180
      ? __ip6_finish_output+0xfb0/0xfb0
      ? lock_is_held_type+0xd9/0x130
      ndisc_send_skb+0xb33/0x1590
      ? __sk_mem_raise_allocated+0x11cf/0x1560
      ? dst_output+0x4a0/0x4a0
      ? ndisc_send_rs+0x432/0x610
      addrconf_dad_completed+0x30c/0xbb0
      ? addrconf_rs_timer+0x650/0x650
      ? addrconf_dad_work+0x73c/0x10e0
      addrconf_dad_work+0x73c/0x10e0
      ? addrconf_dad_completed+0xbb0/0xbb0
      ? rcu_read_lock_sched_held+0xaf/0xe0
      ? rcu_read_lock_bh_held+0xc0/0xc0
      process_one_work+0x97b/0x1740
      ? pwq_dec_nr_in_flight+0x270/0x270
      worker_thread+0x87/0xbf0
      ? process_one_work+0x1740/0x1740
      kthread+0x3ac/0x490
      ? set_kthread_struct+0x100/0x100
      ret_from_fork+0x22/0x30
      </TASK>
      ===============================================================
      
      The solution is to initialize "__ip6_tnl_parm" struct to zeros in the
      "vti6_siocdevprivate()" function.
      
      Signed-off-by: default avatarWilliam Zhao <wizhao@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1833c39
  3. Dec 26, 2021
    • Ma Xinjian's avatar
      selftests: mptcp: Remove the deprecated config NFT_COUNTER · e6007b85
      Ma Xinjian authored
      
      
      NFT_COUNTER was removed since
      390ad4295aa ("netfilter: nf_tables: make counter support built-in")
      LKP/0Day will check if all configs listing under selftests are able to
      be enabled properly.
      
      For the missing configs, it will report something like:
      LKP WARN miss config CONFIG_NFT_COUNTER= of net/mptcp/config
      
      - it's not reasonable to keep the deprecated configs.
      - configs under kselftests are recommended by corresponding tests.
      So if some configs are missing, it will impact the testing results
      
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarMa Xinjian <xinjianx.ma@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e6007b85
    • Xin Long's avatar
      sctp: use call_rcu to free endpoint · 5ec7d18d
      Xin Long authored
      
      
      This patch is to delay the endpoint free by calling call_rcu() to fix
      another use-after-free issue in sctp_sock_dump():
      
        BUG: KASAN: use-after-free in __lock_acquire+0x36d9/0x4c20
        Call Trace:
          __lock_acquire+0x36d9/0x4c20 kernel/locking/lockdep.c:3218
          lock_acquire+0x1ed/0x520 kernel/locking/lockdep.c:3844
          __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
          _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168
          spin_lock_bh include/linux/spinlock.h:334 [inline]
          __lock_sock+0x203/0x350 net/core/sock.c:2253
          lock_sock_nested+0xfe/0x120 net/core/sock.c:2774
          lock_sock include/net/sock.h:1492 [inline]
          sctp_sock_dump+0x122/0xb20 net/sctp/diag.c:324
          sctp_for_each_transport+0x2b5/0x370 net/sctp/socket.c:5091
          sctp_diag_dump+0x3ac/0x660 net/sctp/diag.c:527
          __inet_diag_dump+0xa8/0x140 net/ipv4/inet_diag.c:1049
          inet_diag_dump+0x9b/0x110 net/ipv4/inet_diag.c:1065
          netlink_dump+0x606/0x1080 net/netlink/af_netlink.c:2244
          __netlink_dump_start+0x59a/0x7c0 net/netlink/af_netlink.c:2352
          netlink_dump_start include/linux/netlink.h:216 [inline]
          inet_diag_handler_cmd+0x2ce/0x3f0 net/ipv4/inet_diag.c:1170
          __sock_diag_cmd net/core/sock_diag.c:232 [inline]
          sock_diag_rcv_msg+0x31d/0x410 net/core/sock_diag.c:263
          netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2477
          sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:274
      
      This issue occurs when asoc is peeled off and the old sk is freed after
      getting it by asoc->base.sk and before calling lock_sock(sk).
      
      To prevent the sk free, as a holder of the sk, ep should be alive when
      calling lock_sock(). This patch uses call_rcu() and moves sock_put and
      ep free into sctp_endpoint_destroy_rcu(), so that it's safe to try to
      hold the ep under rcu_read_lock in sctp_transport_traverse_process().
      
      If sctp_endpoint_hold() returns true, it means this ep is still alive
      and we have held it and can continue to dump it; If it returns false,
      it means this ep is dead and can be freed after rcu_read_unlock, and
      we should skip it.
      
      In sctp_sock_dump(), after locking the sk, if this ep is different from
      tsp->asoc->ep, it means during this dumping, this asoc was peeled off
      before calling lock_sock(), and the sk should be skipped; If this ep is
      the same with tsp->asoc->ep, it means no peeloff happens on this asoc,
      and due to lock_sock, no peeloff will happen either until release_sock.
      
      Note that delaying endpoint free won't delay the port release, as the
      port release happens in sctp_endpoint_destroy() before calling call_rcu().
      Also, freeing endpoint by call_rcu() makes it safe to access the sk by
      asoc->base.sk in sctp_assocs_seq_show() and sctp_rcv().
      
      Thanks Jones to bring this issue up.
      
      v1->v2:
        - improve the changelog.
        - add kfree(ep) into sctp_endpoint_destroy_rcu(), as Jakub noticed.
      
      Reported-by: default avatar <syzbot+9276d76e83e3bcde6c99@syzkaller.appspotmail.com>
      Reported-by: default avatarLee Jones <lee.jones@linaro.org>
      Fixes: d25adbeb
      
       ("sctp: fix an use-after-free issue in sctp_sock_dump")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5ec7d18d
  4. Dec 25, 2021
  5. Dec 24, 2021
  6. Dec 23, 2021
    • Christophe JAILLET's avatar
      net/mlx5: Fix some error handling paths in 'mlx5e_tc_add_fdb_flow()' · 4390c6ed
      Christophe JAILLET authored
      All the error handling paths of 'mlx5e_tc_add_fdb_flow()' end to 'err_out'
      where 'flow_flag_set(flow, FAILED);' is called.
      
      All but the new error handling paths added by the commits given in the
      Fixes tag below.
      
      Fix these error handling paths and branch to 'err_out'.
      
      Fixes: 166f431e ("net/mlx5e: Add indirect tc offload of ovs internal port")
      Fixes: b16eb3c8
      
       ("net/mlx5: Support internal port as decap route device")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      (cherry picked from commit 31108d14)
      4390c6ed
    • Chris Mi's avatar
      net/mlx5e: Delete forward rule for ct or sample action · 2820110d
      Chris Mi authored
      When there is ct or sample action, the ct or sample rule will be deleted
      and return. But if there is an extra mirror action, the forward rule can't
      be deleted because of the return.
      
      Fix it by removing the return.
      
      Fixes: 69e2916e ("net/mlx5: CT: Add support for mirroring")
      Fixes: f94d6389
      
       ("net/mlx5e: TC, Add support to offload sample action")
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      2820110d
    • Maxim Mikityanskiy's avatar
      net/mlx5e: Fix ICOSQ recovery flow for XSK · 19c4aba2
      Maxim Mikityanskiy authored
      There are two ICOSQs per channel: one is needed for RX, and the other
      for async operations (XSK TX, kTLS offload). Currently, the recovery
      flow for both is the same, and async ICOSQ is mistakenly treated like
      the regular ICOSQ.
      
      This patch prevents running the regular ICOSQ recovery on async ICOSQ.
      The purpose of async ICOSQ is to handle XSK wakeup requests and post
      kTLS offload RX parameters, it has nothing to do with RQ and XSKRQ UMRs,
      so the regular recovery sequence is not applicable here.
      
      Fixes: be5323c8
      
       ("net/mlx5e: Report and recover from CQE error on ICOSQ")
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: default avatarAya Levin <ayal@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      19c4aba2
    • Maxim Mikityanskiy's avatar
      net/mlx5e: Fix interoperability between XSK and ICOSQ recovery flow · 17958d7c
      Maxim Mikityanskiy authored
      Both regular RQ and XSKRQ use the same ICOSQ for UMRs. When doing
      recovery for the ICOSQ, don't forget to deactivate XSKRQ.
      
      XSK can be opened and closed while channels are active, so a new mutex
      prevents the ICOSQ recovery from running at the same time. The ICOSQ
      recovery deactivates and reactivates XSKRQ, so any parallel change in
      XSK state would break consistency. As the regular RQ is running, it's
      not enough to just flush the recovery work, because it can be
      rescheduled.
      
      Fixes: be5323c8
      
       ("net/mlx5e: Report and recover from CQE error on ICOSQ")
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      17958d7c
    • Gal Pressman's avatar
      net/mlx5e: Fix skb memory leak when TC classifier action offloads are disabled · a0cb9096
      Gal Pressman authored
      When TC classifier action offloads are disabled (CONFIG_MLX5_CLS_ACT in
      Kconfig), the mlx5e_rep_tc_receive() function which is responsible for
      passing the skb to the stack (or freeing it) is defined as a nop, and
      results in leaking the skb memory. Replace the nop with a call to
      napi_gro_receive() to resolve the leak.
      
      Fixes: 28e7606f
      
       ("net/mlx5e: Refactor rx handler of represetor device")
      Signed-off-by: default avatarGal Pressman <gal@nvidia.com>
      Reviewed-by: default avatarAriel Levkovich <lariel@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      a0cb9096
    • Amir Tzin's avatar
      net/mlx5e: Wrap the tx reporter dump callback to extract the sq · 918fc385
      Amir Tzin authored
      Function mlx5e_tx_reporter_dump_sq() casts its void * argument to struct
      mlx5e_txqsq *, but in TX-timeout-recovery flow the argument is actually
      of type struct mlx5e_tx_timeout_ctx *.
      
       mlx5_core 0000:08:00.1 enp8s0f1: TX timeout detected
       mlx5_core 0000:08:00.1 enp8s0f1: TX timeout on queue: 1, SQ: 0x11ec, CQ: 0x146d, SQ Cons: 0x0 SQ Prod: 0x1, usecs since last trans: 21565000
       BUG: stack guard page was hit at 0000000093f1a2de (stack is 00000000b66ea0dc..000000004d932dae)
       kernel stack overflow (page fault): 0000 [#1] SMP NOPTI
       CPU: 5 PID: 95 Comm: kworker/u20:1 Tainted: G W OE 5.13.0_mlnx #1
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       Workqueue: mlx5e mlx5e_tx_timeout_work [mlx5_core]
       RIP: 0010:mlx5e_tx_reporter_dump_sq+0xd3/0x180
       [mlx5_core]
       Call Trace:
       mlx5e_tx_reporter_dump+0x43/0x1c0 [mlx5_core]
       devlink_health_do_dump.part.91+0x71/0xd0
       devlink_health_report+0x157/0x1b0
       mlx5e_reporter_tx_timeout+0xb9/0xf0 [mlx5_core]
       ? mlx5e_tx_reporter_err_cqe_recover+0x1d0/0x1d0
       [mlx5_core]
       ? mlx5e_health_queue_dump+0xd0/0xd0 [mlx5_core]
       ? update_load_avg+0x19b/0x550
       ? set_next_entity+0x72/0x80
       ? pick_next_task_fair+0x227/0x340
       ? finish_task_switch+0xa2/0x280
         mlx5e_tx_timeout_work+0x83/0xb0 [mlx5_core]
         process_one_work+0x1de/0x3a0
         worker_thread+0x2d/0x3c0
       ? process_one_work+0x3a0/0x3a0
         kthread+0x115/0x130
       ? kthread_park+0x90/0x90
         ret_from_fork+0x1f/0x30
       --[ end trace 51ccabea504edaff ]---
       RIP: 0010:mlx5e_tx_reporter_dump_sq+0xd3/0x180
       PKRU: 55555554
       Kernel panic - not syncing: Fatal exception
       Kernel Offset: disabled
       end Kernel panic - not syncing: Fatal exception
      
      To fix this bug add a wrapper for mlx5e_tx_reporter_dump_sq() which
      extracts the sq from struct mlx5e_tx_timeout_ctx and set it as the
      TX-timeout-recovery flow dump callback.
      
      Fixes: 5f29458b
      
       ("net/mlx5e: Support dump callback in TX reporter")
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Signed-off-by: default avatarAmir Tzin <amirtz@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      918fc385
    • Chris Mi's avatar
      net/mlx5: Fix tc max supported prio for nic mode · d671e109
      Chris Mi authored
      Only prio 1 is supported if firmware doesn't support ignore flow
      level for nic mode. The offending commit removed the check wrongly.
      Add it back.
      
      Fixes: 9a99c8f1
      
       ("net/mlx5e: E-Switch, Offload all chain 0 priorities when modify header and forward action is not supported")
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      d671e109
    • Moshe Shemesh's avatar
      net/mlx5: Fix SF health recovery flow · 33de865f
      Moshe Shemesh authored
      SF do not directly control the PCI device. During recovery flow SF
      should not be allowed to do pci disable or pci reset, its PF will do it.
      
      It fixes the following kernel trace:
      mlx5_core.sf mlx5_core.sf.25: mlx5_health_try_recover:387:(pid 40948): starting health recovery flow
      mlx5_core 0000:03:00.0: mlx5_pci_slot_reset was called
      mlx5_core 0000:03:00.0: wait vital counter value 0xab175 after 1 iterations
      mlx5_core.sf mlx5_core.sf.25: firmware version: 24.32.532
      mlx5_core.sf mlx5_core.sf.23: mlx5_health_try_recover:387:(pid 40946): starting health recovery flow
      mlx5_core 0000:03:00.0: mlx5_pci_slot_reset was called
      mlx5_core 0000:03:00.0: wait vital counter value 0xab193 after 1 iterations
      mlx5_core.sf mlx5_core.sf.23: firmware version: 24.32.532
      mlx5_core.sf mlx5_core.sf.25: mlx5_cmd_check:813:(pid 40948): ENABLE_HCA(0x104) op_mod(0x0) failed,
      status bad resource state(0x9), syndrome (0x658908)
      mlx5_core.sf mlx5_core.sf.25: mlx5_function_setup:1292:(pid 40948): enable hca failed
      mlx5_core.sf mlx5_core.sf.25: mlx5_health_try_recover:389:(pid 40948): health recovery failed
      
      Fixes: 1958fc2f
      
       ("net/mlx5: SF, Add auxiliary device driver")
      Signed-off-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      33de865f
    • Shay Drory's avatar
      net/mlx5: Fix error print in case of IRQ request failed · aa968f92
      Shay Drory authored
      In case IRQ layer failed to find or to request irq, the driver is
      printing the first cpu of the provided affinity as part of the error
      print. Empty affinity is a valid input for the IRQ layer, and it is
      an error to call cpumask_first() on empty affinity.
      
      Remove the first cpu print from the error message.
      
      Fixes: c36326d3
      
       ("net/mlx5: Round-Robin EQs over IRQs")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      aa968f92
    • Shay Drory's avatar
      net/mlx5: Use first online CPU instead of hard coded CPU · 26a7993c
      Shay Drory authored
      Hard coded CPU (0 in our case) might be offline. Hence, use the first
      online CPU instead.
      
      Fixes: f891b7cd
      
       ("net/mlx5: Enable single IRQ for PCI Function")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      26a7993c