Skip to content
  1. Oct 28, 2021
  2. Oct 27, 2021
    • Jakub Kicinski's avatar
      Merge tag 'mac80211-for-net-2021-10-27' of... · afe8ca11
      Jakub Kicinski authored
      
      Merge tag 'mac80211-for-net-2021-10-27' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
      
      Johannes Berg says:
      
      ====================
      Two fixes:
       * bridge vs. 4-addr mode check was wrong
       * management frame registrations locking was
         wrong, causing list corruption/crashes
      ====================
      
      Link: https://lore.kernel.org/r/20211027143756.91711-1-johannes@sipsolutions.net
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      afe8ca11
    • David S. Miller's avatar
      Merge branch 'hns3-fixes' · 424a4f52
      David S. Miller authored
      
      
      Guangbin Huang says:
      
      ====================
      net: hns3: add some fixes for -net
      
      This series adds some fixes for the HNS3 ethernet driver.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      424a4f52
    • Guangbin Huang's avatar
      net: hns3: adjust string spaces of some parameters of tx bd info in debugfs · 630a6738
      Guangbin Huang authored
      
      
      This patch adjusts the string spaces of some parameters of tx bd info in
      debugfs according to their maximum needs.
      
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      630a6738
    • Guangbin Huang's avatar
      net: hns3: expand buffer len for some debugfs command · c7a6e397
      Guangbin Huang authored
      The specified buffer length for three debugfs files fd_tcam, uc and tqp
      is not enough for their maximum needs, so this patch fixes them.
      
      Fixes: b5a0b70d ("net: hns3: refactor dump fd tcam of debugfs")
      Fixes: 1556ea91 ("net: hns3: refactor dump mac list of debugfs")
      Fixes: d96b0e59
      
       ("net: hns3: refactor dump reg of debugfs")
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c7a6e397
    • Jie Wang's avatar
      net: hns3: add more string spaces for dumping packets number of queue info in debugfs · 6754614a
      Jie Wang authored
      As the width of packets number registers is 32 bits, they needs at most
      10 characters for decimal data printing, but now the string spaces is not
      enough, so this patch fixes it.
      
      Fixes: e44c495d
      
       ("net: hns3: refactor queue info of debugfs")
      Signed-off-by: default avatarJie Wang <wangjie125@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6754614a
    • Jie Wang's avatar
      net: hns3: fix data endian problem of some functions of debugfs · 2a21dab5
      Jie Wang authored
      The member data in struct hclge_desc is type of __le32, it needs endian
      conversion before using it, and some functions of debugfs didn't do that,
      so this patch fixes it.
      
      Fixes: c0ebebb9
      
       ("net: hns3: Add "dcb register" status information query function")
      Signed-off-by: default avatarJie Wang <wangjie125@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2a21dab5
    • Guangbin Huang's avatar
      net: hns3: ignore reset event before initialization process is done · 0251d196
      Guangbin Huang authored
      
      
      Currently, if there is a reset event triggered by RAS during device in
      initialization process, driver may run reset process concurrently with
      initialization process. In this case, it may cause problem. For example,
      the RSS indirection table may has not been alloc memory in initialization
      process yet, but it is used in reset process, it will cause a call trace
      like this:
      
      [61228.744836] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
      ...
      [61228.897677] Workqueue: hclgevf hclgevf_service_task [hclgevf]
      [61228.911390] pstate: 40400009 (nZcv daif +PAN -UAO -TCO BTYPE=--)
      [61228.918670] pc : hclgevf_set_rss_indir_table+0xb4/0x190 [hclgevf]
      [61228.927812] lr : hclgevf_set_rss_indir_table+0x90/0x190 [hclgevf]
      [61228.937248] sp : ffff8000162ebb50
      [61228.941087] x29: ffff8000162ebb50 x28: ffffb77add72dbc0 x27: ffff0820c7dc8080
      [61228.949516] x26: 0000000000000000 x25: ffff0820ad4fc880 x24: ffff0820c7dc8080
      [61228.958220] x23: ffff0820c7dc8090 x22: 00000000ffffffff x21: 0000000000000040
      [61228.966360] x20: ffffb77add72b9c0 x19: 0000000000000000 x18: 0000000000000030
      [61228.974646] x17: 0000000000000000 x16: ffffb77ae713feb0 x15: ffff0820ad4fcce8
      [61228.982808] x14: ffffffffffffffff x13: ffff8000962eb7f7 x12: 00003834ec70c960
      [61228.991990] x11: 00e0fafa8c206982 x10: 9670facc78a8f9a8 x9 : ffffb77add717530
      [61229.001123] x8 : ffff0820ad4fd6b8 x7 : 0000000000000000 x6 : 0000000000000011
      [61229.010249] x5 : 00000000000cb1b0 x4 : 0000000000002adb x3 : 0000000000000049
      [61229.018662] x2 : ffff8000162ebbb8 x1 : 0000000000000000 x0 : 0000000000000480
      [61229.027002] Call trace:
      [61229.030177]  hclgevf_set_rss_indir_table+0xb4/0x190 [hclgevf]
      [61229.039009]  hclgevf_rss_init_hw+0x128/0x1b4 [hclgevf]
      [61229.046809]  hclgevf_reset_rebuild+0x17c/0x69c [hclgevf]
      [61229.053862]  hclgevf_reset_service_task+0x4cc/0xa80 [hclgevf]
      [61229.061306]  hclgevf_service_task+0x6c/0x630 [hclgevf]
      [61229.068491]  process_one_work+0x1dc/0x48c
      [61229.074121]  worker_thread+0x15c/0x464
      [61229.078562]  kthread+0x168/0x16c
      [61229.082873]  ret_from_fork+0x10/0x18
      [61229.088221] Code: 7900e7f6 f904a683 d503201f 9101a3e2 (38616b43)
      [61229.095357] ---[ end trace 153661a538f6768c ]---
      
      To fix this problem, don't schedule reset task before initialization
      process is done.
      
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0251d196
    • Yufeng Mo's avatar
      net: hns3: change hclge/hclgevf workqueue to WQ_UNBOUND mode · f29da408
      Yufeng Mo authored
      
      
      Currently, the workqueue of hclge/hclgevf is executed on
      the CPU that initiates scheduling requests by default. In
      stress scenarios, the CPU may be busy and workqueue scheduling
      is completed after a long period of time. To avoid this
      situation and implement proper scheduling, use the WQ_UNBOUND
      mode instead. In this way, the workqueue can be performed on
      a relatively idle CPU.
      
      Signed-off-by: default avatarYufeng Mo <moyufeng@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f29da408
    • Guangbin Huang's avatar
      net: hns3: fix pause config problem after autoneg disabled · 3bda2e5d
      Guangbin Huang authored
      If a TP port is configured by follow steps:
      1.ethtool -s ethx autoneg off speed 100 duplex full
      2.ethtool -A ethx rx on tx on
      3.ethtool -s ethx autoneg on(rx&tx negotiated pause results are off)
      4.ethtool -s ethx autoneg off speed 100 duplex full
      
      In step 3, driver will set rx&tx pause parameters of hardware to off as
      pause parameters negotiated with link partner are off.
      
      After step 4, the "ethtool -a ethx" command shows both rx and tx pause
      parameters are on. However, pause parameters of hardware are still off
      and port has no flow control function actually.
      
      To fix this problem, if autoneg is disabled, driver uses its saved
      parameters to restore pause of hardware. If the speed is not changed in
      this case, there is no link state changed for phy, it will cause the pause
      parameter is not taken effect, so we need to force phy to go down and up.
      
      Fixes: aacbe27e
      
       ("net: hns3: modify how pause options is displayed")
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3bda2e5d
    • Jakub Kicinski's avatar
      Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 440ffcdd
      Jakub Kicinski authored
      
      
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2021-10-26
      
      We've added 12 non-merge commits during the last 7 day(s) which contain
      a total of 23 files changed, 118 insertions(+), 98 deletions(-).
      
      The main changes are:
      
      1) Fix potential race window in BPF tail call compatibility check, from Toke Høiland-Jørgensen.
      
      2) Fix memory leak in cgroup fs due to missing cgroup_bpf_offline(), from Quanyang Wang.
      
      3) Fix file descriptor reference counting in generic_map_update_batch(), from Xu Kuohai.
      
      4) Fix bpf_jit_limit knob to the max supported limit by the arch's JIT, from Lorenz Bauer.
      
      5) Fix BPF sockmap ->poll callbacks for UDP and AF_UNIX sockets, from Cong Wang and Yucong Sun.
      
      6) Fix BPF sockmap concurrency issue in TCP on non-blocking sendmsg calls, from Liu Jian.
      
      7) Fix build failure of INODE_STORAGE and TASK_STORAGE maps on !CONFIG_NET, from Tejun Heo.
      
      * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        bpf: Fix potential race in tail call compatibility check
        bpf: Move BPF_MAP_TYPE for INODE_STORAGE and TASK_STORAGE outside of CONFIG_NET
        selftests/bpf: Use recv_timeout() instead of retries
        net: Implement ->sock_is_readable() for UDP and AF_UNIX
        skmsg: Extract and reuse sk_msg_is_readable()
        net: Rename ->stream_memory_read to ->sock_is_readable
        tcp_bpf: Fix one concurrency problem in the tcp_bpf_send_verdict function
        cgroup: Fix memory leak caused by missing cgroup_bpf_offline
        bpf: Fix error usage of map_fd and fdget() in generic_map_update_batch()
        bpf: Prevent increasing bpf_jit_limit above max
        bpf: Define bpf_jit_alloc_exec_limit for arm64 JIT
        bpf: Define bpf_jit_alloc_exec_limit for riscv JIT
      ====================
      
      Link: https://lore.kernel.org/r/20211026201920.11296-1-daniel@iogearbox.net
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      440ffcdd
    • Toke Høiland-Jørgensen's avatar
      bpf: Fix potential race in tail call compatibility check · 54713c85
      Toke Høiland-Jørgensen authored
      Lorenzo noticed that the code testing for program type compatibility of
      tail call maps is potentially racy in that two threads could encounter a
      map with an unset type simultaneously and both return true even though they
      are inserting incompatible programs.
      
      The race window is quite small, but artificially enlarging it by adding a
      usleep_range() inside the check in bpf_prog_array_compatible() makes it
      trivial to trigger from userspace with a program that does, essentially:
      
              map_fd = bpf_create_map(BPF_MAP_TYPE_PROG_ARRAY, 4, 4, 2, 0);
              pid = fork();
              if (pid) {
                      key = 0;
                      value = xdp_fd;
              } else {
                      key = 1;
                      value = tc_fd;
              }
              err = bpf_map_update_elem(map_fd, &key, &value, 0);
      
      While the race window is small, it has potentially serious ramifications in
      that triggering it would allow a BPF program to tail call to a program of a
      different type. So let's get rid of it by protecting the update with a
      spinlock. The commit in the Fixes tag is the last commit that touches the
      code in question.
      
      v2:
      - Use a spinlock instead of an atomic variable and cmpxchg() (Alexei)
      v3:
      - Put lock and the members it protects into an embedded 'owner' struct (Daniel)
      
      Fixes: 3324b584
      
       ("ebpf: misc core cleanup")
      Reported-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211026110019.363464-1-toke@redhat.com
      54713c85
    • Tejun Heo's avatar
      bpf: Move BPF_MAP_TYPE for INODE_STORAGE and TASK_STORAGE outside of CONFIG_NET · 99d0a383
      Tejun Heo authored
      
      
      bpf_types.h has BPF_MAP_TYPE_INODE_STORAGE and BPF_MAP_TYPE_TASK_STORAGE
      declared inside #ifdef CONFIG_NET although they are built regardless of
      CONFIG_NET. So, when CONFIG_BPF_SYSCALL && !CONFIG_NET, they are built
      without the declarations leading to spurious build failures and not
      registered to bpf_map_types making them unavailable.
      
      Fix it by moving the BPF_MAP_TYPE for the two map types outside of
      CONFIG_NET.
      
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Fixes: a10787e6
      
       ("bpf: Enable task local storage for tracing programs")
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/YXG1cuuSJDqHQfRY@slm.duckdns.org
      99d0a383
    • Alexei Starovoitov's avatar
      Merge branch 'sock_map: fix ->poll() and update selftests' · a94b5aae
      Alexei Starovoitov authored
      Cong Wang says:
      
      ====================
      This patchset fixes ->poll() for sockets in sockmap and updates
      selftests accordingly with select(). Please check each patch
      for more details.
      
      Fixes: c50524ec ("Merge branch 'sockmap: add sockmap support for unix datagram socket'")
      Fixes: 89d69c5d
      
       ("Merge branch 'sockmap: introduce BPF_SK_SKB_VERDICT and support UDP'")
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      
      ---
      v4: add a comment in udp_poll()
      
      v3: drop sk_psock_get_checked()
          reuse tcp_bpf_sock_is_readable()
      
      v2: rename and reuse ->stream_memory_read()
          fix a compile error in sk_psock_get_checked()
      
      Cong Wang (3):
        net: rename ->stream_memory_read to ->sock_is_readable
        skmsg: extract and reuse sk_msg_is_readable()
        net: implement ->sock_is_readable() for UDP and AF_UNIX
      
      ====================
      
      Reviewed-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      a94b5aae
    • Yucong Sun's avatar
      selftests/bpf: Use recv_timeout() instead of retries · 67b82150
      Yucong Sun authored
      
      
      We use non-blocking sockets in those tests, retrying for
      EAGAIN is ugly because there is no upper bound for the packet
      arrival time, at least in theory. After we fix poll() on
      sockmap sockets, now we can switch to select()+recv().
      
      Signed-off-by: default avatarYucong Sun <sunyucong@gmail.com>
      Signed-off-by: default avatarCong Wang <cong.wang@bytedance.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211008203306.37525-5-xiyou.wangcong@gmail.com
      67b82150
    • Cong Wang's avatar
      net: Implement ->sock_is_readable() for UDP and AF_UNIX · af493388
      Cong Wang authored
      
      
      Yucong noticed we can't poll() sockets in sockmap even
      when they are the destination sockets of redirections.
      This is because we never poll any psock queues in ->poll(),
      except for TCP. With ->sock_is_readable() now we can
      overwrite >sock_is_readable(), invoke and implement it for
      both UDP and AF_UNIX sockets.
      
      Reported-by: default avatarYucong Sun <sunyucong@gmail.com>
      Signed-off-by: default avatarCong Wang <cong.wang@bytedance.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211008203306.37525-4-xiyou.wangcong@gmail.com
      af493388
    • Cong Wang's avatar
      skmsg: Extract and reuse sk_msg_is_readable() · fb4e0a5e
      Cong Wang authored
      
      
      tcp_bpf_sock_is_readable() is pretty much generic,
      we can extract it and reuse it for non-TCP sockets.
      
      Signed-off-by: default avatarCong Wang <cong.wang@bytedance.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211008203306.37525-3-xiyou.wangcong@gmail.com
      fb4e0a5e
    • Cong Wang's avatar
      net: Rename ->stream_memory_read to ->sock_is_readable · 7b50ecfc
      Cong Wang authored
      
      
      The proto ops ->stream_memory_read() is currently only used
      by TCP to check whether psock queue is empty or not. We need
      to rename it before reusing it for non-TCP protocols, and
      adjust the exsiting users accordingly.
      
      Signed-off-by: default avatarCong Wang <cong.wang@bytedance.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211008203306.37525-2-xiyou.wangcong@gmail.com
      7b50ecfc
    • Liu Jian's avatar
      tcp_bpf: Fix one concurrency problem in the tcp_bpf_send_verdict function · cd9733f5
      Liu Jian authored
      With two Msgs, msgA and msgB and a user doing nonblocking sendmsg calls (or
      multiple cores) on a single socket 'sk' we could get the following flow.
      
       msgA, sk                               msgB, sk
       -----------                            ---------------
       tcp_bpf_sendmsg()
       lock(sk)
       psock = sk->psock
                                              tcp_bpf_sendmsg()
                                              lock(sk) ... blocking
      tcp_bpf_send_verdict
      if (psock->eval == NONE)
         psock->eval = sk_psock_msg_verdict
       ..
       < handle SK_REDIRECT case >
         release_sock(sk)                     < lock dropped so grab here >
         ret = tcp_bpf_sendmsg_redir
                                              psock = sk->psock
                                              tcp_bpf_send_verdict
       lock_sock(sk) ... blocking on B
                                              if (psock->eval == NONE) <- boom.
                                               psock->eval will have msgA state
      
      The problem here is we dropped the lock on msgA and grabbed it with msgB.
      Now we have old state in psock and importantly psock->eval has not been
      cleared. So msgB will run whatever action was done on A and the verdict
      program may never see it.
      
      Fixes: 604326b4
      
       ("bpf, sockmap: convert to generic sk_msg interface")
      Signed-off-by: default avatarLiu Jian <liujian56@huawei.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20211012052019.184398-1-liujian56@huawei.com
      cd9733f5
  3. Oct 26, 2021
    • Vadym Kochan's avatar
    • Johan Hovold's avatar
      net: lan78xx: fix division by zero in send path · db6c3c06
      Johan Hovold authored
      Add the missing endpoint max-packet sanity check to probe() to avoid
      division by zero in lan78xx_tx_bh() in case a malicious device has
      broken descriptors (or when doing descriptor fuzz testing).
      
      Note that USB core will reject URBs submitted for endpoints with zero
      wMaxPacketSize but that drivers doing packet-size calculations still
      need to handle this (cf. commit 2548288b ("USB: Fix: Don't skip
      endpoint descriptors with maxpacket=0")).
      
      Fixes: 55d7de9d
      
       ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
      Cc: stable@vger.kernel.org      # 4.3
      Cc: Woojung.Huh@microchip.com <Woojung.Huh@microchip.com>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      db6c3c06
    • Pavel Skripkin's avatar
      net: batman-adv: fix error handling · 6f68cd63
      Pavel Skripkin authored
      
      
      Syzbot reported ODEBUG warning in batadv_nc_mesh_free(). The problem was
      in wrong error handling in batadv_mesh_init().
      
      Before this patch batadv_mesh_init() was calling batadv_mesh_free() in case
      of any batadv_*_init() calls failure. This approach may work well, when
      there is some kind of indicator, which can tell which parts of batadv are
      initialized; but there isn't any.
      
      All written above lead to cleaning up uninitialized fields. Even if we hide
      ODEBUG warning by initializing bat_priv->nc.work, syzbot was able to hit
      GPF in batadv_nc_purge_paths(), because hash pointer in still NULL. [1]
      
      To fix these bugs we can unwind batadv_*_init() calls one by one.
      It is good approach for 2 reasons: 1) It fixes bugs on error handling
      path 2) It improves the performance, since we won't call unneeded
      batadv_*_free() functions.
      
      So, this patch makes all batadv_*_init() clean up all allocated memory
      before returning with an error to no call correspoing batadv_*_free()
      and open-codes batadv_mesh_free() with proper order to avoid touching
      uninitialized fields.
      
      Link: https://lore.kernel.org/netdev/000000000000c87fbd05cef6bcb0@google.com/ [1]
      Reported-and-tested-by: default avatar <syzbot+28b0702ada0bf7381f58@syzkaller.appspotmail.com>
      Fixes: c6c8fea2
      
       ("net: Add batman-adv meshing protocol")
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Acked-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6f68cd63
    • Max VA's avatar
      tipc: fix size validations for the MSG_CRYPTO type · fa40d973
      Max VA authored
      The function tipc_crypto_key_rcv is used to parse MSG_CRYPTO messages
      to receive keys from other nodes in the cluster in order to decrypt any
      further messages from them.
      This patch verifies that any supplied sizes in the message body are
      valid for the received message.
      
      Fixes: 1ef6f7c9
      
       ("tipc: add automatic session key exchange")
      Signed-off-by: default avatarMax VA <maxv@sentinelone.com>
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: default avatarJon Maloy <jmaloy@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fa40d973
    • Krzysztof Kozlowski's avatar
      nfc: port100: fix using -ERRNO as command type mask · 2195f206
      Krzysztof Kozlowski authored
      During probing, the driver tries to get a list (mask) of supported
      command types in port100_get_command_type_mask() function.  The value
      is u64 and 0 is treated as invalid mask (no commands supported).  The
      function however returns also -ERRNO as u64 which will be interpret as
      valid command mask.
      
      Return 0 on every error case of port100_get_command_type_mask(), so the
      probing will stop.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 0347a6ab
      
       ("NFC: port100: Commands mechanism implementation")
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2195f206
    • David S. Miller's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · eacd68b7
      David S. Miller authored
      
      
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2021-10-25
      
      This series contains updates to ice driver only.
      
      Dave adds event handler for LAG NETDEV_UNREGISTER to unlink device from
      link aggregate.
      
      Yongxin Liu adds a check for PTP support during release which would
      cause a call trace on non-PTP supported devices.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eacd68b7
    • Cyril Strejc's avatar
      net: multicast: calculate csum of looped-back and forwarded packets · 9122a70a
      Cyril Strejc authored
      During a testing of an user-space application which transmits UDP
      multicast datagrams and utilizes multicast routing to send the UDP
      datagrams out of defined network interfaces, I've found a multicast
      router does not fill-in UDP checksum into locally produced, looped-back
      and forwarded UDP datagrams, if an original output NIC the datagrams
      are sent to has UDP TX checksum offload enabled.
      
      The datagrams are sent malformed out of the NIC the datagrams have been
      forwarded to.
      
      It is because:
      
      1. If TX checksum offload is enabled on the output NIC, UDP checksum
         is not calculated by kernel and is not filled into skb data.
      
      2. dev_loopback_xmit(), which is called solely by
         ip_mc_finish_output(), sets skb->ip_summed = CHECKSUM_UNNECESSARY
         unconditionally.
      
      3. Since 35fc92a9
      
       ("[NET]: Allow forwarding of ip_summed except
         CHECKSUM_COMPLETE"), the ip_summed value is preserved during
         forwarding.
      
      4. If ip_summed != CHECKSUM_PARTIAL, checksum is not calculated during
         a packet egress.
      
      The minimum fix in dev_loopback_xmit():
      
      1. Preserves skb->ip_summed CHECKSUM_PARTIAL. This is the
         case when the original output NIC has TX checksum offload enabled.
         The effects are:
      
           a) If the forwarding destination interface supports TX checksum
              offloading, the NIC driver is responsible to fill-in the
              checksum.
      
           b) If the forwarding destination interface does NOT support TX
              checksum offloading, checksums are filled-in by kernel before
              skb is submitted to the NIC driver.
      
           c) For local delivery, checksum validation is skipped as in the
              case of CHECKSUM_UNNECESSARY, thanks to skb_csum_unnecessary().
      
      2. Translates ip_summed CHECKSUM_NONE to CHECKSUM_UNNECESSARY. It
         means, for CHECKSUM_NONE, the behavior is unmodified and is there
         to skip a looped-back packet local delivery checksum validation.
      
      Signed-off-by: default avatarCyril Strejc <cyril.strejc@skoda.cz>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9122a70a
    • Ido Schimmel's avatar
      mlxsw: pci: Recycle received packet upon allocation failure · 75963576
      Ido Schimmel authored
      When the driver fails to allocate a new Rx buffer, it passes an empty Rx
      descriptor (contains zero address and size) to the device and marks it
      as invalid by setting the skb pointer in the descriptor's metadata to
      NULL.
      
      After processing enough Rx descriptors, the driver will try to process
      the invalid descriptor, but will return immediately seeing that the skb
      pointer is NULL. Since the driver no longer passes new Rx descriptors to
      the device, the Rx queue will eventually become full and the device will
      start to drop packets.
      
      Fix this by recycling the received packet if allocation of the new
      packet failed. This means that allocation is no longer performed at the
      end of the Rx routine, but at the start, before tearing down the DMA
      mapping of the received packet.
      
      Remove the comment about the descriptor being zeroed as it is no longer
      correct. This is OK because we either use the descriptor as-is (when
      recycling) or overwrite its address and size fields with that of the
      newly allocated Rx buffer.
      
      The issue was discovered when a process ("perf") consumed too much
      memory and put the system under memory pressure. It can be reproduced by
      injecting slab allocation failures [1]. After the fix, the Rx queue no
      longer comes to a halt.
      
      [1]
       # echo 10 > /sys/kernel/debug/failslab/times
       # echo 1000 > /sys/kernel/debug/failslab/interval
       # echo 100 > /sys/kernel/debug/failslab/probability
      
       FAULT_INJECTION: forcing a failure.
       name failslab, interval 1000, probability 100, space 0, times 8
       [...]
       Call Trace:
        <IRQ>
        dump_stack_lvl+0x34/0x44
        should_fail.cold+0x32/0x37
        should_failslab+0x5/0x10
        kmem_cache_alloc_node+0x23/0x190
        __alloc_skb+0x1f9/0x280
        __netdev_alloc_skb+0x3a/0x150
        mlxsw_pci_rdq_skb_alloc+0x24/0x90
        mlxsw_pci_cq_tasklet+0x3dc/0x1200
        tasklet_action_common.constprop.0+0x9f/0x100
        __do_softirq+0xb5/0x252
        irq_exit_rcu+0x7a/0xa0
        common_interrupt+0x83/0xa0
        </IRQ>
        asm_common_interrupt+0x1e/0x40
       RIP: 0010:cpuidle_enter_state+0xc8/0x340
       [...]
       mlxsw_spectrum2 0000:06:00.0: Failed to alloc skb for RDQ
      
      Fixes: eda6500a
      
       ("mlxsw: Add PCI bus implementation")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Link: https://lore.kernel.org/r/20211024064014.1060919-1-idosch@idosch.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      75963576