Skip to content
  1. Feb 21, 2022
    • Tao Liu's avatar
      gso: do not skip outer ip header in case of ipip and net_failover · cc20cced
      Tao Liu authored
      We encounter a tcp drop issue in our cloud environment. Packet GROed in
      host forwards to a VM virtio_net nic with net_failover enabled. VM acts
      as a IPVS LB with ipip encapsulation. The full path like:
      host gro -> vm virtio_net rx -> net_failover rx -> ipvs fullnat
       -> ipip encap -> net_failover tx -> virtio_net tx
      
      When net_failover transmits a ipip pkt (gso_type = 0x0103, which means
      SKB_GSO_TCPV4, SKB_GSO_DODGY and SKB_GSO_IPXIP4), there is no gso
      did because it supports TSO and GSO_IPXIP4. But network_header points to
      inner ip header.
      
      Call Trace:
       tcp4_gso_segment        ------> return NULL
       inet_gso_segment        ------> inner iph, network_header points to
       ipip_gso_segment
       inet_gso_segment        ------> outer iph
       skb_mac_gso_segment
      
      Afterwards virtio_net transmits the pkt, only inner ip header is modified.
      And the outer one just keeps unchanged. The pkt will be dropped in remote
      host.
      
      Call Trace:
       inet_gso_segment        ------> inner iph, outer iph is skipped
       skb_mac_gso_segment
       __skb_gso_segment
       validate_xmit_skb
       validate_xmit_skb_list
       sch_direct_xmit
       __qdisc_run
       __dev_queue_xmit        ------> virtio_net
       dev_hard_start_xmit
       __dev_queue_xmit        ------> net_failover
       ip_finish_output2
       ip_output
       iptunnel_xmit
       ip_tunnel_xmit
       ipip_tunnel_xmit        ------> ipip
       dev_hard_start_xmit
       __dev_queue_xmit
       ip_finish_output2
       ip_output
       ip_forward
       ip_rcv
       __netif_receive_skb_one_core
       netif_receive_skb_internal
       napi_gro_receive
       receive_buf
       virtnet_poll
       net_rx_action
      
      The root cause of this issue is specific with the rare combination of
      SKB_GSO_DODGY and a tunnel device that adds an SKB_GSO_ tunnel option.
      SKB_GSO_DODGY is set from external virtio_net. We need to reset network
      header when callbacks.gso_segment() returns NULL.
      
      This patch also includes ipv6_gso_segment(), considering SIT, etc.
      
      Fixes: cb32f511
      
       ("ipip: add GSO/TSO support")
      Signed-off-by: default avatarTao Liu <thomas.liu@ucloud.cn>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc20cced
  2. Feb 20, 2022
    • David S. Miller's avatar
      Merge branch 'bnxt_en-fixes' · 5a344973
      David S. Miller authored
      
      
      Michael Chan says:
      
      ====================
      bnxt_en: Bug fixes
      
      This series contains bug fixes for FEC reporting, ethtool self test,
      multicast setup, devlink health reporting and live patching, and
      a firmware response timeout.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5a344973
    • Kalesh AP's avatar
      bnxt_en: Fix devlink fw_activate · 1278d17a
      Kalesh AP authored
      To install a livepatch, first flash the package to NVM, and then
      activate the patch through the "HWRM_FW_LIVEPATCH" fw command.
      To uninstall a patch from NVM, flash the removal package and then
      activate it through the "HWRM_FW_LIVEPATCH" fw command.
      
      The "HWRM_FW_LIVEPATCH" fw command has to consider following scenarios:
      
      1. no patch in NVM and no patch active. Do nothing.
      2. patch in NVM, but not active. Activate the patch currently in NVM.
      3. patch is not in NVM, but active. Deactivate the patch.
      4. patch in NVM and the patch active. Do nothing.
      
      Fix the code to handle these scenarios during devlink "fw_activate".
      
      To install and activate a live patch:
      devlink dev flash pci/0000:c1:00.0 file thor_patch.pkg
      devlink -f dev reload pci/0000:c1:00.0 action fw_activate limit no_reset
      
      To remove and deactivate a live patch:
      devlink dev flash pci/0000:c1:00.0 file thor_patch_rem.pkg
      devlink -f dev reload pci/0000:c1:00.0 action fw_activate limit no_reset
      
      Fixes: 3c415339
      
       ("bnxt_en: implement firmware live patching")
      Reviewed-by: default avatarVikas Gupta <vikas.gupta@broadcom.com>
      Reviewed-by: default avatarSomnath Kotur <somnath.kotur@broadcom.com>
      Signed-off-by: default avatarKalesh AP <kalesh-anakkur.purayil@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1278d17a
    • Michael Chan's avatar
      bnxt_en: Increase firmware message response DMA wait time · b891106d
      Michael Chan authored
      When polling for the firmware message response, we first poll for the
      response message header.  Once the valid length is detected in the
      header, we poll for the valid bit at the end of the message which
      signals DMA completion.  Normally, this poll time for DMA completion
      is extremely short (0 to a few usec).  But on some devices under some
      rare conditions, it can be up to about 20 msec.
      
      Increase this delay to 50 msec and use udelay() for the first 10 usec
      for the common case, and usleep_range() beyond that.
      
      Also, change the error message to include the above delay time when
      printing the timeout value.
      
      Fixes: 3c8c20db
      
       ("bnxt_en: move HWRM API implementation into separate file")
      Reviewed-by: default avatarVladimir Olovyannikov <vladimir.olovyannikov@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b891106d
    • Kalesh AP's avatar
      bnxt_en: Restore the resets_reliable flag in bnxt_open() · 0e0e3c53
      Kalesh AP authored
      During ifdown, we call bnxt_inv_fw_health_reg() which will clear
      both the status_reliable and resets_reliable flags if these
      registers are mapped.  This is correct because a FW reset during
      ifdown will clear these register mappings.  If we detect that FW
      has gone through reset during the next ifup, we will remap these
      registers.
      
      But during normal ifup with no FW reset, we need to restore the
      resets_reliable flag otherwise we will not show the reset counter
      during devlink diagnose.
      
      Fixes: 8cc95ceb
      
       ("bnxt_en: improve fw diagnose devlink health messages")
      Reviewed-by: default avatarVikas Gupta <vikas.gupta@broadcom.com>
      Reviewed-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Reviewed-by: default avatarSomnath Kotur <somnath.kotur@broadcom.com>
      Signed-off-by: default avatarKalesh AP <kalesh-anakkur.purayil@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0e0e3c53
    • Pavan Chebbi's avatar
      bnxt_en: Fix incorrect multicast rx mask setting when not requested · 8cdb1592
      Pavan Chebbi authored
      We should setup multicast only when net_device flags explicitly
      has IFF_MULTICAST set. Otherwise we will incorrectly turn it on
      even when not asked.  Fix it by only passing the multicast table
      to the firmware if IFF_MULTICAST is set.
      
      Fixes: 7d2837dd
      
       ("bnxt_en: Setup multicast properly after resetting device.")
      Signed-off-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8cdb1592
    • Michael Chan's avatar
      bnxt_en: Fix occasional ethtool -t loopback test failures · cfcab3b3
      Michael Chan authored
      In the current code, we setup the port to PHY or MAC loopback mode
      and then transmit a test broadcast packet for the loopback test.  This
      scheme fails sometime if the port is shared with management firmware
      that can also send packets.  The driver may receive the management
      firmware's packet and the test will fail when the contents don't
      match the test packet.
      
      Change the test packet to use it's own MAC address as the destination
      and setup the port to only receive it's own MAC address.  This should
      filter out other packets sent by management firmware.
      
      Fixes: 91725d89
      
       ("bnxt_en: Add PHY loopback to ethtool self-test.")
      Reviewed-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Reviewed-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Reviewed-by: default avatarAndy Gospodarek <gospo@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cfcab3b3
    • Michael Chan's avatar
      bnxt_en: Fix offline ethtool selftest with RDMA enabled · 6758f937
      Michael Chan authored
      For offline (destructive) self tests, we need to stop the RDMA driver
      first.  Otherwise, the RDMA driver will run into unrecoverable errors
      when destructive firmware tests are being performed.
      
      The irq_re_init parameter used in the half close and half open
      sequence when preparing the NIC for offline tests should be set to
      true because the RDMA driver will free all IRQs before the offline
      tests begin.
      
      Fixes: 55fd0cf3
      
       ("bnxt_en: Add external loopback test to ethtool selftest.")
      Reviewed-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Reviewed-by: default avatarBen Li <ben.li@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6758f937
    • Somnath Kotur's avatar
      bnxt_en: Fix active FEC reporting to ethtool · 84d3c83e
      Somnath Kotur authored
      ethtool --show-fec <interface> does not show anything when the Active
      FEC setting in the chip is set to None.  Fix it to properly return
      ETHTOOL_FEC_OFF in that case.
      
      Fixes: 8b277589
      
       ("bnxt_en: Report FEC settings to ethtool.")
      Signed-off-by: default avatarSomnath Kotur <somnath.kotur@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      84d3c83e
    • Vladimir Oltean's avatar
      net: dsa: avoid call to __dev_set_promiscuity() while rtnl_mutex isn't held · 8940e6b6
      Vladimir Oltean authored
      If the DSA master doesn't support IFF_UNICAST_FLT, then the following
      call path is possible:
      
      dsa_slave_switchdev_event_work
      -> dsa_port_host_fdb_add
         -> dev_uc_add
            -> __dev_set_rx_mode
               -> __dev_set_promiscuity
      
      Since the blamed commit, dsa_slave_switchdev_event_work() no longer
      holds rtnl_lock(), which triggers the ASSERT_RTNL() from
      __dev_set_promiscuity().
      
      Taking rtnl_lock() around dev_uc_add() is impossible, because all the
      code paths that call dsa_flush_workqueue() do so from contexts where the
      rtnl_mutex is already held - so this would lead to an instant deadlock.
      
      dev_uc_add() in itself doesn't require the rtnl_mutex for protection.
      There is this comment in __dev_set_rx_mode() which assumes so:
      
      		/* Unicast addresses changes may only happen under the rtnl,
      		 * therefore calling __dev_set_promiscuity here is safe.
      		 */
      
      but it is from commit 4417da66 ("[NET]: dev: secondary unicast
      address support") dated June 2007, and in the meantime, commit
      f1f28aa3 ("netdev: Add addr_list_lock to struct net_device."), dated
      July 2008, has added &dev->addr_list_lock to protect this instead of the
      global rtnl_mutex.
      
      Nonetheless, __dev_set_promiscuity() does assume rtnl_mutex protection,
      but it is the uncommon path of what we typically expect dev_uc_add()
      to do. So since only the uncommon path requires rtnl_lock(), just check
      ahead of time whether dev_uc_add() would result into a call to
      __dev_set_promiscuity(), and handle that condition separately.
      
      DSA already configures the master interface to be promiscuous if the
      tagger requires this. We can extend this to also cover the case where
      the master doesn't handle dev_uc_add() (doesn't support IFF_UNICAST_FLT),
      and on the premise that we'd end up making it promiscuous during
      operation anyway, either if a DSA slave has a non-inherited MAC address,
      or if the bridge notifies local FDB entries for its own MAC address, the
      address of a station learned on a foreign port, etc.
      
      Fixes: 0faf890f
      
       ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work")
      Reported-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8940e6b6
    • Svenning Sørensen's avatar
      net: dsa: microchip: fix bridging with more than two member ports · 3d00827a
      Svenning Sørensen authored
      Commit b3612ccd ("net: dsa: microchip: implement multi-bridge support")
      plugged a packet leak between ports that were members of different bridges.
      Unfortunately, this broke another use case, namely that of more than two
      ports that are members of the same bridge.
      
      After that commit, when a port is added to a bridge, hardware bridging
      between other member ports of that bridge will be cleared, preventing
      packet exchange between them.
      
      Fix by ensuring that the Port VLAN Membership bitmap includes any existing
      ports in the bridge, not just the port being added.
      
      Fixes: b3612ccd
      
       ("net: dsa: microchip: implement multi-bridge support")
      Signed-off-by: default avatarSvenning Sørensen <sss@secomea.com>
      Tested-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3d00827a
    • Christophe Leroy's avatar
      net: Force inlining of checksum functions in net/checksum.h · 5486f5bf
      Christophe Leroy authored
      All functions defined as static inline in net/checksum.h are
      meant to be inlined for performance reason.
      
      But since commit ac7c3e4f ("compiler: enable
      CONFIG_OPTIMIZE_INLINING forcibly") the compiler is allowed to
      uninline functions when it wants.
      
      Fair enough in the general case, but for tiny performance critical
      checksum helpers that's counter-productive.
      
      The problem mainly arises when selecting CONFIG_CC_OPTIMISE_FOR_SIZE,
      Those helpers being 'static inline' in header files you suddenly find
      them duplicated many times in the resulting vmlinux.
      
      Here is a typical exemple when building powerpc pmac32_defconfig
      with CONFIG_CC_OPTIMISE_FOR_SIZE. csum_sub() appears 4 times:
      
      	c04a23cc <csum_sub>:
      	c04a23cc:	7c 84 20 f8 	not     r4,r4
      	c04a23d0:	7c 63 20 14 	addc    r3,r3,r4
      	c04a23d4:	7c 63 01 94 	addze   r3,r3
      	c04a23d8:	4e 80 00 20 	blr
      		...
      	c04a2ce8:	4b ff f6 e5 	bl      c04a23cc <csum_sub>
      		...
      	c04a2d2c:	4b ff f6 a1 	bl      c04a23cc <csum_sub>
      		...
      	c04a2d54:	4b ff f6 79 	bl      c04a23cc <csum_sub>
      		...
      	c04a754c <csum_sub>:
      	c04a754c:	7c 84 20 f8 	not     r4,r4
      	c04a7550:	7c 63 20 14 	addc    r3,r3,r4
      	c04a7554:	7c 63 01 94 	addze   r3,r3
      	c04a7558:	4e 80 00 20 	blr
      		...
      	c04ac930:	4b ff ac 1d 	bl      c04a754c <csum_sub>
      		...
      	c04ad264:	4b ff a2 e9 	bl      c04a754c <csum_sub>
      		...
      	c04e3b08 <csum_sub>:
      	c04e3b08:	7c 84 20 f8 	not     r4,r4
      	c04e3b0c:	7c 63 20 14 	addc    r3,r3,r4
      	c04e3b10:	7c 63 01 94 	addze   r3,r3
      	c04e3b14:	4e 80 00 20 	blr
      		...
      	c04e5788:	4b ff e3 81 	bl      c04e3b08 <csum_sub>
      		...
      	c04e65c8:	4b ff d5 41 	bl      c04e3b08 <csum_sub>
      		...
      	c0512d34 <csum_sub>:
      	c0512d34:	7c 84 20 f8 	not     r4,r4
      	c0512d38:	7c 63 20 14 	addc    r3,r3,r4
      	c0512d3c:	7c 63 01 94 	addze   r3,r3
      	c0512d40:	4e 80 00 20 	blr
      		...
      	c0512dfc:	4b ff ff 39 	bl      c0512d34 <csum_sub>
      		...
      	c05138bc:	4b ff f4 79 	bl      c0512d34 <csum_sub>
      		...
      
      Restore the expected behaviour by using __always_inline for all
      functions defined in net/checksum.h
      
      vmlinux size is even reduced by 256 bytes with this patch:
      
      	   text	   data	    bss	    dec	    hex	filename
      	6980022	2515362	 194384	9689768	 93daa8	vmlinux.before
      	6979862	2515266	 194384	9689512	 93d9a8	vmlinux.now
      
      Fixes: ac7c3e4f
      
       ("compiler: enable CONFIG_OPTIMIZE_INLINING forcibly")
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5486f5bf
  3. Feb 19, 2022
  4. Feb 18, 2022
    • Xiaoke Wang's avatar
      net: ll_temac: check the return value of devm_kmalloc() · b352c346
      Xiaoke Wang authored
      devm_kmalloc() returns a pointer to allocated memory on success, NULL
      on failure. While lp->indirect_lock is allocated by devm_kmalloc()
      without proper check. It is better to check the value of it to
      prevent potential wrong memory access.
      
      Fixes: f14f5c11
      
       ("net: ll_temac: Support indirect_mutex share within TEMAC IP")
      Signed-off-by: default avatarXiaoke Wang <xkernel.wang@foxmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b352c346
    • Eric Dumazet's avatar
      net-timestamp: convert sk->sk_tskey to atomic_t · a1cdec57
      Eric Dumazet authored
      UDP sendmsg() can be lockless, this is causing all kinds
      of data races.
      
      This patch converts sk->sk_tskey to remove one of these races.
      
      BUG: KCSAN: data-race in __ip_append_data / __ip_append_data
      
      read to 0xffff8881035d4b6c of 4 bytes by task 8877 on cpu 1:
       __ip_append_data+0x1c1/0x1de0 net/ipv4/ip_output.c:994
       ip_make_skb+0x13f/0x2d0 net/ipv4/ip_output.c:1636
       udp_sendmsg+0x12bd/0x14c0 net/ipv4/udp.c:1249
       inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg net/socket.c:725 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
       ___sys_sendmsg net/socket.c:2467 [inline]
       __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553
       __do_sys_sendmmsg net/socket.c:2582 [inline]
       __se_sys_sendmmsg net/socket.c:2579 [inline]
       __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      write to 0xffff8881035d4b6c of 4 bytes by task 8880 on cpu 0:
       __ip_append_data+0x1d8/0x1de0 net/ipv4/ip_output.c:994
       ip_make_skb+0x13f/0x2d0 net/ipv4/ip_output.c:1636
       udp_sendmsg+0x12bd/0x14c0 net/ipv4/udp.c:1249
       inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg net/socket.c:725 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
       ___sys_sendmsg net/socket.c:2467 [inline]
       __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553
       __do_sys_sendmmsg net/socket.c:2582 [inline]
       __se_sys_sendmmsg net/socket.c:2579 [inline]
       __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x0000054d -> 0x0000054e
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 8880 Comm: syz-executor.5 Not tainted 5.17.0-rc2-syzkaller-00167-gdcb85f85fa6f-dirty #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 09c2d251
      
       ("net-timestamp: add key to disambiguate concurrent datagrams")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a1cdec57
    • Oliver Neukum's avatar
      sr9700: sanity check for packet length · e9da0b56
      Oliver Neukum authored
      
      
      A malicious device can leak heap data to user space
      providing bogus frame lengths. Introduce a sanity check.
      
      Signed-off-by: default avatarOliver Neukum <oneukum@suse.com>
      Reviewed-by: default avatarGrant Grundler <grundler@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e9da0b56
    • Paul Blakey's avatar
      net/sched: act_ct: Fix flow table lookup after ct clear or switching zones · 2f131de3
      Paul Blakey authored
      Flow table lookup is skipped if packet either went through ct clear
      action (which set the IP_CT_UNTRACKED flag on the packet), or while
      switching zones and there is already a connection associated with
      the packet. This will result in no SW offload of the connection,
      and the and connection not being removed from flow table with
      TCP teardown (fin/rst packet).
      
      To fix the above, remove these unneccary checks in flow
      table lookup.
      
      Fixes: 46475bb2
      
       ("net/sched: act_ct: Software offload of established flows")
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f131de3
    • suresh kumar's avatar
      net-sysfs: add check for netdevice being present to speed_show · 4224cfd7
      suresh kumar authored
      
      
      When bringing down the netdevice or system shutdown, a panic can be
      triggered while accessing the sysfs path because the device is already
      removed.
      
          [  755.549084] mlx5_core 0000:12:00.1: Shutdown was called
          [  756.404455] mlx5_core 0000:12:00.0: Shutdown was called
          ...
          [  757.937260] BUG: unable to handle kernel NULL pointer dereference at           (null)
          [  758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280
      
          crash> bt
          ...
          PID: 12649  TASK: ffff8924108f2100  CPU: 1   COMMAND: "amsd"
          ...
           #9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778
              [exception RIP: dma_pool_alloc+0x1ab]
              RIP: ffffffff8ee11acb  RSP: ffff89240e1a3968  RFLAGS: 00010046
              RAX: 0000000000000246  RBX: ffff89243d874100  RCX: 0000000000001000
              RDX: 0000000000000000  RSI: 0000000000000246  RDI: ffff89243d874090
              RBP: ffff89240e1a39c0   R8: 000000000001f080   R9: ffff8905ffc03c00
              R10: ffffffffc04680d4  R11: ffffffff8edde9fd  R12: 00000000000080d0
              R13: ffff89243d874090  R14: ffff89243d874080  R15: 0000000000000000
              ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
          #10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core]
          #11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core]
          #12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core]
          #13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core]
          #14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core]
          #15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core]
          #16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core]
          #17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46
          #18 [ffff89240e1a3d48] speed_show at ffffffff8f277208
          #19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3
          #20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf
          #21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596
          #22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10
          #23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5
          #24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff
          #25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f
          #26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92
      
          crash> net_device.state ffff89443b0c0000
            state = 0x5  (__LINK_STATE_START| __LINK_STATE_NOCARRIER)
      
      To prevent this scenario, we also make sure that the netdevice is present.
      
      Signed-off-by: default avatarsuresh kumar <suresh2514@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4224cfd7
    • Duoming Zhou's avatar
      drivers: hamradio: 6pack: fix UAF bug caused by mod_timer() · efe4186e
      Duoming Zhou authored
      
      
      When a 6pack device is detaching, the sixpack_close() will act to cleanup
      necessary resources. Although del_timer_sync() in sixpack_close()
      won't return if there is an active timer, one could use mod_timer() in
      sp_xmit_on_air() to wake up timer again by calling userspace syscall such
      as ax25_sendmsg(), ax25_connect() and ax25_ioctl().
      
      This unexpected waked handler, sp_xmit_on_air(), realizes nothing about
      the undergoing cleanup and may still call pty_write() to use driver layer
      resources that have already been released.
      
      One of the possible race conditions is shown below:
      
            (USE)                      |      (FREE)
      ax25_sendmsg()                   |
       ax25_queue_xmit()               |
        ...                            |
        sp_xmit()                      |
         sp_encaps()                   | sixpack_close()
          sp_xmit_on_air()             |  del_timer_sync(&sp->tx_t)
           mod_timer(&sp->tx_t,...)    |  ...
                                       |  unregister_netdev()
                                       |  ...
           (wait a while)              | tty_release()
                                       |  tty_release_struct()
                                       |   release_tty()
          sp_xmit_on_air()             |    tty_kref_put(tty_struct) //FREE
           pty_write(tty_struct) //USE |    ...
      
      The corresponding fail log is shown below:
      ===============================================================
      BUG: KASAN: use-after-free in __run_timers.part.0+0x170/0x470
      Write of size 8 at addr ffff88800a652ab8 by task swapper/2/0
      ...
      Call Trace:
        ...
        queue_work_on+0x3f/0x50
        pty_write+0xcd/0xe0pty_write+0xcd/0xe0
        sp_xmit_on_air+0xb2/0x1f0
        call_timer_fn+0x28/0x150
        __run_timers.part.0+0x3c2/0x470
        run_timer_softirq+0x3b/0x80
        __do_softirq+0xf1/0x380
        ...
      
      This patch reorders the del_timer_sync() after the unregister_netdev()
      to avoid UAF bugs. Because the unregister_netdev() is well synchronized,
      it flushs out any pending queues, waits the refcount of net_device
      decreases to zero and removes net_device from kernel. There is not any
      running routines after executing unregister_netdev(). Therefore, we could
      not arouse timer from userspace again.
      
      Signed-off-by: default avatarDuoming Zhou <duoming@zju.edu.cn>
      Reviewed-by: default avatarLin Ma <linma@zju.edu.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      efe4186e
    • Jakub Kicinski's avatar
      Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 7a2fb912
      Jakub Kicinski authored
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf 2022-02-17
      
      We've added 8 non-merge commits during the last 7 day(s) which contain
      a total of 8 files changed, 119 insertions(+), 15 deletions(-).
      
      The main changes are:
      
      1) Add schedule points in map batch ops, from Eric.
      
      2) Fix bpf_msg_push_data with len 0, from Felix.
      
      3) Fix crash due to incorrect copy_map_value, from Kumar.
      
      4) Fix crash due to out of bounds access into reg2btf_ids, from Kumar.
      
      5) Fix a bpf_timer initialization issue with clang, from Yonghong.
      
      * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        bpf: Add schedule points in batch ops
        bpf: Fix crash due to out of bounds access into reg2btf_ids.
        selftests: bpf: Check bpf_msg_push_data return value
        bpf: Fix a bpf_timer initialization issue
        bpf: Emit bpf_timer in vmlinux BTF
        selftests/bpf: Add test for bpf_timer overwriting crash
        bpf: Fix crash due to incorrect copy_map_value
        bpf: Do not try bpf_msg_push_data with len 0
      ====================
      
      Link: https://lore.kernel.org/r/20220217190000.37925-1-alexei.starovoitov@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7a2fb912
    • Linus Torvalds's avatar
      Merge tag 'net-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 8b97cae3
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from wireless and netfilter.
      
        Current release - regressions:
      
         - dsa: lantiq_gswip: fix use after free in gswip_remove()
      
         - smc: avoid overwriting the copies of clcsock callback functions
      
        Current release - new code bugs:
      
         - iwlwifi:
            - fix use-after-free when no FW is present
            - mei: fix the pskb_may_pull check in ipv4
            - mei: retry mapping the shared area
            - mvm: don't feed the hardware RFKILL into iwlmei
      
        Previous releases - regressions:
      
         - ipv6: mcast: use rcu-safe version of ipv6_get_lladdr()
      
         - tipc: fix wrong publisher node address in link publications
      
         - iwlwifi: mvm: don't send SAR GEO command for 3160 devices, avoid FW
           assertion
      
         - bgmac: make idm and nicpm resource optional again
      
         - atl1c: fix tx timeout after link flap
      
        Previous releases - always broken:
      
         - vsock: remove vsock from connected table when connect is
           interrupted by a signal
      
         - ping: change destination interface checks to match raw sockets
      
         - crypto: af_alg - get rid of alg_memory_allocated to avoid confusing
           semantics (and null-deref) after SO_RESERVE_MEM was added
      
         - ipv6: make exclusive flowlabel checks per-netns
      
         - bonding: force carrier update when releasing slave
      
         - sched: limit TC_ACT_REPEAT loops
      
         - bridge: multicast: notify switchdev driver whenever MC processing
           gets disabled because of max entries reached
      
         - wifi: brcmfmac: fix crash in brcm_alt_fw_path when WLAN not found
      
         - iwlwifi: fix locking when "HW not ready"
      
         - phy: mediatek: remove PHY mode check on MT7531
      
         - dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN
      
         - dsa: lan9303:
            - fix polarity of reset during probe
            - fix accelerated VLAN handling"
      
      * tag 'net-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits)
        bonding: force carrier update when releasing slave
        nfp: flower: netdev offload check for ip6gretap
        ipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt
        ipv4: fix data races in fib_alias_hw_flags_set
        net: dsa: lan9303: add VLAN IDs to master device
        net: dsa: lan9303: handle hwaccel VLAN tags
        vsock: remove vsock from connected table when connect is interrupted by a signal
        Revert "net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname"
        ping: fix the dif and sdif check in ping_lookup
        net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990
        net: sched: limit TC_ACT_REPEAT loops
        tipc: fix wrong notification node addresses
        net: dsa: lantiq_gswip: fix use after free in gswip_remove()
        ipv6: per-netns exclusive flowlabel checks
        net: bridge: multicast: notify switchdev driver whenever MC processing gets disabled
        CDC-NCM: avoid overflow in sanity checking
        mctp: fix use after free
        net: mscc: ocelot: fix use-after-free in ocelot_vlan_del()
        bonding: fix data-races around agg_select_timer
        dpaa2-eth: Initialize mutex used in one step timestamping path
        ...
      8b97cae3
    • Zhang Changzhong's avatar
      bonding: force carrier update when releasing slave · a6ab75ce
      Zhang Changzhong authored
      In __bond_release_one(), bond_set_carrier() is only called when bond
      device has no slave. Therefore, if we remove the up slave from a master
      with two slaves and keep the down slave, the master will remain up.
      
      Fix this by moving bond_set_carrier() out of if (!bond_has_slaves(bond))
      statement.
      
      Reproducer:
      $ insmod bonding.ko mode=0 miimon=100 max_bonds=2
      $ ifconfig bond0 up
      $ ifenslave bond0 eth0 eth1
      $ ifconfig eth0 down
      $ ifenslave -d bond0 eth1
      $ cat /proc/net/bonding/bond0
      
      Fixes: ff59c456
      
       ("[PATCH] bonding: support carrier state for master")
      Signed-off-by: default avatarZhang Changzhong <zhangchangzhong@huawei.com>
      Acked-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Link: https://lore.kernel.org/r/1645021088-38370-1-git-send-email-zhangchangzhong@huawei.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a6ab75ce
    • Eric Dumazet's avatar
      bpf: Add schedule points in batch ops · 75134f16
      Eric Dumazet authored
      syzbot reported various soft lockups caused by bpf batch operations.
      
       INFO: task kworker/1:1:27 blocked for more than 140 seconds.
       INFO: task hung in rcu_barrier
      
      Nothing prevents batch ops to process huge amount of data,
      we need to add schedule points in them.
      
      Note that maybe_wait_bpf_programs(map) calls from
      generic_map_delete_batch() can be factorized by moving
      the call after the loop.
      
      This will be done later in -next tree once we get this fix merged,
      unless there is strong opinion doing this optimization sooner.
      
      Fixes: aa2e93b8 ("bpf: Add generic support for update and delete batch ops")
      Fixes: cb4d03ab
      
       ("bpf: Add generic support for lookup batch op")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: default avatarStanislav Fomichev <sdf@google.com>
      Acked-by: default avatarBrian Vazquez <brianvv@google.com>
      Link: https://lore.kernel.org/bpf/20220217181902.808742-1-eric.dumazet@gmail.com
      75134f16
    • Luis Chamberlain's avatar
      fs/file_table: fix adding missing kmemleak_not_leak() · a3580ac9
      Luis Chamberlain authored
      Commit b42bc9a3 ("Fix regression due to "fs: move binfmt_misc sysctl
      to its own file") fixed a regression, however it failed to add a
      kmemleak_not_leak().
      
      Fixes: b42bc9a3
      
       ("Fix regression due to "fs: move binfmt_misc sysctl to its own file")
      Reported-by: default avatarTong Zhang <ztong0001@gmail.com>
      Cc: Tong Zhang <ztong0001@gmail.com>
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a3580ac9