Skip to content
  1. Dec 22, 2023
    • Linus Torvalds's avatar
      Merge tag 'net-6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 7c5e046b
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from WiFi and bpf.
      
        Current release - regressions:
      
         - bpf: syzkaller found null ptr deref in unix_bpf proto add
      
         - eth: i40e: fix ST code value for clause 45
      
        Previous releases - regressions:
      
         - core: return error from sk_stream_wait_connect() if sk_wait_event()
           fails
      
         - ipv6: revert remove expired routes with a separated list of routes
      
         - wifi rfkill:
             - set GPIO direction
             - fix crash with WED rx support enabled
      
         - bluetooth:
             - fix deadlock in vhci_send_frame
             - fix use-after-free in bt_sock_recvmsg
      
         - eth: mlx5e: fix a race in command alloc flow
      
         - eth: ice: fix PF with enabled XDP going no-carrier after reset
      
         - eth: bnxt_en: do not map packet buffers twice
      
        Previous releases - always broken:
      
         - core:
             - check vlan filter feature in vlan_vids_add_by_dev() and
               vlan_vids_del_by_dev()
             - check dev->gso_max_size in gso_features_check()
      
         - mptcp: fix inconsistent state on fastopen race
      
         - phy: skip LED triggers on PHYs on SFP modules
      
         - eth: mlx5e:
             - fix double free of encap_header
             - fix slab-out-of-bounds in mlx5_query_nic_vport_mac_list()"
      
      * tag 'net-6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits)
        net: check dev->gso_max_size in gso_features_check()
        kselftest: rtnetlink.sh: use grep_fail when expecting the cmd fail
        net/ipv6: Revert remove expired routes with a separated list of routes
        net: avoid build bug in skb extension length calculation
        net: ethernet: mtk_wed: fix possible NULL pointer dereference in mtk_wed_wo_queue_tx_clean()
        net: stmmac: fix incorrect flag check in timestamp interrupt
        selftests: add vlan hw filter tests
        net: check vlan filter feature in vlan_vids_add_by_dev() and vlan_vids_del_by_dev()
        net: hns3: add new maintainer for the HNS3 ethernet driver
        net: mana: select PAGE_POOL
        net: ks8851: Fix TX stall caused by TX buffer overrun
        ice: Fix PF with enabled XDP going no-carrier after reset
        ice: alter feature support check for SRIOV and LAG
        ice: stop trashing VF VSI aggregator node ID information
        mailmap: add entries for Geliang Tang
        mptcp: fill in missing MODULE_DESCRIPTION()
        mptcp: fix inconsistent state on fastopen race
        selftests: mptcp: join: fix subflow_send_ack lookup
        net: phy: skip LED triggers on PHYs on SFP modules
        bpf: Add missing BPF_LINK_TYPE invocations
        ...
      7c5e046b
  2. Dec 21, 2023
  3. Dec 20, 2023
  4. Dec 19, 2023
    • Paolo Abeni's avatar
      Merge branch 'check-vlan-filter-feature-in-vlan_vids_add_by_dev-and-vlan_vids_del_by_dev' · 8353c2ab
      Paolo Abeni authored
      Liu Jian says:
      
      ====================
      check vlan filter feature in vlan_vids_add_by_dev() and vlan_vids_del_by_dev()
      
      v2->v3:
      	Filter using vlan_hw_filter_capable().
      	Add one basic test.
      ====================
      
      Link: https://lore.kernel.org/r/20231216075219.2379123-1-liujian56@huawei.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      8353c2ab
    • Liu Jian's avatar
      selftests: add vlan hw filter tests · 2258b666
      Liu Jian authored
      
      
      Add one basic vlan hw filter test.
      
      Signed-off-by: default avatarLiu Jian <liujian56@huawei.com>
      Reviewed-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      2258b666
    • Liu Jian's avatar
      net: check vlan filter feature in vlan_vids_add_by_dev() and vlan_vids_del_by_dev() · 01a564ba
      Liu Jian authored
      
      
      I got the below warning trace:
      
      WARNING: CPU: 4 PID: 4056 at net/core/dev.c:11066 unregister_netdevice_many_notify
      CPU: 4 PID: 4056 Comm: ip Not tainted 6.7.0-rc4+ #15
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
      RIP: 0010:unregister_netdevice_many_notify+0x9a4/0x9b0
      Call Trace:
       rtnl_dellink
       rtnetlink_rcv_msg
       netlink_rcv_skb
       netlink_unicast
       netlink_sendmsg
       __sock_sendmsg
       ____sys_sendmsg
       ___sys_sendmsg
       __sys_sendmsg
       do_syscall_64
       entry_SYSCALL_64_after_hwframe
      
      It can be repoduced via:
      
          ip netns add ns1
          ip netns exec ns1 ip link add bond0 type bond mode 0
          ip netns exec ns1 ip link add bond_slave_1 type veth peer veth2
          ip netns exec ns1 ip link set bond_slave_1 master bond0
      [1] ip netns exec ns1 ethtool -K bond0 rx-vlan-filter off
      [2] ip netns exec ns1 ip link add link bond_slave_1 name bond_slave_1.0 type vlan id 0
      [3] ip netns exec ns1 ip link add link bond0 name bond0.0 type vlan id 0
      [4] ip netns exec ns1 ip link set bond_slave_1 nomaster
      [5] ip netns exec ns1 ip link del veth2
          ip netns del ns1
      
      This is all caused by command [1] turning off the rx-vlan-filter function
      of bond0. The reason is the same as commit 01f4fd27 ("bonding: Fix
      incorrect deletion of ETH_P_8021AD protocol vid from slaves"). Commands
      [2] [3] add the same vid to slave and master respectively, causing
      command [4] to empty slave->vlan_info. The following command [5] triggers
      this problem.
      
      To fix this problem, we should add VLAN_FILTER feature checks in
      vlan_vids_add_by_dev() and vlan_vids_del_by_dev() to prevent incorrect
      addition or deletion of vlan_vid information.
      
      Fixes: 348a1443 ("vlan: introduce functions to do mass addition/deletion of vids by another device")
      Signed-off-by: default avatarLiu Jian <liujian56@huawei.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      01a564ba
    • Jijie Shao's avatar
      net: hns3: add new maintainer for the HNS3 ethernet driver · fa94a0c8
      Jijie Shao authored
      
      
      Jijie Shao will be responsible for
      maintaining the hns3 driver's code in the future,
      so add Jijie to the hns3 driver's matainer list.
      
      Signed-off-by: default avatarJijie Shao <shaojijie@huawei.com>
      Link: https://lore.kernel.org/r/20231216070413.233668-1-shaojijie@huawei.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      fa94a0c8
    • Yury Norov's avatar
      net: mana: select PAGE_POOL · 340943fb
      Yury Norov authored
      
      
      Mana uses PAGE_POOL API. x86_64 defconfig doesn't select it:
      
      ld: vmlinux.o: in function `mana_create_page_pool.isra.0':
      mana_en.c:(.text+0x9ae36f): undefined reference to `page_pool_create'
      ld: vmlinux.o: in function `mana_get_rxfrag':
      mana_en.c:(.text+0x9afed1): undefined reference to `page_pool_alloc_pages'
      make[3]: *** [/home/yury/work/linux/scripts/Makefile.vmlinux:37: vmlinux] Error 1
      make[2]: *** [/home/yury/work/linux/Makefile:1154: vmlinux] Error 2
      make[1]: *** [/home/yury/work/linux/Makefile:234: __sub-make] Error 2
      make[1]: Leaving directory '/home/yury/work/build-linux-x86_64'
      make: *** [Makefile:234: __sub-make] Error 2
      
      So we need to select it explicitly.
      
      Signed-off-by: default avatarYury Norov <yury.norov@gmail.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Tested-by: Simon Horman <horms@kernel.org> # build-tested
      Fixes: ca9c54d2 ("net: mana: Add a driver for Microsoft Azure Network Adapter")
      Link: https://lore.kernel.org/r/20231215203353.635379-1-yury.norov@gmail.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      340943fb
    • Ronald Wahl's avatar
      net: ks8851: Fix TX stall caused by TX buffer overrun · 3dc5d445
      Ronald Wahl authored
      
      
      There is a bug in the ks8851 Ethernet driver that more data is written
      to the hardware TX buffer than actually available. This is caused by
      wrong accounting of the free TX buffer space.
      
      The driver maintains a tx_space variable that represents the TX buffer
      space that is deemed to be free. The ks8851_start_xmit_spi() function
      adds an SKB to a queue if tx_space is large enough and reduces tx_space
      by the amount of buffer space it will later need in the TX buffer and
      then schedules a work item. If there is not enough space then the TX
      queue is stopped.
      
      The worker function ks8851_tx_work() dequeues all the SKBs and writes
      the data into the hardware TX buffer. The last packet will trigger an
      interrupt after it was send. Here it is assumed that all data fits into
      the TX buffer.
      
      In the interrupt routine (which runs asynchronously because it is a
      threaded interrupt) tx_space is updated with the current value from the
      hardware. Also the TX queue is woken up again.
      
      Now it could happen that after data was sent to the hardware and before
      handling the TX interrupt new data is queued in ks8851_start_xmit_spi()
      when the TX buffer space had still some space left. When the interrupt
      is actually handled tx_space is updated from the hardware but now we
      already have new SKBs queued that have not been written to the hardware
      TX buffer yet. Since tx_space has been overwritten by the value from the
      hardware the space is not accounted for.
      
      Now we have more data queued then buffer space available in the hardware
      and ks8851_tx_work() will potentially overrun the hardware TX buffer. In
      many cases it will still work because often the buffer is written out
      fast enough so that no overrun occurs but for example if the peer
      throttles us via flow control then an overrun may happen.
      
      This can be fixed in different ways. The most simple way would be to set
      tx_space to 0 before writing data to the hardware TX buffer preventing
      the queuing of more SKBs until the TX interrupt has been handled. I have
      chosen a slightly more efficient (and still rather simple) way and
      track the amount of data that is already queued and not yet written to
      the hardware. When new SKBs are to be queued the already queued amount
      of data is honoured when checking free TX buffer space.
      
      I tested this with a setup of two linked KS8851 running iperf3 between
      the two in bidirectional mode. Before the fix I got a stall after some
      minutes. With the fix I saw now issues anymore after hours.
      
      Fixes: 3ba81f3e ("net: Micrel KS8851 SPI network driver")
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Ben Dooks <ben.dooks@codethink.co.uk>
      Cc: Tristram Ha <Tristram.Ha@microchip.com>
      Cc: netdev@vger.kernel.org
      Cc: stable@vger.kernel.org # 5.10+
      Signed-off-by: default avatarRonald Wahl <ronald.wahl@raritan.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20231214181112.76052-1-rwahl@gmx.de
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      3dc5d445
    • Steven Rostedt (Google)'s avatar
      ring-buffer: Fix slowpath of interrupted event · b803d7c6
      Steven Rostedt (Google) authored
      To synchronize the timestamps with the ring buffer reservation, there are
      two timestamps that are saved in the buffer meta data.
      
      1. before_stamp
      2. write_stamp
      
      When the two are equal, the write_stamp is considered valid, as in, it may
      be used to calculate the delta of the next event as the write_stamp is the
      timestamp of the previous reserved event on the buffer.
      
      This is done by the following:
      
       /*A*/	w = current position on the ring buffer
      	before = before_stamp
      	after = write_stamp
      	ts = read current timestamp
      
      	if (before != after) {
      		write_stamp is not valid, force adding an absolute
      		timestamp.
      	}
      
       /*B*/	before_stamp = ts
      
       /*C*/	write = local_add_return(event length, position on ring buffer)
      
      	if (w == write - event length) {
      		/* Nothing interrupted between A and C */
       /*E*/		write_stamp = ts;
      		delta = ts - after
      		/*
      		 * If nothing interrupted again,
      		 * before_stamp == write_stamp and write_stamp
      		 * can be used to calculate the delta for
      		 * events that come in after this one.
      		 */
      	} else {
      
      		/*
      		 * The slow path!
      		 * Was interrupted between A and C.
      		 */
      
      This is the place that there's a bug. We currently have:
      
      		after = write_stamp
      		ts = read current timestamp
      
       /*F*/		if (write == current position on the ring buffer &&
      		    after < ts && cmpxchg(write_stamp, after, ts)) {
      
      			delta = ts - after;
      
      		} else {
      			delta = 0;
      		}
      
      The assumption is that if the current position on the ring buffer hasn't
      moved between C and F, then it also was not interrupted, and that the last
      event written has a timestamp that matches the write_stamp. That is the
      write_stamp is valid.
      
      But this may not be the case:
      
      If a task context event was interrupted by softirq between B and C.
      
      And the softirq wrote an event that got interrupted by a hard irq between
      C and E.
      
      and the hard irq wrote an event (does not need to be interrupted)
      
      We have:
      
       /*B*/ before_stamp = ts of normal context
      
         ---> interrupted by softirq
      
      	/*B*/ before_stamp = ts of softirq context
      
      	  ---> interrupted by hardirq
      
      		/*B*/ before_stamp = ts of hard irq context
      		/*E*/ write_stamp = ts of hard irq context
      
      		/* matches and write_stamp valid */
      	  <----
      
      	/*E*/ write_stamp = ts of softirq context
      
      	/* No longer matches before_stamp, write_stamp is not valid! */
      
         <---
      
       w != write - length, go to slow path
      
      // Right now the order of events in the ring buffer is:
      //
      // |-- softirq event --|-- hard irq event --|-- normal context event --|
      //
      
       after = write_stamp (this is the ts of softirq)
       ts = read current timestamp
      
       if (write == current position on the ring buffer [true] &&
           after < ts [true] && cmpxchg(write_stamp, after, ts) [true]) {
      
      	delta = ts - after  [Wrong!]
      
      The delta is to be between the hard irq event and the normal context
      event, but the above logic made the delta between the softirq event and
      the normal context event, where the hard irq event is between the two. This
      will shift all the remaining event timestamps on the sub-buffer
      incorrectly.
      
      The write_stamp is only valid if it matches the before_stamp. The cmpxchg
      does nothing to help this.
      
      Instead, the following logic can be done to fix this:
      
      	before = before_stamp
      	ts = read current timestamp
      	before_stamp = ts
      
      	after = write_stamp
      
      	if (write == current position on the ring buffer &&
      	    after == before && after < ts) {
      
      		delta = ts - after
      
      	} else {
      		delta = 0;
      	}
      
      The above will only use the write_stamp if it still matches before_stamp
      and was tested to not have changed since C.
      
      As a bonus, with this logic we do not need any 64-bit cmpxchg() at all!
      
      This means the 32-bit rb_time_t workaround can finally be removed. But
      that's for a later time.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20231218175229.58ec3daf@gandalf.local.home/
      Link: https://lore.kernel.org/linux-trace-kernel/20231218230712.3a76b081@gandalf.local.home
      
      
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Fixes: dd939425 ("ring-buffer: Do not try to put back write_stamp")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      b803d7c6
    • Linus Torvalds's avatar
      Merge tag 'hid-for-linus-2023121901' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid · 3f10e214
      Linus Torvalds authored
      Pull HID fixes from Jiri Kosina:
      
       - fix for division by zero in Nintendo driver when generic joycon is
         attached, reported and fixed by SteamOS folks (Guilherme G. Piccoli)
      
       - GCC-7 build fix (which is a good cleanup anyway) for Nintendo driver
         (Ryan McClelland)
      
      * tag 'hid-for-linus-2023121901' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
        HID: nintendo: Prevent divide-by-zero on code
        HID: nintendo: fix initializer element is not constant error
      3f10e214