Skip to content
  1. Jul 18, 2023
    • Ahmed Zaki's avatar
      iavf: fix a deadlock caused by rtnl and driver's lock circular dependencies · d1639a17
      Ahmed Zaki authored
      A driver's lock (crit_lock) is used to serialize all the driver's tasks.
      Lockdep, however, shows a circular dependency between rtnl and
      crit_lock. This happens when an ndo that already holds the rtnl requests
      the driver to reset, since the reset task (in some paths) tries to grab
      rtnl to either change real number of queues of update netdev features.
      
        [566.241851] ======================================================
        [566.241893] WARNING: possible circular locking dependency detected
        [566.241936] 6.2.14-100.fc36.x86_64+debug #1 Tainted: G           OE
        [566.241984] ------------------------------------------------------
        [566.242025] repro.sh/2604 is trying to acquire lock:
        [566.242061] ffff9280fc5ceee8 (&adapter->crit_lock){+.+.}-{3:3}, at: iavf_close+0x3c/0x240 [iavf]
        [566.242167]
                     but task is already holding lock:
        [566.242209] ffffffff9976d350 (rtnl_mutex){+.+.}-{3:3}, at: iavf_remove+0x6b5/0x730 [iavf]
        [566.242300]
                     which lock already depends on the new lock.
      
        [566.242353]
                     the existing dependency chain (in reverse order) is:
        [566.242401]
                     -> #1 (rtnl_mutex){+.+.}-{3:3}:
        [566.242451]        __mutex_lock+0xc1/0xbb0
        [566.242489]        iavf_init_interrupt_scheme+0x179/0x440 [iavf]
        [566.242560]        iavf_watchdog_task+0x80b/0x1400 [iavf]
        [566.242627]        process_one_work+0x2b3/0x560
        [566.242663]        worker_thread+0x4f/0x3a0
        [566.242696]        kthread+0xf2/0x120
        [566.242730]        ret_from_fork+0x29/0x50
        [566.242763]
                     -> #0 (&adapter->crit_lock){+.+.}-{3:3}:
        [566.242815]        __lock_acquire+0x15ff/0x22b0
        [566.242869]        lock_acquire+0xd2/0x2c0
        [566.242901]        __mutex_lock+0xc1/0xbb0
        [566.242934]        iavf_close+0x3c/0x240 [iavf]
        [566.242997]        __dev_close_many+0xac/0x120
        [566.243036]        dev_close_many+0x8b/0x140
        [566.243071]        unregister_netdevice_many_notify+0x165/0x7c0
        [566.243116]        unregister_netdevice_queue+0xd3/0x110
        [566.243157]        iavf_remove+0x6c1/0x730 [iavf]
        [566.243217]        pci_device_remove+0x33/0xa0
        [566.243257]        device_release_driver_internal+0x1bc/0x240
        [566.243299]        pci_stop_bus_device+0x6c/0x90
        [566.243338]        pci_stop_and_remove_bus_device+0xe/0x20
        [566.243380]        pci_iov_remove_virtfn+0xd1/0x130
        [566.243417]        sriov_disable+0x34/0xe0
        [566.243448]        ice_free_vfs+0x2da/0x330 [ice]
        [566.244383]        ice_sriov_configure+0x88/0xad0 [ice]
        [566.245353]        sriov_numvfs_store+0xde/0x1d0
        [566.246156]        kernfs_fop_write_iter+0x15e/0x210
        [566.246921]        vfs_write+0x288/0x530
        [566.247671]        ksys_write+0x74/0xf0
        [566.248408]        do_syscall_64+0x58/0x80
        [566.249145]        entry_SYSCALL_64_after_hwframe+0x72/0xdc
        [566.249886]
                       other info that might help us debug this:
      
        [566.252014]  Possible unsafe locking scenario:
      
        [566.253432]        CPU0                    CPU1
        [566.254118]        ----                    ----
        [566.254800]   lock(rtnl_mutex);
        [566.255514]                                lock(&adapter->crit_lock);
        [566.256233]                                lock(rtnl_mutex);
        [566.256897]   lock(&adapter->crit_lock);
        [566.257388]
                        *** DEADLOCK ***
      
      The deadlock can be triggered by a script that is continuously resetting
      the VF adapter while doing other operations requiring RTNL, e.g:
      
      	while :; do
      		ip link set $VF up
      		ethtool --set-channels $VF combined 2
      		ip link set $VF down
      		ip link set $VF up
      		ethtool --set-channels $VF combined 4
      		ip link set $VF down
      	done
      
      Any operation that triggers a reset can substitute "ethtool --set-channles"
      
      As a fix, add a new task "finish_config" that do all the work which
      needs rtnl lock. With the exception of iavf_remove(), all work that
      require rtnl should be called from this task.
      
      As for iavf_remove(), at the point where we need to call
      unregister_netdevice() (and grab rtnl_lock), we make sure the finish_config
      task is not running (cancel_work_sync()) to safely grab rtnl. Subsequent
      finish_config work cannot restart after that since the task is guarded
      by the __IAVF_IN_REMOVE_TASK bit in iavf_schedule_finish_config().
      
      Fixes: 5ac49f3c
      
       ("iavf: use mutexes for locking of critical sections")
      Signed-off-by: default avatarAhmed Zaki <ahmed.zaki@intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      d1639a17
    • Marcin Szycik's avatar
      Revert "iavf: Do not restart Tx queues after reset task failure" · d916d273
      Marcin Szycik authored
      This reverts commit 08f1c147
      
      .
      
      Netdev is no longer being detached during reset, so this fix can be
      reverted. We leave the removal of "hacky" IFF_UP flag update.
      
      Signed-off-by: default avatarMarcin Szycik <marcin.szycik@linux.intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      d916d273
    • Marcin Szycik's avatar
      Revert "iavf: Detach device during reset task" · d2806d96
      Marcin Szycik authored
      This reverts commit aa626da9
      
      .
      
      Detaching device during reset was not fully fixing the rtnl locking issue,
      as there could be a situation where callback was already in progress before
      detaching netdev.
      
      Furthermore, detaching netdevice causes TX timeouts if traffic is running.
      To reproduce:
      
      ip netns exec ns1 iperf3 -c $PEER_IP -t 600 --logfile /dev/null &
      while :; do
              for i in 200 7000 400 5000 300 3000 ; do
      		ip netns exec ns1 ip link set $VF1 mtu $i
                      sleep 2
              done
              sleep 10
      done
      
      Currently, callbacks such as iavf_change_mtu() wait for the reset.
      If the reset fails to acquire the rtnl_lock, they schedule the netdev
      update for later while continuing the reset flow. Operations like MTU
      changes are performed under the rtnl_lock. Therefore, when the operation
      finishes, another callback that uses rtnl_lock can start.
      
      Signed-off-by: default avatarDawid Wesierski <dawidx.wesierski@intel.com>
      Signed-off-by: default avatarMarcin Szycik <marcin.szycik@linux.intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      d2806d96
    • Marcin Szycik's avatar
      iavf: Wait for reset in callbacks which trigger it · c2ed2403
      Marcin Szycik authored
      There was a fail when trying to add the interface to bonding
      right after changing the MTU on the interface. It was caused
      by bonding interface unable to open the interface due to
      interface being in __RESETTING state because of MTU change.
      
      Add new reset_waitqueue to indicate that reset has finished.
      
      Add waiting for reset to finish in callbacks which trigger hw reset:
      iavf_set_priv_flags(), iavf_change_mtu() and iavf_set_ringparam().
      We use a 5000ms timeout period because on Hyper-V based systems,
      this operation takes around 3000-4000ms. In normal circumstances,
      it doesn't take more than 500ms to complete.
      
      Add a function iavf_wait_for_reset() to reuse waiting for reset code and
      use it also in iavf_set_channels(), which already waits for reset.
      We don't use error handling in iavf_set_channels() as this could
      cause the device to be in incorrect state if the reset was scheduled
      but hit timeout or the waitng function was interrupted by a signal.
      
      Fixes: 4e5e6b5d
      
       ("iavf: Fix return of set the new channel count")
      Signed-off-by: default avatarMarcin Szycik <marcin.szycik@linux.intel.com>
      Co-developed-by: default avatarDawid Wesierski <dawidx.wesierski@intel.com>
      Signed-off-by: default avatarDawid Wesierski <dawidx.wesierski@intel.com>
      Signed-off-by: default avatarSylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
      Signed-off-by: default avatarKamil Maziarz <kamil.maziarz@intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      c2ed2403
    • Ahmed Zaki's avatar
      iavf: use internal state to free traffic IRQs · a77ed5c5
      Ahmed Zaki authored
      If the system tries to close the netdev while iavf_reset_task() is
      running, __LINK_STATE_START will be cleared and netif_running() will
      return false in iavf_reinit_interrupt_scheme(). This will result in
      iavf_free_traffic_irqs() not being called and a leak as follows:
      
          [7632.489326] remove_proc_entry: removing non-empty directory 'irq/999', leaking at least 'iavf-enp24s0f0v0-TxRx-0'
          [7632.490214] WARNING: CPU: 0 PID: 10 at fs/proc/generic.c:718 remove_proc_entry+0x19b/0x1b0
      
      is shown when pci_disable_msix() is later called. Fix by using the
      internal adapter state. The traffic IRQs will always exist if
      state == __IAVF_RUNNING.
      
      Fixes: 5b36e8d0
      
       ("i40evf: Enable VF to request an alternate queue allocation")
      Signed-off-by: default avatarAhmed Zaki <ahmed.zaki@intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      a77ed5c5
    • Ding Hui's avatar
      iavf: Fix out-of-bounds when setting channels on remove · 7c4bced3
      Ding Hui authored
      If we set channels greater during iavf_remove(), and waiting reset done
      would be timeout, then returned with error but changed num_active_queues
      directly, that will lead to OOB like the following logs. Because the
      num_active_queues is greater than tx/rx_rings[] allocated actually.
      
      Reproducer:
      
        [root@host ~]# cat repro.sh
        #!/bin/bash
      
        pf_dbsf="0000:41:00.0"
        vf0_dbsf="0000:41:02.0"
        g_pids=()
      
        function do_set_numvf()
        {
            echo 2 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs
            sleep $((RANDOM%3+1))
            echo 0 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs
            sleep $((RANDOM%3+1))
        }
      
        function do_set_channel()
        {
            local nic=$(ls -1 --indicator-style=none /sys/bus/pci/devices/${vf0_dbsf}/net/)
            [ -z "$nic" ] && { sleep $((RANDOM%3)) ; return 1; }
            ifconfig $nic 192.168.18.5 netmask 255.255.255.0
            ifconfig $nic up
            ethtool -L $nic combined 1
            ethtool -L $nic combined 4
            sleep $((RANDOM%3))
        }
      
        function on_exit()
        {
            local pid
            for pid in "${g_pids[@]}"; do
                kill -0 "$pid" &>/dev/null && kill "$pid" &>/dev/null
            done
            g_pids=()
        }
      
        trap "on_exit; exit" EXIT
      
        while :; do do_set_numvf ; done &
        g_pids+=($!)
        while :; do do_set_channel ; done &
        g_pids+=($!)
      
        wait
      
      Result:
      
      [ 3506.152887] iavf 0000:41:02.0: Removing device
      [ 3510.400799] ==================================================================
      [ 3510.400820] BUG: KASAN: slab-out-of-bounds in iavf_free_all_tx_resources+0x156/0x160 [iavf]
      [ 3510.400823] Read of size 8 at addr ffff88b6f9311008 by task repro.sh/55536
      [ 3510.400823]
      [ 3510.400830] CPU: 101 PID: 55536 Comm: repro.sh Kdump: loaded Tainted: G           O     --------- -t - 4.18.0 #1
      [ 3510.400832] Hardware name: Powerleader PR2008AL/H12DSi-N6, BIOS 2.0 04/09/2021
      [ 3510.400835] Call Trace:
      [ 3510.400851]  dump_stack+0x71/0xab
      [ 3510.400860]  print_address_description+0x6b/0x290
      [ 3510.400865]  ? iavf_free_all_tx_resources+0x156/0x160 [iavf]
      [ 3510.400868]  kasan_report+0x14a/0x2b0
      [ 3510.400873]  iavf_free_all_tx_resources+0x156/0x160 [iavf]
      [ 3510.400880]  iavf_remove+0x2b6/0xc70 [iavf]
      [ 3510.400884]  ? iavf_free_all_rx_resources+0x160/0x160 [iavf]
      [ 3510.400891]  ? wait_woken+0x1d0/0x1d0
      [ 3510.400895]  ? notifier_call_chain+0xc1/0x130
      [ 3510.400903]  pci_device_remove+0xa8/0x1f0
      [ 3510.400910]  device_release_driver_internal+0x1c6/0x460
      [ 3510.400916]  pci_stop_bus_device+0x101/0x150
      [ 3510.400919]  pci_stop_and_remove_bus_device+0xe/0x20
      [ 3510.400924]  pci_iov_remove_virtfn+0x187/0x420
      [ 3510.400927]  ? pci_iov_add_virtfn+0xe10/0xe10
      [ 3510.400929]  ? pci_get_subsys+0x90/0x90
      [ 3510.400932]  sriov_disable+0xed/0x3e0
      [ 3510.400936]  ? bus_find_device+0x12d/0x1a0
      [ 3510.400953]  i40e_free_vfs+0x754/0x1210 [i40e]
      [ 3510.400966]  ? i40e_reset_all_vfs+0x880/0x880 [i40e]
      [ 3510.400968]  ? pci_get_device+0x7c/0x90
      [ 3510.400970]  ? pci_get_subsys+0x90/0x90
      [ 3510.400982]  ? pci_vfs_assigned.part.7+0x144/0x210
      [ 3510.400987]  ? __mutex_lock_slowpath+0x10/0x10
      [ 3510.400996]  i40e_pci_sriov_configure+0x1fa/0x2e0 [i40e]
      [ 3510.401001]  sriov_numvfs_store+0x214/0x290
      [ 3510.401005]  ? sriov_totalvfs_show+0x30/0x30
      [ 3510.401007]  ? __mutex_lock_slowpath+0x10/0x10
      [ 3510.401011]  ? __check_object_size+0x15a/0x350
      [ 3510.401018]  kernfs_fop_write+0x280/0x3f0
      [ 3510.401022]  vfs_write+0x145/0x440
      [ 3510.401025]  ksys_write+0xab/0x160
      [ 3510.401028]  ? __ia32_sys_read+0xb0/0xb0
      [ 3510.401031]  ? fput_many+0x1a/0x120
      [ 3510.401032]  ? filp_close+0xf0/0x130
      [ 3510.401038]  do_syscall_64+0xa0/0x370
      [ 3510.401041]  ? page_fault+0x8/0x30
      [ 3510.401043]  entry_SYSCALL_64_after_hwframe+0x65/0xca
      [ 3510.401073] RIP: 0033:0x7f3a9bb842c0
      [ 3510.401079] Code: 73 01 c3 48 8b 0d d8 cb 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 89 24 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 fe dd 01 00 48 89 04 24
      [ 3510.401080] RSP: 002b:00007ffc05f1fe18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [ 3510.401083] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f3a9bb842c0
      [ 3510.401085] RDX: 0000000000000002 RSI: 0000000002327408 RDI: 0000000000000001
      [ 3510.401086] RBP: 0000000002327408 R08: 00007f3a9be53780 R09: 00007f3a9c8a4700
      [ 3510.401086] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002
      [ 3510.401087] R13: 0000000000000001 R14: 00007f3a9be52620 R15: 0000000000000001
      [ 3510.401090]
      [ 3510.401093] Allocated by task 76795:
      [ 3510.401098]  kasan_kmalloc+0xa6/0xd0
      [ 3510.401099]  __kmalloc+0xfb/0x200
      [ 3510.401104]  iavf_init_interrupt_scheme+0x26f/0x1310 [iavf]
      [ 3510.401108]  iavf_watchdog_task+0x1d58/0x4050 [iavf]
      [ 3510.401114]  process_one_work+0x56a/0x11f0
      [ 3510.401115]  worker_thread+0x8f/0xf40
      [ 3510.401117]  kthread+0x2a0/0x390
      [ 3510.401119]  ret_from_fork+0x1f/0x40
      [ 3510.401122]  0xffffffffffffffff
      [ 3510.401123]
      
      In timeout handling, we should keep the original num_active_queues
      and reset num_req_queues to 0.
      
      Fixes: 4e5e6b5d
      
       ("iavf: Fix return of set the new channel count")
      Signed-off-by: default avatarDing Hui <dinghui@sangfor.com.cn>
      Cc: Donglin Peng <pengdonglin@sangfor.com.cn>
      Cc: Huang Cun <huangcun@sangfor.com.cn>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      7c4bced3
    • Ding Hui's avatar
      iavf: Fix use-after-free in free_netdev · 5f4fa167
      Ding Hui authored
      We do netif_napi_add() for all allocated q_vectors[], but potentially
      do netif_napi_del() for part of them, then kfree q_vectors and leave
      invalid pointers at dev->napi_list.
      
      Reproducer:
      
        [root@host ~]# cat repro.sh
        #!/bin/bash
      
        pf_dbsf="0000:41:00.0"
        vf0_dbsf="0000:41:02.0"
        g_pids=()
      
        function do_set_numvf()
        {
            echo 2 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs
            sleep $((RANDOM%3+1))
            echo 0 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs
            sleep $((RANDOM%3+1))
        }
      
        function do_set_channel()
        {
            local nic=$(ls -1 --indicator-style=none /sys/bus/pci/devices/${vf0_dbsf}/net/)
            [ -z "$nic" ] && { sleep $((RANDOM%3)) ; return 1; }
            ifconfig $nic 192.168.18.5 netmask 255.255.255.0
            ifconfig $nic up
            ethtool -L $nic combined 1
            ethtool -L $nic combined 4
            sleep $((RANDOM%3))
        }
      
        function on_exit()
        {
            local pid
            for pid in "${g_pids[@]}"; do
                kill -0 "$pid" &>/dev/null && kill "$pid" &>/dev/null
            done
            g_pids=()
        }
      
        trap "on_exit; exit" EXIT
      
        while :; do do_set_numvf ; done &
        g_pids+=($!)
        while :; do do_set_channel ; done &
        g_pids+=($!)
      
        wait
      
      Result:
      
      [ 4093.900222] ==================================================================
      [ 4093.900230] BUG: KASAN: use-after-free in free_netdev+0x308/0x390
      [ 4093.900232] Read of size 8 at addr ffff88b4dc145640 by task repro.sh/6699
      [ 4093.900233]
      [ 4093.900236] CPU: 10 PID: 6699 Comm: repro.sh Kdump: loaded Tainted: G           O     --------- -t - 4.18.0 #1
      [ 4093.900238] Hardware name: Powerleader PR2008AL/H12DSi-N6, BIOS 2.0 04/09/2021
      [ 4093.900239] Call Trace:
      [ 4093.900244]  dump_stack+0x71/0xab
      [ 4093.900249]  print_address_description+0x6b/0x290
      [ 4093.900251]  ? free_netdev+0x308/0x390
      [ 4093.900252]  kasan_report+0x14a/0x2b0
      [ 4093.900254]  free_netdev+0x308/0x390
      [ 4093.900261]  iavf_remove+0x825/0xd20 [iavf]
      [ 4093.900265]  pci_device_remove+0xa8/0x1f0
      [ 4093.900268]  device_release_driver_internal+0x1c6/0x460
      [ 4093.900271]  pci_stop_bus_device+0x101/0x150
      [ 4093.900273]  pci_stop_and_remove_bus_device+0xe/0x20
      [ 4093.900275]  pci_iov_remove_virtfn+0x187/0x420
      [ 4093.900277]  ? pci_iov_add_virtfn+0xe10/0xe10
      [ 4093.900278]  ? pci_get_subsys+0x90/0x90
      [ 4093.900280]  sriov_disable+0xed/0x3e0
      [ 4093.900282]  ? bus_find_device+0x12d/0x1a0
      [ 4093.900290]  i40e_free_vfs+0x754/0x1210 [i40e]
      [ 4093.900298]  ? i40e_reset_all_vfs+0x880/0x880 [i40e]
      [ 4093.900299]  ? pci_get_device+0x7c/0x90
      [ 4093.900300]  ? pci_get_subsys+0x90/0x90
      [ 4093.900306]  ? pci_vfs_assigned.part.7+0x144/0x210
      [ 4093.900309]  ? __mutex_lock_slowpath+0x10/0x10
      [ 4093.900315]  i40e_pci_sriov_configure+0x1fa/0x2e0 [i40e]
      [ 4093.900318]  sriov_numvfs_store+0x214/0x290
      [ 4093.900320]  ? sriov_totalvfs_show+0x30/0x30
      [ 4093.900321]  ? __mutex_lock_slowpath+0x10/0x10
      [ 4093.900323]  ? __check_object_size+0x15a/0x350
      [ 4093.900326]  kernfs_fop_write+0x280/0x3f0
      [ 4093.900329]  vfs_write+0x145/0x440
      [ 4093.900330]  ksys_write+0xab/0x160
      [ 4093.900332]  ? __ia32_sys_read+0xb0/0xb0
      [ 4093.900334]  ? fput_many+0x1a/0x120
      [ 4093.900335]  ? filp_close+0xf0/0x130
      [ 4093.900338]  do_syscall_64+0xa0/0x370
      [ 4093.900339]  ? page_fault+0x8/0x30
      [ 4093.900341]  entry_SYSCALL_64_after_hwframe+0x65/0xca
      [ 4093.900357] RIP: 0033:0x7f16ad4d22c0
      [ 4093.900359] Code: 73 01 c3 48 8b 0d d8 cb 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 89 24 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 fe dd 01 00 48 89 04 24
      [ 4093.900360] RSP: 002b:00007ffd6491b7f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [ 4093.900362] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f16ad4d22c0
      [ 4093.900363] RDX: 0000000000000002 RSI: 0000000001a41408 RDI: 0000000000000001
      [ 4093.900364] RBP: 0000000001a41408 R08: 00007f16ad7a1780 R09: 00007f16ae1f2700
      [ 4093.900364] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002
      [ 4093.900365] R13: 0000000000000001 R14: 00007f16ad7a0620 R15: 0000000000000001
      [ 4093.900367]
      [ 4093.900368] Allocated by task 820:
      [ 4093.900371]  kasan_kmalloc+0xa6/0xd0
      [ 4093.900373]  __kmalloc+0xfb/0x200
      [ 4093.900376]  iavf_init_interrupt_scheme+0x63b/0x1320 [iavf]
      [ 4093.900380]  iavf_watchdog_task+0x3d51/0x52c0 [iavf]
      [ 4093.900382]  process_one_work+0x56a/0x11f0
      [ 4093.900383]  worker_thread+0x8f/0xf40
      [ 4093.900384]  kthread+0x2a0/0x390
      [ 4093.900385]  ret_from_fork+0x1f/0x40
      [ 4093.900387]  0xffffffffffffffff
      [ 4093.900387]
      [ 4093.900388] Freed by task 6699:
      [ 4093.900390]  __kasan_slab_free+0x137/0x190
      [ 4093.900391]  kfree+0x8b/0x1b0
      [ 4093.900394]  iavf_free_q_vectors+0x11d/0x1a0 [iavf]
      [ 4093.900397]  iavf_remove+0x35a/0xd20 [iavf]
      [ 4093.900399]  pci_device_remove+0xa8/0x1f0
      [ 4093.900400]  device_release_driver_internal+0x1c6/0x460
      [ 4093.900401]  pci_stop_bus_device+0x101/0x150
      [ 4093.900402]  pci_stop_and_remove_bus_device+0xe/0x20
      [ 4093.900403]  pci_iov_remove_virtfn+0x187/0x420
      [ 4093.900404]  sriov_disable+0xed/0x3e0
      [ 4093.900409]  i40e_free_vfs+0x754/0x1210 [i40e]
      [ 4093.900415]  i40e_pci_sriov_configure+0x1fa/0x2e0 [i40e]
      [ 4093.900416]  sriov_numvfs_store+0x214/0x290
      [ 4093.900417]  kernfs_fop_write+0x280/0x3f0
      [ 4093.900418]  vfs_write+0x145/0x440
      [ 4093.900419]  ksys_write+0xab/0x160
      [ 4093.900420]  do_syscall_64+0xa0/0x370
      [ 4093.900421]  entry_SYSCALL_64_after_hwframe+0x65/0xca
      [ 4093.900422]  0xffffffffffffffff
      [ 4093.900422]
      [ 4093.900424] The buggy address belongs to the object at ffff88b4dc144200
                      which belongs to the cache kmalloc-8k of size 8192
      [ 4093.900425] The buggy address is located 5184 bytes inside of
                      8192-byte region [ffff88b4dc144200, ffff88b4dc146200)
      [ 4093.900425] The buggy address belongs to the page:
      [ 4093.900427] page:ffffea00d3705000 refcount:1 mapcount:0 mapping:ffff88bf04415c80 index:0x0 compound_mapcount: 0
      [ 4093.900430] flags: 0x10000000008100(slab|head)
      [ 4093.900433] raw: 0010000000008100 dead000000000100 dead000000000200 ffff88bf04415c80
      [ 4093.900434] raw: 0000000000000000 0000000000030003 00000001ffffffff 0000000000000000
      [ 4093.900434] page dumped because: kasan: bad access detected
      [ 4093.900435]
      [ 4093.900435] Memory state around the buggy address:
      [ 4093.900436]  ffff88b4dc145500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [ 4093.900437]  ffff88b4dc145580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [ 4093.900438] >ffff88b4dc145600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [ 4093.900438]                                            ^
      [ 4093.900439]  ffff88b4dc145680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [ 4093.900440]  ffff88b4dc145700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [ 4093.900440] ==================================================================
      
      Although the patch #2 (of 2) can avoid the issue triggered by this
      repro.sh, there still are other potential risks that if num_active_queues
      is changed to less than allocated q_vectors[] by unexpected, the
      mismatched netif_napi_add/del() can also cause UAF.
      
      Since we actually call netif_napi_add() for all allocated q_vectors
      unconditionally in iavf_alloc_q_vectors(), so we should fix it by
      letting netif_napi_del() match to netif_napi_add().
      
      Fixes: 5eae00c5
      
       ("i40evf: main driver core")
      Signed-off-by: default avatarDing Hui <dinghui@sangfor.com.cn>
      Cc: Donglin Peng <pengdonglin@sangfor.com.cn>
      Cc: Huang Cun <huangcun@sangfor.com.cn>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarMadhu Chittim <madhu.chittim@intel.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      5f4fa167
  2. Jul 17, 2023
  3. Jul 15, 2023
    • Jakub Kicinski's avatar
      Merge branch 'net-fix-kernel-doc-problems-in-include-net' · 0dd1805f
      Jakub Kicinski authored
      
      
      Randy Dunlap says:
      
      ====================
      net: fix kernel-doc problems in include/net/
      
      Fix many (but not all) kernel-doc warnings in include/net/.
      
       [PATCH v2 net 1/9] net: bonding: remove kernel-doc comment marker
       [PATCH v2 net 2/9] net: cfg802154: fix kernel-doc notation warnings
       [PATCH v2 net 3/9] codel: fix kernel-doc notation warnings
       [PATCH v2 net 4/9] devlink: fix kernel-doc notation warnings
       [PATCH v2 net 5/9] inet: frags: remove kernel-doc comment marker
       [PATCH v2 net 6/9] net: llc: fix kernel-doc notation warnings
       [PATCH v2 net 7/9] net: NSH: fix kernel-doc notation warning
       [PATCH v2 net 8/9] pie: fix kernel-doc notation warning
       [PATCH v2 net 9/9] rsi: remove kernel-doc comment marker
      ====================
      
      Link: https://lore.kernel.org/r/20230714045127.18752-1-rdunlap@infradead.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0dd1805f
    • Randy Dunlap's avatar
      rsi: remove kernel-doc comment marker · 04be3c95
      Randy Dunlap authored
      Change an errant kernel-doc comment marker (/**) to a regular
      comment to prevent a kernel-doc warning.
      
      rsi_91x.h:3: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
       * Copyright (c) 2017 Redpine Signals Inc.
      
      Fixes: 4c10d56a
      
       ("rsi: add header file rsi_91x")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Prameela Rani Garnepudi <prameela.j04cs@gmail.com>
      Cc: Siva Rebbagondla <siva.rebbagondla@redpinesignals.com>
      Acked-by: default avatarKalle Valo <kvalo@kernel.org>
      Link: https://lore.kernel.org/r/20230714045127.18752-10-rdunlap@infradead.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      04be3c95
    • Randy Dunlap's avatar
      pie: fix kernel-doc notation warning · d1cca974
      Randy Dunlap authored
      Spell a struct member's name correctly to prevent a kernel-doc
      warning.
      
      pie.h:38: warning: Function parameter or member 'tupdate' not described in 'pie_params'
      
      Fixes: b42a3d7c
      
       ("pie: improve comments and commenting style")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Leslie Monis <lesliemonis@gmail.com>
      Cc: "Mohit P. Tahiliani" <tahiliani@nitk.edu.in>
      Cc: Gautam Ramakrishnan <gautamramk@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Link: https://lore.kernel.org/r/20230714045127.18752-9-rdunlap@infradead.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d1cca974
    • Randy Dunlap's avatar
      net: NSH: fix kernel-doc notation warning · d1533d72
      Randy Dunlap authored
      Use the struct member's name and the correct format to prevent a
      kernel-doc warning.
      
      nsh.h:200: warning: Function parameter or member 'context' not described in 'nsh_md1_ctx'
      
      Fixes: 1f0b7744
      
       ("net: add NSH header structures and helpers")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Jiri Benc <jbenc@redhat.com>
      Link: https://lore.kernel.org/r/20230714045127.18752-8-rdunlap@infradead.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d1533d72
    • Randy Dunlap's avatar
      net: llc: fix kernel-doc notation warnings · 201a0883
      Randy Dunlap authored
      Use the corrent function parameter name or format to prevent
      kernel-doc warnings.
      Add 2 function parameter descriptions to prevent kernel-doc warnings.
      
      llc_pdu.h:278: warning: Function parameter or member 'da' not described in 'llc_pdu_decode_da'
      llc_pdu.h:278: warning: Excess function parameter 'sa' description in 'llc_pdu_decode_da'
      llc_pdu.h:330: warning: Function parameter or member 'skb' not described in 'llc_pdu_init_as_test_cmd'
      llc_pdu.h:379: warning: Function parameter or member 'svcs_supported' not described in 'llc_pdu_init_as_xid_cmd'
      llc_pdu.h:379: warning: Function parameter or member 'rx_window' not described in 'llc_pdu_init_as_xid_cmd'
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Link: https://lore.kernel.org/r/20230714045127.18752-7-rdunlap@infradead.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      201a0883
    • Randy Dunlap's avatar
      inet: frags: eliminate kernel-doc warning · d20909a0
      Randy Dunlap authored
      Modify the anonymous enum kernel-doc content so that it doesn't cause
      a kernel-doc warning.
      
      inet_frag.h:33: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
      
      Fixes: 1ab1934e
      
       ("inet: frags: enum the flag definitions and add descriptions")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Nikolay Aleksandrov <razor@blackwall.org>
      Link: https://lore.kernel.org/r/20230714045127.18752-6-rdunlap@infradead.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d20909a0
    • Randy Dunlap's avatar
      devlink: fix kernel-doc notation warnings · 839f55c5
      Randy Dunlap authored
      Spell function or struct member names correctly.
      Use ':' instead of '-' for struct member entries.
      Mark one field as private in kernel-doc.
      Add a few entries that were missing.
      Fix a typo.
      
      These changes prevent kernel-doc warnings:
      
      devlink.h:252: warning: Function parameter or member 'field_id' not described in 'devlink_dpipe_match'
      devlink.h:267: warning: Function parameter or member 'field_id' not described in 'devlink_dpipe_action'
      devlink.h:310: warning: Function parameter or member 'match_values_count' not described in 'devlink_dpipe_entry'
      devlink.h:355: warning: Function parameter or member 'list' not described in 'devlink_dpipe_table'
      devlink.h:374: warning: Function parameter or member 'actions_dump' not described in 'devlink_dpipe_table_ops'
      devlink.h:374: warning: Function parameter or member 'matches_dump' not described in 'devlink_dpipe_table_ops'
      devlink.h:374: warning: Function parameter or member 'entries_dump' not described in 'devlink_dpipe_table_ops'
      devlink.h:374: warning: Function parameter or member 'counters_set_update' not described in 'devlink_dpipe_table_ops'
      devlink.h:374: warning: Function parameter or member 'size_get' not described in 'devlink_dpipe_table_ops'
      devlink.h:384: warning: Function parameter or member 'headers' not described in 'devlink_dpipe_headers'
      devlink.h:384: warning: Function parameter or member 'headers_count' not described in 'devlink_dpipe_headers'
      devlink.h:398: warning: Function parameter or member 'unit' not described in 'devlink_resource_size_params'
      devlink.h:487: warning: Function parameter or member 'id' not described in 'devlink_param'
      devlink.h:645: warning: Function parameter or member 'overwrite_mask' not described in 'devlink_flash_update_params'
      
      Fixes: 1555d204 ("devlink: Support for pipeline debug (dpipe)")
      Fixes: d9f9b9a4 ("devlink: Add support for resource abstraction")
      Fixes: eabaef18 ("devlink: Add devlink_param register and unregister")
      Fixes: 5d5b4128
      
       ("devlink: introduce flash update overwrite mask")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: Moshe Shemesh <moshe@mellanox.com>
      Cc: Jacob Keller <jacob.e.keller@intel.com>
      Link: https://lore.kernel.org/r/20230714045127.18752-5-rdunlap@infradead.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      839f55c5
    • Randy Dunlap's avatar
      codel: fix kernel-doc notation warnings · cfe57122
      Randy Dunlap authored
      Use '@' before the struct member names in kernel-doc notation
      to prevent kernel-doc warnings.
      
      codel.h:158: warning: Function parameter or member 'ecn_mark' not described in 'codel_stats'
      codel.h:158: warning: Function parameter or member 'ce_mark' not described in 'codel_stats'
      
      Fixes: 76e3cc12
      
       ("codel: Controlled Delay AQM")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: Dave Taht <dave.taht@bufferbloat.net>
      Link: https://lore.kernel.org/r/20230714045127.18752-4-rdunlap@infradead.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cfe57122
    • Randy Dunlap's avatar
      net: cfg802154: fix kernel-doc notation warnings · a63e4044
      Randy Dunlap authored
      Add an enum heading to the kernel-doc comments to prevent
      kernel-doc warnings.
      
      cfg802154.h:174: warning: Cannot understand  * @WPAN_PHY_FLAG_TRANSMIT_POWER: Indicates that transceiver will support
       on line 174 - I thought it was a doc line
      
      cfg802154.h:192: warning: Enum value 'WPAN_PHY_FLAG_TXPOWER' not described in enum 'wpan_phy_flags'
      cfg802154.h:192: warning: Excess enum value 'WPAN_PHY_FLAG_TRANSMIT_POWER' description in 'wpan_phy_flags'
      
      Fixes: edea8f7c
      
       ("cfg802154: introduce wpan phy flags")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Alexander Aring <alex.aring@gmail.com>
      Cc: Stefan Schmidt <stefan@datenfreihafen.org>
      Cc: Marcel Holtmann <marcel@holtmann.org>
      Acked-by: default avatarMiquel Raynal <miquel.raynal@bootlin.com>
      Link: https://lore.kernel.org/r/20230714045127.18752-3-rdunlap@infradead.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a63e4044
    • Randy Dunlap's avatar
      net: bonding: remove kernel-doc comment marker · a66557c7
      Randy Dunlap authored
      Change an errant kernel-doc comment marker (/**) to a regular
      comment to prevent a kernel-doc warning.
      
      bonding.h:282: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
       * Returns NULL if the net_device does not belong to any of the bond's slaves
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Link: https://lore.kernel.org/r/20230714045127.18752-2-rdunlap@infradead.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a66557c7
  4. Jul 14, 2023
    • Yan Zhai's avatar
      gso: fix dodgy bit handling for GSO_UDP_L4 · 98400367
      Yan Zhai authored
      Commit 1fd54773 ("udp: allow header check for dodgy GSO_UDP_L4
      packets.") checks DODGY bit for UDP, but for packets that can be fed
      directly to the device after gso_segs reset, it actually falls through
      to fragmentation:
      
      https://lore.kernel.org/all/CAJPywTKDdjtwkLVUW6LRA2FU912qcDmQOQGt2WaDo28KzYDg+A@mail.gmail.com/
      
      This change restores the expected behavior of GSO_UDP_L4 packets.
      
      Fixes: 1fd54773
      
       ("udp: allow header check for dodgy GSO_UDP_L4 packets.")
      Suggested-by: default avatarWillem de Bruijn <willemdebruijn.kernel@gmail.com>
      Signed-off-by: default avatarYan Zhai <yan@cloudflare.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      98400367
    • Wang Ming's avatar
      net: ethernet: Remove repeating expression · a822551c
      Wang Ming authored
      
      
      Identify issues that arise by using the tests/doublebitand.cocci
      semantic patch. Need to remove duplicate expression in if statement.
      
      Signed-off-by: default avatarWang Ming <machel@vivo.com>
      Reviewed-by: default avatarJiawen Wu <jiawenwu@trustnetic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a822551c
    • Wang Ming's avatar
      bna: Remove error checking for debugfs_create_dir() · 4ad23d23
      Wang Ming authored
      
      
      It is expected that most callers should _ignore_ the errors return by
      debugfs_create_dir() in bnad_debugfs_init().
      
      Signed-off-by: default avatarWang Ming <machel@vivo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ad23d23
    • Daniel Golle's avatar
      net: ethernet: mtk_eth_soc: handle probe deferral · 1d6d537d
      Daniel Golle authored
      Move the call to of_get_ethdev_address to mtk_add_mac which is part of
      the probe function and can hence itself return -EPROBE_DEFER should
      of_get_ethdev_address return -EPROBE_DEFER. This allows us to entirely
      get rid of the mtk_init function.
      
      The problem of of_get_ethdev_address returning -EPROBE_DEFER surfaced
      in situations in which the NVMEM provider holding the MAC address has
      not yet be loaded at the time mtk_eth_soc is initially probed. In this
      case probing of mtk_eth_soc should be deferred instead of falling back
      to use a random MAC address, so once the NVMEM provider becomes
      available probing can be repeated.
      
      Fixes: 656e7052
      
       ("net-next: mediatek: add support for MT7623 ethernet")
      Signed-off-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1d6d537d
    • Kuniyuki Iwashima's avatar
      bridge: Add extack warning when enabling STP in netns. · 56a16035
      Kuniyuki Iwashima authored
      When we create an L2 loop on a bridge in netns, we will see packets storm
      even if STP is enabled.
      
        # unshare -n
        # ip link add br0 type bridge
        # ip link add veth0 type veth peer name veth1
        # ip link set veth0 master br0 up
        # ip link set veth1 master br0 up
        # ip link set br0 type bridge stp_state 1
        # ip link set br0 up
        # sleep 30
        # ip -s link show br0
        2: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
            link/ether b6:61:98:1c:1c:b5 brd ff:ff:ff:ff:ff:ff
            RX: bytes  packets  errors  dropped missed  mcast
            956553768  12861249 0       0       0       12861249  <-. Keep
            TX: bytes  packets  errors  dropped carrier collsns     |  increasing
            1027834    11951    0       0       0       0         <-'   rapidly
      
      This is because llc_rcv() drops all packets in non-root netns and BPDU
      is dropped.
      
      Let's add extack warning when enabling STP in netns.
      
        # unshare -n
        # ip link add br0 type bridge
        # ip link set br0 type bridge stp_state 1
        Warning: bridge: STP does not work in non-root netns.
      
      Note this commit will be reverted later when we namespacify the whole LLC
      infra.
      
      Fixes: e730c155
      
       ("[NET]: Make packet reception network namespace safe")
      Suggested-by: default avatarHarry Coin <hcoin@quietfountain.com>
      Link: https://lore.kernel.org/netdev/0f531295-e289-022d-5add-5ceffa0df9bc@quietfountain.com/
      Suggested-by: default avatarIdo Schimmel <idosch@idosch.org>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      56a16035
    • Tanmay Patil's avatar
      net: ethernet: ti: cpsw_ale: Fix cpsw_ale_get_field()/cpsw_ale_set_field() · b685f1a5
      Tanmay Patil authored
      CPSW ALE has 75 bit ALE entries which are stored within three 32 bit words.
      The cpsw_ale_get_field() and cpsw_ale_set_field() functions assume that the
      field will be strictly contained within one word. However, this is not
      guaranteed to be the case and it is possible for ALE field entries to span
      across up to two words at the most.
      
      Fix the methods to handle getting/setting fields spanning up to two words.
      
      Fixes: db82173f
      
       ("netdev: driver: ethernet: add cpsw address lookup engine support")
      Signed-off-by: default avatarTanmay Patil <t-patil@ti.com>
      [s-vadapalli@ti.com: rephrased commit message and added Fixes tag]
      Signed-off-by: default avatarSiddharth Vadapalli <s-vadapalli@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b685f1a5
    • Mark Brown's avatar
      net: dsa: ar9331: Use explict flags for regmap single read/write · 9845217d
      Mark Brown authored
      
      
      The at9331 is only able to read or write a single register at once.  The
      driver has a custom regmap bus and chooses to tell the regmap core about
      this by reporting the maximum transfer sizes rather than the explicit
      flags that exist at the regmap level.  Since there are a number of
      problems with the raw transfer limits and the regmap level flags are
      better integrated anyway convert the driver to use the flags.
      
      No functional change.
      
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9845217d
    • Alan Stern's avatar
      net: usbnet: Fix WARNING in usbnet_start_xmit/usb_submit_urb · 5e1627cb
      Alan Stern authored
      
      
      The syzbot fuzzer identified a problem in the usbnet driver:
      
      usb 1-1: BOGUS urb xfer, pipe 3 != type 1
      WARNING: CPU: 0 PID: 754 at drivers/usb/core/urb.c:504 usb_submit_urb+0xed6/0x1880 drivers/usb/core/urb.c:504
      Modules linked in:
      CPU: 0 PID: 754 Comm: kworker/0:2 Not tainted 6.4.0-rc7-syzkaller-00014-g692b7dc87ca6 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
      Workqueue: mld mld_ifc_work
      RIP: 0010:usb_submit_urb+0xed6/0x1880 drivers/usb/core/urb.c:504
      Code: 7c 24 18 e8 2c b4 5b fb 48 8b 7c 24 18 e8 42 07 f0 fe 41 89 d8 44 89 e1 4c 89 ea 48 89 c6 48 c7 c7 a0 c9 fc 8a e8 5a 6f 23 fb <0f> 0b e9 58 f8 ff ff e8 fe b3 5b fb 48 81 c5 c0 05 00 00 e9 84 f7
      RSP: 0018:ffffc9000463f568 EFLAGS: 00010086
      RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
      RDX: ffff88801eb28000 RSI: ffffffff814c03b7 RDI: 0000000000000001
      RBP: ffff8881443b7190 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000003
      R13: ffff88802a77cb18 R14: 0000000000000003 R15: ffff888018262500
      FS:  0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000556a99c15a18 CR3: 0000000028c71000 CR4: 0000000000350ef0
      Call Trace:
       <TASK>
       usbnet_start_xmit+0xfe5/0x2190 drivers/net/usb/usbnet.c:1453
       __netdev_start_xmit include/linux/netdevice.h:4918 [inline]
       netdev_start_xmit include/linux/netdevice.h:4932 [inline]
       xmit_one net/core/dev.c:3578 [inline]
       dev_hard_start_xmit+0x187/0x700 net/core/dev.c:3594
      ...
      
      This bug is caused by the fact that usbnet trusts the bulk endpoint
      addresses its probe routine receives in the driver_info structure, and
      it does not check to see that these endpoints actually exist and have
      the expected type and directions.
      
      The fix is simply to add such a check.
      
      Reported-and-tested-by: default avatar <syzbot+63ee658b9a100ffadbe2@syzkaller.appspotmail.com>
      Closes: https://lore.kernel.org/linux-usb/000000000000a56e9105d0cec021@google.com/
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      CC: Oliver Neukum <oneukum@suse.com>
      Link: https://lore.kernel.org/r/ea152b6d-44df-4f8a-95c6-4db51143dcc1@rowland.harvard.edu
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5e1627cb
    • Linus Walleij's avatar
      dsa: mv88e6xxx: Do a final check before timing out · 95ce158b
      Linus Walleij authored
      
      
      I get sporadic timeouts from the driver when using the
      MV88E6352. Reading the status again after the loop fixes the
      problem: the operation is successful but goes undetected.
      
      Some added prints show things like this:
      
      [   58.356209] mv88e6085 mdio_mux-0.1:00: Timeout while waiting
          for switch, addr 1b reg 0b, mask 8000, val 0000, data c000
      [   58.367487] mv88e6085 mdio_mux-0.1:00: Timeout waiting for
          ATU op 4000, fid 0001
      (...)
      [   61.826293] mv88e6085 mdio_mux-0.1:00: Timeout while waiting
          for switch, addr 1c reg 18, mask 8000, val 0000, data 9860
      [   61.837560] mv88e6085 mdio_mux-0.1:00: Timeout waiting
          for PHY command 1860 to complete
      
      The reason is probably not the commands: I think those are
      mostly fine with the 50+50ms timeout, but the problem
      appears when OpenWrt brings up several interfaces in
      parallel on a system with 7 populated ports: if one of
      them take more than 50 ms and waits one or more of the
      others can get stuck on the mutex for the switch and then
      this can easily multiply.
      
      As we sleep and wait, the function loop needs a final
      check after exiting the loop if we were successful.
      
      Suggested-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Cc: Tobias Waldekranz <tobias@waldekranz.com>
      Fixes: 35da1dfd
      
       ("net: dsa: mv88e6xxx: Improve performance of busy bit polling")
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20230712223405.861899-1-linus.walleij@linaro.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      95ce158b
    • Linus Torvalds's avatar
      Merge tag 'net-6.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · b1983d42
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from netfilter, wireless and ebpf.
      
        Current release - regressions:
      
         - netfilter: conntrack: gre: don't set assured flag for clash entries
      
         - wifi: iwlwifi: remove 'use_tfh' config to fix crash
      
        Previous releases - regressions:
      
         - ipv6: fix a potential refcount underflow for idev
      
         - icmp6: ifix null-ptr-deref of ip6_null_entry->rt6i_idev in
           icmp6_dev()
      
         - bpf: fix max stack depth check for async callbacks
      
         - eth: mlx5e:
            - check for NOT_READY flag state after locking
            - fix page_pool page fragment tracking for XDP
      
         - eth: igc:
            - fix tx hang issue when QBV gate is closed
            - fix corner cases for TSN offload
      
         - eth: octeontx2-af: Move validation of ptp pointer before its usage
      
         - eth: ena: fix shift-out-of-bounds in exponential backoff
      
        Previous releases - always broken:
      
         - core: prevent skb corruption on frag list segmentation
      
         - sched:
            - cls_fw: fix improper refcount update leads to use-after-free
            - sch_qfq: account for stab overhead in qfq_enqueue
      
         - netfilter:
            - report use refcount overflow
            - prevent OOB access in nft_byteorder_eval
      
         - wifi: mt7921e: fix init command fail with enabled device
      
         - eth: ocelot: fix oversize frame dropping for preemptible TCs
      
         - eth: fec: recycle pages for transmitted XDP frames"
      
      * tag 'net-6.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (79 commits)
        selftests: tc-testing: add test for qfq with stab overhead
        net/sched: sch_qfq: account for stab overhead in qfq_enqueue
        selftests: tc-testing: add tests for qfq mtu sanity check
        net/sched: sch_qfq: reintroduce lmax bound check for MTU
        wifi: cfg80211: fix receiving mesh packets without RFC1042 header
        wifi: rtw89: debug: fix error code in rtw89_debug_priv_send_h2c_set()
        net: txgbe: fix eeprom calculation error
        net/sched: make psched_mtu() RTNL-less safe
        net: ena: fix shift-out-of-bounds in exponential backoff
        netdevsim: fix uninitialized data in nsim_dev_trap_fa_cookie_write()
        net/sched: flower: Ensure both minimum and maximum ports are specified
        MAINTAINERS: Add another mailing list for QUALCOMM ETHQOS ETHERNET DRIVER
        docs: netdev: update the URL of the status page
        wifi: iwlwifi: remove 'use_tfh' config to fix crash
        xdp: use trusted arguments in XDP hints kfuncs
        bpf: cpumap: Fix memory leak in cpu_map_update_elem
        wifi: airo: avoid uninitialized warning in airo_get_rate()
        octeontx2-pf: Add additional check for MCAM rules
        net: dsa: Removed unneeded of_node_put in felix_parse_ports_node
        net: fec: use netdev_err_once() instead of netdev_err()
        ...
      b1983d42
    • Linus Torvalds's avatar
      Merge tag 'trace-v6.5-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · ebc27aac
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
      
       - Fix some missing-prototype warnings
      
       - Fix user events struct args (did not include size of struct)
      
         When creating a user event, the "struct" keyword is to denote that
         the size of the field will be passed in. But the parsing failed to
         handle this case.
      
       - Add selftest to struct sizes for user events
      
       - Fix sample code for direct trampolines.
      
         The sample code for direct trampolines attached to handle_mm_fault().
         But the prototype changed and the direct trampoline sample code was
         not updated. Direct trampolines needs to have the arguments correct
         otherwise it can fail or crash the system.
      
       - Remove unused ftrace_regs_caller_ret() prototype.
      
       - Quiet false positive of FORTIFY_SOURCE
      
         Due to backward compatibility, the structure used to save stack
         traces in the kernel had a fixed size of 8. This structure is
         exported to user space via the tracing format file. A change was made
         to allow more than 8 functions to be recorded, and user space now
         uses the size field to know how many functions are actually in the
         stack.
      
         But the structure still has size of 8 (even though it points into the
         ring buffer that has the required amount allocated to hold a full
         stack.
      
         This was fine until the fortifier noticed that the
         memcpy(&entry->caller, stack, size) was greater than the 8 functions
         and would complain at runtime about it.
      
         Hide this by using a pointer to the stack location on the ring buffer
         instead of using the address of the entry structure caller field.
      
       - Fix a deadloop in reading trace_pipe that was caused by a mismatch
         between ring_buffer_empty() returning false which then asked to read
         the data, but the read code uses rb_num_of_entries() that returned
         zero, and causing a infinite "retry".
      
       - Fix a warning caused by not using all pages allocated to store ftrace
         functions, where this can happen if the linker inserts a bunch of
         "NULL" entries, causing the accounting of how many pages needed to be
         off.
      
       - Fix histogram synthetic event crashing when the start event is
         removed and the end event is still using a variable from it
      
       - Fix memory leak in freeing iter->temp in tracing_release_pipe()
      
      * tag 'trace-v6.5-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        tracing: Fix memory leak of iter->temp when reading trace_pipe
        tracing/histograms: Add histograms to hist_vars if they have referenced variables
        tracing: Stop FORTIFY_SOURCE complaining about stack trace caller
        ftrace: Fix possible warning on checking all pages used in ftrace_process_locs()
        ring-buffer: Fix deadloop issue on reading trace_pipe
        tracing: arm64: Avoid missing-prototype warnings
        selftests/user_events: Test struct size match cases
        tracing/user_events: Fix struct arg size match check
        x86/ftrace: Remove unsued extern declaration ftrace_regs_caller_ret()
        arm64: ftrace: Add direct call trampoline samples support
        samples: ftrace: Save required argument registers in sample trampolines
      ebc27aac
    • Linus Torvalds's avatar
      Merge tag 'for-linus-6.5-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 15999328
      Linus Torvalds authored
      Pull xen fixes from Juergen Gross:
      
       - a cleanup of the Xen related ELF-notes
      
       - a fix for virtio handling in Xen dom0 when running Xen in a VM
      
      * tag 'for-linus-6.5-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen/virtio: Fix NULL deref when a bridge of PCI root bus has no parent
        x86/Xen: tidy xen-head.S
      15999328
    • Linus Torvalds's avatar
      Merge tag 'sh-for-v6.5-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux · 9350cd01
      Linus Torvalds authored
      Pull sh fixes from John Paul Adrian Glaubitz:
       "The sh updates introduced multiple regressions.
      
        In particular, the change a8ac2961 ("sh: Avoid using IRQ0 on SH3
        and SH4") causes several boards to hang during boot due to incorrect
        IRQ numbers.
      
        Geert Uytterhoeven has contributed patches that handle the virq offset
        in the IRQ code for the dreamcast, highlander and r2d boards while
        Artur Rojek has contributed a patch which handles the virq offset for
        the hd64461 companion chip"
      
      * tag 'sh-for-v6.5-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
        sh: hd64461: Handle virq offset for offchip IRQ base and HD64461 IRQ
        sh: mach-dreamcast: Handle virq offset in cascaded IRQ demux
        sh: mach-highlander: Handle virq offset in cascaded IRL demux
        sh: mach-r2d: Handle virq offset in cascaded IRL demux
      9350cd01
  5. Jul 13, 2023
    • Zheng Yejian's avatar
      tracing: Fix memory leak of iter->temp when reading trace_pipe · d5a82189
      Zheng Yejian authored
      kmemleak reports:
        unreferenced object 0xffff88814d14e200 (size 256):
          comm "cat", pid 336, jiffies 4294871818 (age 779.490s)
          hex dump (first 32 bytes):
            04 00 01 03 00 00 00 00 08 00 00 00 00 00 00 00  ................
            0c d8 c8 9b ff ff ff ff 04 5a ca 9b ff ff ff ff  .........Z......
          backtrace:
            [<ffffffff9bdff18f>] __kmalloc+0x4f/0x140
            [<ffffffff9bc9238b>] trace_find_next_entry+0xbb/0x1d0
            [<ffffffff9bc9caef>] trace_print_lat_context+0xaf/0x4e0
            [<ffffffff9bc94490>] print_trace_line+0x3e0/0x950
            [<ffffffff9bc95499>] tracing_read_pipe+0x2d9/0x5a0
            [<ffffffff9bf03a43>] vfs_read+0x143/0x520
            [<ffffffff9bf04c2d>] ksys_read+0xbd/0x160
            [<ffffffff9d0f0edf>] do_syscall_64+0x3f/0x90
            [<ffffffff9d2000aa>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
      
      when reading file 'trace_pipe', 'iter->temp' is allocated or relocated
      in trace_find_next_entry() but not freed before 'trace_pipe' is closed.
      
      To fix it, free 'iter->temp' in tracing_release_pipe().
      
      Link: https://lore.kernel.org/linux-trace-kernel/20230713141435.1133021-1-zhengyejian1@huawei.com
      
      Cc: stable@vger.kernel.org
      Fixes: ff895103
      
       ("tracing: Save off entry when peeking at next entry")
      Signed-off-by: default avatarZheng Yejian <zhengyejian1@huawei.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      d5a82189
    • Paolo Abeni's avatar
      Merge branch 'net-sched-fixes-for-sch_qfq' · 9d23aac8
      Paolo Abeni authored
      
      
      Pedro Tammela says:
      
      ====================
      net/sched: fixes for sch_qfq
      
      Patch 1 fixes a regression introduced in 6.4 where the MTU size could be
      bigger than 'lmax'.
      
      Patch 3 fixes an issue where the code doesn't account for qdisc_pkt_len()
      returning a size bigger then 'lmax'.
      
      Patches 2 and 4 are selftests for the issues above.
      ====================
      
      Link: https://lore.kernel.org/r/20230711210103.597831-1-pctammela@mojatatu.com
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      9d23aac8