Skip to content
  1. Aug 24, 2022
  2. Aug 23, 2022
    • Vladimir Oltean's avatar
      net: dsa: don't dereference NULL extack in dsa_slave_changeupper() · 855a28f9
      Vladimir Oltean authored
      When a driver returns -EOPNOTSUPP in dsa_port_bridge_join() but failed
      to provide a reason for it, DSA attempts to set the extack to say that
      software fallback will kick in.
      
      The problem is, when we use brctl and the legacy bridge ioctls, the
      extack will be NULL, and DSA dereferences it in the process of setting
      it.
      
      Sergei Antonov proves this using the following stack trace:
      
      Unable to handle kernel NULL pointer dereference at virtual address 00000000
      PC is at dsa_slave_changeupper+0x5c/0x158
      
       dsa_slave_changeupper from raw_notifier_call_chain+0x38/0x6c
       raw_notifier_call_chain from __netdev_upper_dev_link+0x198/0x3b4
       __netdev_upper_dev_link from netdev_master_upper_dev_link+0x50/0x78
       netdev_master_upper_dev_link from br_add_if+0x430/0x7f4
       br_add_if from br_ioctl_stub+0x170/0x530
       br_ioctl_stub from br_ioctl_call+0x54/0x7c
       br_ioctl_call from dev_ifsioc+0x4e0/0x6bc
       dev_ifsioc from dev_ioctl+0x2f8/0x758
       dev_ioctl from sock_ioctl+0x5f0/0x674
       sock_ioctl from sys_ioctl+0x518/0xe40
       sys_ioctl from ret_fast_syscall+0x0/0x1c
      
      Fix the problem by only overriding the extack if non-NULL.
      
      Fixes: 1c6e8088
      
       ("net: dsa: allow port_bridge_join() to override extack message")
      Link: https://lore.kernel.org/netdev/CABikg9wx7vB5eRDAYtvAm7fprJ09Ta27a4ZazC=NX5K4wn6pWA@mail.gmail.com/
      Reported-by: default avatarSergei Antonov <saproj@gmail.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Tested-by: default avatarSergei Antonov <saproj@gmail.com>
      Link: https://lore.kernel.org/r/20220819173925.3581871-1-vladimir.oltean@nxp.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      855a28f9
    • Maciej Żenczykowski's avatar
      net: ipvtap - add __init/__exit annotations to module init/exit funcs · 4b2e3a17
      Maciej Żenczykowski authored
      Looks to have been left out in an oversight.
      
      Cc: Mahesh Bandewar <maheshb@google.com>
      Cc: Sainath Grandhi <sainath.grandhi@intel.com>
      Fixes: 235a9d89
      
       ('ipvtap: IP-VLAN based tap driver')
      Signed-off-by: default avatarMaciej Żenczykowski <maze@google.com>
      Link: https://lore.kernel.org/r/20220821130808.12143-1-zenczykowski@gmail.com
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      4b2e3a17
    • Jakub Kicinski's avatar
      Merge branch 'bonding-802-3ad-fix-no-transmission-of-lacpdus' · 5003e52c
      Jakub Kicinski authored
      
      
      Jonathan Toppins says:
      
      ====================
      bonding: 802.3ad: fix no transmission of LACPDUs
      
      Configuring a bond in a specific order can leave the bond in a state
      where it never transmits LACPDUs.
      
      The first patch adds some kselftest infrastructure and the reproducer
      that demonstrates the problem. The second patch fixes the issue. The
      new third patch makes ad_ticks_per_sec a static const and removes the
      passing of this variable via the stack.
      ====================
      
      Link: https://lore.kernel.org/r/cover.1660919940.git.jtoppins@redhat.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5003e52c
    • Jonathan Toppins's avatar
      bonding: 3ad: make ad_ticks_per_sec a const · f2e44dff
      Jonathan Toppins authored
      
      
      The value is only ever set once in bond_3ad_initialize and only ever
      read otherwise. There seems to be no reason to set the variable via
      bond_3ad_initialize when setting the global variable will do. Change
      ad_ticks_per_sec to a const to enforce its read-only usage.
      
      Signed-off-by: default avatarJonathan Toppins <jtoppins@redhat.com>
      Acked-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f2e44dff
    • Jonathan Toppins's avatar
      bonding: 802.3ad: fix no transmission of LACPDUs · d745b506
      Jonathan Toppins authored
      This is caused by the global variable ad_ticks_per_sec being zero as
      demonstrated by the reproducer script discussed below. This causes
      all timer values in __ad_timer_to_ticks to be zero, resulting
      in the periodic timer to never fire.
      
      To reproduce:
      Run the script in
      `tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh` which
      puts bonding into a state where it never transmits LACPDUs.
      
      line 44: ip link add fbond type bond mode 4 miimon 200 \
                  xmit_hash_policy 1 ad_actor_sys_prio 65535 lacp_rate fast
      setting bond param: ad_actor_sys_prio
      given:
          params.ad_actor_system = 0
      call stack:
          bond_option_ad_actor_sys_prio()
          -> bond_3ad_update_ad_actor_settings()
             -> set ad.system.sys_priority = bond->params.ad_actor_sys_prio
             -> ad.system.sys_mac_addr = bond->dev->dev_addr; because
                  params.ad_actor_system == 0
      results:
           ad.system.sys_mac_addr = bond->dev->dev_addr
      
      line 48: ip link set fbond address 52:54:00:3B:7C:A6
      setting bond MAC addr
      call stack:
          bond->dev->dev_addr = new_mac
      
      line 52: ip link set fbond type bond ad_actor_sys_prio 65535
      setting bond param: ad_actor_sys_prio
      given:
          params.ad_actor_system = 0
      call stack:
          bond_option_ad_actor_sys_prio()
          -> bond_3ad_update_ad_actor_settings()
             -> set ad.system.sys_priority = bond->params.ad_actor_sys_prio
             -> ad.system.sys_mac_addr = bond->dev->dev_addr; because
                  params.ad_actor_system == 0
      results:
           ad.system.sys_mac_addr = bond->dev->dev_addr
      
      line 60: ip link set veth1-bond down master fbond
      given:
          params.ad_actor_system = 0
          params.mode = BOND_MODE_8023AD
          ad.system.sys_mac_addr == bond->dev->dev_addr
      call stack:
          bond_enslave
          -> bond_3ad_initialize(); because first slave
             -> if ad.system.sys_mac_addr != bond->dev->dev_addr
                return
      results:
           Nothing is run in bond_3ad_initialize() because dev_addr equals
           sys_mac_addr leaving the global ad_ticks_per_sec zero as it is
           never initialized anywhere else.
      
      The if check around the contents of bond_3ad_initialize() is no longer
      needed due to commit 5ee14e6d ("bonding: 3ad: apply ad_actor settings
      changes immediately") which sets ad.system.sys_mac_addr if any one of
      the bonding parameters whos set function calls
      bond_3ad_update_ad_actor_settings(). This is because if
      ad.system.sys_mac_addr is zero it will be set to the current bond mac
      address, this causes the if check to never be true.
      
      Fixes: 5ee14e6d
      
       ("bonding: 3ad: apply ad_actor settings changes immediately")
      Signed-off-by: default avatarJonathan Toppins <jtoppins@redhat.com>
      Acked-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d745b506
    • Jonathan Toppins's avatar
      selftests: include bonding tests into the kselftest infra · c078290a
      Jonathan Toppins authored
      
      
      This creates a test collection in drivers/net/bonding for bonding
      specific kernel selftests.
      
      The first test is a reproducer that provisions a bond and given the
      specific order in how the ip-link(8) commands are issued the bond never
      transmits an LACPDU frame on any of its slaves.
      
      Signed-off-by: default avatarJonathan Toppins <jtoppins@redhat.com>
      Acked-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c078290a
    • Sergei Antonov's avatar
      net: moxa: get rid of asymmetry in DMA mapping/unmapping · 0ee7828d
      Sergei Antonov authored
      Since priv->rx_mapping[i] is maped in moxart_mac_open(), we
      should unmap it from moxart_mac_stop(). Fixes 2 warnings.
      
      1. During error unwinding in moxart_mac_probe(): "goto init_fail;",
      then moxart_mac_free_memory() calls dma_unmap_single() with
      priv->rx_mapping[i] pointers zeroed.
      
      WARNING: CPU: 0 PID: 1 at kernel/dma/debug.c:963 check_unmap+0x704/0x980
      DMA-API: moxart-ethernet 92000000.mac: device driver tries to free DMA memory it has not allocated [device address=0x0000000000000000] [size=1600 bytes]
      CPU: 0 PID: 1 Comm: swapper Not tainted 5.19.0+ #60
      Hardware name: Generic DT based system
       unwind_backtrace from show_stack+0x10/0x14
       show_stack from dump_stack_lvl+0x34/0x44
       dump_stack_lvl from __warn+0xbc/0x1f0
       __warn from warn_slowpath_fmt+0x94/0xc8
       warn_slowpath_fmt from check_unmap+0x704/0x980
       check_unmap from debug_dma_unmap_page+0x8c/0x9c
       debug_dma_unmap_page from moxart_mac_free_memory+0x3c/0xa8
       moxart_mac_free_memory from moxart_mac_probe+0x190/0x218
       moxart_mac_probe from platform_probe+0x48/0x88
       platform_probe from really_probe+0xc0/0x2e4
      
      2. After commands:
       ip link set dev eth0 down
       ip link set dev eth0 up
      
      WARNING: CPU: 0 PID: 55 at kernel/dma/debug.c:570 add_dma_entry+0x204/0x2ec
      DMA-API: moxart-ethernet 92000000.mac: cacheline tracking EEXIST, overlapping mappings aren't supported
      CPU: 0 PID: 55 Comm: ip Not tainted 5.19.0+ #57
      Hardware name: Generic DT based system
       unwind_backtrace from show_stack+0x10/0x14
       show_stack from dump_stack_lvl+0x34/0x44
       dump_stack_lvl from __warn+0xbc/0x1f0
       __warn from warn_slowpath_fmt+0x94/0xc8
       warn_slowpath_fmt from add_dma_entry+0x204/0x2ec
       add_dma_entry from dma_map_page_attrs+0x110/0x328
       dma_map_page_attrs from moxart_mac_open+0x134/0x320
       moxart_mac_open from __dev_open+0x11c/0x1ec
       __dev_open from __dev_change_flags+0x194/0x22c
       __dev_change_flags from dev_change_flags+0x14/0x44
       dev_change_flags from devinet_ioctl+0x6d4/0x93c
       devinet_ioctl from inet_ioctl+0x1ac/0x25c
      
      v1 -> v2:
      Extraneous change removed.
      
      Fixes: 6c821bd9
      
       ("net: Add MOXA ART SoCs ethernet driver")
      Signed-off-by: default avatarSergei Antonov <saproj@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20220819110519.1230877-1-saproj@gmail.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0ee7828d
    • Xiaolei Wang's avatar
      net: phy: Don't WARN for PHY_READY state in mdio_bus_phy_resume() · 6dbe852c
      Xiaolei Wang authored
      For some MAC drivers, they set the mac_managed_pm to true in its
      ->ndo_open() callback. So before the mac_managed_pm is set to true,
      we still want to leverage the mdio_bus_phy_suspend()/resume() for
      the phy device suspend and resume. In this case, the phy device is
      in PHY_READY, and we shouldn't warn about this. It also seems that
      the check of mac_managed_pm in WARN_ON is redundant since we already
      check this in the entry of mdio_bus_phy_resume(), so drop it.
      
      Fixes: 744d23c7
      
       ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")
      Signed-off-by: default avatarXiaolei Wang <xiaolei.wang@windriver.com>
      Acked-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/20220819082451.1992102-1-xiaolei.wang@windriver.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6dbe852c
    • Alex Elder's avatar
      net: ipa: don't assume SMEM is page-aligned · b8d43803
      Alex Elder authored
      In ipa_smem_init(), a Qualcomm SMEM region is allocated (if needed)
      and then its virtual address is fetched using qcom_smem_get().  The
      physical address associated with that region is also fetched.
      
      The physical address is adjusted so that it is page-aligned, and an
      attempt is made to update the size of the region to compensate for
      any non-zero adjustment.
      
      But that adjustment isn't done properly.  The physical address is
      aligned twice, and as a result the size is never actually adjusted.
      
      Fix this by *not* aligning the "addr" local variable, and instead
      making the "phys" local variable be the adjusted "addr" value.
      
      Fixes: a0036bb4
      
       ("net: ipa: define SMEM memory region for IPA")
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Link: https://lore.kernel.org/r/20220818134206.567618-1-elder@linaro.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b8d43803
    • Vladimir Oltean's avatar
      net: dsa: microchip: keep compatibility with device tree blobs with no phy-mode · 5fbb08eb
      Vladimir Oltean authored
      DSA has multiple ways of specifying a MAC connection to an internal PHY.
      One requires a DT description like this:
      
      	port@0 {
      		reg = <0>;
      		phy-handle = <&internal_phy>;
      		phy-mode = "internal";
      	};
      
      (which is IMO the recommended approach, as it is the clearest
      description)
      
      but it is also possible to leave the specification as just:
      
      	port@0 {
      		reg = <0>;
      	}
      
      and if the driver implements ds->ops->phy_read and ds->ops->phy_write,
      the DSA framework "knows" it should create a ds->slave_mii_bus, and it
      should connect to a non-OF-based internal PHY on this MDIO bus, at an
      MDIO address equal to the port address.
      
      There is also an intermediary way of describing things:
      
      	port@0 {
      		reg = <0>;
      		phy-handle = <&internal_phy>;
      	};
      
      In case 2, DSA calls phylink_connect_phy() and in case 3, it calls
      phylink_of_phy_connect(). In both cases, phylink_create() has been
      called with a phy_interface_t of PHY_INTERFACE_MODE_NA, and in both
      cases, PHY_INTERFACE_MODE_NA is translated into phy->interface.
      
      It is important to note that phy_device_create() initializes
      dev->interface = PHY_INTERFACE_MODE_GMII, and so, when we use
      phylink_create(PHY_INTERFACE_MODE_NA), no one will override this, and we
      will end up with a PHY_INTERFACE_MODE_GMII interface inherited from the
      PHY.
      
      All this means that in order to maintain compatibility with device tree
      blobs where the phy-mode property is missing, we need to allow the
      "gmii" phy-mode and treat it as "internal".
      
      Fixes: 2c709e0b
      
       ("net: dsa: microchip: ksz8795: add phylink support")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=216320
      Reported-by: default avatarCraig McQueen <craig@mcqueen.id.au>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarAlvin Šipraga <alsi@bang-olufsen.dk>
      Tested-by: default avatarRasmus Villemoes <rasmus.villemoes@prevas.dk>
      Link: https://lore.kernel.org/r/20220818143250.2797111-1-vladimir.oltean@nxp.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5fbb08eb
    • Dan Carpenter's avatar
      net/mlx5: Unlock on error in mlx5_sriov_enable() · 35419025
      Dan Carpenter authored
      Unlock before returning if mlx5_device_enable_sriov() fails.
      
      Fixes: 84a433a4
      
       ("net/mlx5: Lock mlx5 devlink reload callbacks")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      35419025
    • Dan Carpenter's avatar
      net/mlx5e: Fix use after free in mlx5e_fs_init() · 21234e3a
      Dan Carpenter authored
      Call mlx5e_fs_vlan_free(fs) before kvfree(fs).
      
      Fixes: af8bbf73
      
       ("net/mlx5e: Convert mlx5e_flow_steering member of mlx5e_priv to pointer")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      21234e3a
    • Dan Carpenter's avatar
      net/mlx5e: kTLS, Use _safe() iterator in mlx5e_tls_priv_tx_list_cleanup() · 6514210b
      Dan Carpenter authored
      Use the list_for_each_entry_safe() macro to prevent dereferencing "obj"
      after it has been freed.
      
      Fixes: c4dfe704
      
       ("net/mlx5e: kTLS, Recycle objects of device-offloaded TLS TX connections")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      6514210b
    • Dan Carpenter's avatar
      net/mlx5: unlock on error path in esw_vfs_changed_event_handler() · b868c8fe
      Dan Carpenter authored
      Unlock before returning on this error path.
      
      Fixes: f1bc646c
      
       ("net/mlx5: Use devl_ API in mlx5_esw_offloads_devlink_port_register")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      b868c8fe
    • Maor Dickman's avatar
      net/mlx5e: Fix wrong tc flag used when set hw-tc-offload off · 550f9643
      Maor Dickman authored
      The cited commit reintroduced the ability to set hw-tc-offload
      in switchdev mode by reusing NIC mode calls without modifying it
      to support both modes, this can cause an illegal memory access
      when trying to turn hw-tc-offload off.
      
      Fix this by using the right TC_FLAG when checking if tc rules
      are installed while disabling hw-tc-offload.
      
      Fixes: d3cbd425
      
       ("net/mlx5e: Add ndo_set_feature for uplink representor")
      Signed-off-by: default avatarMaor Dickman <maord@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      550f9643
    • Roi Dayan's avatar
      net/mlx5e: TC, Add missing policer validation · f7a4e867
      Roi Dayan authored
      There is a missing policer validation when offloading police action
      with tc action api. Add it.
      
      Fixes: 7d1a5ce4
      
       ("net/mlx5e: TC, Support tc action api for police")
      Signed-off-by: default avatarRoi Dayan <roid@nvidia.com>
      Reviewed-by: default avatarMaor Dickman <maord@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      f7a4e867
    • Aya Levin's avatar
      net/mlx5e: Fix wrong application of the LRO state · 7b3707fc
      Aya Levin authored
      Driver caches packet merge type in mlx5e_params instance which must be
      in perfect sync with the netdev_feature's bit.
      Prior to this patch, in certain conditions (*) LRO state was set in
      mlx5e_params, while netdev_feature's bit was off. Causing the LRO to
      be applied on the RQs (HW level).
      
      (*) This can happen only on profile init (mlx5e_build_nic_params()),
      when RQ expect non-linear SKB and PCI is fast enough in comparison to
      link width.
      
      Solution: remove setting of packet merge type from
      mlx5e_build_nic_params() as netdev features are not updated.
      
      Fixes: 619a8f2a
      
       ("net/mlx5e: Use linear SKB in Striding RQ")
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarMaxim Mikityanskiy <maximmi@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      7b3707fc
    • Moshe Shemesh's avatar
      net/mlx5: Avoid false positive lockdep warning by adding lock_class_key · d59b73a6
      Moshe Shemesh authored
      Add a lock_class_key per mlx5 device to avoid a false positive
      "possible circular locking dependency" warning by lockdep, on flows
      which lock more than one mlx5 device, such as adding SF.
      
      kernel log:
       ======================================================
       WARNING: possible circular locking dependency detected
       5.19.0-rc8+ #2 Not tainted
       ------------------------------------------------------
       kworker/u20:0/8 is trying to acquire lock:
       ffff88812dfe0d98 (&dev->intf_state_mutex){+.+.}-{3:3}, at: mlx5_init_one+0x2e/0x490 [mlx5_core]
      
       but task is already holding lock:
       ffff888101aa7898 (&(&notifier->n_head)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x5a/0x130
      
       which lock already depends on the new lock.
      
       the existing dependency chain (in reverse order) is:
      
       -> #1 (&(&notifier->n_head)->rwsem){++++}-{3:3}:
              down_write+0x90/0x150
              blocking_notifier_chain_register+0x53/0xa0
              mlx5_sf_table_init+0x369/0x4a0 [mlx5_core]
              mlx5_init_one+0x261/0x490 [mlx5_core]
              probe_one+0x430/0x680 [mlx5_core]
              local_pci_probe+0xd6/0x170
              work_for_cpu_fn+0x4e/0xa0
              process_one_work+0x7c2/0x1340
              worker_thread+0x6f6/0xec0
              kthread+0x28f/0x330
              ret_from_fork+0x1f/0x30
      
       -> #0 (&dev->intf_state_mutex){+.+.}-{3:3}:
              __lock_acquire+0x2fc7/0x6720
              lock_acquire+0x1c1/0x550
              __mutex_lock+0x12c/0x14b0
              mlx5_init_one+0x2e/0x490 [mlx5_core]
              mlx5_sf_dev_probe+0x29c/0x370 [mlx5_core]
              auxiliary_bus_probe+0x9d/0xe0
              really_probe+0x1e0/0xaa0
              __driver_probe_device+0x219/0x480
              driver_probe_device+0x49/0x130
              __device_attach_driver+0x1b8/0x280
              bus_for_each_drv+0x123/0x1a0
              __device_attach+0x1a3/0x460
              bus_probe_device+0x1a2/0x260
              device_add+0x9b1/0x1b40
              __auxiliary_device_add+0x88/0xc0
              mlx5_sf_dev_state_change_handler+0x67e/0x9d0 [mlx5_core]
              blocking_notifier_call_chain+0xd5/0x130
              mlx5_vhca_state_work_handler+0x2b0/0x3f0 [mlx5_core]
              process_one_work+0x7c2/0x1340
              worker_thread+0x59d/0xec0
              kthread+0x28f/0x330
              ret_from_fork+0x1f/0x30
      
        other info that might help us debug this:
      
        Possible unsafe locking scenario:
      
              CPU0                    CPU1
              ----                    ----
         lock(&(&notifier->n_head)->rwsem);
                                      lock(&dev->intf_state_mutex);
                                      lock(&(&notifier->n_head)->rwsem);
         lock(&dev->intf_state_mutex);
      
        *** DEADLOCK ***
      
       4 locks held by kworker/u20:0/8:
        #0: ffff888150612938 ((wq_completion)mlx5_events){+.+.}-{0:0}, at: process_one_work+0x6e2/0x1340
        #1: ffff888100cafdb8 ((work_completion)(&work->work)#3){+.+.}-{0:0}, at: process_one_work+0x70f/0x1340
        #2: ffff888101aa7898 (&(&notifier->n_head)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x5a/0x130
        #3: ffff88813682d0e8 (&dev->mutex){....}-{3:3}, at:__device_attach+0x76/0x460
      
       stack backtrace:
       CPU: 6 PID: 8 Comm: kworker/u20:0 Not tainted 5.19.0-rc8+
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       Workqueue: mlx5_events mlx5_vhca_state_work_handler [mlx5_core]
       Call Trace:
        <TASK>
        dump_stack_lvl+0x57/0x7d
        check_noncircular+0x278/0x300
        ? print_circular_bug+0x460/0x460
        ? lock_chain_count+0x20/0x20
        ? register_lock_class+0x1880/0x1880
        __lock_acquire+0x2fc7/0x6720
        ? register_lock_class+0x1880/0x1880
        ? register_lock_class+0x1880/0x1880
        lock_acquire+0x1c1/0x550
        ? mlx5_init_one+0x2e/0x490 [mlx5_core]
        ? lockdep_hardirqs_on_prepare+0x400/0x400
        __mutex_lock+0x12c/0x14b0
        ? mlx5_init_one+0x2e/0x490 [mlx5_core]
        ? mlx5_init_one+0x2e/0x490 [mlx5_core]
        ? _raw_read_unlock+0x1f/0x30
        ? mutex_lock_io_nested+0x1320/0x1320
        ? __ioremap_caller.constprop.0+0x306/0x490
        ? mlx5_sf_dev_probe+0x269/0x370 [mlx5_core]
        ? iounmap+0x160/0x160
        mlx5_init_one+0x2e/0x490 [mlx5_core]
        mlx5_sf_dev_probe+0x29c/0x370 [mlx5_core]
        ? mlx5_sf_dev_remove+0x130/0x130 [mlx5_core]
        auxiliary_bus_probe+0x9d/0xe0
        really_probe+0x1e0/0xaa0
        __driver_probe_device+0x219/0x480
        ? auxiliary_match_id+0xe9/0x140
        driver_probe_device+0x49/0x130
        __device_attach_driver+0x1b8/0x280
        ? driver_allows_async_probing+0x140/0x140
        bus_for_each_drv+0x123/0x1a0
        ? bus_for_each_dev+0x1a0/0x1a0
        ? lockdep_hardirqs_on_prepare+0x286/0x400
        ? trace_hardirqs_on+0x2d/0x100
        __device_attach+0x1a3/0x460
        ? device_driver_attach+0x1e0/0x1e0
        ? kobject_uevent_env+0x22d/0xf10
        bus_probe_device+0x1a2/0x260
        device_add+0x9b1/0x1b40
        ? dev_set_name+0xab/0xe0
        ? __fw_devlink_link_to_suppliers+0x260/0x260
        ? memset+0x20/0x40
        ? lockdep_init_map_type+0x21a/0x7d0
        __auxiliary_device_add+0x88/0xc0
        ? auxiliary_device_init+0x86/0xa0
        mlx5_sf_dev_state_change_handler+0x67e/0x9d0 [mlx5_core]
        blocking_notifier_call_chain+0xd5/0x130
        mlx5_vhca_state_work_handler+0x2b0/0x3f0 [mlx5_core]
        ? mlx5_vhca_event_arm+0x100/0x100 [mlx5_core]
        ? lock_downgrade+0x6e0/0x6e0
        ? lockdep_hardirqs_on_prepare+0x286/0x400
        process_one_work+0x7c2/0x1340
        ? lockdep_hardirqs_on_prepare+0x400/0x400
        ? pwq_dec_nr_in_flight+0x230/0x230
        ? rwlock_bug.part.0+0x90/0x90
        worker_thread+0x59d/0xec0
        ? process_one_work+0x1340/0x1340
        kthread+0x28f/0x330
        ? kthread_complete_and_exit+0x20/0x20
        ret_from_fork+0x1f/0x30
        </TASK>
      
      Fixes: 6a327321
      
       ("net/mlx5: SF, Port function state change support")
      Signed-off-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarShay Drory <shayd@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      d59b73a6
    • Roy Novich's avatar
      net/mlx5: Fix cmd error logging for manage pages cmd · 090f3e4f
      Roy Novich authored
      When the driver unloads, give/reclaim_pages may fail as PF driver in
      teardown flow, current code will lead to the following kernel log print
      'failed reclaiming pages: err 0'.
      
      Fix it to get same behavior as before the cited commits,
      by calling mlx5_cmd_check before handling error state.
      mlx5_cmd_check will verify if the returned error is an actual error
      needed to be handled by the driver or not and will return an
      appropriate value.
      
      Fixes: 8d564292 ("net/mlx5: Remove redundant error on reclaim pages")
      Fixes: 4dac2f10
      
       ("net/mlx5: Remove redundant notify fail on give pages")
      Signed-off-by: default avatarRoy Novich <royno@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      090f3e4f
    • Vlad Buslov's avatar
      net/mlx5: Disable irq when locking lag_lock · 8e93f294
      Vlad Buslov authored
      The lag_lock is taken from both process and softirq contexts which results
      lockdep warning[0] about potential deadlock. However, just disabling
      softirqs by using *_bh spinlock API is not enough since it will cause
      warning in some contexts where the lock is obtained with hard irqs
      disabled. To fix the issue save current irq state, disable them before
      obtaining the lock an re-enable irqs from saved state after releasing it.
      
      [0]:
      
      [Sun Aug  7 13:12:29 2022] ================================
      [Sun Aug  7 13:12:29 2022] WARNING: inconsistent lock state
      [Sun Aug  7 13:12:29 2022] 5.19.0_for_upstream_debug_2022_08_04_16_06 #1 Not tainted
      [Sun Aug  7 13:12:29 2022] --------------------------------
      [Sun Aug  7 13:12:29 2022] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
      [Sun Aug  7 13:12:29 2022] swapper/0/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
      [Sun Aug  7 13:12:29 2022] ffffffffa06dc0d8 (lag_lock){+.?.}-{2:2}, at: mlx5_lag_is_shared_fdb+0x1f/0x120 [mlx5_core]
      [Sun Aug  7 13:12:29 2022] {SOFTIRQ-ON-W} state was registered at:
      [Sun Aug  7 13:12:29 2022]   lock_acquire+0x1c1/0x550
      [Sun Aug  7 13:12:29 2022]   _raw_spin_lock+0x2c/0x40
      [Sun Aug  7 13:12:29 2022]   mlx5_lag_add_netdev+0x13b/0x480 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]   mlx5e_nic_enable+0x114/0x470 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]   mlx5e_attach_netdev+0x30e/0x6a0 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]   mlx5e_resume+0x105/0x160 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]   mlx5e_probe+0xac3/0x14f0 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]   auxiliary_bus_probe+0x9d/0xe0
      [Sun Aug  7 13:12:29 2022]   really_probe+0x1e0/0xaa0
      [Sun Aug  7 13:12:29 2022]   __driver_probe_device+0x219/0x480
      [Sun Aug  7 13:12:29 2022]   driver_probe_device+0x49/0x130
      [Sun Aug  7 13:12:29 2022]   __driver_attach+0x1e4/0x4d0
      [Sun Aug  7 13:12:29 2022]   bus_for_each_dev+0x11e/0x1a0
      [Sun Aug  7 13:12:29 2022]   bus_add_driver+0x3f4/0x5a0
      [Sun Aug  7 13:12:29 2022]   driver_register+0x20f/0x390
      [Sun Aug  7 13:12:29 2022]   __auxiliary_driver_register+0x14e/0x260
      [Sun Aug  7 13:12:29 2022]   mlx5e_init+0x38/0x90 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]   vhost_iotlb_itree_augment_rotate+0xcb/0x180 [vhost_iotlb]
      [Sun Aug  7 13:12:29 2022]   do_one_initcall+0xc4/0x400
      [Sun Aug  7 13:12:29 2022]   do_init_module+0x18a/0x620
      [Sun Aug  7 13:12:29 2022]   load_module+0x563a/0x7040
      [Sun Aug  7 13:12:29 2022]   __do_sys_finit_module+0x122/0x1d0
      [Sun Aug  7 13:12:29 2022]   do_syscall_64+0x3d/0x90
      [Sun Aug  7 13:12:29 2022]   entry_SYSCALL_64_after_hwframe+0x46/0xb0
      [Sun Aug  7 13:12:29 2022] irq event stamp: 3596508
      [Sun Aug  7 13:12:29 2022] hardirqs last  enabled at (3596508): [<ffffffff813687c2>] __local_bh_enable_ip+0xa2/0x100
      [Sun Aug  7 13:12:29 2022] hardirqs last disabled at (3596507): [<ffffffff813687da>] __local_bh_enable_ip+0xba/0x100
      [Sun Aug  7 13:12:29 2022] softirqs last  enabled at (3596488): [<ffffffff81368a2a>] irq_exit_rcu+0x11a/0x170
      [Sun Aug  7 13:12:29 2022] softirqs last disabled at (3596495): [<ffffffff81368a2a>] irq_exit_rcu+0x11a/0x170
      [Sun Aug  7 13:12:29 2022]
                                 other info that might help us debug this:
      [Sun Aug  7 13:12:29 2022]  Possible unsafe locking scenario:
      
      [Sun Aug  7 13:12:29 2022]        CPU0
      [Sun Aug  7 13:12:29 2022]        ----
      [Sun Aug  7 13:12:29 2022]   lock(lag_lock);
      [Sun Aug  7 13:12:29 2022]   <Interrupt>
      [Sun Aug  7 13:12:29 2022]     lock(lag_lock);
      [Sun Aug  7 13:12:29 2022]
                                  *** DEADLOCK ***
      
      [Sun Aug  7 13:12:29 2022] 4 locks held by swapper/0/0:
      [Sun Aug  7 13:12:29 2022]  #0: ffffffff84643260 (rcu_read_lock){....}-{1:2}, at: mlx5e_napi_poll+0x43/0x20a0 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]  #1: ffffffff84643260 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb_list_internal+0x2d7/0xd60
      [Sun Aug  7 13:12:29 2022]  #2: ffff888144a18b58 (&br->hash_lock){+.-.}-{2:2}, at: br_fdb_update+0x301/0x570
      [Sun Aug  7 13:12:29 2022]  #3: ffffffff84643260 (rcu_read_lock){....}-{1:2}, at: atomic_notifier_call_chain+0x5/0x1d0
      [Sun Aug  7 13:12:29 2022]
                                 stack backtrace:
      [Sun Aug  7 13:12:29 2022] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0_for_upstream_debug_2022_08_04_16_06 #1
      [Sun Aug  7 13:12:29 2022] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      [Sun Aug  7 13:12:29 2022] Call Trace:
      [Sun Aug  7 13:12:29 2022]  <IRQ>
      [Sun Aug  7 13:12:29 2022]  dump_stack_lvl+0x57/0x7d
      [Sun Aug  7 13:12:29 2022]  mark_lock.part.0.cold+0x5f/0x92
      [Sun Aug  7 13:12:29 2022]  ? lock_chain_count+0x20/0x20
      [Sun Aug  7 13:12:29 2022]  ? unwind_next_frame+0x1c4/0x1b50
      [Sun Aug  7 13:12:29 2022]  ? secondary_startup_64_no_verify+0xcd/0xdb
      [Sun Aug  7 13:12:29 2022]  ? mlx5e_napi_poll+0x4e9/0x20a0 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]  ? mlx5e_napi_poll+0x4e9/0x20a0 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]  ? stack_access_ok+0x1d0/0x1d0
      [Sun Aug  7 13:12:29 2022]  ? start_kernel+0x3a7/0x3c5
      [Sun Aug  7 13:12:29 2022]  __lock_acquire+0x1260/0x6720
      [Sun Aug  7 13:12:29 2022]  ? lock_chain_count+0x20/0x20
      [Sun Aug  7 13:12:29 2022]  ? lock_chain_count+0x20/0x20
      [Sun Aug  7 13:12:29 2022]  ? register_lock_class+0x1880/0x1880
      [Sun Aug  7 13:12:29 2022]  ? mark_lock.part.0+0xed/0x3060
      [Sun Aug  7 13:12:29 2022]  ? stack_trace_save+0x91/0xc0
      [Sun Aug  7 13:12:29 2022]  lock_acquire+0x1c1/0x550
      [Sun Aug  7 13:12:29 2022]  ? mlx5_lag_is_shared_fdb+0x1f/0x120 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]  ? lockdep_hardirqs_on_prepare+0x400/0x400
      [Sun Aug  7 13:12:29 2022]  ? __lock_acquire+0xd6f/0x6720
      [Sun Aug  7 13:12:29 2022]  _raw_spin_lock+0x2c/0x40
      [Sun Aug  7 13:12:29 2022]  ? mlx5_lag_is_shared_fdb+0x1f/0x120 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]  mlx5_lag_is_shared_fdb+0x1f/0x120 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]  mlx5_esw_bridge_rep_vport_num_vhca_id_get+0x1a0/0x600 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]  ? mlx5_esw_bridge_update_work+0x90/0x90 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]  ? lock_acquire+0x1c1/0x550
      [Sun Aug  7 13:12:29 2022]  mlx5_esw_bridge_switchdev_event+0x185/0x8f0 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]  ? mlx5_esw_bridge_port_obj_attr_set+0x3e0/0x3e0 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]  ? check_chain_key+0x24a/0x580
      [Sun Aug  7 13:12:29 2022]  atomic_notifier_call_chain+0xd7/0x1d0
      [Sun Aug  7 13:12:29 2022]  br_switchdev_fdb_notify+0xea/0x100
      [Sun Aug  7 13:12:29 2022]  ? br_switchdev_set_port_flag+0x310/0x310
      [Sun Aug  7 13:12:29 2022]  fdb_notify+0x11b/0x150
      [Sun Aug  7 13:12:29 2022]  br_fdb_update+0x34c/0x570
      [Sun Aug  7 13:12:29 2022]  ? lock_chain_count+0x20/0x20
      [Sun Aug  7 13:12:29 2022]  ? br_fdb_add_local+0x50/0x50
      [Sun Aug  7 13:12:29 2022]  ? br_allowed_ingress+0x5f/0x1070
      [Sun Aug  7 13:12:29 2022]  ? check_chain_key+0x24a/0x580
      [Sun Aug  7 13:12:29 2022]  br_handle_frame_finish+0x786/0x18e0
      [Sun Aug  7 13:12:29 2022]  ? check_chain_key+0x24a/0x580
      [Sun Aug  7 13:12:29 2022]  ? br_handle_local_finish+0x20/0x20
      [Sun Aug  7 13:12:29 2022]  ? __lock_acquire+0xd6f/0x6720
      [Sun Aug  7 13:12:29 2022]  ? sctp_inet_bind_verify+0x4d/0x190
      [Sun Aug  7 13:12:29 2022]  ? xlog_unpack_data+0x2e0/0x310
      [Sun Aug  7 13:12:29 2022]  ? br_handle_local_finish+0x20/0x20
      [Sun Aug  7 13:12:29 2022]  br_nf_hook_thresh+0x227/0x380 [br_netfilter]
      [Sun Aug  7 13:12:29 2022]  ? setup_pre_routing+0x460/0x460 [br_netfilter]
      [Sun Aug  7 13:12:29 2022]  ? br_handle_local_finish+0x20/0x20
      [Sun Aug  7 13:12:29 2022]  ? br_nf_pre_routing_ipv6+0x48b/0x69c [br_netfilter]
      [Sun Aug  7 13:12:29 2022]  br_nf_pre_routing_finish_ipv6+0x5c2/0xbf0 [br_netfilter]
      [Sun Aug  7 13:12:29 2022]  ? br_handle_local_finish+0x20/0x20
      [Sun Aug  7 13:12:29 2022]  br_nf_pre_routing_ipv6+0x4c6/0x69c [br_netfilter]
      [Sun Aug  7 13:12:29 2022]  ? br_validate_ipv6+0x9e0/0x9e0 [br_netfilter]
      [Sun Aug  7 13:12:29 2022]  ? br_nf_forward_arp+0xb70/0xb70 [br_netfilter]
      [Sun Aug  7 13:12:29 2022]  ? br_nf_pre_routing+0xacf/0x1160 [br_netfilter]
      [Sun Aug  7 13:12:29 2022]  br_handle_frame+0x8a9/0x1270
      [Sun Aug  7 13:12:29 2022]  ? br_handle_frame_finish+0x18e0/0x18e0
      [Sun Aug  7 13:12:29 2022]  ? register_lock_class+0x1880/0x1880
      [Sun Aug  7 13:12:29 2022]  ? br_handle_local_finish+0x20/0x20
      [Sun Aug  7 13:12:29 2022]  ? bond_handle_frame+0xf9/0xac0 [bonding]
      [Sun Aug  7 13:12:29 2022]  ? br_handle_frame_finish+0x18e0/0x18e0
      [Sun Aug  7 13:12:29 2022]  __netif_receive_skb_core+0x7c0/0x2c70
      [Sun Aug  7 13:12:29 2022]  ? check_chain_key+0x24a/0x580
      [Sun Aug  7 13:12:29 2022]  ? generic_xdp_tx+0x5b0/0x5b0
      [Sun Aug  7 13:12:29 2022]  ? __lock_acquire+0xd6f/0x6720
      [Sun Aug  7 13:12:29 2022]  ? register_lock_class+0x1880/0x1880
      [Sun Aug  7 13:12:29 2022]  ? check_chain_key+0x24a/0x580
      [Sun Aug  7 13:12:29 2022]  __netif_receive_skb_list_core+0x2d7/0x8a0
      [Sun Aug  7 13:12:29 2022]  ? lock_acquire+0x1c1/0x550
      [Sun Aug  7 13:12:29 2022]  ? process_backlog+0x960/0x960
      [Sun Aug  7 13:12:29 2022]  ? lockdep_hardirqs_on_prepare+0x129/0x400
      [Sun Aug  7 13:12:29 2022]  ? kvm_clock_get_cycles+0x14/0x20
      [Sun Aug  7 13:12:29 2022]  netif_receive_skb_list_internal+0x5f4/0xd60
      [Sun Aug  7 13:12:29 2022]  ? do_xdp_generic+0x150/0x150
      [Sun Aug  7 13:12:29 2022]  ? mlx5e_poll_rx_cq+0xf6b/0x2960 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]  ? mlx5e_poll_ico_cq+0x3d/0x1590 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]  napi_complete_done+0x188/0x710
      [Sun Aug  7 13:12:29 2022]  mlx5e_napi_poll+0x4e9/0x20a0 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]  ? __queue_work+0x53c/0xeb0
      [Sun Aug  7 13:12:29 2022]  __napi_poll+0x9f/0x540
      [Sun Aug  7 13:12:29 2022]  net_rx_action+0x420/0xb70
      [Sun Aug  7 13:12:29 2022]  ? napi_threaded_poll+0x470/0x470
      [Sun Aug  7 13:12:29 2022]  ? __common_interrupt+0x79/0x1a0
      [Sun Aug  7 13:12:29 2022]  __do_softirq+0x271/0x92c
      [Sun Aug  7 13:12:29 2022]  irq_exit_rcu+0x11a/0x170
      [Sun Aug  7 13:12:29 2022]  common_interrupt+0x7d/0xa0
      [Sun Aug  7 13:12:29 2022]  </IRQ>
      [Sun Aug  7 13:12:29 2022]  <TASK>
      [Sun Aug  7 13:12:29 2022]  asm_common_interrupt+0x22/0x40
      [Sun Aug  7 13:12:29 2022] RIP: 0010:default_idle+0x42/0x60
      [Sun Aug  7 13:12:29 2022] Code: c1 83 e0 07 48 c1 e9 03 83 c0 03 0f b6 14 11 38 d0 7c 04 84 d2 75 14 8b 05 6b f1 22 02 85 c0 7e 07 0f 00 2d 80 3b 4a 00 fb f4 <c3> 48 c7 c7 e0 07 7e 85 e8 21 bd 40 fe eb de 66 66 2e 0f 1f 84 00
      [Sun Aug  7 13:12:29 2022] RSP: 0018:ffffffff84407e18 EFLAGS: 00000242
      [Sun Aug  7 13:12:29 2022] RAX: 0000000000000001 RBX: ffffffff84ec4a68 RCX: 1ffffffff0afc0fc
      [Sun Aug  7 13:12:29 2022] RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffffffff835b1fac
      [Sun Aug  7 13:12:29 2022] RBP: 0000000000000000 R08: 0000000000000001 R09: ffff8884d2c44ac3
      [Sun Aug  7 13:12:29 2022] R10: ffffed109a588958 R11: 00000000ffffffff R12: 0000000000000000
      [Sun Aug  7 13:12:29 2022] R13: ffffffff84efac20 R14: 0000000000000000 R15: dffffc0000000000
      [Sun Aug  7 13:12:29 2022]  ? default_idle_call+0xcc/0x460
      [Sun Aug  7 13:12:29 2022]  default_idle_call+0xec/0x460
      [Sun Aug  7 13:12:29 2022]  do_idle+0x394/0x450
      [Sun Aug  7 13:12:29 2022]  ? arch_cpu_idle_exit+0x40/0x40
      [Sun Aug  7 13:12:29 2022]  cpu_startup_entry+0x19/0x20
      [Sun Aug  7 13:12:29 2022]  rest_init+0x156/0x250
      [Sun Aug  7 13:12:29 2022]  arch_call_rest_init+0xf/0x15
      [Sun Aug  7 13:12:29 2022]  start_kernel+0x3a7/0x3c5
      [Sun Aug  7 13:12:29 2022]  secondary_startup_64_no_verify+0xcd/0xdb
      [Sun Aug  7 13:12:29 2022]  </TASK>
      
      Fixes: ff9b7521
      
       ("net/mlx5: Bridge, support LAG")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarMark Bloch <mbloch@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      8e93f294