Skip to content
  1. Aug 30, 2023
    • Florian Westphal's avatar
      netfilter: nf_tables: fix out of memory error handling · 16cc42cc
      Florian Westphal authored
      [ Upstream commit 5e1be4cd ]
      
      Several instances of pipapo_resize() don't propagate allocation failures,
      this causes a crash when fault injection is enabled for gfp_kernel slabs.
      
      Fixes: 3c4287f6
      
       ("nf_tables: Add set type for arbitrary concatenation of ranges")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      16cc42cc
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: use correct lock to protect gc_list · e05b2a9f
      Pablo Neira Ayuso authored
      [ Upstream commit 8357bc94 ]
      
      Use nf_tables_gc_list_lock spinlock, not nf_tables_destroy_list_lock to
      protect the gc list.
      
      Fixes: 5f68718b
      
       ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e05b2a9f
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: GC transaction race with abort path · e07e6882
      Pablo Neira Ayuso authored
      [ Upstream commit 72034434 ]
      
      Abort path is missing a synchronization point with GC transactions. Add
      GC sequence number hence any GC transaction losing race will be
      discarded.
      
      Fixes: 5f68718b
      
       ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e07e6882
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: flush pending destroy work before netlink notifier · 4167aa47
      Pablo Neira Ayuso authored
      [ Upstream commit 2c9f0293 ]
      
      Destroy work waits for the RCU grace period then it releases the objects
      with no mutex held. All releases objects follow this path for
      transactions, therefore, order is guaranteed and references to top-level
      objects in the hierarchy remain valid.
      
      However, netlink notifier might interfer with pending destroy work.
      rcu_barrier() is not correct because objects are not release via RCU
      callback. Flush destroy work before releasing objects from netlink
      notifier path.
      
      Fixes: d4bc8271
      
       ("netfilter: nf_tables: netlink notifier might race to release objects")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4167aa47
    • Florian Westphal's avatar
      netfilter: nf_tables: validate all pending tables · e290509f
      Florian Westphal authored
      [ Upstream commit 4b80ced9 ]
      
      We have to validate all tables in the transaction that are in
      VALIDATE_DO state, the blamed commit below did not move the break
      statement to its right location so we only validate one table.
      
      Moreover, we can't init table->validate to _SKIP when a table object
      is allocated.
      
      If we do, then if a transcaction creates a new table and then
      fails the transaction, nfnetlink will loop and nft will hang until
      user cancels the command.
      
      Add back the pernet state as a place to stash the last state encountered.
      This is either _DO (we hit an error during commit validation) or _SKIP
      (transaction passed all checks).
      
      Fixes: 00c320f9
      
       ("netfilter: nf_tables: make validation state per table")
      Reported-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e290509f
    • Andrii Staikov's avatar
      i40e: fix potential NULL pointer dereferencing of pf->vf i40e_sync_vsi_filters() · 711ffb6f
      Andrii Staikov authored
      [ Upstream commit 9525a3c3 ]
      
      Add check for pf->vf not being NULL before dereferencing
      pf->vf[vsi->vf_id] in updating VSI filter sync.
      Add a similar check before dereferencing !pf->vf[vsi->vf_id].trusted
      in the condition for clearing promisc mode bit.
      
      Fixes: c87c938f
      
       ("i40e: Add VF VLAN pruning")
      Signed-off-by: default avatarAndrii Staikov <andrii.staikov@intel.com>
      Signed-off-by: default avatarAleksandr Loktionov <aleksandr.loktionov@intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      711ffb6f
    • Jamal Hadi Salim's avatar
      net/sched: fix a qdisc modification with ambiguous command request · 7ac40938
      Jamal Hadi Salim authored
      [ Upstream commit da71714e ]
      
      When replacing an existing root qdisc, with one that is of the same kind, the
      request boils down to essentially a parameterization change  i.e not one that
      requires allocation and grafting of a new qdisc. syzbot was able to create a
      scenario which resulted in a taprio qdisc replacing an existing taprio qdisc
      with a combination of NLM_F_CREATE, NLM_F_REPLACE and NLM_F_EXCL leading to
      create and graft scenario.
      The fix ensures that only when the qdisc kinds are different that we should
      allow a create and graft, otherwise it goes into the "change" codepath.
      
      While at it, fix the code and comments to improve readability.
      
      While syzbot was able to create the issue, it did not zone on the root cause.
      Analysis from Vladimir Oltean <vladimir.oltean@nxp.com> helped narrow it down.
      
      v1->V2 changes:
      - remove "inline" function definition (Vladmir)
      - remove extrenous braces in branches (Vladmir)
      - change inline function names (Pedro)
      - Run tdc tests (Victor)
      v2->v3 changes:
      - dont break else/if (Simon)
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Reported-by: default avatar <syzbot+a3618a167af2021433cd@syzkaller.appspotmail.com>
      Closes: https://lore.kernel.org/netdev/20230816225759.g25x76kmgzya2gei@skbuf/T/
      Tested-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarVictor Nogueira <victor@mojatatu.com>
      Reviewed-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Reviewed-by: default avatarVictor Nogueira <victor@mojatatu.com>
      Signed-off-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7ac40938
    • Sasha Neftin's avatar
      igc: Fix the typo in the PTM Control macro · 0717a95b
      Sasha Neftin authored
      [ Upstream commit de439757 ]
      
      The IGC_PTM_CTRL_SHRT_CYC defines the time between two consecutive PTM
      requests. The bit resolution of this field is six bits. That bit five was
      missing in the mask. This patch comes to correct the typo in the
      IGC_PTM_CTRL_SHRT_CYC macro.
      
      Fixes: a90ec848
      
       ("igc: Add support for PTP getcrosststamp()")
      Signed-off-by: default avatarSasha Neftin <sasha.neftin@intel.com>
      Tested-by: default avatarNaama Meir <naamax.meir@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarKalesh AP <kalesh-anakkur.purayil@broadcom.com>
      Link: https://lore.kernel.org/r/20230821171721.2203572-1-anthony.l.nguyen@intel.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0717a95b
    • Alessio Igor Bogani's avatar
      igb: Avoid starting unnecessary workqueues · 8fe9d54f
      Alessio Igor Bogani authored
      [ Upstream commit b888c510 ]
      
      If ptp_clock_register() fails or CONFIG_PTP isn't enabled, avoid starting
      PTP related workqueues.
      
      In this way we can fix this:
       BUG: unable to handle page fault for address: ffffc9000440b6f8
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 100000067 P4D 100000067 PUD 1001e0067 PMD 107dc5067 PTE 0
       Oops: 0000 [#1] PREEMPT SMP
       [...]
       Workqueue: events igb_ptp_overflow_check
       RIP: 0010:igb_rd32+0x1f/0x60
       [...]
       Call Trace:
        igb_ptp_read_82580+0x20/0x50
        timecounter_read+0x15/0x60
        igb_ptp_overflow_check+0x1a/0x50
        process_one_work+0x1cb/0x3c0
        worker_thread+0x53/0x3f0
        ? rescuer_thread+0x370/0x370
        kthread+0x142/0x160
        ? kthread_associate_blkcg+0xc0/0xc0
        ret_from_fork+0x1f/0x30
      
      Fixes: 1f6e8178 ("igb: Prevent dropped Tx timestamps via work items and interrupts.")
      Fixes: d339b133
      
       ("igb: add PTP Hardware Clock code")
      Signed-off-by: default avatarAlessio Igor Bogani <alessio.bogani@elettra.eu>
      Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20230821171927.2203644-1-anthony.l.nguyen@intel.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8fe9d54f
    • Oliver Hartkopp's avatar
      can: isotp: fix support for transmission of SF without flow control · ecebc084
      Oliver Hartkopp authored
      [ Upstream commit 0bfe7115 ]
      
      The original implementation had a very simple handling for single frame
      transmissions as it just sent the single frame without a timeout handling.
      
      With the new echo frame handling the echo frame was also introduced for
      single frames but the former exception ('simple without timers') has been
      maintained by accident. This leads to a 1 second timeout when closing the
      socket and to an -ECOMM error when CAN_ISOTP_WAIT_TX_DONE is selected.
      
      As the echo handling is always active (also for single frames) remove the
      wrong extra condition for single frames.
      
      Fixes: 9f39d365
      
       ("can: isotp: add support for transmission without flow control")
      Signed-off-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Link: https://lore.kernel.org/r/20230821144547.6658-2-socketcan@hartkopp.net
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ecebc084
    • Daniel Golle's avatar
      net: ethernet: mtk_eth_soc: fix NULL pointer on hw reset · 65009906
      Daniel Golle authored
      [ Upstream commit 604204fc ]
      
      When a hardware reset is triggered on devices not initializing WED the
      calls to mtk_wed_fe_reset and mtk_wed_fe_reset_complete dereference a
      pointer on uninitialized stack memory.
      Break out of both functions in case a hw_list entry is 0.
      
      Fixes: 08a764a7
      
       ("net: ethernet: mtk_wed: add reset/reset_complete callbacks")
      Signed-off-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Acked-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Link: https://lore.kernel.org/r/5465c1609b464cc7407ae1530c40821dcdf9d3e6.1692634266.git.daniel@makrotopia.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      65009906
    • Kees Cook's avatar
      tg3: Use slab_build_skb() when needed · d56f8304
      Kees Cook authored
      [ Upstream commit 99b415fe ]
      
      The tg3 driver will use kmalloc() under some conditions. Check the
      frag_size and use slab_build_skb() when frag_size is 0. Silences
      the warning introduced by commit ce098da1 ("skbuff: Introduce
      slab_build_skb()"):
      
      	Use slab_build_skb() instead
      	...
      	tg3_poll_work+0x638/0xf90 [tg3]
      
      Fixes: ce098da1
      
       ("skbuff: Introduce slab_build_skb()")
      Reported-by: default avatarFiona Ebner <f.ebner@proxmox.com>
      Closes: https://lore.kernel.org/all/1bd4cb9c-4eb8-3bdb-3e05-8689817242d1@proxmox.com
      Cc: Siva Reddy Kallam <siva.kallam@broadcom.com>
      Cc: Prashant Sreedharan <prashant@broadcom.com>
      Cc: Michael Chan <mchan@broadcom.com>
      Cc: Bagas Sanjaya <bagasdotme@gmail.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Link: https://lore.kernel.org/r/20230818175417.never.273-kees@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d56f8304
    • Hangbin Liu's avatar
      selftests: bonding: do not set port down before adding to bond · be7d58c9
      Hangbin Liu authored
      [ Upstream commit be809424 ]
      
      Before adding a port to bond, it need to be set down first. In the
      lacpdu test the author set the port down specifically. But commit
      a4abfa62 ("net: rtnetlink: Enslave device before bringing it up")
      changed the operation order, the kernel will set the port down _after_
      adding to bond. So all the ports will be down at last and the test failed.
      
      In fact, the veth interfaces are already inactive when added. This
      means there's no need to set them down again before adding to the bond.
      Let's just remove the link down operation.
      
      Fixes: a4abfa62
      
       ("net: rtnetlink: Enslave device before bringing it up")
      Reported-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Closes: https://lore.kernel.org/netdev/a0ef07c7-91b0-94bd-240d-944a330fcabd@huawei.com/
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Link: https://lore.kernel.org/r/20230817082459.1685972-1-liuhangbin@gmail.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      be7d58c9
    • Petr Oros's avatar
      ice: Fix NULL pointer deref during VF reset · b995365b
      Petr Oros authored
      [ Upstream commit 67f6317d ]
      
      During stress test with attaching and detaching VF from KVM and
      simultaneously changing VFs spoofcheck and trust there was a
      NULL pointer dereference in ice_reset_vf that VF's VSI is null.
      
      More than one instance of ice_reset_vf() can be running at a given
      time. When we rebuild the VSI in ice_reset_vf, another reset can be
      triaged from ice_service_task. In this case we can access the currently
      uninitialized VSI and cause panic. The window for this racing condition
      has been around for a long time but it's much worse after commit
      227bf450 ("ice: move VSI delete outside deconfig") because
      the reset runs faster. ice_reset_vf() using vf->cfg_lock and when
      we move this lock before accessing to the VF VSI, we can fix
      BUG for all cases.
      
      Panic occurs sometimes in ice_vsi_is_rx_queue_active() and sometimes
      in ice_vsi_stop_all_rx_rings()
      
      With our reproducer, we can hit BUG:
      ~8h before commit 227bf450 ("ice: move VSI delete outside deconfig").
      ~20m after commit 227bf450 ("ice: move VSI delete outside deconfig").
      After this fix we are not able to reproduce it after ~48h
      
      There was commit cf90b743 ("ice: Fix call trace with null VSI during
      VF reset") which also tried to fix this issue, but it was only
      partially resolved and the bug still exists.
      
      [ 6420.658415] BUG: kernel NULL pointer dereference, address: 0000000000000000
      [ 6420.665382] #PF: supervisor read access in kernel mode
      [ 6420.670521] #PF: error_code(0x0000) - not-present page
      [ 6420.675659] PGD 0
      [ 6420.677679] Oops: 0000 [#1] PREEMPT SMP NOPTI
      [ 6420.682038] CPU: 53 PID: 326472 Comm: kworker/53:0 Kdump: loaded Not tainted 5.14.0-317.el9.x86_64 #1
      [ 6420.691250] Hardware name: Dell Inc. PowerEdge R750/04V528, BIOS 1.6.5 04/15/2022
      [ 6420.698729] Workqueue: ice ice_service_task [ice]
      [ 6420.703462] RIP: 0010:ice_vsi_is_rx_queue_active+0x2d/0x60 [ice]
      [ 6420.705860] ice 0000:ca:00.0: VF 0 is now untrusted
      [ 6420.709494] Code: 00 00 66 83 bf 76 04 00 00 00 48 8b 77 10 74 3e 31 c0 eb 0f 0f b7 97 76 04 00 00 48 83 c0 01 39 c2 7e 2b 48 8b 97 68 04 00 00 <0f> b7 0c 42 48 8b 96 20 13 00 00 48 8d 94 8a 00 00 12 00 8b 12 83
      [ 6420.714426] ice 0000:ca:00.0 ens7f0: Setting MAC 22:22:22:22:22:00 on VF 0. VF driver will be reinitialized
      [ 6420.733120] RSP: 0018:ff778d2ff383fdd8 EFLAGS: 00010246
      [ 6420.733123] RAX: 0000000000000000 RBX: ff2acf1916294000 RCX: 0000000000000000
      [ 6420.733125] RDX: 0000000000000000 RSI: ff2acf1f2c6401a0 RDI: ff2acf1a27301828
      [ 6420.762346] RBP: ff2acf1a27301828 R08: 0000000000000010 R09: 0000000000001000
      [ 6420.769476] R10: ff2acf1916286000 R11: 00000000019eba3f R12: ff2acf19066460d0
      [ 6420.776611] R13: ff2acf1f2c6401a0 R14: ff2acf1f2c6401a0 R15: 00000000ffffffff
      [ 6420.783742] FS:  0000000000000000(0000) GS:ff2acf28ffa80000(0000) knlGS:0000000000000000
      [ 6420.791829] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 6420.797575] CR2: 0000000000000000 CR3: 00000016ad410003 CR4: 0000000000773ee0
      [ 6420.804708] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 6420.811034] vfio-pci 0000:ca:01.0: enabling device (0000 -> 0002)
      [ 6420.811840] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 6420.811841] PKRU: 55555554
      [ 6420.811842] Call Trace:
      [ 6420.811843]  <TASK>
      [ 6420.811844]  ice_reset_vf+0x9a/0x450 [ice]
      [ 6420.811876]  ice_process_vflr_event+0x8f/0xc0 [ice]
      [ 6420.841343]  ice_service_task+0x23b/0x600 [ice]
      [ 6420.845884]  ? __schedule+0x212/0x550
      [ 6420.849550]  process_one_work+0x1e2/0x3b0
      [ 6420.853563]  ? rescuer_thread+0x390/0x390
      [ 6420.857577]  worker_thread+0x50/0x3a0
      [ 6420.861242]  ? rescuer_thread+0x390/0x390
      [ 6420.865253]  kthread+0xdd/0x100
      [ 6420.868400]  ? kthread_complete_and_exit+0x20/0x20
      [ 6420.873194]  ret_from_fork+0x1f/0x30
      [ 6420.876774]  </TASK>
      [ 6420.878967] Modules linked in: vfio_pci vfio_pci_core vfio_iommu_type1 vfio iavf vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter nf_tables bridge stp llc sctp ip6_udp_tunnel udp_tunnel nfp tls nfnetlink bluetooth mlx4_en mlx4_core rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill sunrpc intel_rapl_msr intel_rapl_common i10nm_edac nfit libnvdimm ipmi_ssif x86_pkg_temp_thermal intel_powerclamp coretemp irdma kvm_intel i40e kvm iTCO_wdt dcdbas ib_uverbs irqbypass iTCO_vendor_support mgag200 mei_me ib_core dell_smbios isst_if_mmio isst_if_mbox_pci rapl i2c_algo_bit drm_shmem_helper intel_cstate drm_kms_helper syscopyarea sysfillrect isst_if_common sysimgblt intel_uncore fb_sys_fops dell_wmi_descriptor wmi_bmof intel_vsec mei i2c_i801 acpi_ipmi ipmi_si i2c_smbus ipmi_devintf intel_pch_thermal acpi_power_meter pcspk
       r
      
      Fixes: efe41860 ("ice: Fix memory corruption in VF driver")
      Fixes: f23df522
      
       ("ice: Fix spurious interrupt during removal of trusted VF")
      Signed-off-by: default avatarPetr Oros <poros@redhat.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b995365b
    • Petr Oros's avatar
      Revert "ice: Fix ice VF reset during iavf initialization" · 92989287
      Petr Oros authored
      [ Upstream commit 0ecff05e ]
      
      This reverts commit 7255355a.
      
      After this commit we are not able to attach VF to VM:
      virsh attach-interface v0 hostdev --managed 0000:41:01.0 --mac 52:52:52:52:52:52
      error: Failed to attach interface
      error: Cannot set interface MAC to 52:52:52:52:52:52 for ifname enp65s0f0np0 vf 0: Resource temporarily unavailable
      
      ice_check_vf_ready_for_cfg() already contain waiting for reset.
      New condition in ice_check_vf_ready_for_reset() causing only problems.
      
      Fixes: 7255355a
      
       ("ice: Fix ice VF reset during iavf initialization")
      Signed-off-by: default avatarPetr Oros <poros@redhat.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      92989287
    • Jesse Brandeburg's avatar
      ice: fix receive buffer size miscalculation · 8aa038c2
      Jesse Brandeburg authored
      [ Upstream commit 10083aef ]
      
      The driver is misconfiguring the hardware for some values of MTU such that
      it could use multiple descriptors to receive a packet when it could have
      simply used one.
      
      Change the driver to use a round-up instead of the result of a shift, as
      the shift can truncate the lower bits of the size, and result in the
      problem noted above. It also aligns this driver with similar code in i40e.
      
      The insidiousness of this problem is that everything works with the wrong
      size, it's just not working as well as it could, as some MTU sizes end up
      using two or more descriptors, and there is no way to tell that is
      happening without looking at ice_trace or a bus analyzer.
      
      Fixes: efc2214b
      
       ("ice: Add support for XDP")
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Signed-off-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8aa038c2
    • Eric Dumazet's avatar
      ipv4: fix data-races around inet->inet_id · abee4c8e
      Eric Dumazet authored
      [ Upstream commit f866fbc8 ]
      
      UDP sendmsg() is lockless, so ip_select_ident_segs()
      can very well be run from multiple cpus [1]
      
      Convert inet->inet_id to an atomic_t, but implement
      a dedicated path for TCP, avoiding cost of a locked
      instruction (atomic_add_return())
      
      Note that this patch will cause a trivial merge conflict
      because we added inet->flags in net-next tree.
      
      v2: added missing change in
      drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c
      (David Ahern)
      
      [1]
      
      BUG: KCSAN: data-race in __ip_make_skb / __ip_make_skb
      
      read-write to 0xffff888145af952a of 2 bytes by task 7803 on cpu 1:
      ip_select_ident_segs include/net/ip.h:542 [inline]
      ip_select_ident include/net/ip.h:556 [inline]
      __ip_make_skb+0x844/0xc70 net/ipv4/ip_output.c:1446
      ip_make_skb+0x233/0x2c0 net/ipv4/ip_output.c:1560
      udp_sendmsg+0x1199/0x1250 net/ipv4/udp.c:1260
      inet_sendmsg+0x63/0x80 net/ipv4/af_inet.c:830
      sock_sendmsg_nosec net/socket.c:725 [inline]
      sock_sendmsg net/socket.c:748 [inline]
      ____sys_sendmsg+0x37c/0x4d0 net/socket.c:2494
      ___sys_sendmsg net/socket.c:2548 [inline]
      __sys_sendmmsg+0x269/0x500 net/socket.c:2634
      __do_sys_sendmmsg net/socket.c:2663 [inline]
      __se_sys_sendmmsg net/socket.c:2660 [inline]
      __x64_sys_sendmmsg+0x57/0x60 net/socket.c:2660
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      read to 0xffff888145af952a of 2 bytes by task 7804 on cpu 0:
      ip_select_ident_segs include/net/ip.h:541 [inline]
      ip_select_ident include/net/ip.h:556 [inline]
      __ip_make_skb+0x817/0xc70 net/ipv4/ip_output.c:1446
      ip_make_skb+0x233/0x2c0 net/ipv4/ip_output.c:1560
      udp_sendmsg+0x1199/0x1250 net/ipv4/udp.c:1260
      inet_sendmsg+0x63/0x80 net/ipv4/af_inet.c:830
      sock_sendmsg_nosec net/socket.c:725 [inline]
      sock_sendmsg net/socket.c:748 [inline]
      ____sys_sendmsg+0x37c/0x4d0 net/socket.c:2494
      ___sys_sendmsg net/socket.c:2548 [inline]
      __sys_sendmmsg+0x269/0x500 net/socket.c:2634
      __do_sys_sendmmsg net/socket.c:2663 [inline]
      __se_sys_sendmmsg net/socket.c:2660 [inline]
      __x64_sys_sendmmsg+0x57/0x60 net/socket.c:2660
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      value changed: 0x184d -> 0x184e
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 7804 Comm: syz-executor.1 Not tainted 6.5.0-rc6-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
      ==================================================================
      
      Fixes: 23f57406
      
       ("ipv4: avoid using shared IP generator for connected sockets")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      abee4c8e
    • Jakub Kicinski's avatar
      net: validate veth and vxcan peer ifindexes · 3844e0c5
      Jakub Kicinski authored
      [ Upstream commit f534f658 ]
      
      veth and vxcan need to make sure the ifindexes of the peer
      are not negative, core does not validate this.
      
      Using iproute2 with user-space-level checking removed:
      
      Before:
      
        # ./ip link add index 10 type veth peer index -1
        # ip link show
        1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
          link/ether 52:54:00:74:b2:03 brd ff:ff:ff:ff:ff:ff
        10: veth1@veth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
          link/ether 8a:90:ff:57:6d:5d brd ff:ff:ff:ff:ff:ff
        -1: veth0@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
          link/ether ae:ed:18:e6:fa:7f brd ff:ff:ff:ff:ff:ff
      
      Now:
      
        $ ./ip link add index 10 type veth peer index -1
        Error: ifindex can't be negative.
      
      This problem surfaced in net-next because an explicit WARN()
      was added, the root cause is older.
      
      Fixes: e6f8f1a7 ("veth: Allow to create peer link with given ifindex")
      Fixes: a8f820a3
      
       ("can: add Virtual CAN Tunnel driver (vxcan)")
      Reported-by: default avatar <syzbot+5ba06978f34abb058571@syzkaller.appspotmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3844e0c5
    • Ruan Jinjie's avatar
      net: bcmgenet: Fix return value check for fixed_phy_register() · 69179921
      Ruan Jinjie authored
      [ Upstream commit 32bbe64a ]
      
      The fixed_phy_register() function returns error pointers and never
      returns NULL. Update the checks accordingly.
      
      Fixes: b0ba512e
      
       ("net: bcmgenet: enable driver to work without a device tree")
      Signed-off-by: default avatarRuan Jinjie <ruanjinjie@huawei.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Acked-by: default avatarDoug Berger <opendmb@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      69179921
    • Ruan Jinjie's avatar
      net: bgmac: Fix return value check for fixed_phy_register() · d3a74a85
      Ruan Jinjie authored
      [ Upstream commit 23a14488 ]
      
      The fixed_phy_register() function returns error pointers and never
      returns NULL. Update the checks accordingly.
      
      Fixes: c25b23b8
      
       ("bgmac: register fixed PHY for ARM BCM470X / BCM5301X chipsets")
      Signed-off-by: default avatarRuan Jinjie <ruanjinjie@huawei.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d3a74a85
    • Serge Semin's avatar
      net: mdio: mdio-bitbang: Fix C45 read/write protocol · a7cecd33
      Serge Semin authored
      [ Upstream commit 2572ce62 ]
      
      Based on the original code semantic in case of Clause 45 MDIO, the address
      command is supposed to be followed by the command sending the MMD address,
      not the CSR address. The commit 002dd3de ("net: mdio: mdio-bitbang:
      Separate C22 and C45 transactions") has erroneously broken that. So most
      likely due to an unfortunate variable name it switched the code to sending
      the CSR address. In our case it caused the protocol malfunction so the
      read operation always failed with the turnaround bit always been driven to
      one by PHY instead of zero. Fix that by getting back the correct
      behaviour: sending MMD address command right after the regular address
      command.
      
      Fixes: 002dd3de
      
       ("net: mdio: mdio-bitbang: Separate C22 and C45 transactions")
      Signed-off-by: default avatarSerge Semin <fancer.lancer@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a7cecd33
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: fix handling of 802.1X PAE frames · 7e7b2b50
      Arınç ÜNAL authored
      [ Upstream commit e94b590a ]
      
      802.1X PAE frames are link-local frames, therefore they must be trapped to
      the CPU port. Currently, the MT753X switches treat 802.1X PAE frames as
      regular multicast frames, therefore flooding them to user ports. To fix
      this, set 802.1X PAE frames to be trapped to the CPU port(s).
      
      Fixes: b8f126a8
      
       ("net-next: dsa: add dsa support for Mediatek MT7530 switch")
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7e7b2b50
    • Ido Schimmel's avatar
      selftests: mlxsw: Fix test failure on Spectrum-4 · b457f312
      Ido Schimmel authored
      [ Upstream commit f520489e ]
      
      Remove assumptions about shared buffer cell size and instead query the
      cell size from devlink. Adjust the test to send small packets that fit
      inside a single cell.
      
      Tested on Spectrum-{1,2,3,4}.
      
      Fixes: 47354021
      
       ("mlxsw: spectrum: Extend to support Spectrum-4 ASIC")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/f7dfbf3c4d1cb23838d9eb99bab09afaa320c4ca.1692268427.git.petrm@nvidia.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b457f312
    • Amit Cohen's avatar
      mlxsw: Fix the size of 'VIRT_ROUTER_MSB' · 747e71ff
      Amit Cohen authored
      [ Upstream commit 348c976b ]
      
      The field 'virtual router' was extended to 12 bits in Spectrum-4.
      Therefore, the element 'MLXSW_AFK_ELEMENT_VIRT_ROUTER_MSB' needs 3 bits for
      Spectrum < 4 and 4 bits for Spectrum >= 4.
      
      The elements are stored in an internal storage scratchpad. Currently, the
      MSB is defined there as 3 bits. It means that for Spectrum-4, only 2K VRFs
      can be used for multicast routing, as the highest bit is not really used by
      the driver. Fix the definition of 'VIRT_ROUTER_MSB' to use 4 bits. Adjust
      the definitions of 'virtual router' field in the blocks accordingly - use
      '_avoid_size_check' for Spectrum-2 instead of for Spectrum-4. Fix the mask
      in parse function to use 4 bits.
      
      Fixes: 6d5d8ebb
      
       ("mlxsw: Rename virtual router flex key element")
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/79bed2b70f6b9ed58d4df02e9798a23da648015b.1692268427.git.petrm@nvidia.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      747e71ff
    • Ido Schimmel's avatar
      mlxsw: reg: Fix SSPR register layout · 5a76c525
      Ido Schimmel authored
      [ Upstream commit 0dc63b9c ]
      
      The two most significant bits of the "local_port" field in the SSPR
      register are always cleared since they are overwritten by the deprecated
      and overlapping "sub_port" field.
      
      On systems with more than 255 local ports (e.g., Spectrum-4), this
      results in the firmware maintaining invalid mappings between system port
      and local port. Specifically, two different systems ports (0x1 and
      0x101) point to the same local port (0x1), which eventually leads to
      firmware errors.
      
      Fix by removing the deprecated "sub_port" field.
      
      Fixes: fd24b29a
      
       ("mlxsw: reg: Align existing registers to use extended local_port field")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/9b909a3033c8d3d6f67f237306bef4411c5e6ae4.1692268427.git.petrm@nvidia.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5a76c525
    • Danielle Ratson's avatar
      mlxsw: pci: Set time stamp fields also when its type is MIRROR_UTC · 40ffbae5
      Danielle Ratson authored
      [ Upstream commit bc2de151 ]
      
      Currently, in Spectrum-2 and above, time stamps are extracted from the CQE
      into the time stamp fields in 'struct mlxsw_skb_cb', only when the CQE
      time stamp type is UTC. The time stamps are read directly from the CQE and
      software can get the time stamp in UTC format using CQEv2.
      
      From Spectrum-4, the time stamps that are read from the CQE are allowed
      to be also from MIRROR_UTC type.
      
      Therefore, we get a warning [1] from the driver that the time stamp fields
      were not set, when LLDP control packet is sent.
      
      Allow the time stamp type to be MIRROR_UTC and set the time stamp in this
      case as well.
      
      [1]
       WARNING: CPU: 11 PID: 0 at drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c:1409 mlxsw_sp2_ptp_hwtstamp_fill+0x1f/0x70 [mlxsw_spectrum]
      [...]
       Call Trace:
        <IRQ>
        mlxsw_sp2_ptp_receive+0x3c/0x80 [mlxsw_spectrum]
        mlxsw_core_skb_receive+0x119/0x190 [mlxsw_core]
        mlxsw_pci_cq_tasklet+0x3c9/0x780 [mlxsw_pci]
        tasklet_action_common.constprop.0+0x9f/0x110
        __do_softirq+0xbb/0x296
        irq_exit_rcu+0x79/0xa0
        common_interrupt+0x86/0xa0
        </IRQ>
        <TASK>
      
      Fixes: 47354021
      
       ("mlxsw: spectrum: Extend to support Spectrum-4 ASIC")
      Signed-off-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/bcef4d044ef608a4e258d33a7ec0ecd91f480db5.1692268427.git.petrm@nvidia.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      40ffbae5
    • Lu Wei's avatar
      ipvlan: Fix a reference count leak warning in ipvlan_ns_exit() · 3f5a3e02
      Lu Wei authored
      [ Upstream commit 043d5f68 ]
      
      There are two network devices(veth1 and veth3) in ns1, and ipvlan1 with
      L3S mode and ipvlan2 with L2 mode are created based on them as
      figure (1). In this case, ipvlan_register_nf_hook() will be called to
      register nf hook which is needed by ipvlans in L3S mode in ns1 and value
      of ipvl_nf_hook_refcnt is set to 1.
      
      (1)
                 ns1                           ns2
            ------------                  ------------
      
         veth1--ipvlan1 (L3S)
      
         veth3--ipvlan2 (L2)
      
      (2)
                 ns1                           ns2
            ------------                  ------------
      
         veth1--ipvlan1 (L3S)
      
               ipvlan2 (L2)                  veth3
           |                                  |
           |------->-------->--------->--------
                          migrate
      
      When veth3 migrates from ns1 to ns2 as figure (2), veth3 will register in
      ns2 and calls call_netdevice_notifiers with NETDEV_REGISTER event:
      
      dev_change_net_namespace
          call_netdevice_notifiers
              ipvlan_device_event
                  ipvlan_migrate_l3s_hook
                      ipvlan_register_nf_hook(newnet)      (I)
                      ipvlan_unregister_nf_hook(oldnet)    (II)
      
      In function ipvlan_migrate_l3s_hook(), ipvl_nf_hook_refcnt in ns1 is not 0
      since veth1 with ipvlan1 still in ns1, (I) and (II) will be called to
      register nf_hook in ns2 and unregister nf_hook in ns1. As a result,
      ipvl_nf_hook_refcnt in ns1 is decreased incorrectly and this in ns2
      is increased incorrectly. When the second net namespace is removed, a
      reference count leak warning in ipvlan_ns_exit() will be triggered.
      
      This patch add a check before ipvlan_migrate_l3s_hook() is called. The
      warning can be triggered as follows:
      
      $ ip netns add ns1
      $ ip netns add ns2
      $ ip netns exec ns1 ip link add veth1 type veth peer name veth2
      $ ip netns exec ns1 ip link add veth3 type veth peer name veth4
      $ ip netns exec ns1 ip link add ipv1 link veth1 type ipvlan mode l3s
      $ ip netns exec ns1 ip link add ipv2 link veth3 type ipvlan mode l2
      $ ip netns exec ns1 ip link set veth3 netns ns2
      $ ip net del ns2
      
      Fixes: 3133822f
      
       ("ipvlan: use pernet operations and restrict l3s hooks to master netns")
      Signed-off-by: default avatarLu Wei <luwei32@huawei.com>
      Reviewed-by: default avatarFlorian Westphal <fw@strlen.de>
      Link: https://lore.kernel.org/r/20230817145449.141827-1-luwei32@huawei.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3f5a3e02
    • Eric Dumazet's avatar
      dccp: annotate data-races in dccp_poll() · 056e0ce1
      Eric Dumazet authored
      [ Upstream commit cba3f178 ]
      
      We changed tcp_poll() over time, bug never updated dccp.
      
      Note that we also could remove dccp instead of maintaining it.
      
      Fixes: 7c657876
      
       ("[DCCP]: Initial implementation")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20230818015820.2701595-1-edumazet@google.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      056e0ce1
    • Eric Dumazet's avatar
      sock: annotate data-races around prot->memory_pressure · 2a7d2f2b
      Eric Dumazet authored
      [ Upstream commit 76f33296 ]
      
      *prot->memory_pressure is read/writen locklessly, we need
      to add proper annotations.
      
      A recent commit added a new race, it is time to audit all accesses.
      
      Fixes: 2d0c88e8 ("sock: Fix misuse of sk_under_memory_pressure()")
      Fixes: 4d93df0a
      
       ("[SCTP]: Rewrite of sctp buffer management code")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Abel Wu <wuyun.abel@bytedance.com>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Link: https://lore.kernel.org/r/20230818015132.2699348-1-edumazet@google.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2a7d2f2b
    • Vladimir Oltean's avatar
      net: dsa: felix: fix oversize frame dropping for always closed tc-taprio gates · b8bcc45a
      Vladimir Oltean authored
      [ Upstream commit d44036ca ]
      
      The blamed commit resolved a bug where frames would still get stuck at
      egress, even though they're smaller than the maxSDU[tc], because the
      driver did not take into account the extra 33 ns that the queue system
      needs for scheduling the frame.
      
      It now takes that into account, but the arithmetic that we perform in
      vsc9959_tas_remaining_gate_len_ps() is buggy, because we operate on
      64-bit unsigned integers, so gate_len_ns - VSC9959_TAS_MIN_GATE_LEN_NS
      may become a very large integer if gate_len_ns < 33 ns.
      
      In practice, this means that we've introduced a regression where all
      traffic class gates which are permanently closed will not get detected
      by the driver, and we won't enable oversize frame dropping for them.
      
      Before:
      mscc_felix 0000:00:00.5: port 0: max frame size 1526 needs 12400000 ps, 1152000 ps for mPackets at speed 1000
      mscc_felix 0000:00:00.5: port 0 tc 0 min gate len 1000000, sending all frames
      mscc_felix 0000:00:00.5: port 0 tc 1 min gate len 0, sending all frames
      mscc_felix 0000:00:00.5: port 0 tc 2 min gate len 0, sending all frames
      mscc_felix 0000:00:00.5: port 0 tc 3 min gate len 0, sending all frames
      mscc_felix 0000:00:00.5: port 0 tc 4 min gate len 0, sending all frames
      mscc_felix 0000:00:00.5: port 0 tc 5 min gate len 0, sending all frames
      mscc_felix 0000:00:00.5: port 0 tc 6 min gate len 0, sending all frames
      mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 5120 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 615 octets including FCS
      
      After:
      mscc_felix 0000:00:00.5: port 0: max frame size 1526 needs 12400000 ps, 1152000 ps for mPackets at speed 1000
      mscc_felix 0000:00:00.5: port 0 tc 0 min gate len 1000000, sending all frames
      mscc_felix 0000:00:00.5: port 0 tc 1 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS
      mscc_felix 0000:00:00.5: port 0 tc 2 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS
      mscc_felix 0000:00:00.5: port 0 tc 3 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS
      mscc_felix 0000:00:00.5: port 0 tc 4 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS
      mscc_felix 0000:00:00.5: port 0 tc 5 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS
      mscc_felix 0000:00:00.5: port 0 tc 6 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS
      mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 5120 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 615 octets including FCS
      
      Fixes: 11afdc65
      
       ("net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20230817120111.3522827-1-vladimir.oltean@nxp.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b8bcc45a
    • Jiri Pirko's avatar
      devlink: add missing unregister linecard notification · e3b4e527
      Jiri Pirko authored
      [ Upstream commit 2ebbc975 ]
      
      Cited fixes commit introduced linecard notifications for register,
      however it didn't add them for unregister. Fix that by adding them.
      
      Fixes: c246f9b5
      
       ("devlink: add support to create line card and expose to user")
      Signed-off-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20230817125240.2144794-1-jiri@resnulli.us
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e3b4e527
    • Hariprasad Kelam's avatar
      octeontx2-af: SDP: fix receive link config · 0f0dd7b1
      Hariprasad Kelam authored
      [ Upstream commit 05f3d5bc ]
      
      On SDP interfaces, frame oversize and undersize errors are
      observed as driver is not considering packet sizes of all
      subscribers of the link before updating the link config.
      
      This patch fixes the same.
      
      Fixes: 9b7dd87a
      
       ("octeontx2-af: Support to modify min/max allowed packet lengths")
      Signed-off-by: default avatarHariprasad Kelam <hkelam@marvell.com>
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Link: https://lore.kernel.org/r/20230817063006.10366-1-hkelam@marvell.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0f0dd7b1
    • Zheng Yejian's avatar
      tracing: Fix memleak due to race between current_tracer and trace · 2242640e
      Zheng Yejian authored
      [ Upstream commit eecb91b9 ]
      
      Kmemleak report a leak in graph_trace_open():
      
        unreferenced object 0xffff0040b95f4a00 (size 128):
          comm "cat", pid 204981, jiffies 4301155872 (age 99771.964s)
          hex dump (first 32 bytes):
            e0 05 e7 b4 ab 7d 00 00 0b 00 01 00 00 00 00 00 .....}..........
            f4 00 01 10 00 a0 ff ff 00 00 00 00 65 00 10 00 ............e...
          backtrace:
            [<000000005db27c8b>] kmem_cache_alloc_trace+0x348/0x5f0
            [<000000007df90faa>] graph_trace_open+0xb0/0x344
            [<00000000737524cd>] __tracing_open+0x450/0xb10
            [<0000000098043327>] tracing_open+0x1a0/0x2a0
            [<00000000291c3876>] do_dentry_open+0x3c0/0xdc0
            [<000000004015bcd6>] vfs_open+0x98/0xd0
            [<000000002b5f60c9>] do_open+0x520/0x8d0
            [<00000000376c7820>] path_openat+0x1c0/0x3e0
            [<00000000336a54b5>] do_filp_open+0x14c/0x324
            [<000000002802df13>] do_sys_openat2+0x2c4/0x530
            [<0000000094eea458>] __arm64_sys_openat+0x130/0x1c4
            [<00000000a71d7881>] el0_svc_common.constprop.0+0xfc/0x394
            [<00000000313647bf>] do_el0_svc+0xac/0xec
            [<000000002ef1c651>] el0_svc+0x20/0x30
            [<000000002fd4692a>] el0_sync_handler+0xb0/0xb4
            [<000000000c309c35>] el0_sync+0x160/0x180
      
      The root cause is descripted as follows:
      
        __tracing_open() {  // 1. File 'trace' is being opened;
          ...
          *iter->trace = *tr->current_trace;  // 2. Tracer 'function_graph' is
                                              //    currently set;
          ...
          iter->trace->open(iter);  // 3. Call graph_trace_open() here,
                                    //    and memory are allocated in it;
          ...
        }
      
        s_start() {  // 4. The opened file is being read;
          ...
          *iter->trace = *tr->current_trace;  // 5. If tracer is switched to
                                              //    'nop' or others, then memory
                                              //    in step 3 are leaked!!!
          ...
        }
      
      To fix it, in s_start(), close tracer before switching then reopen the
      new tracer after switching. And some tracers like 'wakeup' may not update
      'iter->private' in some cases when reopen, then it should be cleared
      to avoid being mistakenly closed again.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20230817125539.1646321-1-zhengyejian1@huawei.com
      
      Fixes: d7350c3f
      
       ("tracing/core: make the read callbacks reentrants")
      Signed-off-by: default avatarZheng Yejian <zhengyejian1@huawei.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2242640e
    • Sven Schnelle's avatar
      tracing/synthetic: Allocate one additional element for size · 49834a2c
      Sven Schnelle authored
      [ Upstream commit c4d6b543 ]
      
      While debugging another issue I noticed that the stack trace contains one
      invalid entry at the end:
      
      <idle>-0       [008] d..4.    26.484201: wake_lat: pid=0 delta=2629976084 000000009cc24024 stack=STACK:
      => __schedule+0xac6/0x1a98
      => schedule+0x126/0x2c0
      => schedule_timeout+0x150/0x2c0
      => kcompactd+0x9ca/0xc20
      => kthread+0x2f6/0x3d8
      => __ret_from_fork+0x8a/0xe8
      => 0x6b6b6b6b6b6b6b6b
      
      This is because the code failed to add the one element containing the
      number of entries to field_size.
      
      Link: https://lkml.kernel.org/r/20230816154928.4171614-4-svens@linux.ibm.com
      
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Fixes: 00cf3d67
      
       ("tracing: Allow synthetic events to pass around stacktraces")
      Signed-off-by: default avatarSven Schnelle <svens@linux.ibm.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      49834a2c
    • Sven Schnelle's avatar
      tracing/synthetic: Skip first entry for stack traces · 009e77a9
      Sven Schnelle authored
      [ Upstream commit 887f92e0 ]
      
      While debugging another issue I noticed that the stack trace output
      contains the number of entries on top:
      
               <idle>-0       [000] d..4.   203.322502: wake_lat: pid=0 delta=2268270616 stack=STACK:
      => 0x10
      => __schedule+0xac6/0x1a98
      => schedule+0x126/0x2c0
      => schedule_timeout+0x242/0x2c0
      => __wait_for_common+0x434/0x680
      => __wait_rcu_gp+0x198/0x3e0
      => synchronize_rcu+0x112/0x138
      => ring_buffer_reset_online_cpus+0x140/0x2e0
      => tracing_reset_online_cpus+0x15c/0x1d0
      => tracing_set_clock+0x180/0x1d8
      => hist_register_trigger+0x486/0x670
      => event_hist_trigger_parse+0x494/0x1318
      => trigger_process_regex+0x1d4/0x258
      => event_trigger_write+0xb4/0x170
      => vfs_write+0x210/0xad0
      => ksys_write+0x122/0x208
      
      Fix this by skipping the first element. Also replace the pointer
      logic with an index variable which is easier to read.
      
      Link: https://lkml.kernel.org/r/20230816154928.4171614-3-svens@linux.ibm.com
      
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Fixes: 00cf3d67
      
       ("tracing: Allow synthetic events to pass around stacktraces")
      Signed-off-by: default avatarSven Schnelle <svens@linux.ibm.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      009e77a9
    • Sven Schnelle's avatar
      tracing/synthetic: Use union instead of casts · 5c2d886e
      Sven Schnelle authored
      [ Upstream commit ddeea494 ]
      
      The current code uses a lot of casts to access the fields member in struct
      synth_trace_events with different sizes.  This makes the code hard to
      read, and had already introduced an endianness bug. Use a union and struct
      instead.
      
      Link: https://lkml.kernel.org/r/20230816154928.4171614-2-svens@linux.ibm.com
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Fixes: 00cf3d67
      
       ("tracing: Allow synthetic events to pass around stacktraces")
      Signed-off-by: default avatarSven Schnelle <svens@linux.ibm.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Stable-dep-of: 887f92e0
      
       ("tracing/synthetic: Skip first entry for stack traces")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5c2d886e
    • Zheng Yejian's avatar
      tracing: Fix cpu buffers unavailable due to 'record_disabled' missed · 299e0033
      Zheng Yejian authored
      [ Upstream commit b71645d6 ]
      
      Trace ring buffer can no longer record anything after executing
      following commands at the shell prompt:
      
        # cd /sys/kernel/tracing
        # cat tracing_cpumask
        fff
        # echo 0 > tracing_cpumask
        # echo 1 > snapshot
        # echo fff > tracing_cpumask
        # echo 1 > tracing_on
        # echo "hello world" > trace_marker
        -bash: echo: write error: Bad file descriptor
      
      The root cause is that:
        1. After `echo 0 > tracing_cpumask`, 'record_disabled' of cpu buffers
           in 'tr->array_buffer.buffer' became 1 (see tracing_set_cpumask());
        2. After `echo 1 > snapshot`, 'tr->array_buffer.buffer' is swapped
           with 'tr->max_buffer.buffer', then the 'record_disabled' became 0
           (see update_max_tr());
        3. After `echo fff > tracing_cpumask`, the 'record_disabled' become -1;
      Then array_buffer and max_buffer are both unavailable due to value of
      'record_disabled' is not 0.
      
      To fix it, enable or disable both array_buffer and max_buffer at the same
      time in tracing_set_cpumask().
      
      Link: https://lkml.kernel.org/r/20230805033816.3284594-2-zhengyejian1@huawei.com
      
      Cc: <mhiramat@kernel.org>
      Cc: <vnagarnaik@google.com>
      Cc: <shuah@kernel.org>
      Fixes: 71babb27
      
       ("tracing: change CPU ring buffer state from tracing_cpumask")
      Signed-off-by: default avatarZheng Yejian <zhengyejian1@huawei.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      299e0033
    • Randy Dunlap's avatar
      wifi: iwlwifi: mvm: add dependency for PTP clock · f3acc613
      Randy Dunlap authored
      [ Upstream commit 609a1bcd ]
      
      When the code to use the PTP HW clock was added, it didn't update
      the Kconfig entry for the PTP dependency, leading to build errors,
      so update the Kconfig entry to depend on PTP_1588_CLOCK_OPTIONAL.
      
      aarch64-linux-ld: drivers/net/wireless/intel/iwlwifi/mvm/ptp.o: in function `iwl_mvm_ptp_init':
      drivers/net/wireless/intel/iwlwifi/mvm/ptp.c:294: undefined reference to `ptp_clock_register'
      drivers/net/wireless/intel/iwlwifi/mvm/ptp.c:294:(.text+0xce8): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ptp_clock_register'
      aarch64-linux-ld: drivers/net/wireless/intel/iwlwifi/mvm/ptp.c:301: undefined reference to `ptp_clock_index'
      drivers/net/wireless/intel/iwlwifi/mvm/ptp.c:301:(.text+0xd18): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ptp_clock_index'
      aarch64-linux-ld: drivers/net/wireless/intel/iwlwifi/mvm/ptp.o: in function `iwl_mvm_ptp_remove':
      drivers/net/wireless/intel/iwlwifi/mvm/ptp.c:315: undefined reference to `ptp_clock_index'
      drivers/net/wireless/intel/iwlwifi/mvm/ptp.c:315:(.text+0xe80): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ptp_clock_index'
      aarch64-linux-ld: drivers/net/wireless/intel/iwlwifi/mvm/ptp.c:319: undefined reference to `ptp_clock_unregister'
      drivers/net/wireless/intel/iwlwifi/mvm/ptp.c:319:(.text+0xeac): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ptp_clock_unregister'
      
      Fixes: 1595ecce
      
       ("wifi: iwlwifi: mvm: add support for PTP HW clock (PHC)")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Link: https://lore.kernel.org/all/202308110447.4QSJHmFH-lkp@intel.com/
      Cc: Krishnanand Prabhu <krishnanand.prabhu@intel.com>
      Cc: Luca Coelho <luciano.coelho@intel.com>
      Cc: Gregory Greenman <gregory.greenman@intel.com>
      Cc: Johannes Berg <johannes.berg@intel.com>
      Cc: Kalle Valo <kvalo@kernel.org>
      Cc: linux-wireless@vger.kernel.org
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: netdev@vger.kernel.org
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Tested-by: Simon Horman <horms@kernel.org> # build-tested
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Acked-by: default avatarGregory Greenman <gregory.greenman@intel.com>
      Link: https://lore.kernel.org/r/20230812052947.22913-1-rdunlap@infradead.org
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f3acc613
    • Eric Dumazet's avatar
      can: raw: fix lockdep issue in raw_release() · 7f35e561
      Eric Dumazet authored
      [ Upstream commit 11c9027c ]
      
      syzbot complained about a lockdep issue [1]
      
      Since raw_bind() and raw_setsockopt() first get RTNL
      before locking the socket, we must adopt the same order in raw_release()
      
      [1]
      WARNING: possible circular locking dependency detected
      6.5.0-rc1-syzkaller-00192-g78adb4bcf99e #0 Not tainted
      ------------------------------------------------------
      syz-executor.0/14110 is trying to acquire lock:
      ffff88804e4b6130 (sk_lock-AF_CAN){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1708 [inline]
      ffff88804e4b6130 (sk_lock-AF_CAN){+.+.}-{0:0}, at: raw_bind+0xb1/0xab0 net/can/raw.c:435
      
      but task is already holding lock:
      ffffffff8e3df368 (rtnl_mutex){+.+.}-{3:3}, at: raw_bind+0xa7/0xab0 net/can/raw.c:434
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (rtnl_mutex){+.+.}-{3:3}:
      __mutex_lock_common kernel/locking/mutex.c:603 [inline]
      __mutex_lock+0x181/0x1340 kernel/locking/mutex.c:747
      raw_release+0x1c6/0x9b0 net/can/raw.c:391
      __sock_release+0xcd/0x290 net/socket.c:654
      sock_close+0x1c/0x20 net/socket.c:1386
      __fput+0x3fd/0xac0 fs/file_table.c:384
      task_work_run+0x14d/0x240 kernel/task_work.c:179
      resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
      exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
      exit_to_user_mode_prepare+0x210/0x240 kernel/entry/common.c:204
      __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
      syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:297
      do_syscall_64+0x44/0xb0 arch/x86/entry/common.c:86
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      -> #0 (sk_lock-AF_CAN){+.+.}-{0:0}:
      check_prev_add kernel/locking/lockdep.c:3142 [inline]
      check_prevs_add kernel/locking/lockdep.c:3261 [inline]
      validate_chain kernel/locking/lockdep.c:3876 [inline]
      __lock_acquire+0x2e3d/0x5de0 kernel/locking/lockdep.c:5144
      lock_acquire kernel/locking/lockdep.c:5761 [inline]
      lock_acquire+0x1ae/0x510 kernel/locking/lockdep.c:5726
      lock_sock_nested+0x3a/0xf0 net/core/sock.c:3492
      lock_sock include/net/sock.h:1708 [inline]
      raw_bind+0xb1/0xab0 net/can/raw.c:435
      __sys_bind+0x1ec/0x220 net/socket.c:1792
      __do_sys_bind net/socket.c:1803 [inline]
      __se_sys_bind net/socket.c:1801 [inline]
      __x64_sys_bind+0x72/0xb0 net/socket.c:1801
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      other info that might help us debug this:
      
      Possible unsafe locking scenario:
      
      CPU0 CPU1
      ---- ----
      lock(rtnl_mutex);
              lock(sk_lock-AF_CAN);
              lock(rtnl_mutex);
      lock(sk_lock-AF_CAN);
      
      *** DEADLOCK ***
      
      1 lock held by syz-executor.0/14110:
      
      stack backtrace:
      CPU: 0 PID: 14110 Comm: syz-executor.0 Not tainted 6.5.0-rc1-syzkaller-00192-g78adb4bcf99e #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/03/2023
      Call Trace:
      <TASK>
      __dump_stack lib/dump_stack.c:88 [inline]
      dump_stack_lvl+0xd9/0x1b0 lib/dump_stack.c:106
      check_noncircular+0x311/0x3f0 kernel/locking/lockdep.c:2195
      check_prev_add kernel/locking/lockdep.c:3142 [inline]
      check_prevs_add kernel/locking/lockdep.c:3261 [inline]
      validate_chain kernel/locking/lockdep.c:3876 [inline]
      __lock_acquire+0x2e3d/0x5de0 kernel/locking/lockdep.c:5144
      lock_acquire kernel/locking/lockdep.c:5761 [inline]
      lock_acquire+0x1ae/0x510 kernel/locking/lockdep.c:5726
      lock_sock_nested+0x3a/0xf0 net/core/sock.c:3492
      lock_sock include/net/sock.h:1708 [inline]
      raw_bind+0xb1/0xab0 net/can/raw.c:435
      __sys_bind+0x1ec/0x220 net/socket.c:1792
      __do_sys_bind net/socket.c:1803 [inline]
      __se_sys_bind net/socket.c:1801 [inline]
      __x64_sys_bind+0x72/0xb0 net/socket.c:1801
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      RIP: 0033:0x7fd89007cb29
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fd890d2a0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000031
      RAX: ffffffffffffffda RBX: 00007fd89019bf80 RCX: 00007fd89007cb29
      RDX: 0000000000000010 RSI: 0000000020000040 RDI: 0000000000000003
      RBP: 00007fd8900c847a R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 000000000000000b R14: 00007fd89019bf80 R15: 00007ffebf8124f8
      </TASK>
      
      Fixes: ee8b94c8
      
       ("can: raw: fix receiver memory leak")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Ziyang Xuan <william.xuanziyang@huawei.com>
      Cc: Oliver Hartkopp <socketcan@hartkopp.net>
      Cc: stable@vger.kernel.org
      Cc: Marc Kleine-Budde <mkl@pengutronix.de>
      Link: https://lore.kernel.org/all/20230720114438.172434-1-edumazet@google.com
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7f35e561
    • Ziyang Xuan's avatar
      can: raw: fix receiver memory leak · c8ddbaec
      Ziyang Xuan authored
      [ Upstream commit ee8b94c8 ]
      
      Got kmemleak errors with the following ltp can_filter testcase:
      
      for ((i=1; i<=100; i++))
      do
              ./can_filter &
              sleep 0.1
      done
      
      ==============================================================
      [<00000000db4a4943>] can_rx_register+0x147/0x360 [can]
      [<00000000a289549d>] raw_setsockopt+0x5ef/0x853 [can_raw]
      [<000000006d3d9ebd>] __sys_setsockopt+0x173/0x2c0
      [<00000000407dbfec>] __x64_sys_setsockopt+0x61/0x70
      [<00000000fd468496>] do_syscall_64+0x33/0x40
      [<00000000b7e47d51>] entry_SYSCALL_64_after_hwframe+0x61/0xc6
      
      It's a bug in the concurrent scenario of unregister_netdevice_many()
      and raw_release() as following:
      
                   cpu0                                        cpu1
      unregister_netdevice_many(can_dev)
        unlist_netdevice(can_dev) // dev_get_by_index() return NULL after this
        net_set_todo(can_dev)
      						raw_release(can_socket)
      						  dev = dev_get_by_index(, ro->ifindex); // dev == NULL
      						  if (dev) { // receivers in dev_rcv_lists not free because dev is NULL
      						    raw_disable_allfilters(, dev, );
      						    dev_put(dev);
      						  }
      						  ...
      						  ro->bound = 0;
      						  ...
      
      call_netdevice_notifiers(NETDEV_UNREGISTER, )
        raw_notify(, NETDEV_UNREGISTER, )
          if (ro->bound) // invalid because ro->bound has been set 0
            raw_disable_allfilters(, dev, ); // receivers in dev_rcv_lists will never be freed
      
      Add a net_device pointer member in struct raw_sock to record bound
      can_dev, and use rtnl_lock to serialize raw_socket members between
      raw_bind(), raw_release(), raw_setsockopt() and raw_notify(). Use
      ro->dev to decide whether to free receivers in dev_rcv_lists.
      
      Fixes: 8d0caedb
      
       ("can: bcm/raw/isotp: use per module netdevice notifier")
      Reviewed-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Acked-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: default avatarZiyang Xuan <william.xuanziyang@huawei.com>
      Link: https://lore.kernel.org/all/20230711011737.1969582-1-william.xuanziyang@huawei.com
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c8ddbaec