Skip to content
  1. Aug 11, 2023
    • Duoming Zhou's avatar
      net: usb: lan78xx: reorder cleanup operations to avoid UAF bugs · a54bf862
      Duoming Zhou authored
      [ Upstream commit 1e7417c1 ]
      
      The timer dev->stat_monitor can schedule the delayed work dev->wq and
      the delayed work dev->wq can also arm the dev->stat_monitor timer.
      
      When the device is detaching, the net_device will be deallocated. but
      the net_device private data could still be dereferenced in delayed work
      or timer handler. As a result, the UAF bugs will happen.
      
      One racy situation is shown below:
      
            (Thread 1)                 |      (Thread 2)
      lan78xx_stat_monitor()           |
       ...                             |  lan78xx_disconnect()
       lan78xx_defer_kevent()          |    ...
        ...                            |    cancel_delayed_work_sync(&dev->wq);
        schedule_delayed_work()        |    ...
        (wait some time)               |    free_netdev(net); //free net_device
        lan78xx_delayedwork()          |
        //use net_device private data  |
        dev-> //use                    |
      
      Although we use cancel_delayed_work_sync() to cancel the delayed work
      in lan78xx_disconnect(), it could still be scheduled in timer handler
      lan78xx_stat_monitor().
      
      Another racy situation is shown below:
      
            (Thread 1)                |      (Thread 2)
      lan78xx_delayedwork             |
       mod_timer()                    |  lan78xx_disconnect()
                                      |   cancel_delayed_work_sync()
       (wait some time)               |   if (timer_pending(&dev->stat_monitor))
                   	                |       del_timer_sync(&dev->stat_monitor);
       lan78xx_stat_monitor()         |   ...
        lan78xx_defer_kevent()        |   free_netdev(net); //free
         //use net_device private data|
         dev-> //use                  |
      
      Although we use del_timer_sync() to delete the timer, the function
      timer_pending() returns 0 when the timer is activated. As a result,
      the del_timer_sync() will not be executed and the timer could be
      re-armed.
      
      In order to mitigate this bug, We use timer_shutdown_sync() to shutdown
      the timer and then use cancel_delayed_work_sync() to cancel the delayed
      work. As a result, the net_device could be deallocated safely.
      
      What's more, the dev->flags is set to EVENT_DEV_DISCONNECT in
      lan78xx_disconnect(). But it could still be set to EVENT_STAT_UPDATE
      in lan78xx_stat_monitor(). So this patch put the set_bit() behind
      timer_shutdown_sync().
      
      Fixes: 77dfff5b
      
       ("lan78xx: Fix race condition in disconnect handling")
      Signed-off-by: default avatarDuoming Zhou <duoming@zju.edu.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a54bf862
    • Kuniyuki Iwashima's avatar
      net/sched: taprio: Limit TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME to INT_MAX. · 57b3fe08
      Kuniyuki Iwashima authored
      [ Upstream commit e7397184 ]
      
      syzkaller found zero division error [0] in div_s64_rem() called from
      get_cycle_time_elapsed(), where sched->cycle_time is the divisor.
      
      We have tests in parse_taprio_schedule() so that cycle_time will never
      be 0, and actually cycle_time is not 0 in get_cycle_time_elapsed().
      
      The problem is that the types of divisor are different; cycle_time is
      s64, but the argument of div_s64_rem() is s32.
      
      syzkaller fed this input and 0x100000000 is cast to s32 to be 0.
      
        @TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME={0xc, 0x8, 0x100000000}
      
      We use s64 for cycle_time to cast it to ktime_t, so let's keep it and
      set max for cycle_time.
      
      While at it, we prevent overflow in setup_txtime() and add another
      test in parse_taprio_schedule() to check if cycle_time overflows.
      
      Also, we add a new tdc test case for this issue.
      
      [0]:
      divide error: 0000 [#1] PREEMPT SMP KASAN NOPTI
      CPU: 1 PID: 103 Comm: kworker/1:3 Not tainted 6.5.0-rc1-00330-g60cc1f7d0605 #3
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      Workqueue: ipv6_addrconf addrconf_dad_work
      RIP: 0010:div_s64_rem include/linux/math64.h:42 [inline]
      RIP: 0010:get_cycle_time_elapsed net/sched/sch_taprio.c:223 [inline]
      RIP: 0010:find_entry_to_transmit+0x252/0x7e0 net/sched/sch_taprio.c:344
      Code: 3c 02 00 0f 85 5e 05 00 00 48 8b 4c 24 08 4d 8b bd 40 01 00 00 48 8b 7c 24 48 48 89 c8 4c 29 f8 48 63 f7 48 99 48 89 74 24 70 <48> f7 fe 48 29 d1 48 8d 04 0f 49 89 cc 48 89 44 24 20 49 8d 85 10
      RSP: 0018:ffffc90000acf260 EFLAGS: 00010206
      RAX: 177450e0347560cf RBX: 0000000000000000 RCX: 177450e0347560cf
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000100000000
      RBP: 0000000000000056 R08: 0000000000000000 R09: ffffed10020a0934
      R10: ffff8880105049a7 R11: ffff88806cf3a520 R12: ffff888010504800
      R13: ffff88800c00d800 R14: ffff8880105049a0 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff88806cf00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f0edf84f0e8 CR3: 000000000d73c002 CR4: 0000000000770ee0
      PKRU: 55555554
      Call Trace:
       <TASK>
       get_packet_txtime net/sched/sch_taprio.c:508 [inline]
       taprio_enqueue_one+0x900/0xff0 net/sched/sch_taprio.c:577
       taprio_enqueue+0x378/0xae0 net/sched/sch_taprio.c:658
       dev_qdisc_enqueue+0x46/0x170 net/core/dev.c:3732
       __dev_xmit_skb net/core/dev.c:3821 [inline]
       __dev_queue_xmit+0x1b2f/0x3000 net/core/dev.c:4169
       dev_queue_xmit include/linux/netdevice.h:3088 [inline]
       neigh_resolve_output net/core/neighbour.c:1552 [inline]
       neigh_resolve_output+0x4a7/0x780 net/core/neighbour.c:1532
       neigh_output include/net/neighbour.h:544 [inline]
       ip6_finish_output2+0x924/0x17d0 net/ipv6/ip6_output.c:135
       __ip6_finish_output+0x620/0xaa0 net/ipv6/ip6_output.c:196
       ip6_finish_output net/ipv6/ip6_output.c:207 [inline]
       NF_HOOK_COND include/linux/netfilter.h:292 [inline]
       ip6_output+0x206/0x410 net/ipv6/ip6_output.c:228
       dst_output include/net/dst.h:458 [inline]
       NF_HOOK.constprop.0+0xea/0x260 include/linux/netfilter.h:303
       ndisc_send_skb+0x872/0xe80 net/ipv6/ndisc.c:508
       ndisc_send_ns+0xb5/0x130 net/ipv6/ndisc.c:666
       addrconf_dad_work+0xc14/0x13f0 net/ipv6/addrconf.c:4175
       process_one_work+0x92c/0x13a0 kernel/workqueue.c:2597
       worker_thread+0x60f/0x1240 kernel/workqueue.c:2748
       kthread+0x2fe/0x3f0 kernel/kthread.c:389
       ret_from_fork+0x2c/0x50 arch/x86/entry/entry_64.S:308
       </TASK>
      Modules linked in:
      
      Fixes: 4cfd5779
      
       ("taprio: Add support for txtime-assist mode")
      Reported-by: default avatarsyzkaller <syzkaller@googlegroups.com>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Co-developed-by: default avatarEric Dumazet <edumazet@google.com>
      Co-developed-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Acked-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      57b3fe08
    • Eric Dumazet's avatar
      net: annotate data-races around sk->sk_priority · 7e7c4fde
      Eric Dumazet authored
      [ Upstream commit 8bf43be7 ]
      
      sk_getsockopt() runs locklessly. This means sk->sk_priority
      can be read while other threads are changing its value.
      
      Other reads also happen without socket lock being held.
      
      Add missing annotations where needed.
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7e7c4fde
    • Eric Dumazet's avatar
      net: add missing data-race annotation for sk_ll_usec · 9ceaff15
      Eric Dumazet authored
      [ Upstream commit e5f0d2dd ]
      
      In a prior commit I forgot that sk_getsockopt() reads
      sk->sk_ll_usec without holding a lock.
      
      Fixes: 0dbffbb5
      
       ("net: annotate data race around sk_ll_usec")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9ceaff15
    • Eric Dumazet's avatar
      net: add missing data-race annotations around sk->sk_peek_off · eb2604f0
      Eric Dumazet authored
      [ Upstream commit 11695c6e ]
      
      sk_getsockopt() runs locklessly, thus we need to annotate the read
      of sk->sk_peek_off.
      
      While we are at it, add corresponding annotations to sk_set_peek_off()
      and unix_set_peek_off().
      
      Fixes: b9bb53f3
      
       ("sock: convert sk_peek_offset functions to WRITE_ONCE")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      eb2604f0
    • Eric Dumazet's avatar
      net: annotate data-races around sk->sk_mark · b76d2fa6
      Eric Dumazet authored
      [ Upstream commit 3c5b4d69 ]
      
      sk->sk_mark is often read while another thread could change the value.
      
      Fixes: 4a19ec58
      
       ("[NET]: Introducing socket mark socket option.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b76d2fa6
    • Eric Dumazet's avatar
      net: add missing READ_ONCE(sk->sk_rcvbuf) annotation · ea47de09
      Eric Dumazet authored
      [ Upstream commit b4b55325 ]
      
      In a prior commit, I forgot to change sk_getsockopt()
      when reading sk->sk_rcvbuf locklessly.
      
      Fixes: ebb3b78d
      
       ("tcp: annotate sk->sk_rcvbuf lockless reads")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ea47de09
    • Eric Dumazet's avatar
      net: add missing READ_ONCE(sk->sk_sndbuf) annotation · 4b5bda45
      Eric Dumazet authored
      [ Upstream commit 74bc0843 ]
      
      In a prior commit, I forgot to change sk_getsockopt()
      when reading sk->sk_sndbuf locklessly.
      
      Fixes: e292f05e
      
       ("tcp: annotate sk->sk_sndbuf lockless reads")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4b5bda45
    • Eric Dumazet's avatar
      net: add missing READ_ONCE(sk->sk_rcvlowat) annotation · 4685a86b
      Eric Dumazet authored
      [ Upstream commit e6d12bdb ]
      
      In a prior commit, I forgot to change sk_getsockopt()
      when reading sk->sk_rcvlowat locklessly.
      
      Fixes: eac66402
      
       ("net: annotate sk->sk_rcvlowat lockless reads")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4685a86b
    • Eric Dumazet's avatar
      net: annotate data-races around sk->sk_max_pacing_rate · 98ee7a0f
      Eric Dumazet authored
      [ Upstream commit ea7f45ef ]
      
      sk_getsockopt() runs locklessly. This means sk->sk_max_pacing_rate
      can be read while other threads are changing its value.
      
      Fixes: 62748f32
      
       ("net: introduce SO_MAX_PACING_RATE")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      98ee7a0f
    • Eric Dumazet's avatar
      net: annotate data-race around sk->sk_txrehash · d0e273bc
      Eric Dumazet authored
      [ Upstream commit c76a0328 ]
      
      sk_getsockopt() runs locklessly. This means sk->sk_txrehash
      can be read while other threads are changing its value.
      
      Other locations were handled in commit cb6cd2ce
      ("tcp: Change SYN ACK retransmit behaviour to account for rehash")
      
      Fixes: 26859240
      
       ("txhash: Add socket option to control TX hash rethink behavior")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Akhmat Karakotov <hmukos@yandex-team.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d0e273bc
    • Eric Dumazet's avatar
      net: annotate data-races around sk->sk_reserved_mem · 6269d3ea
      Eric Dumazet authored
      [ Upstream commit fe11fdcb ]
      
      sk_getsockopt() runs locklessly. This means sk->sk_reserved_mem
      can be read while other threads are changing its value.
      
      Add missing annotations where they are needed.
      
      Fixes: 2bb2f5fb
      
       ("net: add new socket option SO_RESERVE_MEM")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Wei Wang <weiwan@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6269d3ea
    • Richard Gobert's avatar
      net: gro: fix misuse of CB in udp socket lookup · 5ac34598
      Richard Gobert authored
      [ Upstream commit 7938cd15 ]
      
      This patch fixes a misuse of IP{6}CB(skb) in GRO, while calling to
      `udp6_lib_lookup2` when handling udp tunnels. `udp6_lib_lookup2` fetch the
      device from CB. The fix changes it to fetch the device from `skb->dev`.
      l3mdev case requires special attention since it has a master and a slave
      device.
      
      Fixes: a6024562
      
       ("udp: Add GRO functions to UDP socket")
      Reported-by: default avatarGal Pressman <gal@nvidia.com>
      Signed-off-by: default avatarRichard Gobert <richardbgobert@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5ac34598
    • Eric Dumazet's avatar
      net: move gso declarations and functions to their own files · bbe07adb
      Eric Dumazet authored
      [ Upstream commit d457a0e3
      
       ]
      
      Move declarations into include/net/gso.h and code into net/core/gso.c
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Stanislav Fomichev <sdf@google.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20230608191738.3947077-1-edumazet@google.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Stable-dep-of: 7938cd15
      
       ("net: gro: fix misuse of CB in udp socket lookup")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      bbe07adb
    • Konstantin Khorenko's avatar
      qed: Fix scheduling in a tasklet while getting stats · 3e0d2545
      Konstantin Khorenko authored
      [ Upstream commit e346e231 ]
      
      Here we've got to a situation when tasklet called usleep_range() in PTT
      acquire logic, thus welcome to the "scheduling while atomic" BUG().
      
        BUG: scheduling while atomic: swapper/24/0/0x00000100
      
         [<ffffffffb41c6199>] schedule+0x29/0x70
         [<ffffffffb41c5512>] schedule_hrtimeout_range_clock+0xb2/0x150
         [<ffffffffb41c55c3>] schedule_hrtimeout_range+0x13/0x20
         [<ffffffffb41c3bcf>] usleep_range+0x4f/0x70
         [<ffffffffc08d3e58>] qed_ptt_acquire+0x38/0x100 [qed]
         [<ffffffffc08eac48>] _qed_get_vport_stats+0x458/0x580 [qed]
         [<ffffffffc08ead8c>] qed_get_vport_stats+0x1c/0xd0 [qed]
         [<ffffffffc08dffd3>] qed_get_protocol_stats+0x93/0x100 [qed]
                              qed_mcp_send_protocol_stats
                  case MFW_DRV_MSG_GET_LAN_STATS:
                  case MFW_DRV_MSG_GET_FCOE_STATS:
                  case MFW_DRV_MSG_GET_ISCSI_STATS:
                  case MFW_DRV_MSG_GET_RDMA_STATS:
         [<ffffffffc08e36d8>] qed_mcp_handle_events+0x2d8/0x890 [qed]
                              qed_int_assertion
                              qed_int_attentions
         [<ffffffffc08d9490>] qed_int_sp_dpc+0xa50/0xdc0 [qed]
         [<ffffffffb3aa7623>] tasklet_action+0x83/0x140
         [<ffffffffb41d9125>] __do_softirq+0x125/0x2bb
         [<ffffffffb41d560c>] call_softirq+0x1c/0x30
         [<ffffffffb3a30645>] do_softirq+0x65/0xa0
         [<ffffffffb3aa78d5>] irq_exit+0x105/0x110
         [<ffffffffb41d8996>] do_IRQ+0x56/0xf0
      
      Fix this by making caller to provide the context whether it could be in
      atomic context flow or not when getting stats from QED driver.
      QED driver based on the context provided decide to schedule out or not
      when acquiring the PTT BAR window.
      
      We faced the BUG_ON() while getting vport stats, but according to the
      code same issue could happen for fcoe and iscsi statistics as well, so
      fixing them too.
      
      Fixes: 6c754246 ("qed: Add support for NCSI statistics.")
      Fixes: 1e128c81 ("qed: Add support for hardware offloaded FCoE.")
      Fixes: 2f2b2614
      
       ("qed: Provide iSCSI statistics to management")
      Cc: Sudarsana Kalluru <skalluru@marvell.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Manish Chopra <manishc@marvell.com>
      
      Signed-off-by: default avatarKonstantin Khorenko <khorenko@virtuozzo.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3e0d2545
    • Thierry Reding's avatar
      net: stmmac: tegra: Properly allocate clock bulk data · 3a234a48
      Thierry Reding authored
      [ Upstream commit a0b1b205 ]
      
      The clock data is an array of struct clk_bulk_data, so make sure to
      allocate enough memory.
      
      Fixes: d8ca1137
      
       ("net: stmmac: tegra: Add MGBE support")
      Signed-off-by: default avatarThierry Reding <treding@nvidia.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3a234a48
    • Chengfeng Ye's avatar
      mISDN: hfcpci: Fix potential deadlock on &hc->lock · ea496e48
      Chengfeng Ye authored
      [ Upstream commit 56c6be35 ]
      
      As &hc->lock is acquired by both timer _hfcpci_softirq() and hardirq
      hfcpci_int(), the timer should disable irq before lock acquisition
      otherwise deadlock could happen if the timmer is preemtped by the hadr irq.
      
      Possible deadlock scenario:
      hfcpci_softirq() (timer)
          -> _hfcpci_softirq()
          -> spin_lock(&hc->lock);
              <irq interruption>
              -> hfcpci_int()
              -> spin_lock(&hc->lock); (deadlock here)
      
      This flaw was found by an experimental static analysis tool I am developing
      for irq-related deadlock.
      
      The tentative patch fixes the potential deadlock by spin_lock_irq()
      in timer.
      
      Fixes: b36b654a
      
       ("mISDN: Create /sys/class/mISDN")
      Signed-off-by: default avatarChengfeng Ye <dg573847474@gmail.com>
      Link: https://lore.kernel.org/r/20230727085619.7419-1-dg573847474@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ea496e48
    • Jamal Hadi Salim's avatar
      net: sched: cls_u32: Fix match key mis-addressing · de14cff7
      Jamal Hadi Salim authored
      [ Upstream commit e68409db
      
       ]
      
      A match entry is uniquely identified with an "address" or "path" in the
      form of: hashtable ID(12b):bucketid(8b):nodeid(12b).
      
      When creating table match entries all of hash table id, bucket id and
      node (match entry id) are needed to be either specified by the user or
      reasonable in-kernel defaults are used. The in-kernel default for a table id is
      0x800(omnipresent root table); for bucketid it is 0x0. Prior to this fix there
      was none for a nodeid i.e. the code assumed that the user passed the correct
      nodeid and if the user passes a nodeid of 0 (as Mingi Cho did) then that is what
      was used. But nodeid of 0 is reserved for identifying the table. This is not
      a problem until we dump. The dump code notices that the nodeid is zero and
      assumes it is referencing a table and therefore references table struct
      tc_u_hnode instead of what was created i.e match entry struct tc_u_knode.
      
      Ming does an equivalent of:
      tc filter add dev dummy0 parent 10: prio 1 handle 0x1000 \
      protocol ip u32 match ip src 10.0.0.1/32 classid 10:1 action ok
      
      Essentially specifying a table id 0, bucketid 1 and nodeid of zero
      Tableid 0 is remapped to the default of 0x800.
      Bucketid 1 is ignored and defaults to 0x00.
      Nodeid was assumed to be what Ming passed - 0x000
      
      dumping before fix shows:
      ~$ tc filter ls dev dummy0 parent 10:
      filter protocol ip pref 1 u32 chain 0
      filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor 1
      filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor -30591
      
      Note that the last line reports a table instead of a match entry
      (you can tell this because it says "ht divisor...").
      As a result of reporting the wrong data type (misinterpretting of struct
      tc_u_knode as being struct tc_u_hnode) the divisor is reported with value
      of -30591. Ming identified this as part of the heap address
      (physmap_base is 0xffff8880 (-30591 - 1)).
      
      The fix is to ensure that when table entry matches are added and no
      nodeid is specified (i.e nodeid == 0) then we get the next available
      nodeid from the table's pool.
      
      After the fix, this is what the dump shows:
      $ tc filter ls dev dummy0 parent 10:
      filter protocol ip pref 1 u32 chain 0
      filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor 1
      filter protocol ip pref 1 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 flowid 10:1 not_in_hw
        match 0a000001/ffffffff at 12
      	action order 1: gact action pass
      	 random type none pass val 0
      	 index 1 ref 1 bind 1
      
      Reported-by: default avatarMingi Cho <mgcho.minic@gmail.com>
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Link: https://lore.kernel.org/r/20230726135151.416917-1-jhs@mojatatu.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      de14cff7
    • Georg Müller's avatar
      perf test uprobe_from_different_cu: Skip if there is no gcc · 4034838a
      Georg Müller authored
      [ Upstream commit 98ce8e4a ]
      
      Without gcc, the test will fail.
      
      On cleanup, ignore probe removal errors. Otherwise, in case of an error
      adding the probe, the temporary directory is not removed.
      
      Fixes: 56cbeacf
      
       ("perf probe: Add test for regression introduced by switch to die_get_decl_file()")
      Signed-off-by: default avatarGeorg Müller <georgmueller@gmx.net>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Georg Müller <georgmueller@gmx.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20230728151812.454806-2-georgmueller@gmx.net
      Link: https://lore.kernel.org/r/CAP-5=fUP6UuLgRty3t2=fQsQi3k4hDMz415vWdp1x88QMvZ8ug@mail.gmail.com/
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4034838a
    • Yuanjun Gong's avatar
      net: dsa: fix value check in bcm_sf2_sw_probe() · 76d0f82f
      Yuanjun Gong authored
      [ Upstream commit dadc5b86 ]
      
      in bcm_sf2_sw_probe(), check the return value of clk_prepare_enable()
      and return the error code if clk_prepare_enable() returns an
      unexpected value.
      
      Fixes: e9ec5c3b
      
       ("net: dsa: bcm_sf2: request and handle clocks")
      Signed-off-by: default avatarYuanjun Gong <ruc_gongyuanjun@163.com>
      Reviewed-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Link: https://lore.kernel.org/r/20230726170506.16547-1-ruc_gongyuanjun@163.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      76d0f82f
    • Lin Ma's avatar
      rtnetlink: let rtnl_bridge_setlink checks IFLA_BRIDGE_MODE length · 00757f58
      Lin Ma authored
      [ Upstream commit d73ef2d6 ]
      
      There are totally 9 ndo_bridge_setlink handlers in the current kernel,
      which are 1) bnxt_bridge_setlink, 2) be_ndo_bridge_setlink 3)
      i40e_ndo_bridge_setlink 4) ice_bridge_setlink 5)
      ixgbe_ndo_bridge_setlink 6) mlx5e_bridge_setlink 7)
      nfp_net_bridge_setlink 8) qeth_l2_bridge_setlink 9) br_setlink.
      
      By investigating the code, we find that 1-7 parse and use nlattr
      IFLA_BRIDGE_MODE but 3 and 4 forget to do the nla_len check. This can
      lead to an out-of-attribute read and allow a malformed nlattr (e.g.,
      length 0) to be viewed as a 2 byte integer.
      
      To avoid such issues, also for other ndo_bridge_setlink handlers in the
      future. This patch adds the nla_len check in rtnl_bridge_setlink and
      does an early error return if length mismatches. To make it works, the
      break is removed from the parsing for IFLA_BRIDGE_FLAGS to make sure
      this nla_for_each_nested iterates every attribute.
      
      Fixes: b1edc14a ("ice: Implement ice_bridge_getlink and ice_bridge_setlink")
      Fixes: 51616018
      
       ("i40e: Add support for getlink, setlink ndo ops")
      Suggested-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Reviewed-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Link: https://lore.kernel.org/r/20230726075314.1059224-1-linma@zju.edu.cn
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      00757f58
    • Lin Ma's avatar
      bpf: Add length check for SK_DIAG_BPF_STORAGE_REQ_MAP_FD parsing · 95b2e27b
      Lin Ma authored
      [ Upstream commit bcc29b7f ]
      
      The nla_for_each_nested parsing in function bpf_sk_storage_diag_alloc
      does not check the length of the nested attribute. This can lead to an
      out-of-attribute read and allow a malformed nlattr (e.g., length 0) to
      be viewed as a 4 byte integer.
      
      This patch adds an additional check when the nlattr is getting counted.
      This makes sure the latter nla_get_u32 can access the attributes with
      the correct length.
      
      Fixes: 1ed4d924
      
       ("bpf: INET_DIAG support in bpf_sk_storage")
      Suggested-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Reviewed-by: default avatarJakub Kicinski <kuba@kernel.org>
      Link: https://lore.kernel.org/r/20230725023330.422856-1-linma@zju.edu.cn
      
      
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      95b2e27b
    • Shay Drory's avatar
      net/mlx5: Unregister devlink params in case interface is down · 471f59b3
      Shay Drory authored
      [ Upstream commit 53d737df ]
      
      Currently, in case an interface is down, mlx5 driver doesn't
      unregister its devlink params, which leads to this WARN[1].
      Fix it by unregistering devlink params in that case as well.
      
      [1]
      [  295.244769 ] WARNING: CPU: 15 PID: 1 at net/core/devlink.c:9042 devlink_free+0x174/0x1fc
      [  295.488379 ] CPU: 15 PID: 1 Comm: shutdown Tainted: G S         OE 5.15.0-1017.19.3.g0677e61-bluefield #g0677e61
      [  295.509330 ] Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS 4.2.0.12761 Jun  6 2023
      [  295.543096 ] pc : devlink_free+0x174/0x1fc
      [  295.551104 ] lr : mlx5_devlink_free+0x18/0x2c [mlx5_core]
      [  295.561816 ] sp : ffff80000809b850
      [  295.711155 ] Call trace:
      [  295.716030 ]  devlink_free+0x174/0x1fc
      [  295.723346 ]  mlx5_devlink_free+0x18/0x2c [mlx5_core]
      [  295.733351 ]  mlx5_sf_dev_remove+0x98/0xb0 [mlx5_core]
      [  295.743534 ]  auxiliary_bus_remove+0x2c/0x50
      [  295.751893 ]  __device_release_driver+0x19c/0x280
      [  295.761120 ]  device_release_driver+0x34/0x50
      [  295.769649 ]  bus_remove_device+0xdc/0x170
      [  295.777656 ]  device_del+0x17c/0x3a4
      [  295.784620 ]  mlx5_sf_dev_remove+0x28/0xf0 [mlx5_core]
      [  295.794800 ]  mlx5_sf_dev_table_destroy+0x98/0x110 [mlx5_core]
      [  295.806375 ]  mlx5_unload+0x34/0xd0 [mlx5_core]
      [  295.815339 ]  mlx5_unload_one+0x70/0xe4 [mlx5_core]
      [  295.824998 ]  shutdown+0xb0/0xd8 [mlx5_core]
      [  295.833439 ]  pci_device_shutdown+0x3c/0xa0
      [  295.841651 ]  device_shutdown+0x170/0x340
      [  295.849486 ]  __do_sys_reboot+0x1f4/0x2a0
      [  295.857322 ]  __arm64_sys_reboot+0x2c/0x40
      [  295.865329 ]  invoke_syscall+0x78/0x100
      [  295.872817 ]  el0_svc_common.constprop.0+0x54/0x184
      [  295.882392 ]  do_el0_svc+0x30/0xac
      [  295.889008 ]  el0_svc+0x48/0x160
      [  295.895278 ]  el0t_64_sync_handler+0xa4/0x130
      [  295.903807 ]  el0t_64_sync+0x1a4/0x1a8
      [  295.911120 ] ---[ end trace 4f1d2381d00d9dce  ]---
      
      Fixes: fe578cbb
      
       ("net/mlx5: Move devlink registration before mlx5_load")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarMaher Sanalla <msanalla@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      471f59b3
    • Chris Mi's avatar
      net/mlx5: fs_chains: Fix ft prio if ignore_flow_level is not supported · 3280f8a4
      Chris Mi authored
      [ Upstream commit 61eab651 ]
      
      The cited commit sets ft prio to fs_base_prio. But if
      ignore_flow_level it not supported, ft prio must be set based on
      tc filter prio. Otherwise, all the ft prio are the same on the same
      chain. It is invalid if ignore_flow_level is not supported.
      
      Fix it by setting ft prio based on tc filter prio and setting
      fs_base_prio to 0 for fdb.
      
      Fixes: 8e80e564
      
       ("net/mlx5: fs_chains: Refactor to detach chains from tc usage")
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarPaul Blakey <paulb@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3280f8a4
    • Jianbo Liu's avatar
      net/mlx5e: kTLS, Fix protection domain in use syndrome when devlink reload · bd964343
      Jianbo Liu authored
      [ Upstream commit 3e4cf1dd ]
      
      There are DEK objects cached in DEK pool after kTLS is used, and they
      are freed only in mlx5e_ktls_cleanup().
      
      mlx5e_destroy_mdev_resources() is called in mlx5e_suspend() to
      free mdev resources, including protection domain (PD). However, PD is
      still referenced by the cached DEK objects in this case, because
      profile->cleanup() (and therefore mlx5e_ktls_cleanup()) is called
      after mlx5e_suspend() during devlink reload. So the following FW
      syndrome is generated:
      
       mlx5_cmd_out_err:803:(pid 12948): DEALLOC_PD(0x801) op_mod(0x0) failed,
          status bad resource state(0x9), syndrome (0xef0c8a), err(-22)
      
      To avoid this syndrome, move DEK pool destruction to
      mlx5e_ktls_cleanup_tx(), which is called by profile->cleanup_tx(). And
      move pool creation to mlx5e_ktls_init_tx() for symmetry.
      
      Fixes: f741db1a
      
       ("net/mlx5e: kTLS, Improve connection rate by using fast update encryption key")
      Signed-off-by: default avatarJianbo Liu <jianbol@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      bd964343
    • Dragos Tatulea's avatar
      net/mlx5e: xsk: Fix crash on regular rq reactivation · 02a84eb2
      Dragos Tatulea authored
      [ Upstream commit 39646d9b ]
      
      When the regular rq is reactivated after the XSK socket is closed
      it could be reading stale cqes which eventually corrupts the rq.
      This leads to no more traffic being received on the regular rq and a
      crash on the next close or deactivation of the rq.
      
      Kal Cuttler Conely reported this issue as a crash on the release
      path when the xdpsock sample program is stopped (killed) and restarted
      in sequence while traffic is running.
      
      This patch flushes all cqes when during the rq flush. The cqe flushing
      is done in the reset state of the rq. mlx5e_rq_to_ready code is moved
      into the flush function to allow for this.
      
      Fixes: 082a9edf
      
       ("net/mlx5e: xsk: Flush RQ on XSK activation to save memory")
      Reported-by: default avatarKal Cutter Conley <kal.conley@dectris.com>
      Closes: https://lore.kernel.org/xdp-newbies/CAHApi-nUAs4TeFWUDV915CZJo07XVg2Vp63-no7UDfj6wur9nQ@mail.gmail.com
      
      
      Signed-off-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      02a84eb2
    • Dragos Tatulea's avatar
      net/mlx5e: xsk: Fix invalid buffer access for legacy rq · 58a113a3
      Dragos Tatulea authored
      [ Upstream commit e0f52298 ]
      
      The below crash can be encountered when using xdpsock in rx mode for
      legacy rq: the buffer gets released in the XDP_REDIRECT path, and then
      once again in the driver. This fix sets the flag to avoid releasing on
      the driver side.
      
      XSK handling of buffers for legacy rq was relying on the caller to set
      the skip release flag. But the referenced fix started using fragment
      counts for pages instead of the skip flag.
      
      Crash log:
       general protection fault, probably for non-canonical address 0xffff8881217e3a: 0000 [#1] SMP
       CPU: 0 PID: 14 Comm: ksoftirqd/0 Not tainted 6.5.0-rc1+ #31
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       RIP: 0010:bpf_prog_03b13f331978c78c+0xf/0x28
       Code:  ...
       RSP: 0018:ffff88810082fc98 EFLAGS: 00010246
       RAX: 0000000000000000 RBX: ffff888138404901 RCX: c0ffffc900027cbc
       RDX: ffffffffa000b514 RSI: 00ffff8881217e32 RDI: ffff888138404901
       RBP: ffff88810082fc98 R08: 0000000000091100 R09: 0000000000000006
       R10: 0000000000000800 R11: 0000000000000800 R12: ffffc9000027a000
       R13: ffff8881217e2dc0 R14: ffff8881217e2910 R15: ffff8881217e2f00
       FS:  0000000000000000(0000) GS:ffff88852c800000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000564cb2e2cde0 CR3: 000000010e603004 CR4: 0000000000370eb0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
        <TASK>
        ? die_addr+0x32/0x80
        ? exc_general_protection+0x192/0x390
        ? asm_exc_general_protection+0x22/0x30
        ? 0xffffffffa000b514
        ? bpf_prog_03b13f331978c78c+0xf/0x28
        mlx5e_xdp_handle+0x48/0x670 [mlx5_core]
        ? dev_gro_receive+0x3b5/0x6e0
        mlx5e_xsk_skb_from_cqe_linear+0x6e/0x90 [mlx5_core]
        mlx5e_handle_rx_cqe+0x55/0x100 [mlx5_core]
        mlx5e_poll_rx_cq+0x87/0x6e0 [mlx5_core]
        mlx5e_napi_poll+0x45e/0x6b0 [mlx5_core]
        __napi_poll+0x25/0x1a0
        net_rx_action+0x28a/0x300
        __do_softirq+0xcd/0x279
        ? sort_range+0x20/0x20
        run_ksoftirqd+0x1a/0x20
        smpboot_thread_fn+0xa2/0x130
        kthread+0xc9/0xf0
        ? kthread_complete_and_exit+0x20/0x20
        ret_from_fork+0x1f/0x30
        </TASK>
       Modules linked in: mlx5_ib mlx5_core rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter overlay zram zsmalloc fuse [last unloaded: mlx5_core]
       ---[ end trace 0000000000000000 ]---
      
      Fixes: 7abd955a
      
       ("net/mlx5e: RX, Fix page_pool page fragment tracking for XDP")
      Signed-off-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      58a113a3
    • Jianbo Liu's avatar
      net/mlx5e: Move representor neigh cleanup to profile cleanup_tx · 36697c59
      Jianbo Liu authored
      [ Upstream commit d03b6e6f ]
      
      For IP tunnel encapsulation in ECMP (Equal-Cost Multipath) mode, as
      the flow is duplicated to the peer eswitch, the related neighbour
      information on the peer uplink representor is created as well.
      
      In the cited commit, eswitch devcom unpair is moved to uplink unload
      API, specifically the profile->cleanup_tx. If there is a encap rule
      offloaded in ECMP mode, when one eswitch does unpair (because of
      unloading the driver, for instance), and the peer rule from the peer
      eswitch is going to be deleted, the use-after-free error is triggered
      while accessing neigh info, as it is already cleaned up in uplink's
      profile->disable, which is before its profile->cleanup_tx.
      
      To fix this issue, move the neigh cleanup to profile's cleanup_tx
      callback, and after mlx5e_cleanup_uplink_rep_tx is called. The neigh
      init is moved to init_tx for symmeter.
      
      [ 2453.376299] BUG: KASAN: slab-use-after-free in mlx5e_rep_neigh_entry_release+0x109/0x3a0 [mlx5_core]
      [ 2453.379125] Read of size 4 at addr ffff888127af9008 by task modprobe/2496
      
      [ 2453.381542] CPU: 7 PID: 2496 Comm: modprobe Tainted: G    B              6.4.0-rc7+ #15
      [ 2453.383386] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      [ 2453.384335] Call Trace:
      [ 2453.384625]  <TASK>
      [ 2453.384891]  dump_stack_lvl+0x33/0x50
      [ 2453.385285]  print_report+0xc2/0x610
      [ 2453.385667]  ? __virt_addr_valid+0xb1/0x130
      [ 2453.386091]  ? mlx5e_rep_neigh_entry_release+0x109/0x3a0 [mlx5_core]
      [ 2453.386757]  kasan_report+0xae/0xe0
      [ 2453.387123]  ? mlx5e_rep_neigh_entry_release+0x109/0x3a0 [mlx5_core]
      [ 2453.387798]  mlx5e_rep_neigh_entry_release+0x109/0x3a0 [mlx5_core]
      [ 2453.388465]  mlx5e_rep_encap_entry_detach+0xa6/0xe0 [mlx5_core]
      [ 2453.389111]  mlx5e_encap_dealloc+0xa7/0x100 [mlx5_core]
      [ 2453.389706]  mlx5e_tc_tun_encap_dests_unset+0x61/0xb0 [mlx5_core]
      [ 2453.390361]  mlx5_free_flow_attr_actions+0x11e/0x340 [mlx5_core]
      [ 2453.391015]  ? complete_all+0x43/0xd0
      [ 2453.391398]  ? free_flow_post_acts+0x38/0x120 [mlx5_core]
      [ 2453.392004]  mlx5e_tc_del_fdb_flow+0x4ae/0x690 [mlx5_core]
      [ 2453.392618]  mlx5e_tc_del_fdb_peers_flow+0x308/0x370 [mlx5_core]
      [ 2453.393276]  mlx5e_tc_clean_fdb_peer_flows+0xf5/0x140 [mlx5_core]
      [ 2453.393925]  mlx5_esw_offloads_unpair+0x86/0x540 [mlx5_core]
      [ 2453.394546]  ? mlx5_esw_offloads_set_ns_peer.isra.0+0x180/0x180 [mlx5_core]
      [ 2453.395268]  ? down_write+0xaa/0x100
      [ 2453.395652]  mlx5_esw_offloads_devcom_event+0x203/0x530 [mlx5_core]
      [ 2453.396317]  mlx5_devcom_send_event+0xbb/0x190 [mlx5_core]
      [ 2453.396917]  mlx5_esw_offloads_devcom_cleanup+0xb0/0xd0 [mlx5_core]
      [ 2453.397582]  mlx5e_tc_esw_cleanup+0x42/0x120 [mlx5_core]
      [ 2453.398182]  mlx5e_rep_tc_cleanup+0x15/0x30 [mlx5_core]
      [ 2453.398768]  mlx5e_cleanup_rep_tx+0x6c/0x80 [mlx5_core]
      [ 2453.399367]  mlx5e_detach_netdev+0xee/0x120 [mlx5_core]
      [ 2453.399957]  mlx5e_netdev_change_profile+0x84/0x170 [mlx5_core]
      [ 2453.400598]  mlx5e_vport_rep_unload+0xe0/0xf0 [mlx5_core]
      [ 2453.403781]  mlx5_eswitch_unregister_vport_reps+0x15e/0x190 [mlx5_core]
      [ 2453.404479]  ? mlx5_eswitch_register_vport_reps+0x200/0x200 [mlx5_core]
      [ 2453.405170]  ? up_write+0x39/0x60
      [ 2453.405529]  ? kernfs_remove_by_name_ns+0xb7/0xe0
      [ 2453.405985]  auxiliary_bus_remove+0x2e/0x40
      [ 2453.406405]  device_release_driver_internal+0x243/0x2d0
      [ 2453.406900]  ? kobject_put+0x42/0x2d0
      [ 2453.407284]  bus_remove_device+0x128/0x1d0
      [ 2453.407687]  device_del+0x240/0x550
      [ 2453.408053]  ? waiting_for_supplier_show+0xe0/0xe0
      [ 2453.408511]  ? kobject_put+0xfa/0x2d0
      [ 2453.408889]  ? __kmem_cache_free+0x14d/0x280
      [ 2453.409310]  mlx5_rescan_drivers_locked.part.0+0xcd/0x2b0 [mlx5_core]
      [ 2453.409973]  mlx5_unregister_device+0x40/0x50 [mlx5_core]
      [ 2453.410561]  mlx5_uninit_one+0x3d/0x110 [mlx5_core]
      [ 2453.411111]  remove_one+0x89/0x130 [mlx5_core]
      [ 2453.411628]  pci_device_remove+0x59/0xf0
      [ 2453.412026]  device_release_driver_internal+0x243/0x2d0
      [ 2453.412511]  ? parse_option_str+0x14/0x90
      [ 2453.412915]  driver_detach+0x7b/0xf0
      [ 2453.413289]  bus_remove_driver+0xb5/0x160
      [ 2453.413685]  pci_unregister_driver+0x3f/0xf0
      [ 2453.414104]  mlx5_cleanup+0xc/0x20 [mlx5_core]
      
      Fixes: 2be5bd42
      
       ("net/mlx5: Handle pairing of E-switch via uplink un/load APIs")
      Signed-off-by: default avatarJianbo Liu <jianbol@nvidia.com>
      Reviewed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      36697c59
    • Amir Tzin's avatar
      net/mlx5e: Fix crash moving to switchdev mode when ntuple offload is set · 90c226e4
      Amir Tzin authored
      [ Upstream commit 3ec43c1b ]
      
      Moving to switchdev mode with ntuple offload on causes the kernel to
      crash since fs->arfs is freed during nic profile cleanup flow.
      
      Ntuple offload is not supported in switchdev mode and it is already
      unset by mlx5 fix feature ndo in switchdev mode. Verify fs->arfs is
      valid before disabling it.
      
      trace:
      [] RIP: 0010:_raw_spin_lock_bh+0x17/0x30
      [] arfs_del_rules+0x44/0x1a0 [mlx5_core]
      [] mlx5e_arfs_disable+0xe/0x20 [mlx5_core]
      [] mlx5e_handle_feature+0x3d/0xb0 [mlx5_core]
      [] ? __rtnl_unlock+0x25/0x50
      [] mlx5e_set_features+0xfe/0x160 [mlx5_core]
      [] __netdev_update_features+0x278/0xa50
      [] ? netdev_run_todo+0x5e/0x2a0
      [] netdev_update_features+0x22/0x70
      [] ? _cond_resched+0x15/0x30
      [] mlx5e_attach_netdev+0x12a/0x1e0 [mlx5_core]
      [] mlx5e_netdev_attach_profile+0xa1/0xc0 [mlx5_core]
      [] mlx5e_netdev_change_profile+0x77/0xe0 [mlx5_core]
      [] mlx5e_vport_rep_load+0x1ed/0x290 [mlx5_core]
      [] mlx5_esw_offloads_rep_load+0x88/0xd0 [mlx5_core]
      [] esw_offloads_load_rep.part.38+0x31/0x50 [mlx5_core]
      [] esw_offloads_enable+0x6c5/0x710 [mlx5_core]
      [] mlx5_eswitch_enable_locked+0x1bb/0x290 [mlx5_core]
      [] mlx5_devlink_eswitch_mode_set+0x14f/0x320 [mlx5_core]
      [] devlink_nl_cmd_eswitch_set_doit+0x94/0x120
      [] genl_family_rcv_msg_doit.isra.17+0x113/0x150
      [] genl_family_rcv_msg+0xb7/0x170
      [] ? devlink_nl_cmd_port_split_doit+0x100/0x100
      [] genl_rcv_msg+0x47/0xa0
      [] ? genl_family_rcv_msg+0x170/0x170
      [] netlink_rcv_skb+0x4c/0x130
      [] genl_rcv+0x24/0x40
      [] netlink_unicast+0x19a/0x230
      [] netlink_sendmsg+0x204/0x3d0
      [] sock_sendmsg+0x50/0x60
      
      Fixes: 90b22b9b
      
       ("net/mlx5e: Disable Rx ntuple offload for uplink representor")
      Signed-off-by: default avatarAmir Tzin <amirtz@nvidia.com>
      Reviewed-by: default avatarAya Levin <ayal@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      90c226e4
    • Chris Mi's avatar
      net/mlx5e: Don't hold encap tbl lock if there is no encap action · 2e76da7b
      Chris Mi authored
      [ Upstream commit 93a33193 ]
      
      The cited commit holds encap tbl lock unconditionally when setting
      up dests. But it may cause the following deadlock:
      
       PID: 1063722  TASK: ffffa062ca5d0000  CPU: 13   COMMAND: "handler8"
        #0 [ffffb14de05b7368] __schedule at ffffffffa1d5aa91
        #1 [ffffb14de05b7410] schedule at ffffffffa1d5afdb
        #2 [ffffb14de05b7430] schedule_preempt_disabled at ffffffffa1d5b528
        #3 [ffffb14de05b7440] __mutex_lock at ffffffffa1d5d6cb
        #4 [ffffb14de05b74e8] mutex_lock_nested at ffffffffa1d5ddeb
        #5 [ffffb14de05b74f8] mlx5e_tc_tun_encap_dests_set at ffffffffc12f2096 [mlx5_core]
        #6 [ffffb14de05b7568] post_process_attr at ffffffffc12d9fc5 [mlx5_core]
        #7 [ffffb14de05b75a0] mlx5e_tc_add_fdb_flow at ffffffffc12de877 [mlx5_core]
        #8 [ffffb14de05b75f0] __mlx5e_add_fdb_flow at ffffffffc12e0eef [mlx5_core]
        #9 [ffffb14de05b7660] mlx5e_tc_add_flow at ffffffffc12e12f7 [mlx5_core]
       #10 [ffffb14de05b76b8] mlx5e_configure_flower at ffffffffc12e1686 [mlx5_core]
       #11 [ffffb14de05b7720] mlx5e_rep_indr_offload at ffffffffc12e3817 [mlx5_core]
       #12 [ffffb14de05b7730] mlx5e_rep_indr_setup_tc_cb at ffffffffc12e388a [mlx5_core]
       #13 [ffffb14de05b7740] tc_setup_cb_add at ffffffffa1ab2ba8
       #14 [ffffb14de05b77a0] fl_hw_replace_filter at ffffffffc0bdec2f [cls_flower]
       #15 [ffffb14de05b7868] fl_change at ffffffffc0be6caa [cls_flower]
       #16 [ffffb14de05b7908] tc_new_tfilter at ffffffffa1ab71f0
      
      [1031218.028143]  wait_for_completion+0x24/0x30
      [1031218.028589]  mlx5e_update_route_decap_flows+0x9a/0x1e0 [mlx5_core]
      [1031218.029256]  mlx5e_tc_fib_event_work+0x1ad/0x300 [mlx5_core]
      [1031218.029885]  process_one_work+0x24e/0x510
      
      Actually no need to hold encap tbl lock if there is no encap action.
      Fix it by checking if encap action exists or not before holding
      encap tbl lock.
      
      Fixes: 37c3b9fa
      
       ("net/mlx5e: Prevent encap offload when neigh update is running")
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2e76da7b
    • Shay Drory's avatar
      net/mlx5: Honor user input for migratable port fn attr · 0302414c
      Shay Drory authored
      [ Upstream commit 0507f2c8 ]
      
      Currently, whenever a user is setting migratable port fn attr, the
      driver is always turn migratable capability on.
      Fix it by honor the user input
      
      Fixes: e5b9642a
      
       ("net/mlx5: E-Switch, Implement devlink port function cmds to control migratable")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0302414c
    • Yuanjun Gong's avatar
      net/mlx5e: fix return value check in mlx5e_ipsec_remove_trailer() · cc94d516
      Yuanjun Gong authored
      [ Upstream commit e5bcb756 ]
      
      mlx5e_ipsec_remove_trailer() should return an error code if function
      pskb_trim() returns an unexpected value.
      
      Fixes: 2ac9cfe7
      
       ("net/mlx5e: IPSec, Add Innova IPSec offload TX data path")
      Signed-off-by: default avatarYuanjun Gong <ruc_gongyuanjun@163.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      cc94d516
    • Zhengchao Shao's avatar
      net/mlx5: fix potential memory leak in mlx5e_init_rep_rx · c265d8c2
      Zhengchao Shao authored
      [ Upstream commit c6cf0b60 ]
      
      The memory pointed to by the priv->rx_res pointer is not freed in the error
      path of mlx5e_init_rep_rx, which can lead to a memory leak. Fix by freeing
      the memory in the error path, thereby making the error path identical to
      mlx5e_cleanup_rep_rx().
      
      Fixes: af8bbf73
      
       ("net/mlx5e: Convert mlx5e_flow_steering member of mlx5e_priv to pointer")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c265d8c2
    • Zhengchao Shao's avatar
      net/mlx5: DR, fix memory leak in mlx5dr_cmd_create_reformat_ctx · 622d71d9
      Zhengchao Shao authored
      [ Upstream commit 5dd77585 ]
      
      when mlx5_cmd_exec failed in mlx5dr_cmd_create_reformat_ctx, the memory
      pointed by 'in' is not released, which will cause memory leak. Move memory
      release after mlx5_cmd_exec.
      
      Fixes: 1d918647
      
       ("net/mlx5: DR, Add direct rule command utilities")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      622d71d9
    • Zhengchao Shao's avatar
      net/mlx5e: fix double free in macsec_fs_tx_create_crypto_table_groups · 957702c3
      Zhengchao Shao authored
      [ Upstream commit aeb66017 ]
      
      In function macsec_fs_tx_create_crypto_table_groups(), when the ft->g
      memory is successfully allocated but the 'in' memory fails to be
      allocated, the memory pointed to by ft->g is released once. And in function
      macsec_fs_tx_create(), macsec_fs_tx_destroy() is called to release the
      memory pointed to by ft->g again. This will cause double free problem.
      
      Fixes: e467b283
      
       ("net/mlx5e: Add MACsec TX steering rules")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      957702c3
    • Ilan Peer's avatar
      wifi: cfg80211: Fix return value in scan logic · 1d23e51c
      Ilan Peer authored
      [ Upstream commit fd7f08d9 ]
      
      The reporter noticed a warning when running iwlwifi:
      
      WARNING: CPU: 8 PID: 659 at mm/page_alloc.c:4453 __alloc_pages+0x329/0x340
      
      As cfg80211_parse_colocated_ap() is not expected to return a negative
      value return 0 and not a negative value if cfg80211_calc_short_ssid()
      fails.
      
      Fixes: c8cb5b85 ("nl80211/cfg80211: support 6 GHz scanning")
      Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217675
      
      
      Signed-off-by: default avatarIlan Peer <ilan.peer@intel.com>
      Signed-off-by: default avatarKalle Valo <kvalo@kernel.org>
      Link: https://lore.kernel.org/r/20230723201043.3007430-1-ilan.peer@intel.com
      
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1d23e51c
    • Haixin Yu's avatar
      perf pmu arm64: Fix reading the PMU cpu slots in sysfs · 0ab6fac3
      Haixin Yu authored
      [ Upstream commit 9754353d ]
      
      Commit f8ad6018 ("perf pmu: Remove duplication around
      EVENT_SOURCE_DEVICE_PATH") uses sysfs__read_ull() to read a full sysfs
      path, which will never succeeds as it already comes with the sysfs mount
      point in it, which sysfs__read_ull() will add again.
      
      Fix it by reading the file using filename__read_ull(), that will not add
      the sysfs mount point.
      
      Fixes: f8ad6018
      
       ("perf pmu: Remove duplication around EVENT_SOURCE_DEVICE_PATH")
      Signed-off-by: default avatarHaixin Yu <yuhaixin.yhx@linux.alibaba.com>
      Tested-by: default avatarJing Zhang <renyu.zj@linux.alibaba.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/ZL4G7rWXkfv-Ectq@B-Q60VQ05P-2326.local
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0ab6fac3
    • Gao Xiang's avatar
      erofs: fix wrong primary bvec selection on deduplicated extents · b845249a
      Gao Xiang authored
      [ Upstream commit 94c43de7
      
       ]
      
      When handling deduplicated compressed data, there can be multiple
      decompressed extents pointing to the same compressed data in one shot.
      
      In such cases, the bvecs which belong to the longest extent will be
      selected as the primary bvecs for real decompressors to decode and the
      other duplicated bvecs will be directly copied from the primary bvecs.
      
      Previously, only relative offsets of the longest extent were checked to
      decompress the primary bvecs.  On rare occasions, it can be incorrect
      if there are several extents with the same start relative offset.
      As a result, some short bvecs could be selected for decompression and
      then cause data corruption.
      
      For example, as Shijie Sun reported off-list, considering the following
      extents of a file:
       117:   903345..  915250 |   11905 :     385024..    389120 |    4096
      ...
       119:   919729..  930323 |   10594 :     385024..    389120 |    4096
      ...
       124:   968881..  980786 |   11905 :     385024..    389120 |    4096
      
      The start relative offset is the same: 2225, but extent 119 (919729..
      930323) is shorter than the others.
      
      Let's restrict the bvec length in addition to the start offset if bvecs
      are not full.
      
      Reported-by: default avatarShijie Sun <sunshijie@xiaomi.com>
      Fixes: 5c2a6425
      
       ("erofs: introduce partial-referenced pclusters")
      Tested-by Shijie Sun <sunshijie@xiaomi.com>
      Reviewed-by: default avatarYue Hu <huyue2@coolpad.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarGao Xiang <hsiangkao@linux.alibaba.com>
      Link: https://lore.kernel.org/r/20230719065459.60083-1-hsiangkao@linux.alibaba.com
      
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b845249a
    • Heiko Carstens's avatar
      KVM: s390: fix sthyi error handling · 53980121
      Heiko Carstens authored
      [ Upstream commit 0c02cc57 ]
      
      Commit 9fb6c9b3 ("s390/sthyi: add cache to store hypervisor info")
      added cache handling for store hypervisor info. This also changed the
      possible return code for sthyi_fill().
      
      Instead of only returning a condition code like the sthyi instruction would
      do, it can now also return a negative error value (-ENOMEM). handle_styhi()
      was not changed accordingly. In case of an error, the negative error value
      would incorrectly injected into the guest PSW.
      
      Add proper error handling to prevent this, and update the comment which
      describes the possible return values of sthyi_fill().
      
      Fixes: 9fb6c9b3
      
       ("s390/sthyi: add cache to store hypervisor info")
      Reviewed-by: default avatarChristian Borntraeger <borntraeger@linux.ibm.com>
      Link: https://lore.kernel.org/r/20230727182939.2050744-1-hca@linux.ibm.com
      
      
      Signed-off-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      53980121
    • Sven Schnelle's avatar
      s390/vmem: split pages when debug pagealloc is enabled · 601e467e
      Sven Schnelle authored
      [ Upstream commit edc1e4b6 ]
      
      Since commit bb1520d5 ("s390/mm: start kernel with DAT enabled")
      the kernel crashes early during boot when debug pagealloc is enabled:
      
      mem auto-init: stack:off, heap alloc:off, heap free:off
      addressing exception: 0005 ilc:2 [#1] SMP DEBUG_PAGEALLOC
      Modules linked in:
      CPU: 0 PID: 0 Comm: swapper Not tainted 6.5.0-rc3-09759-gc5666c912155 #630
      [..]
      Krnl Code: 00000000001325f6: ec5600248064 cgrj %r5,%r6,8,000000000013263e
                 00000000001325fc: eb880002000c srlg %r8,%r8,2
                #0000000000132602: b2210051     ipte %r5,%r1,%r0,0
                >0000000000132606: b90400d1     lgr %r13,%r1
                 000000000013260a: 41605008     la %r6,8(%r5)
                 000000000013260e: a7db1000     aghi %r13,4096
                 0000000000132612: b221006d     ipte %r6,%r13,%r0,0
                 0000000000132616: e3d0d0000171 lay %r13,4096(%r13)
      
      Call Trace:
       __kernel_map_pages+0x14e/0x320
       __free_pages_ok+0x23a/0x5a8)
       free_low_memory_core_early+0x214/0x2c8
       memblock_free_all+0x28/0x58
       mem_init+0xb6/0x228
       mm_core_init+0xb6/0x3b0
       start_kernel+0x1d2/0x5a8
       startup_continue+0x36/0x40
      Kernel panic - not syncing: Fatal exception: panic_on_oops
      
      This is caused by using large mappings on machines with EDAT1/EDAT2. Add
      the code to split the mappings into 4k pages if debug pagealloc is enabled
      by CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT or the debug_pagealloc kernel
      command line option.
      
      Fixes: bb1520d5
      
       ("s390/mm: start kernel with DAT enabled")
      Signed-off-by: default avatarSven Schnelle <svens@linux.ibm.com>
      Reviewed-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      601e467e