- Aug 11, 2023
-
-
Dan Carpenter authored
[ Upstream commit ef45e840 ] Most kernel functions return negative error codes but some irq functions return zero on error. In this code irq_of_parse_and_map(), returns zero and platform_get_irq() returns negative error codes. We need to handle both cases appropriately. Fixes: 8425c41d ("net: ll_temac: Extend support to non-device-tree platforms") Signed-off-by:
Dan Carpenter <dan.carpenter@linaro.org> Acked-by:
Esben Haabendal <esben@geanix.com> Reviewed-by:
Yang Yingliang <yangyingliang@huawei.com> Reviewed-by:
Harini Katakam <harini.katakam@amd.com> Link: https://lore.kernel.org/r/3d0aef75-06e0-45a5-a2a6-2cc4738d4143@moroto.mountain Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Tomas Glozar authored
[ Upstream commit 13d2618b ] Disabling preemption in sock_map_sk_acquire conflicts with GFP_ATOMIC allocation later in sk_psock_init_link on PREEMPT_RT kernels, since GFP_ATOMIC might sleep on RT (see bpf: Make BPF and PREEMPT_RT co-exist patchset notes for details). This causes calling bpf_map_update_elem on BPF_MAP_TYPE_SOCKMAP maps to BUG (sleeping function called from invalid context) on RT kernels. preempt_disable was introduced together with lock_sk and rcu_read_lock in commit 99ba2b5a ("bpf: sockhash, disallow bpf_tcp_close and update in parallel"), probably to match disabled migration of BPF programs, and is no longer necessary. Remove preempt_disable to fix BUG in sock_map_update_common on RT. Signed-off-by:
Tomas Glozar <tglozar@redhat.com> Reviewed-by:
Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/all/20200224140131.461979697@linutronix.de/ Fixes: 99ba2b5a ("bpf: sockhash, disallow bpf_tcp_close and update in parallel") Reviewed-by:
John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/20230728064411.305576-1-tglozar@redhat.com Signed-off-by:
Paolo Abeni <pabeni@redhat.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
valis authored
[ Upstream commit b80b829e ] When route4_change() is called on an existing filter, the whole tcf_result struct is always copied into the new instance of the filter. This causes a problem when updating a filter bound to a class, as tcf_unbind_filter() is always called on the old instance in the success path, decreasing filter_cnt of the still referenced class and allowing it to be deleted, leading to a use-after-free. Fix this by no longer copying the tcf_result struct from the old filter. Fixes: 1109c005 ("net: sched: RCU cls_route") Reported-by:
valis <sec@valis.email> Reported-by:
Bing-Jhong Billy Jheng <billy@starlabs.sg> Signed-off-by:
valis <sec@valis.email> Signed-off-by:
Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by:
Victor Nogueira <victor@mojatatu.com> Reviewed-by:
Pedro Tammela <pctammela@mojatatu.com> Reviewed-by:
M A Ramdhan <ramdhan@starlabs.sg> Link: https://lore.kernel.org/r/20230729123202.72406-4-jhs@mojatatu.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
valis authored
[ Upstream commit 76e42ae8 ] When fw_change() is called on an existing filter, the whole tcf_result struct is always copied into the new instance of the filter. This causes a problem when updating a filter bound to a class, as tcf_unbind_filter() is always called on the old instance in the success path, decreasing filter_cnt of the still referenced class and allowing it to be deleted, leading to a use-after-free. Fix this by no longer copying the tcf_result struct from the old filter. Fixes: e35a8ee5 ("net: sched: fw use RCU") Reported-by:
valis <sec@valis.email> Reported-by:
Bing-Jhong Billy Jheng <billy@starlabs.sg> Signed-off-by:
valis <sec@valis.email> Signed-off-by:
Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by:
Victor Nogueira <victor@mojatatu.com> Reviewed-by:
Pedro Tammela <pctammela@mojatatu.com> Reviewed-by:
M A Ramdhan <ramdhan@starlabs.sg> Link: https://lore.kernel.org/r/20230729123202.72406-3-jhs@mojatatu.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
valis authored
[ Upstream commit 3044b16e ] When u32_change() is called on an existing filter, the whole tcf_result struct is always copied into the new instance of the filter. This causes a problem when updating a filter bound to a class, as tcf_unbind_filter() is always called on the old instance in the success path, decreasing filter_cnt of the still referenced class and allowing it to be deleted, leading to a use-after-free. Fix this by no longer copying the tcf_result struct from the old filter. Fixes: de5df632 ("net: sched: cls_u32 changes to knode must appear atomic to readers") Reported-by:
valis <sec@valis.email> Reported-by:
M A Ramdhan <ramdhan@starlabs.sg> Signed-off-by:
valis <sec@valis.email> Signed-off-by:
Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by:
Victor Nogueira <victor@mojatatu.com> Reviewed-by:
Pedro Tammela <pctammela@mojatatu.com> Reviewed-by:
M A Ramdhan <ramdhan@starlabs.sg> Link: https://lore.kernel.org/r/20230729123202.72406-2-jhs@mojatatu.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Hou Tao authored
[ Upstream commit 7c62b75c ] The following warning was reported when running xdp_redirect_cpu with both skb-mode and stress-mode enabled: ------------[ cut here ]------------ Incorrect XDP memory type (-2128176192) usage WARNING: CPU: 7 PID: 1442 at net/core/xdp.c:405 Modules linked in: CPU: 7 PID: 1442 Comm: kworker/7:0 Tainted: G 6.5.0-rc2+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) Workqueue: events __cpu_map_entry_free RIP: 0010:__xdp_return+0x1e4/0x4a0 ...... Call Trace: <TASK> ? show_regs+0x65/0x70 ? __warn+0xa5/0x240 ? __xdp_return+0x1e4/0x4a0 ...... xdp_return_frame+0x4d/0x150 __cpu_map_entry_free+0xf9/0x230 process_one_work+0x6b0/0xb80 worker_thread+0x96/0x720 kthread+0x1a5/0x1f0 ret_from_fork+0x3a/0x70 ret_from_fork_asm+0x1b/0x30 </TASK> The reason for the warning is twofold. One is due to the kthread cpu_map_kthread_run() is stopped prematurely. Another one is __cpu_map_ring_cleanup() doesn't handle skb mode and treats skbs in ptr_ring as XDP frames. Prematurely-stopped kthread will be fixed by the preceding patch and ptr_ring will be empty when __cpu_map_ring_cleanup() is called. But as the comments in __cpu_map_ring_cleanup() said, handling and freeing skbs in ptr_ring as well to "catch any broken behaviour gracefully". Fixes: 11941f8a ("bpf: cpumap: Implement generic cpumap") Signed-off-by:
Hou Tao <houtao1@huawei.com> Acked-by:
Jesper Dangaard Brouer <hawk@kernel.org> Link: https://lore.kernel.org/r/20230729095107.1722450-3-houtao@huaweicloud.com Signed-off-by:
Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Hou Tao authored
[ Upstream commit 640a6045 ] The following warning was reported when running stress-mode enabled xdp_redirect_cpu with some RT threads: ------------[ cut here ]------------ WARNING: CPU: 4 PID: 65 at kernel/bpf/cpumap.c:135 CPU: 4 PID: 65 Comm: kworker/4:1 Not tainted 6.5.0-rc2+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) Workqueue: events cpu_map_kthread_stop RIP: 0010:put_cpu_map_entry+0xda/0x220 ...... Call Trace: <TASK> ? show_regs+0x65/0x70 ? __warn+0xa5/0x240 ...... ? put_cpu_map_entry+0xda/0x220 cpu_map_kthread_stop+0x41/0x60 process_one_work+0x6b0/0xb80 worker_thread+0x96/0x720 kthread+0x1a5/0x1f0 ret_from_fork+0x3a/0x70 ret_from_fork_asm+0x1b/0x30 </TASK> The root cause is the same as commit 43690164 ("bpf: cpumap: Fix memory leak in cpu_map_update_elem"). The kthread is stopped prematurely by kthread_stop() in cpu_map_kthread_stop(), and kthread() doesn't call cpu_map_kthread_run() at all but XDP program has already queued some frames or skbs into ptr_ring. So when __cpu_map_ring_cleanup() checks the ptr_ring, it will find it was not emptied and report a warning. An alternative fix is to use __cpu_map_ring_cleanup() to drop these pending frames or skbs when kthread_stop() returns -EINTR, but it may confuse the user, because these frames or skbs have been handled correctly by XDP program. So instead of dropping these frames or skbs, just make sure the per-cpu kthread is running before __cpu_map_entry_alloc() returns. After apply the fix, the error handle for kthread_stop() will be unnecessary because it will always return 0, so just remove it. Fixes: 6710e112 ("bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP") Signed-off-by:
Hou Tao <houtao1@huawei.com> Reviewed-by:
Pu Lehui <pulehui@huawei.com> Acked-by:
Jesper Dangaard Brouer <hawk@kernel.org> Link: https://lore.kernel.org/r/20230729095107.1722450-2-houtao@huaweicloud.com Signed-off-by:
Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Andrii Nakryiko authored
[ Upstream commit 6c3eba1c ] This allows to do more centralized decisions later on, and generally makes it very explicit which maps are privileged and which are not (e.g., LRU_HASH and LRU_PERCPU_HASH, which are privileged HASH variants, as opposed to unprivileged HASH and HASH_PERCPU; now this is explicit and easy to verify). Signed-off-by:
Andrii Nakryiko <andrii@kernel.org> Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Acked-by:
Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20230613223533.3689589-4-andrii@kernel.org Stable-dep-of: 640a6045 ("bpf, cpumap: Make sure kthread is running before map update returns") Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Andrii Nakryiko authored
[ Upstream commit 22db4122 ] Currently find_and_alloc_map() performs two separate functions: some argument sanity checking and partial map creation workflow hanling. Neither of those functions are self-sufficient and are augmented by further checks and initialization logic in the caller (map_create() function). So unify all the sanity checks, permission checks, and creation and initialization logic in one linear piece of code in map_create() instead. This also make it easier to further enhance permission checks and keep them located in one place. Signed-off-by:
Andrii Nakryiko <andrii@kernel.org> Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Acked-by:
Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20230613223533.3689589-3-andrii@kernel.org Stable-dep-of: 640a6045 ("bpf, cpumap: Make sure kthread is running before map update returns") Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Andrii Nakryiko authored
[ Upstream commit 1d28635a ] Make each bpf() syscall command a bit more self-contained, making it easier to further enhance it. We move sysctl_unprivileged_bpf_disabled handling down to map_create() and bpf_prog_load(), two special commands in this regard. Also swap the order of checks, calling bpf_capable() only if sysctl_unprivileged_bpf_disabled is true, avoiding unnecessary audit messages. Signed-off-by:
Andrii Nakryiko <andrii@kernel.org> Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Acked-by:
Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20230613223533.3689589-2-andrii@kernel.org Stable-dep-of: 640a6045 ("bpf, cpumap: Make sure kthread is running before map update returns") Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Michal Schmidt authored
[ Upstream commit 611e1b01 ] The two mbox-related mutexes are destroyed in octep_ctrl_mbox_uninit(), but the corresponding mutex_init calls were missing. A "DEBUG_LOCKS_WARN_ON(lock->magic != lock)" warning was emitted with CONFIG_DEBUG_MUTEXES on. Initialize the two mutexes in octep_ctrl_mbox_init(). Fixes: 577f0d1b ("octeon_ep: add separate mailbox command and response queues") Signed-off-by:
Michal Schmidt <mschmidt@redhat.com> Reviewed-by:
Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230729151516.24153-1-mschmidt@redhat.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Jakub Kicinski authored
[ Upstream commit 37b61cda ] Similarly to other recently fixed drivers make sure we don't try to access XDP or page pool APIs when NAPI budget is 0. NAPI budget of 0 may mean that we are in netpoll. This may result in running software IRQs in hard IRQ context, leading to deadlocks or crashes. To make sure bnapi->tx_pkts don't get wiped without handling the events, move clearing the field into the handler itself. Remember to clear tx_pkts after reset (bnxt_enable_napi()) as it's technically possible that netpoll will accumulate some tx_pkts and then a reset will happen, leaving tx_pkts out of sync with reality. Fixes: 322b87ca ("bnxt_en: add page_pool support") Reviewed-by:
Andy Gospodarek <gospo@broadcom.com> Reviewed-by:
Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20230728205020.2784844-1-kuba@kernel.org Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Rafal Rogalski authored
[ Upstream commit 4b31fd4d ] During qdisc create/delete, it is necessary to rebuild the queue of VSIs. An error occurred because the VSIs created by RDMA were still active. Added check if RDMA is active. If yes, it disallows qdisc changes and writes a message in the system logs. Fixes: 348048e7 ("ice: Implement iidc operations") Signed-off-by:
Rafal Rogalski <rafalx.rogalski@intel.com> Signed-off-by:
Mateusz Palczewski <mateusz.palczewski@intel.com> Signed-off-by:
Kamil Maziarz <kamil.maziarz@intel.com> Tested-by:
Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by:
Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by:
Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230728171243.2446101-1-anthony.l.nguyen@intel.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Duoming Zhou authored
[ Upstream commit 1e7417c1 ] The timer dev->stat_monitor can schedule the delayed work dev->wq and the delayed work dev->wq can also arm the dev->stat_monitor timer. When the device is detaching, the net_device will be deallocated. but the net_device private data could still be dereferenced in delayed work or timer handler. As a result, the UAF bugs will happen. One racy situation is shown below: (Thread 1) | (Thread 2) lan78xx_stat_monitor() | ... | lan78xx_disconnect() lan78xx_defer_kevent() | ... ... | cancel_delayed_work_sync(&dev->wq); schedule_delayed_work() | ... (wait some time) | free_netdev(net); //free net_device lan78xx_delayedwork() | //use net_device private data | dev-> //use | Although we use cancel_delayed_work_sync() to cancel the delayed work in lan78xx_disconnect(), it could still be scheduled in timer handler lan78xx_stat_monitor(). Another racy situation is shown below: (Thread 1) | (Thread 2) lan78xx_delayedwork | mod_timer() | lan78xx_disconnect() | cancel_delayed_work_sync() (wait some time) | if (timer_pending(&dev->stat_monitor)) | del_timer_sync(&dev->stat_monitor); lan78xx_stat_monitor() | ... lan78xx_defer_kevent() | free_netdev(net); //free //use net_device private data| dev-> //use | Although we use del_timer_sync() to delete the timer, the function timer_pending() returns 0 when the timer is activated. As a result, the del_timer_sync() will not be executed and the timer could be re-armed. In order to mitigate this bug, We use timer_shutdown_sync() to shutdown the timer and then use cancel_delayed_work_sync() to cancel the delayed work. As a result, the net_device could be deallocated safely. What's more, the dev->flags is set to EVENT_DEV_DISCONNECT in lan78xx_disconnect(). But it could still be set to EVENT_STAT_UPDATE in lan78xx_stat_monitor(). So this patch put the set_bit() behind timer_shutdown_sync(). Fixes: 77dfff5b ("lan78xx: Fix race condition in disconnect handling") Signed-off-by:
Duoming Zhou <duoming@zju.edu.cn> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Kuniyuki Iwashima authored
[ Upstream commit e7397184 ] syzkaller found zero division error [0] in div_s64_rem() called from get_cycle_time_elapsed(), where sched->cycle_time is the divisor. We have tests in parse_taprio_schedule() so that cycle_time will never be 0, and actually cycle_time is not 0 in get_cycle_time_elapsed(). The problem is that the types of divisor are different; cycle_time is s64, but the argument of div_s64_rem() is s32. syzkaller fed this input and 0x100000000 is cast to s32 to be 0. @TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME={0xc, 0x8, 0x100000000} We use s64 for cycle_time to cast it to ktime_t, so let's keep it and set max for cycle_time. While at it, we prevent overflow in setup_txtime() and add another test in parse_taprio_schedule() to check if cycle_time overflows. Also, we add a new tdc test case for this issue. [0]: divide error: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 1 PID: 103 Comm: kworker/1:3 Not tainted 6.5.0-rc1-00330-g60cc1f7d0605 #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: ipv6_addrconf addrconf_dad_work RIP: 0010:div_s64_rem include/linux/math64.h:42 [inline] RIP: 0010:get_cycle_time_elapsed net/sched/sch_taprio.c:223 [inline] RIP: 0010:find_entry_to_transmit+0x252/0x7e0 net/sched/sch_taprio.c:344 Code: 3c 02 00 0f 85 5e 05 00 00 48 8b 4c 24 08 4d 8b bd 40 01 00 00 48 8b 7c 24 48 48 89 c8 4c 29 f8 48 63 f7 48 99 48 89 74 24 70 <48> f7 fe 48 29 d1 48 8d 04 0f 49 89 cc 48 89 44 24 20 49 8d 85 10 RSP: 0018:ffffc90000acf260 EFLAGS: 00010206 RAX: 177450e0347560cf RBX: 0000000000000000 RCX: 177450e0347560cf RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000100000000 RBP: 0000000000000056 R08: 0000000000000000 R09: ffffed10020a0934 R10: ffff8880105049a7 R11: ffff88806cf3a520 R12: ffff888010504800 R13: ffff88800c00d800 R14: ffff8880105049a0 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88806cf00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f0edf84f0e8 CR3: 000000000d73c002 CR4: 0000000000770ee0 PKRU: 55555554 Call Trace: <TASK> get_packet_txtime net/sched/sch_taprio.c:508 [inline] taprio_enqueue_one+0x900/0xff0 net/sched/sch_taprio.c:577 taprio_enqueue+0x378/0xae0 net/sched/sch_taprio.c:658 dev_qdisc_enqueue+0x46/0x170 net/core/dev.c:3732 __dev_xmit_skb net/core/dev.c:3821 [inline] __dev_queue_xmit+0x1b2f/0x3000 net/core/dev.c:4169 dev_queue_xmit include/linux/netdevice.h:3088 [inline] neigh_resolve_output net/core/neighbour.c:1552 [inline] neigh_resolve_output+0x4a7/0x780 net/core/neighbour.c:1532 neigh_output include/net/neighbour.h:544 [inline] ip6_finish_output2+0x924/0x17d0 net/ipv6/ip6_output.c:135 __ip6_finish_output+0x620/0xaa0 net/ipv6/ip6_output.c:196 ip6_finish_output net/ipv6/ip6_output.c:207 [inline] NF_HOOK_COND include/linux/netfilter.h:292 [inline] ip6_output+0x206/0x410 net/ipv6/ip6_output.c:228 dst_output include/net/dst.h:458 [inline] NF_HOOK.constprop.0+0xea/0x260 include/linux/netfilter.h:303 ndisc_send_skb+0x872/0xe80 net/ipv6/ndisc.c:508 ndisc_send_ns+0xb5/0x130 net/ipv6/ndisc.c:666 addrconf_dad_work+0xc14/0x13f0 net/ipv6/addrconf.c:4175 process_one_work+0x92c/0x13a0 kernel/workqueue.c:2597 worker_thread+0x60f/0x1240 kernel/workqueue.c:2748 kthread+0x2fe/0x3f0 kernel/kthread.c:389 ret_from_fork+0x2c/0x50 arch/x86/entry/entry_64.S:308 </TASK> Modules linked in: Fixes: 4cfd5779 ("taprio: Add support for txtime-assist mode") Reported-by:
syzkaller <syzkaller@googlegroups.com> Signed-off-by:
Kuniyuki Iwashima <kuniyu@amazon.com> Co-developed-by:
Eric Dumazet <edumazet@google.com> Co-developed-by:
Pedro Tammela <pctammela@mojatatu.com> Acked-by:
Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Eric Dumazet authored
[ Upstream commit 8bf43be7 ] sk_getsockopt() runs locklessly. This means sk->sk_priority can be read while other threads are changing its value. Other reads also happen without socket lock being held. Add missing annotations where needed. Fixes: 1da177e4 ("Linux-2.6.12-rc2") Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Eric Dumazet authored
[ Upstream commit e5f0d2dd ] In a prior commit I forgot that sk_getsockopt() reads sk->sk_ll_usec without holding a lock. Fixes: 0dbffbb5 ("net: annotate data race around sk_ll_usec") Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Eric Dumazet authored
[ Upstream commit 11695c6e ] sk_getsockopt() runs locklessly, thus we need to annotate the read of sk->sk_peek_off. While we are at it, add corresponding annotations to sk_set_peek_off() and unix_set_peek_off(). Fixes: b9bb53f3 ("sock: convert sk_peek_offset functions to WRITE_ONCE") Signed-off-by:
Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Eric Dumazet authored
[ Upstream commit 3c5b4d69 ] sk->sk_mark is often read while another thread could change the value. Fixes: 4a19ec58 ("[NET]: Introducing socket mark socket option.") Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Eric Dumazet authored
[ Upstream commit b4b55325 ] In a prior commit, I forgot to change sk_getsockopt() when reading sk->sk_rcvbuf locklessly. Fixes: ebb3b78d ("tcp: annotate sk->sk_rcvbuf lockless reads") Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Eric Dumazet authored
[ Upstream commit 74bc0843 ] In a prior commit, I forgot to change sk_getsockopt() when reading sk->sk_sndbuf locklessly. Fixes: e292f05e ("tcp: annotate sk->sk_sndbuf lockless reads") Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Eric Dumazet authored
[ Upstream commit e6d12bdb ] In a prior commit, I forgot to change sk_getsockopt() when reading sk->sk_rcvlowat locklessly. Fixes: eac66402 ("net: annotate sk->sk_rcvlowat lockless reads") Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Eric Dumazet authored
[ Upstream commit ea7f45ef ] sk_getsockopt() runs locklessly. This means sk->sk_max_pacing_rate can be read while other threads are changing its value. Fixes: 62748f32 ("net: introduce SO_MAX_PACING_RATE") Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Eric Dumazet authored
[ Upstream commit c76a0328 ] sk_getsockopt() runs locklessly. This means sk->sk_txrehash can be read while other threads are changing its value. Other locations were handled in commit cb6cd2ce ("tcp: Change SYN ACK retransmit behaviour to account for rehash") Fixes: 26859240 ("txhash: Add socket option to control TX hash rethink behavior") Signed-off-by:
Eric Dumazet <edumazet@google.com> Cc: Akhmat Karakotov <hmukos@yandex-team.ru> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Eric Dumazet authored
[ Upstream commit fe11fdcb ] sk_getsockopt() runs locklessly. This means sk->sk_reserved_mem can be read while other threads are changing its value. Add missing annotations where they are needed. Fixes: 2bb2f5fb ("net: add new socket option SO_RESERVE_MEM") Signed-off-by:
Eric Dumazet <edumazet@google.com> Cc: Wei Wang <weiwan@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Richard Gobert authored
[ Upstream commit 7938cd15 ] This patch fixes a misuse of IP{6}CB(skb) in GRO, while calling to `udp6_lib_lookup2` when handling udp tunnels. `udp6_lib_lookup2` fetch the device from CB. The fix changes it to fetch the device from `skb->dev`. l3mdev case requires special attention since it has a master and a slave device. Fixes: a6024562 ("udp: Add GRO functions to UDP socket") Reported-by:
Gal Pressman <gal@nvidia.com> Signed-off-by:
Richard Gobert <richardbgobert@gmail.com> Reviewed-by:
David Ahern <dsahern@kernel.org> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Eric Dumazet authored
[ Upstream commit d457a0e3 ] Move declarations into include/net/gso.h and code into net/core/gso.c Signed-off-by:
Eric Dumazet <edumazet@google.com> Cc: Stanislav Fomichev <sdf@google.com> Reviewed-by:
Simon Horman <simon.horman@corigine.com> Reviewed-by:
David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230608191738.3947077-1-edumazet@google.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Stable-dep-of: 7938cd15 ("net: gro: fix misuse of CB in udp socket lookup") Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Konstantin Khorenko authored
[ Upstream commit e346e231 ] Here we've got to a situation when tasklet called usleep_range() in PTT acquire logic, thus welcome to the "scheduling while atomic" BUG(). BUG: scheduling while atomic: swapper/24/0/0x00000100 [<ffffffffb41c6199>] schedule+0x29/0x70 [<ffffffffb41c5512>] schedule_hrtimeout_range_clock+0xb2/0x150 [<ffffffffb41c55c3>] schedule_hrtimeout_range+0x13/0x20 [<ffffffffb41c3bcf>] usleep_range+0x4f/0x70 [<ffffffffc08d3e58>] qed_ptt_acquire+0x38/0x100 [qed] [<ffffffffc08eac48>] _qed_get_vport_stats+0x458/0x580 [qed] [<ffffffffc08ead8c>] qed_get_vport_stats+0x1c/0xd0 [qed] [<ffffffffc08dffd3>] qed_get_protocol_stats+0x93/0x100 [qed] qed_mcp_send_protocol_stats case MFW_DRV_MSG_GET_LAN_STATS: case MFW_DRV_MSG_GET_FCOE_STATS: case MFW_DRV_MSG_GET_ISCSI_STATS: case MFW_DRV_MSG_GET_RDMA_STATS: [<ffffffffc08e36d8>] qed_mcp_handle_events+0x2d8/0x890 [qed] qed_int_assertion qed_int_attentions [<ffffffffc08d9490>] qed_int_sp_dpc+0xa50/0xdc0 [qed] [<ffffffffb3aa7623>] tasklet_action+0x83/0x140 [<ffffffffb41d9125>] __do_softirq+0x125/0x2bb [<ffffffffb41d560c>] call_softirq+0x1c/0x30 [<ffffffffb3a30645>] do_softirq+0x65/0xa0 [<ffffffffb3aa78d5>] irq_exit+0x105/0x110 [<ffffffffb41d8996>] do_IRQ+0x56/0xf0 Fix this by making caller to provide the context whether it could be in atomic context flow or not when getting stats from QED driver. QED driver based on the context provided decide to schedule out or not when acquiring the PTT BAR window. We faced the BUG_ON() while getting vport stats, but according to the code same issue could happen for fcoe and iscsi statistics as well, so fixing them too. Fixes: 6c754246 ("qed: Add support for NCSI statistics.") Fixes: 1e128c81 ("qed: Add support for hardware offloaded FCoE.") Fixes: 2f2b2614 ("qed: Provide iSCSI statistics to management") Cc: Sudarsana Kalluru <skalluru@marvell.com> Cc: David Miller <davem@davemloft.net> Cc: Manish Chopra <manishc@marvell.com> Signed-off-by:
Konstantin Khorenko <khorenko@virtuozzo.com> Reviewed-by:
Simon Horman <horms@kernel.org> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Thierry Reding authored
[ Upstream commit a0b1b205 ] The clock data is an array of struct clk_bulk_data, so make sure to allocate enough memory. Fixes: d8ca1137 ("net: stmmac: tegra: Add MGBE support") Signed-off-by:
Thierry Reding <treding@nvidia.com> Reviewed-by:
Simon Horman <simon.horman@corigine.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Chengfeng Ye authored
[ Upstream commit 56c6be35 ] As &hc->lock is acquired by both timer _hfcpci_softirq() and hardirq hfcpci_int(), the timer should disable irq before lock acquisition otherwise deadlock could happen if the timmer is preemtped by the hadr irq. Possible deadlock scenario: hfcpci_softirq() (timer) -> _hfcpci_softirq() -> spin_lock(&hc->lock); <irq interruption> -> hfcpci_int() -> spin_lock(&hc->lock); (deadlock here) This flaw was found by an experimental static analysis tool I am developing for irq-related deadlock. The tentative patch fixes the potential deadlock by spin_lock_irq() in timer. Fixes: b36b654a ("mISDN: Create /sys/class/mISDN") Signed-off-by:
Chengfeng Ye <dg573847474@gmail.com> Link: https://lore.kernel.org/r/20230727085619.7419-1-dg573847474@gmail.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Jamal Hadi Salim authored
[ Upstream commit e68409db ] A match entry is uniquely identified with an "address" or "path" in the form of: hashtable ID(12b):bucketid(8b):nodeid(12b). When creating table match entries all of hash table id, bucket id and node (match entry id) are needed to be either specified by the user or reasonable in-kernel defaults are used. The in-kernel default for a table id is 0x800(omnipresent root table); for bucketid it is 0x0. Prior to this fix there was none for a nodeid i.e. the code assumed that the user passed the correct nodeid and if the user passes a nodeid of 0 (as Mingi Cho did) then that is what was used. But nodeid of 0 is reserved for identifying the table. This is not a problem until we dump. The dump code notices that the nodeid is zero and assumes it is referencing a table and therefore references table struct tc_u_hnode instead of what was created i.e match entry struct tc_u_knode. Ming does an equivalent of: tc filter add dev dummy0 parent 10: prio 1 handle 0x1000 \ protocol ip u32 match ip src 10.0.0.1/32 classid 10:1 action ok Essentially specifying a table id 0, bucketid 1 and nodeid of zero Tableid 0 is remapped to the default of 0x800. Bucketid 1 is ignored and defaults to 0x00. Nodeid was assumed to be what Ming passed - 0x000 dumping before fix shows: ~$ tc filter ls dev dummy0 parent 10: filter protocol ip pref 1 u32 chain 0 filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor 1 filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor -30591 Note that the last line reports a table instead of a match entry (you can tell this because it says "ht divisor..."). As a result of reporting the wrong data type (misinterpretting of struct tc_u_knode as being struct tc_u_hnode) the divisor is reported with value of -30591. Ming identified this as part of the heap address (physmap_base is 0xffff8880 (-30591 - 1)). The fix is to ensure that when table entry matches are added and no nodeid is specified (i.e nodeid == 0) then we get the next available nodeid from the table's pool. After the fix, this is what the dump shows: $ tc filter ls dev dummy0 parent 10: filter protocol ip pref 1 u32 chain 0 filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor 1 filter protocol ip pref 1 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 flowid 10:1 not_in_hw match 0a000001/ffffffff at 12 action order 1: gact action pass random type none pass val 0 index 1 ref 1 bind 1 Reported-by:
Mingi Cho <mgcho.minic@gmail.com> Fixes: 1da177e4 ("Linux-2.6.12-rc2") Signed-off-by:
Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20230726135151.416917-1-jhs@mojatatu.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Georg Müller authored
[ Upstream commit 98ce8e4a ] Without gcc, the test will fail. On cleanup, ignore probe removal errors. Otherwise, in case of an error adding the probe, the temporary directory is not removed. Fixes: 56cbeacf ("perf probe: Add test for regression introduced by switch to die_get_decl_file()") Signed-off-by:
Georg Müller <georgmueller@gmx.net> Acked-by:
Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Georg Müller <georgmueller@gmx.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230728151812.454806-2-georgmueller@gmx.net Link: https://lore.kernel.org/r/CAP-5=fUP6UuLgRty3t2=fQsQi3k4hDMz415vWdp1x88QMvZ8ug@mail.gmail.com/ Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Yuanjun Gong authored
[ Upstream commit dadc5b86 ] in bcm_sf2_sw_probe(), check the return value of clk_prepare_enable() and return the error code if clk_prepare_enable() returns an unexpected value. Fixes: e9ec5c3b ("net: dsa: bcm_sf2: request and handle clocks") Signed-off-by:
Yuanjun Gong <ruc_gongyuanjun@163.com> Reviewed-by:
Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20230726170506.16547-1-ruc_gongyuanjun@163.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Lin Ma authored
[ Upstream commit d73ef2d6 ] There are totally 9 ndo_bridge_setlink handlers in the current kernel, which are 1) bnxt_bridge_setlink, 2) be_ndo_bridge_setlink 3) i40e_ndo_bridge_setlink 4) ice_bridge_setlink 5) ixgbe_ndo_bridge_setlink 6) mlx5e_bridge_setlink 7) nfp_net_bridge_setlink 8) qeth_l2_bridge_setlink 9) br_setlink. By investigating the code, we find that 1-7 parse and use nlattr IFLA_BRIDGE_MODE but 3 and 4 forget to do the nla_len check. This can lead to an out-of-attribute read and allow a malformed nlattr (e.g., length 0) to be viewed as a 2 byte integer. To avoid such issues, also for other ndo_bridge_setlink handlers in the future. This patch adds the nla_len check in rtnl_bridge_setlink and does an early error return if length mismatches. To make it works, the break is removed from the parsing for IFLA_BRIDGE_FLAGS to make sure this nla_for_each_nested iterates every attribute. Fixes: b1edc14a ("ice: Implement ice_bridge_getlink and ice_bridge_setlink") Fixes: 51616018 ("i40e: Add support for getlink, setlink ndo ops") Suggested-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Lin Ma <linma@zju.edu.cn> Acked-by:
Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by:
Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20230726075314.1059224-1-linma@zju.edu.cn Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Lin Ma authored
[ Upstream commit bcc29b7f ] The nla_for_each_nested parsing in function bpf_sk_storage_diag_alloc does not check the length of the nested attribute. This can lead to an out-of-attribute read and allow a malformed nlattr (e.g., length 0) to be viewed as a 4 byte integer. This patch adds an additional check when the nlattr is getting counted. This makes sure the latter nla_get_u32 can access the attributes with the correct length. Fixes: 1ed4d924 ("bpf: INET_DIAG support in bpf_sk_storage") Suggested-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Lin Ma <linma@zju.edu.cn> Reviewed-by:
Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20230725023330.422856-1-linma@zju.edu.cn Signed-off-by:
Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Shay Drory authored
[ Upstream commit 53d737df ] Currently, in case an interface is down, mlx5 driver doesn't unregister its devlink params, which leads to this WARN[1]. Fix it by unregistering devlink params in that case as well. [1] [ 295.244769 ] WARNING: CPU: 15 PID: 1 at net/core/devlink.c:9042 devlink_free+0x174/0x1fc [ 295.488379 ] CPU: 15 PID: 1 Comm: shutdown Tainted: G S OE 5.15.0-1017.19.3.g0677e61-bluefield #g0677e61 [ 295.509330 ] Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS 4.2.0.12761 Jun 6 2023 [ 295.543096 ] pc : devlink_free+0x174/0x1fc [ 295.551104 ] lr : mlx5_devlink_free+0x18/0x2c [mlx5_core] [ 295.561816 ] sp : ffff80000809b850 [ 295.711155 ] Call trace: [ 295.716030 ] devlink_free+0x174/0x1fc [ 295.723346 ] mlx5_devlink_free+0x18/0x2c [mlx5_core] [ 295.733351 ] mlx5_sf_dev_remove+0x98/0xb0 [mlx5_core] [ 295.743534 ] auxiliary_bus_remove+0x2c/0x50 [ 295.751893 ] __device_release_driver+0x19c/0x280 [ 295.761120 ] device_release_driver+0x34/0x50 [ 295.769649 ] bus_remove_device+0xdc/0x170 [ 295.777656 ] device_del+0x17c/0x3a4 [ 295.784620 ] mlx5_sf_dev_remove+0x28/0xf0 [mlx5_core] [ 295.794800 ] mlx5_sf_dev_table_destroy+0x98/0x110 [mlx5_core] [ 295.806375 ] mlx5_unload+0x34/0xd0 [mlx5_core] [ 295.815339 ] mlx5_unload_one+0x70/0xe4 [mlx5_core] [ 295.824998 ] shutdown+0xb0/0xd8 [mlx5_core] [ 295.833439 ] pci_device_shutdown+0x3c/0xa0 [ 295.841651 ] device_shutdown+0x170/0x340 [ 295.849486 ] __do_sys_reboot+0x1f4/0x2a0 [ 295.857322 ] __arm64_sys_reboot+0x2c/0x40 [ 295.865329 ] invoke_syscall+0x78/0x100 [ 295.872817 ] el0_svc_common.constprop.0+0x54/0x184 [ 295.882392 ] do_el0_svc+0x30/0xac [ 295.889008 ] el0_svc+0x48/0x160 [ 295.895278 ] el0t_64_sync_handler+0xa4/0x130 [ 295.903807 ] el0t_64_sync+0x1a4/0x1a8 [ 295.911120 ] ---[ end trace 4f1d2381d00d9dce ]--- Fixes: fe578cbb ("net/mlx5: Move devlink registration before mlx5_load") Signed-off-by:
Shay Drory <shayd@nvidia.com> Reviewed-by:
Maher Sanalla <msanalla@nvidia.com> Signed-off-by:
Saeed Mahameed <saeedm@nvidia.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Chris Mi authored
[ Upstream commit 61eab651 ] The cited commit sets ft prio to fs_base_prio. But if ignore_flow_level it not supported, ft prio must be set based on tc filter prio. Otherwise, all the ft prio are the same on the same chain. It is invalid if ignore_flow_level is not supported. Fix it by setting ft prio based on tc filter prio and setting fs_base_prio to 0 for fdb. Fixes: 8e80e564 ("net/mlx5: fs_chains: Refactor to detach chains from tc usage") Signed-off-by:
Chris Mi <cmi@nvidia.com> Reviewed-by:
Paul Blakey <paulb@nvidia.com> Reviewed-by:
Roi Dayan <roid@nvidia.com> Signed-off-by:
Saeed Mahameed <saeedm@nvidia.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Jianbo Liu authored
[ Upstream commit 3e4cf1dd ] There are DEK objects cached in DEK pool after kTLS is used, and they are freed only in mlx5e_ktls_cleanup(). mlx5e_destroy_mdev_resources() is called in mlx5e_suspend() to free mdev resources, including protection domain (PD). However, PD is still referenced by the cached DEK objects in this case, because profile->cleanup() (and therefore mlx5e_ktls_cleanup()) is called after mlx5e_suspend() during devlink reload. So the following FW syndrome is generated: mlx5_cmd_out_err:803:(pid 12948): DEALLOC_PD(0x801) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xef0c8a), err(-22) To avoid this syndrome, move DEK pool destruction to mlx5e_ktls_cleanup_tx(), which is called by profile->cleanup_tx(). And move pool creation to mlx5e_ktls_init_tx() for symmetry. Fixes: f741db1a ("net/mlx5e: kTLS, Improve connection rate by using fast update encryption key") Signed-off-by:
Jianbo Liu <jianbol@nvidia.com> Reviewed-by:
Tariq Toukan <tariqt@nvidia.com> Signed-off-by:
Saeed Mahameed <saeedm@nvidia.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Dragos Tatulea authored
[ Upstream commit 39646d9b ] When the regular rq is reactivated after the XSK socket is closed it could be reading stale cqes which eventually corrupts the rq. This leads to no more traffic being received on the regular rq and a crash on the next close or deactivation of the rq. Kal Cuttler Conely reported this issue as a crash on the release path when the xdpsock sample program is stopped (killed) and restarted in sequence while traffic is running. This patch flushes all cqes when during the rq flush. The cqe flushing is done in the reset state of the rq. mlx5e_rq_to_ready code is moved into the flush function to allow for this. Fixes: 082a9edf ("net/mlx5e: xsk: Flush RQ on XSK activation to save memory") Reported-by:
Kal Cutter Conley <kal.conley@dectris.com> Closes: https://lore.kernel.org/xdp-newbies/CAHApi-nUAs4TeFWUDV915CZJo07XVg2Vp63-no7UDfj6wur9nQ@mail.gmail.com Signed-off-by:
Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by:
Tariq Toukan <tariqt@nvidia.com> Signed-off-by:
Saeed Mahameed <saeedm@nvidia.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Dragos Tatulea authored
[ Upstream commit e0f52298 ] The below crash can be encountered when using xdpsock in rx mode for legacy rq: the buffer gets released in the XDP_REDIRECT path, and then once again in the driver. This fix sets the flag to avoid releasing on the driver side. XSK handling of buffers for legacy rq was relying on the caller to set the skip release flag. But the referenced fix started using fragment counts for pages instead of the skip flag. Crash log: general protection fault, probably for non-canonical address 0xffff8881217e3a: 0000 [#1] SMP CPU: 0 PID: 14 Comm: ksoftirqd/0 Not tainted 6.5.0-rc1+ #31 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:bpf_prog_03b13f331978c78c+0xf/0x28 Code: ... RSP: 0018:ffff88810082fc98 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff888138404901 RCX: c0ffffc900027cbc RDX: ffffffffa000b514 RSI: 00ffff8881217e32 RDI: ffff888138404901 RBP: ffff88810082fc98 R08: 0000000000091100 R09: 0000000000000006 R10: 0000000000000800 R11: 0000000000000800 R12: ffffc9000027a000 R13: ffff8881217e2dc0 R14: ffff8881217e2910 R15: ffff8881217e2f00 FS: 0000000000000000(0000) GS:ffff88852c800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000564cb2e2cde0 CR3: 000000010e603004 CR4: 0000000000370eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? die_addr+0x32/0x80 ? exc_general_protection+0x192/0x390 ? asm_exc_general_protection+0x22/0x30 ? 0xffffffffa000b514 ? bpf_prog_03b13f331978c78c+0xf/0x28 mlx5e_xdp_handle+0x48/0x670 [mlx5_core] ? dev_gro_receive+0x3b5/0x6e0 mlx5e_xsk_skb_from_cqe_linear+0x6e/0x90 [mlx5_core] mlx5e_handle_rx_cqe+0x55/0x100 [mlx5_core] mlx5e_poll_rx_cq+0x87/0x6e0 [mlx5_core] mlx5e_napi_poll+0x45e/0x6b0 [mlx5_core] __napi_poll+0x25/0x1a0 net_rx_action+0x28a/0x300 __do_softirq+0xcd/0x279 ? sort_range+0x20/0x20 run_ksoftirqd+0x1a/0x20 smpboot_thread_fn+0xa2/0x130 kthread+0xc9/0xf0 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Modules linked in: mlx5_ib mlx5_core rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter overlay zram zsmalloc fuse [last unloaded: mlx5_core] ---[ end trace 0000000000000000 ]--- Fixes: 7abd955a ("net/mlx5e: RX, Fix page_pool page fragment tracking for XDP") Signed-off-by:
Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by:
Tariq Toukan <tariqt@nvidia.com> Signed-off-by:
Saeed Mahameed <saeedm@nvidia.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-