- Mar 15, 2024
-
-
Florian Kauer authored
[ Upstream commit ef27f655 ] When a frame can not be transmitted in XDP_REDIRECT (e.g. due to a full queue), it is necessary to free it by calling xdp_return_frame_rx_napi. However, this is the responsibility of the caller of the ndo_xdp_xmit (see for example bq_xmit_all in kernel/bpf/devmap.c) and thus calling it inside igc_xdp_xmit (which is the ndo_xdp_xmit of the igc driver) as well will lead to memory corruption. In fact, bq_xmit_all expects that it can return all frames after the last successfully transmitted one. Therefore, break for the first not transmitted frame, but do not call xdp_return_frame_rx_napi in igc_xdp_xmit. This is equally implemented in other Intel drivers such as the igb. There are two alternatives to this that were rejected: 1. Return num_frames as all the frames would have been transmitted and release them inside igc_xdp_xmit. While it might work technically, it is not what the return value is meant to represent (i.e. the number of SUCCESSFULLY transmitted packets). 2. Rework kernel/bpf/devmap.c and all drivers to support non-consecutively dropped packets. Besides being complex, it likely has a negative performance impact without a significant gain since it is anyway unlikely that the next frame can be transmitted if the previous one was dropped. The memory corruption can be reproduced with the following script which leads to a kernel panic after a few seconds. It basically generates more traffic than a i225 NIC can transmit and pushes it via XDP_REDIRECT from a virtual interface to the physical interface where frames get dropped. #!/bin/bash INTERFACE=enp4s0 INTERFACE_IDX=`cat /sys/class/net/$INTERFACE/ifindex` sudo ip link add dev veth1 type veth peer name veth2 sudo ip link set up $INTERFACE sudo ip link set up veth1 sudo ip link set up veth2 cat << EOF > redirect.bpf.c SEC("prog") int redirect(struct xdp_md *ctx) { return bpf_redirect($INTERFACE_IDX, 0); } char _license[] SEC("license") = "GPL"; EOF clang -O2 -g -Wall -target bpf -c redirect.bpf.c -o redirect.bpf.o sudo ip link set veth2 xdp obj redirect.bpf.o cat << EOF > pass.bpf.c SEC("prog") int pass(struct xdp_md *ctx) { return XDP_PASS; } char _license[] SEC("license") = "GPL"; EOF clang -O2 -g -Wall -target bpf -c pass.bpf.c -o pass.bpf.o sudo ip link set $INTERFACE xdp obj pass.bpf.o cat << EOF > trafgen.cfg { /* Ethernet Header */ 0xe8, 0x6a, 0x64, 0x41, 0xbf, 0x46, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, const16(ETH_P_IP), /* IPv4 Header */ 0b01000101, 0, # IPv4 version, IHL, TOS const16(1028), # IPv4 total length (UDP length + 20 bytes (IP header)) const16(2), # IPv4 ident 0b01000000, 0, # IPv4 flags, fragmentation off 64, # IPv4 TTL 17, # Protocol UDP csumip(14, 33), # IPv4 checksum /* UDP Header */ 10, 0, 1, 1, # IP Src - adapt as needed 10, 0, 1, 2, # IP Dest - adapt as needed const16(6666), # UDP Src Port const16(6666), # UDP Dest Port const16(1008), # UDP length (UDP header 8 bytes + payload length) csumudp(14, 34), # UDP checksum /* Payload */ fill('W', 1000), } EOF sudo trafgen -i trafgen.cfg -b3000MB -o veth1 --cpp Fixes: 4ff32036 ("igc: Add support for XDP_REDIRECT action") Signed-off-by:
Florian Kauer <florian.kauer@linutronix.de> Reviewed-by:
Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by:
Naama Meir <naamax.meir@linux.intel.com> Signed-off-by:
Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Rand Deeb authored
[ Upstream commit 06e456a0 ] The function ice_bridge_setlink() may encounter a NULL pointer dereference if nlmsg_find_attr() returns NULL and br_spec is dereferenced subsequently in nla_for_each_nested(). To address this issue, add a check to ensure that br_spec is not NULL before proceeding with the nested attribute iteration. Fixes: b1edc14a ("ice: Implement ice_bridge_getlink and ice_bridge_setlink") Signed-off-by:
Rand Deeb <rand.sec96@gmail.com> Reviewed-by:
Simon Horman <horms@kernel.org> Signed-off-by:
Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Jacob Keller authored
[ Upstream commit 2652b99e ] The E800 series hardware uses the same iAVF driver as older devices, including the virtchnl negotiation scheme. This negotiation scheme includes a mechanism to determine what type of RSS should be supported, including RSS over PF virtchnl messages, RSS over firmware AdminQ messages, and RSS via direct register access. The PF driver will always prefer VIRTCHNL_VF_OFFLOAD_RSS_PF if its supported by the VF driver. However, if an older VF driver is loaded, it may request only VIRTCHNL_VF_OFFLOAD_RSS_REG or VIRTCHNL_VF_OFFLOAD_RSS_AQ. The ice driver happily agrees to support these methods. Unfortunately, the underlying hardware does not support these mechanisms. The E800 series VFs don't have the appropriate registers for RSS_REG. The mailbox queue used by VFs for VF to PF communication blocks messages which do not have the VF-to-PF opcode. Stop lying to the VF that it could support RSS over AdminQ or registers, as these interfaces do not work when the hardware is operating on an E800 series device. In practice this is unlikely to be hit by any normal user. The iAVF driver has supported RSS over PF virtchnl commands since 2016, and always defaults to using RSS_PF if possible. In principle, nothing actually stops the existing VF from attempting to access the registers or send an AQ command. However a properly coded VF will check the capability flags and will report a more useful error if it detects a case where the driver does not support the RSS offloads that it does. Fixes: 1071a835 ("ice: Implement virtchnl commands for AVF support") Signed-off-by:
Jacob Keller <jacob.e.keller@intel.com> Reviewed-by:
Alan Brady <alan.brady@intel.com> Tested-by:
Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by:
Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Horatiu Vultur authored
[ Upstream commit 89d72d41 ] Based on the static analyzis of the code it looks like when an entry from the MAC table was removed, the entry was still used after being freed. More precise the vid of the mac_entry was used after calling devm_kfree on the mac_entry. The fix consists in first using the vid of the mac_entry to delete the entry from the HW and after that to free it. Fixes: b37a1bae ("net: sparx5: add mactable support") Signed-off-by:
Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by:
Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240301080608.3053468-1-horatiu.vultur@microchip.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Eric Dumazet authored
[ Upstream commit 1ca1ba46 ] syzbot triggered a bug in geneve_rx() [1] Issue is similar to the one I fixed in commit 8d975c15 ("ip6_tunnel: make sure to pull inner header in __ip6_tnl_rcv()") We have to save skb->network_header in a temporary variable in order to be able to recompute the network_header pointer after a pskb_inet_may_pull() call. pskb_inet_may_pull() makes sure the needed headers are in skb->head. [1] BUG: KMSAN: uninit-value in IP_ECN_decapsulate include/net/inet_ecn.h:302 [inline] BUG: KMSAN: uninit-value in geneve_rx drivers/net/geneve.c:279 [inline] BUG: KMSAN: uninit-value in geneve_udp_encap_recv+0x36f9/0x3c10 drivers/net/geneve.c:391 IP_ECN_decapsulate include/net/inet_ecn.h:302 [inline] geneve_rx drivers/net/geneve.c:279 [inline] geneve_udp_encap_recv+0x36f9/0x3c10 drivers/net/geneve.c:391 udp_queue_rcv_one_skb+0x1d39/0x1f20 net/ipv4/udp.c:2108 udp_queue_rcv_skb+0x6ae/0x6e0 net/ipv4/udp.c:2186 udp_unicast_rcv_skb+0x184/0x4b0 net/ipv4/udp.c:2346 __udp4_lib_rcv+0x1c6b/0x3010 net/ipv4/udp.c:2422 udp_rcv+0x7d/0xa0 net/ipv4/udp.c:2604 ip_protocol_deliver_rcu+0x264/0x1300 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x2b8/0x440 net/ipv4/ip_input.c:233 NF_HOOK include/linux/netfilter.h:314 [inline] ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254 dst_input include/net/dst.h:461 [inline] ip_rcv_finish net/ipv4/ip_input.c:449 [inline] NF_HOOK include/linux/netfilter.h:314 [inline] ip_rcv+0x46f/0x760 net/ipv4/ip_input.c:569 __netif_receive_skb_one_core net/core/dev.c:5534 [inline] __netif_receive_skb+0x1a6/0x5a0 net/core/dev.c:5648 process_backlog+0x480/0x8b0 net/core/dev.c:5976 __napi_poll+0xe3/0x980 net/core/dev.c:6576 napi_poll net/core/dev.c:6645 [inline] net_rx_action+0x8b8/0x1870 net/core/dev.c:6778 __do_softirq+0x1b7/0x7c5 kernel/softirq.c:553 do_softirq+0x9a/0xf0 kernel/softirq.c:454 __local_bh_enable_ip+0x9b/0xa0 kernel/softirq.c:381 local_bh_enable include/linux/bottom_half.h:33 [inline] rcu_read_unlock_bh include/linux/rcupdate.h:820 [inline] __dev_queue_xmit+0x2768/0x51c0 net/core/dev.c:4378 dev_queue_xmit include/linux/netdevice.h:3171 [inline] packet_xmit+0x9c/0x6b0 net/packet/af_packet.c:276 packet_snd net/packet/af_packet.c:3081 [inline] packet_sendmsg+0x8aef/0x9f10 net/packet/af_packet.c:3113 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] __sys_sendto+0x735/0xa10 net/socket.c:2191 __do_sys_sendto net/socket.c:2203 [inline] __se_sys_sendto net/socket.c:2199 [inline] __x64_sys_sendto+0x125/0x1c0 net/socket.c:2199 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b Uninit was created at: slab_post_alloc_hook mm/slub.c:3819 [inline] slab_alloc_node mm/slub.c:3860 [inline] kmem_cache_alloc_node+0x5cb/0xbc0 mm/slub.c:3903 kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:560 __alloc_skb+0x352/0x790 net/core/skbuff.c:651 alloc_skb include/linux/skbuff.h:1296 [inline] alloc_skb_with_frags+0xc8/0xbd0 net/core/skbuff.c:6394 sock_alloc_send_pskb+0xa80/0xbf0 net/core/sock.c:2783 packet_alloc_skb net/packet/af_packet.c:2930 [inline] packet_snd net/packet/af_packet.c:3024 [inline] packet_sendmsg+0x70c2/0x9f10 net/packet/af_packet.c:3113 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] __sys_sendto+0x735/0xa10 net/socket.c:2191 __do_sys_sendto net/socket.c:2203 [inline] __se_sys_sendto net/socket.c:2199 [inline] __x64_sys_sendto+0x125/0x1c0 net/socket.c:2199 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b Fixes: 2d07dc79 ("geneve: add initial netdev driver for GENEVE tunnels") Reported-and-tested-by:
<syzbot+6a1423ff3f97159aae64@syzkaller.appspotmail.com> Signed-off-by:
Eric Dumazet <edumazet@google.com> Reviewed-by:
Jiri Pirko <jiri@nvidia.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Steven Rostedt (Google) authored
[ Upstream commit 51270d57 ] I'm updating __assign_str() and will be removing the second parameter. To make sure that it does not break anything, I make sure that it matches the __string() field, as that is where the string is actually going to be saved in. To make sure there's nothing that breaks, I added a WARN_ON() to make sure that what was used in __string() is the same that is used in __assign_str(). In doing this change, an error was triggered as __assign_str() now expects the string passed in to be a char * value. I instead had the following warning: include/trace/events/qdisc.h: In function ‘trace_event_raw_event_qdisc_reset’: include/trace/events/qdisc.h:91:35: error: passing argument 1 of 'strcmp' from incompatible pointer type [-Werror=incompatible-pointer-types] 91 | __assign_str(dev, qdisc_dev(q)); That's because the qdisc_enqueue() and qdisc_reset() pass in qdisc_dev(q) to __assign_str() and to __string(). But that function returns a pointer to struct net_device and not a string. It appears that these events are just saving the pointer as a string and then reading it as a string as well. Use qdisc_dev(q)->name to save the device instead. Fixes: a34dac0b ("net_sched: add tracepoints for qdisc_reset() and qdisc_destroy()") Signed-off-by:
Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by:
Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Rahul Rameshbabu authored
[ Upstream commit 90502d43 ] The NAPI poll context is a softirq context. Do not use normal spinlock API in this context to prevent concurrency issues. Fixes: 3178308a ("net/mlx5e: Make tx_port_ts logic resilient to out-of-order CQEs") Signed-off-by:
Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by:
Saeed Mahameed <saeedm@nvidia.com> CC: Vadim Fedorenko <vadfed@meta.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Rahul Rameshbabu authored
net/mlx5e: Use a memory barrier to enforce PTP WQ xmit submission tracking occurs after populating the metadata_map [ Upstream commit b7cf0758 ] Just simply reordering the functions mlx5e_ptp_metadata_map_put and mlx5e_ptpsq_track_metadata in the mlx5e_txwqe_complete context is not good enough since both the compiler and CPU are free to reorder these two functions. If reordering does occur, the issue that was supposedly fixed by 7e3f3ba9 ("net/mlx5e: Track xmit submission to PTP WQ after populating metadata map") will be seen. This will lead to NULL pointer dereferences in mlx5e_ptpsq_mark_ts_cqes_undelivered in the NAPI polling context due to the tracking list being populated before the metadata map. Fixes: 7e3f3ba9 ("net/mlx5e: Track xmit submission to PTP WQ after populating metadata map") Signed-off-by:
Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by:
Saeed Mahameed <saeedm@nvidia.com> CC: Vadim Fedorenko <vadfed@meta.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Emeel Hakim authored
[ Upstream commit a71f2147 ] The packet number attribute of the SA is incremented by the device rather than the software stack when enabling hardware offload. Because the packet number attribute is managed by the hardware, the software has no insight into the value of the packet number attribute actually written by the device. Previously when MACsec offload was enabled, the hardware object for handling the offload was destroyed when the SA was disabled. Re-enabling the SA would lead to a new hardware object being instantiated. This new hardware object would not have any recollection of the correct packet number for the SA. Instead, destroy the flow steering rule when deactivating the SA and recreate it upon reactivation, preserving the original hardware object. Fixes: 8ff0ac5b ("net/mlx5: Add MACsec offload Tx command support") Signed-off-by:
Emeel Hakim <ehakim@nvidia.com> Signed-off-by:
Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by:
Gal Pressman <gal@nvidia.com> Reviewed-by:
Tariq Toukan <tariqt@nvidia.com> Signed-off-by:
Saeed Mahameed <saeedm@nvidia.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Jianbo Liu authored
[ Upstream commit dd238b70 ] Downgrade the print from mlx5_core_warn() to mlx5_core_dbg(), as it is just a statement of fact that firmware doesn't support ignore flow level. And change the wording to "firmware flow level support is missing", to make it more accurate. Fixes: ae2ee3be ("net/mlx5: CT: Remove warning of ignore_flow_level support for VFs") Signed-off-by:
Jianbo Liu <jianbol@nvidia.com> Suggested-by:
Elliott, Robert (Servers) <elliott@hpe.com> Reviewed-by:
Roi Dayan <roid@nvidia.com> Signed-off-by:
Saeed Mahameed <saeedm@nvidia.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Moshe Shemesh authored
[ Upstream commit 5e6107b4 ] Functions which can't access MFRL (Management Firmware Reset Level) register, have no use of fw_reset structures or events. Remove fw_reset structures allocation and registration for fw reset events notifications for these functions. Having the devlink param enable_remote_dev_reset on functions that don't have this capability is misleading as these functions are not allowed to influence the reset flow. Hence, this patch removes this parameter for such functions. In addition, return not supported on devlink reload action fw_activate for these functions. Fixes: 38b9f903 ("net/mlx5: Handle sync reset request event") Signed-off-by:
Moshe Shemesh <moshe@nvidia.com> Reviewed-by:
Aya Levin <ayal@nvidia.com> Signed-off-by:
Saeed Mahameed <saeedm@nvidia.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Jianbo Liu authored
[ Upstream commit 85ea2c5c ] The checking in the cited commit is not accurate. In the common case, VF destination is internal, and uplink destination is external. However, uplink destination with packet reformat is considered as internal because firmware uses LB+hairpin to support it. Update the checking so header rewrite rules with both internal and external destinations are not allowed. Fixes: e0e22d59 ("net/mlx5: E-switch, Add checking for flow rule destinations") Signed-off-by:
Jianbo Liu <jianbol@nvidia.com> Reviewed-by:
Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by:
Saeed Mahameed <saeedm@nvidia.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Saeed Mahameed authored
[ Upstream commit b7bbd698 ] This reverts commit 4e25b661. This Commit was mistakenly applied by pulling the wrong tag, remove it. Fixes: 4e25b661 ("net/mlx5e: Check the number of elements before walk TC rhashtable") Signed-off-by:
Saeed Mahameed <saeedm@nvidia.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Gavin Li authored
[ Upstream commit 8deeefb2 ] This reverts commit 662404b2. The revert is required due to the suspicion it is not good for anything and cause crash. Fixes: 662404b2 ("net/mlx5e: Block entering switchdev mode with ns inconsistency") Signed-off-by:
Gavin Li <gavinl@nvidia.com> Reviewed-by:
Jiri Pirko <jiri@nvidia.com> Signed-off-by:
Saeed Mahameed <saeedm@nvidia.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Maciej Fijalkowski authored
[ Upstream commit 99099c6b ] ice_qp_dis() currently does things in very mixed way. Tx is stopped before disabling IRQ on related queue vector, then it takes care of disabling Rx and finally NAPI is disabled. Let us start with disabling IRQs in the first place followed by turning off NAPI. Then it is safe to handle queues. One subtle change on top of that is that even though ice_qp_ena() looks more sane, clear ICE_CFG_BUSY as the last thing there. Fixes: 2d4238f5 ("ice: Add support for AF_XDP") Signed-off-by:
Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Acked-by:
Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by:
Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Maciej Fijalkowski authored
[ Upstream commit d562b11c ] Disable NAPI before shutting down queues that this particular NAPI contains so that the order of actions in i40e_queue_pair_disable() mirrors what we do in i40e_queue_pair_enable(). Fixes: 123cecd4 ("i40e: added queue pair disable/enable functions") Signed-off-by:
Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Acked-by:
Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by:
Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Maciej Fijalkowski authored
[ Upstream commit cbf996f5 ] Currently routines that are supposed to toggle state of ring pair do not take care of associated interrupt with queue vector that these rings belong to. This causes funky issues such as dead interface due to irq misconfiguration, as per Pavel's report from Closes: tag. Add a function responsible for disabling single IRQ in EIMC register and call this as a very first thing when disabling ring pair during xsk_pool setup. For enable let's reuse ixgbe_irq_enable_queues(). Besides this, disable/enable NAPI as first/last thing when dealing with closing or opening ring pair that xsk_pool is being configured on. Reported-by:
Pavel Vazharov <pavel@x3me.net> Closes: https://lore.kernel.org/netdev/CAJEV1ijxNyPTwASJER1bcZzS9nMoZJqfR86nu_3jFFVXzZQ4NA@mail.gmail.com/ Fixes: 024aa580 ("ixgbe: added Rx/Tx ring disable/enable functions") Signed-off-by:
Maciej Fijalkowski <maciej.fijalkowski@intel.com> Acked-by:
Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by:
Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Oleksij Rempel authored
[ Upstream commit 1eecc7ab ] Current driver has some asymmetry in the runtime PM calls. On lan78xx_open() it will call usb_autopm_get() and unconditionally usb_autopm_put(). And on lan78xx_stop() it will call only usb_autopm_put(). So far, it was working only because this driver do not activate autosuspend by default, so it was visible only by warning "Runtime PM usage count underflow!". Since, with current driver, we can't use runtime PM with active link, execute lan78xx_open()->usb_autopm_put() only in error case. Otherwise, keep ref counting high as long as interface is open. Fixes: 55d7de9d ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver") Signed-off-by:
Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by:
Jiri Pirko <jiri@nvidia.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Leon Romanovsky authored
[ Upstream commit 983a73da ] In addition to citied commit in Fixes line, allow UDP encapsulation in TX path too. Fixes: 89edf402 ("xfrm: Support UDP encapsulation in packet offload mode") CC: Steffen Klassert <steffen.klassert@secunet.com> Reported-by:
Mike Yu <yumike@google.com> Signed-off-by:
Leon Romanovsky <leonro@nvidia.com> Signed-off-by:
Saeed Mahameed <saeedm@nvidia.com> Signed-off-by:
Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Byungchul Park authored
[ Upstream commit 2774f256 ] With numa balancing on, when a numa system is running where a numa node doesn't have its local memory so it has no managed zones, the following oops has been observed. It's because wakeup_kswapd() is called with a wrong zone index, -1. Fixed it by checking the index before calling wakeup_kswapd(). > BUG: unable to handle page fault for address: 00000000000033f3 > #PF: supervisor read access in kernel mode > #PF: error_code(0x0000) - not-present page > PGD 0 P4D 0 > Oops: 0000 [#1] PREEMPT SMP NOPTI > CPU: 2 PID: 895 Comm: masim Not tainted 6.6.0-dirty #255 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 > RIP: 0010:wakeup_kswapd (./linux/mm/vmscan.c:7812) > Code: (omitted) > RSP: 0000:ffffc90004257d58 EFLAGS: 00010286 > RAX: ffffffffffffffff RBX: ffff88883fff0480 RCX: 0000000000000003 > RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88883fff0480 > RBP: ffffffffffffffff R08: ff0003ffffffffff R09: ffffffffffffffff > R10: ffff888106c95540 R11: 0000000055555554 R12: 0000000000000003 > R13: 0000000000000000 R14: 0000000000000000 R15: ffff88883fff0940 > FS: 00007fc4b8124740(0000) GS:ffff888827c00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00000000000033f3 CR3: 000000026cc08004 CR4: 0000000000770ee0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > PKRU: 55555554 > Call Trace: > <TASK> > ? __die > ? page_fault_oops > ? __pte_offset_map_lock > ? exc_page_fault > ? asm_exc_page_fault > ? wakeup_kswapd > migrate_misplaced_page > __handle_mm_fault > handle_mm_fault > do_user_addr_fault > exc_page_fault > asm_exc_page_fault > RIP: 0033:0x55b897ba0808 > Code: (omitted) > RSP: 002b:00007ffeefa821a0 EFLAGS: 00010287 > RAX: 000055b89983acd0 RBX: 00007ffeefa823f8 RCX: 000055b89983acd0 > RDX: 00007fc2f8122010 RSI: 0000000000020000 RDI: 000055b89983acd0 > RBP: 00007ffeefa821a0 R08: 0000000000000037 R09: 0000000000000075 > R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 > R13: 00007ffeefa82410 R14: 000055b897ba5dd8 R15: 00007fc4b8340000 > </TASK> Link: https://lkml.kernel.org/r/20240216111502.79759-1-byungchul@sk.com Signed-off-by:
Byungchul Park <byungchul@sk.com> Reported-by:
Hyeongtak Ji <hyeongtak.ji@sk.com> Fixes: c574bbe9 ("NUMA balancing: optimize page placement for memory tiering system") Reviewed-by:
Oscar Salvador <osalvador@suse.de> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Xiubo Li authored
[ Upstream commit 51d31149 ] The addition of bal_rank_mask with encoding version 17 was merged into ceph.git in Oct 2022 and made it into v18.2.0 release normally. A few months later, the much delayed addition of max_xattr_size got merged, also with encoding version 17, placed before bal_rank_mask in the encoding -- but it didn't make v18.2.0 release. The way this ended up being resolved on the MDS side is that bal_rank_mask will continue to be encoded in version 17 while max_xattr_size is now encoded in version 18. This does mean that older kernels will misdecode version 17, but this is also true for v18.2.0 and v18.2.1 clients in userspace. The best we can do is backport this adjustment -- see ceph.git commit 78abfeaff27fee343fb664db633de5b221699a73 for details. [ idryomov: changelog ] Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/64440 Fixes: d93231a6 ("ceph: prevent a client from exceeding the MDS maximum xattr size") Signed-off-by:
Xiubo Li <xiubli@redhat.com> Reviewed-by:
Patrick Donnelly <pdonnell@ibm.com> Reviewed-by:
Venky Shankar <vshankar@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Frank Li authored
[ Upstream commit a79f949a ] Correcting the previous setting of 0x3fff to the actual value of 0x7fff. Introduced new macro 'EDMA_TCD_ITER_MASK' for improved code clarity and utilization of FIELD_GET to obtain the accurate maximum value. Cc: stable@vger.kernel.org Fixes: e0674853 ("dmaengine: fsl-edma: support edma memcpy") Signed-off-by:
Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20240207194733.2112870-1-Frank.Li@nxp.com Signed-off-by:
Vinod Koul <vkoul@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Frank Li authored
[ Upstream commit d0e217b7 ] Refactor the code to use the common dt-binding header file, fsl-edma.h. Renaming ARGS* to FSL_EDMA*, ensuring no functional changes. Signed-off-by:
Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20231114154824.3617255-4-Frank.Li@nxp.com Signed-off-by:
Vinod Koul <vkoul@kernel.org> Stable-dep-of: a79f949a ("dmaengine: fsl-edma: correct max_segment_size setting") Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Frank Li authored
[ Upstream commit 1e9b0525 ] Introduce a common dt-bindings header file, fsl-edma.h, shared between the driver and dts files. This addition aims to eliminate hardcoded values in dts files, promoting maintainability and consistency. DTS header file not support BIT() macro yet. Directly use 2^n number. Signed-off-by:
Frank Li <Frank.Li@nxp.com> Reviewed-by:
Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20231114154824.3617255-3-Frank.Li@nxp.com Signed-off-by:
Vinod Koul <vkoul@kernel.org> Stable-dep-of: a79f949a ("dmaengine: fsl-edma: correct max_segment_size setting") Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
- Mar 06, 2024
-
-
Greg Kroah-Hartman authored
Link: https://lore.kernel.org/r/20240304211549.876981797@linuxfoundation.org Tested-by:
SeongJae Park <sj@kernel.org> Tested-by:
Luna Jernberg <droidbittin@gmail.com> Tested-by:
Bagas Sanjaya <bagasdotme@gmail.com> Tested-by:
Ron Economos <re@w6rz.net> Tested-by:
Takeshi Ogasawara <takeshi.ogasawara@futuring-girl.com> Tested-by:
Jon Hunter <jonathanh@nvidia.com> Tested-by:
Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Tested-by:
Shuah Khan <skhan@linuxfoundation.org> Tested-by:
Florian Fainelli <florian.fainelli@broadcom.com> Tested-by:
Linux Kernel Functional Testing <lkft@linaro.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Danilo Krummrich authored
This bug is present in v6.7 only, since the scheduler design has been re-worked in v6.8. Client scheduler entities must be flushed before an associated GPU scheduler is teared down. Otherwise the entitiy might still hold a pointer to the scheduler's runqueue which is freed at scheduler tear down already. [ 305.224293] ================================================================== [ 305.224297] BUG: KASAN: slab-use-after-free in drm_sched_entity_flush+0x6c4/0x7b0 [gpu_sched] [ 305.224310] Read of size 8 at addr ffff8881440a8f48 by task rmmod/4436 [ 305.224317] CPU: 10 PID: 4436 Comm: rmmod Tainted: G U 6.7.6-100.fc38.x86_64+debug #1 [ 305.224321] Hardware name: Dell Inc. Precision 7550/01PXFR, BIOS 1.27.0 11/08/2023 [ 305.224324] Call Trace: [ 305.224327] <TASK> [ 305.224329] dump_stack_lvl+0x76/0xd0 [ 305.224336] print_report+0xcf/0x670 [ 305.224342] ? drm_sched_entity_flush+0x6c4/0x7b0 [gpu_sched] [ 305.224352] ? __virt_addr_valid+0x215/0x410 [ 305.224359] ? drm_sched_entity_flush+0x6c4/0x7b0 [gpu_sched] [ 305.224368] kasan_report+0xa6/0xe0 [ 305.224373] ? drm_sched_entity_flush+0x6c4/0x7b0 [gpu_sched] [ 305.224385] drm_sched_entity_flush+0x6c4/0x7b0 [gpu_sched] [ 305.224395] ? __pfx_drm_sched_entity_flush+0x10/0x10 [gpu_sched] [ 305.224406] ? rcu_is_watching+0x15/0xb0 [ 305.224413] drm_sched_entity_destroy+0x17/0x20 [gpu_sched] [ 305.224422] nouveau_cli_fini+0x6c/0x120 [nouveau] [ 305.224658] nouveau_drm_device_fini+0x2ac/0x490 [nouveau] [ 305.224871] nouveau_drm_remove+0x18e/0x220 [nouveau] [ 305.225082] ? __pfx_nouveau_drm_remove+0x10/0x10 [nouveau] [ 305.225290] ? rcu_is_watching+0x15/0xb0 [ 305.225295] ? _raw_spin_unlock_irqrestore+0x66/0x80 [ 305.225299] ? trace_hardirqs_on+0x16/0x100 [ 305.225304] ? _raw_spin_unlock_irqrestore+0x4f/0x80 [ 305.225310] pci_device_remove+0xa3/0x1d0 [ 305.225316] device_release_driver_internal+0x379/0x540 [ 305.225322] driver_detach+0xc5/0x180 [ 305.225327] bus_remove_driver+0x11e/0x2a0 [ 305.225333] pci_unregister_driver+0x2a/0x250 [ 305.225339] nouveau_drm_exit+0x1f/0x970 [nouveau] [ 305.225548] __do_sys_delete_module+0x350/0x580 [ 305.225554] ? __pfx___do_sys_delete_module+0x10/0x10 [ 305.225562] ? syscall_enter_from_user_mode+0x26/0x90 [ 305.225567] ? rcu_is_watching+0x15/0xb0 [ 305.225571] ? syscall_enter_from_user_mode+0x26/0x90 [ 305.225575] ? trace_hardirqs_on+0x16/0x100 [ 305.225580] do_syscall_64+0x61/0xe0 [ 305.225584] ? rcu_is_watching+0x15/0xb0 [ 305.225587] ? syscall_exit_to_user_mode+0x1f/0x50 [ 305.225592] ? trace_hardirqs_on_prepare+0xe3/0x100 [ 305.225596] ? do_syscall_64+0x70/0xe0 [ 305.225600] ? trace_hardirqs_on_prepare+0xe3/0x100 [ 305.225604] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 305.225609] RIP: 0033:0x7f6148f3592b [ 305.225650] Code: 73 01 c3 48 8b 0d dd 04 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ad 04 0c 00 f7 d8 64 89 01 48 [ 305.225653] RSP: 002b:00007ffe89986f08 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 [ 305.225659] RAX: ffffffffffffffda RBX: 000055cbb036e900 RCX: 00007f6148f3592b [ 305.225662] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055cbb036e968 [ 305.225664] RBP: 00007ffe89986f30 R08: 1999999999999999 R09: 0000000000000000 [ 305.225667] R10: 00007f6148fa6ac0 R11: 0000000000000206 R12: 0000000000000000 [ 305.225670] R13: 00007ffe89987190 R14: 000055cbb036e900 R15: 0000000000000000 [ 305.225678] </TASK> [ 305.225683] Allocated by task 484: [ 305.225685] kasan_save_stack+0x33/0x60 [ 305.225690] kasan_set_track+0x25/0x30 [ 305.225693] __kasan_kmalloc+0x8f/0xa0 [ 305.225696] drm_sched_init+0x3c7/0xce0 [gpu_sched] [ 305.225705] nouveau_sched_init+0xd2/0x110 [nouveau] [ 305.225913] nouveau_drm_device_init+0x130/0x3290 [nouveau] [ 305.226121] nouveau_drm_probe+0x1ab/0x6b0 [nouveau] [ 305.226329] local_pci_probe+0xda/0x190 [ 305.226333] pci_device_probe+0x23a/0x780 [ 305.226337] really_probe+0x3df/0xb80 [ 305.226341] __driver_probe_device+0x18c/0x450 [ 305.226345] driver_probe_device+0x4a/0x120 [ 305.226348] __driver_attach+0x1e5/0x4a0 [ 305.226351] bus_for_each_dev+0x106/0x190 [ 305.226355] bus_add_driver+0x2a1/0x570 [ 305.226358] driver_register+0x134/0x460 [ 305.226361] do_one_initcall+0xd3/0x430 [ 305.226366] do_init_module+0x238/0x770 [ 305.226370] load_module+0x5581/0x6f10 [ 305.226374] __do_sys_init_module+0x1f2/0x220 [ 305.226377] do_syscall_64+0x61/0xe0 [ 305.226381] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 305.226387] Freed by task 4436: [ 305.226389] kasan_save_stack+0x33/0x60 [ 305.226392] kasan_set_track+0x25/0x30 [ 305.226396] kasan_save_free_info+0x2b/0x50 [ 305.226399] __kasan_slab_free+0x10b/0x1a0 [ 305.226402] slab_free_freelist_hook+0x12b/0x1e0 [ 305.226406] __kmem_cache_free+0xd4/0x1d0 [ 305.226410] drm_sched_fini+0x178/0x320 [gpu_sched] [ 305.226418] nouveau_drm_device_fini+0x2a0/0x490 [nouveau] [ 305.226624] nouveau_drm_remove+0x18e/0x220 [nouveau] [ 305.226832] pci_device_remove+0xa3/0x1d0 [ 305.226836] device_release_driver_internal+0x379/0x540 [ 305.226840] driver_detach+0xc5/0x180 [ 305.226843] bus_remove_driver+0x11e/0x2a0 [ 305.226847] pci_unregister_driver+0x2a/0x250 [ 305.226850] nouveau_drm_exit+0x1f/0x970 [nouveau] [ 305.227056] __do_sys_delete_module+0x350/0x580 [ 305.227060] do_syscall_64+0x61/0xe0 [ 305.227064] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 305.227070] The buggy address belongs to the object at ffff8881440a8f00 which belongs to the cache kmalloc-128 of size 128 [ 305.227073] The buggy address is located 72 bytes inside of freed 128-byte region [ffff8881440a8f00, ffff8881440a8f80) [ 305.227078] The buggy address belongs to the physical page: [ 305.227081] page:00000000627efa0a refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1440a8 [ 305.227085] head:00000000627efa0a order:1 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ 305.227088] flags: 0x17ffffc0000840(slab|head|node=0|zone=2|lastcpupid=0x1fffff) [ 305.227093] page_type: 0xffffffff() [ 305.227097] raw: 0017ffffc0000840 ffff8881000428c0 ffffea0005b33500 dead000000000002 [ 305.227100] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000 [ 305.227102] page dumped because: kasan: bad access detected [ 305.227106] Memory state around the buggy address: [ 305.227109] ffff8881440a8e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 305.227112] ffff8881440a8e80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 305.227114] >ffff8881440a8f00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 305.227117] ^ [ 305.227120] ffff8881440a8f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 305.227122] ffff8881440a9000: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc [ 305.227125] ================================================================== Cc: <stable@vger.kernel.org> # v6.7 only Reported-by:
Karol Herbst <kherbst@redhat.com> Closes: https://gist.githubusercontent.com/karolherbst/a20eb0f937a06ed6aabe2ac2ca3d11b5/raw/9cd8b1dc5894872d0eeebbee3dd0fdd28bb576bc/gistfile1.txt Fixes: b88baab8 ("drm/nouveau: implement new VM_BIND uAPI") Signed-off-by:
Danilo Krummrich <dakr@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Geliang Tang authored
commit 7092dbee upstream. Now both a v4 address and a v4-mapped address are supported when destroying a userspace pm subflow, this patch adds a second subflow to "userspace pm add & remove address" test, and two subflows could be removed two different ways, one with the v4mapped and one with v4. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/387 Fixes: 48d73f60 ("selftests: mptcp: update userspace pm addr tests") Cc: stable@vger.kernel.org Signed-off-by:
Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by:
Mat Martineau <martineau@kernel.org> Reviewed-by:
Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by:
Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-2-162e87e48497@kernel.org Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Geliang Tang authored
commit b850f2c7 upstream. To avoid duplicated code in different MPTCP selftests, we can add and use helpers defined in mptcp_lib.sh. is_v6() helper is defined in mptcp_connect.sh, mptcp_join.sh and mptcp_sockopt.sh, so export it into mptcp_lib.sh and rename it as mptcp_lib_is_v6(). Use this new helper in all scripts. Reviewed-by:
Matthieu Baerts <matttbe@kernel.org> Signed-off-by:
Geliang Tang <geliang.tang@suse.com> Signed-off-by:
Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231128-send-net-next-2023107-v4-10-8d6b94150f6b@kernel.org Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Geliang Tang authored
commit 757c828c upstream. This patch adds a new argument namespace to userspace_pm_add_addr() and userspace_pm_add_sf() to make these two helper more versatile. Add two more versatile helpers for userspace pm remove subflow or address: userspace_pm_rm_addr() and userspace_pm_rm_sf(). The original test helpers userspace_pm_rm_sf_addr_ns1() and userspace_pm_rm_sf_addr_ns2() can be replaced by these new helpers. Reviewed-by:
Matthieu Baerts <matttbe@kernel.org> Signed-off-by:
Geliang Tang <geliang.tang@suse.com> Signed-off-by:
Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231128-send-net-next-2023107-v4-4-8d6b94150f6b@kernel.org Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Geliang Tang authored
commit 80775412 upstream. This patch adds a new helper chk_subflows_total(), in it use the newly added counter mptcpi_subflows_total to get the "correct" amount of subflows, including the initial one. To be compatible with old 'ss' or kernel versions not supporting this counter, get the total subflows by listing TCP connections that are MPTCP subflows: ss -ti state state established state syn-sent state syn-recv | grep -c tcp-ulp-mptcp. Reviewed-by:
Matthieu Baerts <matttbe@kernel.org> Signed-off-by:
Geliang Tang <geliang.tang@suse.com> Signed-off-by:
Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231128-send-net-next-2023107-v4-3-8d6b94150f6b@kernel.org Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Geliang Tang authored
commit 06848c0f upstream. This patch adds a new helper get_info_value(), using 'sed' command to parse the value of the given item name in the line with the given keyword, to make chk_mptcp_info() and pedit_action_pkts() more readable. Also add another helper evts_get_info() to use get_info_value() to parse the output of 'pm_nl_ctl events' command, to make all the userspace pm selftests more readable, both in mptcp_join.sh and userspace_pm.sh. Reviewed-by:
Matthieu Baerts <matttbe@kernel.org> Signed-off-by:
Geliang Tang <geliang.tang@suse.com> Signed-off-by:
Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231128-send-net-next-2023107-v4-2-8d6b94150f6b@kernel.org Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pawan Gupta authored
commit 43fb862d upstream. During VMentry VERW is executed to mitigate MDS. After VERW, any memory access like register push onto stack may put host data in MDS affected CPU buffers. A guest can then use MDS to sample host data. Although likelihood of secrets surviving in registers at current VERW callsite is less, but it can't be ruled out. Harden the MDS mitigation by moving the VERW mitigation late in VMentry path. Note that VERW for MMIO Stale Data mitigation is unchanged because of the complexity of per-guest conditional VERW which is not easy to handle that late in asm with no GPRs available. If the CPU is also affected by MDS, VERW is unconditionally executed late in asm regardless of guest having MMIO access. Signed-off-by:
Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by:
Dave Hansen <dave.hansen@linux.intel.com> Acked-by:
Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/all/20240213-delay-verw-v8-6-a6216d83edb7%40linux.intel.com Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pawan Gupta authored
From: Sean Christopherson <seanjc@google.com> commit 706a189d upstream. Use EFLAGS.CF instead of EFLAGS.ZF to track whether to use VMRESUME versus VMLAUNCH. Freeing up EFLAGS.ZF will allow doing VERW, which clobbers ZF, for MDS mitigations as late as possible without needing to duplicate VERW for both paths. Signed-off-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by:
Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by:
Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/all/20240213-delay-verw-v8-5-a6216d83edb7%40linux.intel.com Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pawan Gupta authored
commit 6613d82e upstream. The VERW mitigation at exit-to-user is enabled via a static branch mds_user_clear. This static branch is never toggled after boot, and can be safely replaced with an ALTERNATIVE() which is convenient to use in asm. Switch to ALTERNATIVE() to use the VERW mitigation late in exit-to-user path. Also remove the now redundant VERW in exc_nmi() and arch_exit_to_user_mode(). Signed-off-by:
Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by:
Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20240213-delay-verw-v8-4-a6216d83edb7%40linux.intel.com Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pawan Gupta authored
commit a0e2dab4 upstream. As done for entry_64, add support for executing VERW late in exit to user path for 32-bit mode. Signed-off-by:
Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by:
Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20240213-delay-verw-v8-3-a6216d83edb7%40linux.intel.com Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pawan Gupta authored
commit 3c750172 upstream. Mitigation for MDS is to use VERW instruction to clear any secrets in CPU Buffers. Any memory accesses after VERW execution can still remain in CPU buffers. It is safer to execute VERW late in return to user path to minimize the window in which kernel data can end up in CPU buffers. There are not many kernel secrets to be had after SWITCH_TO_USER_CR3. Add support for deploying VERW mitigation after user register state is restored. This helps minimize the chances of kernel data ending up into CPU buffers after executing VERW. Note that the mitigation at the new location is not yet enabled. Corner case not handled ======================= Interrupts returning to kernel don't clear CPUs buffers since the exit-to-user path is expected to do that anyways. But, there could be a case when an NMI is generated in kernel after the exit-to-user path has cleared the buffers. This case is not handled and NMI returning to kernel don't clear CPU buffers because: 1. It is rare to get an NMI after VERW, but before returning to userspace. 2. For an unprivileged user, there is no known way to make that NMI less rare or target it. 3. It would take a large number of these precisely-timed NMIs to mount an actual attack. There's presumably not enough bandwidth. 4. The NMI in question occurs after a VERW, i.e. when user state is restored and most interesting data is already scrubbed. Whats left is only the data that NMI touches, and that may or may not be of any interest. [ pawan: resolved conflict for hunk swapgs_restore_regs_and_return_to_usermode in backport ] Suggested-by:
Dave Hansen <dave.hansen@intel.com> Signed-off-by:
Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by:
Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20240213-delay-verw-v8-2-a6216d83edb7%40linux.intel.com Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ming Lei authored
[ Upstream commit 7838b465 ] In commit 19416123 ("block: define 'struct bvec_iter' as packed"), what we need is to save the 4byte padding, and avoid `bio` to spread on one extra cache line. It is enough to define it as '__packed __aligned(4)', as '__packed' alone means byte aligned, and can cause compiler to generate horrible code on architectures that don't support unaligned access in case that bvec_iter is embedded in other structures. Cc: Mikulas Patocka <mpatocka@redhat.com> Suggested-by:
Linus Torvalds <torvalds@linux-foundation.org> Fixes: 19416123 ("block: define 'struct bvec_iter' as packed") Signed-off-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Bartosz Golaszewski authored
[ Upstream commit ec5c54a9 ] Hogs are added *after* ACPI so should be removed *before* in error path. Fixes: a411e81e ("gpiolib: add hogs support for machine code") Signed-off-by:
Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by:
Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Andy Shevchenko authored
[ Upstream commit e4aec4da ] After shuffling the code, error path wasn't updated correctly. Fix it here. Fixes: 2f4133bb ("gpiolib: No need to call gpiochip_remove_pin_ranges() twice") Signed-off-by:
Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by:
Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Arturas Moskvinas authored
[ Upstream commit 530b1dbd ] Chip outputs are enabled[1] before actual reset is performed[2] which might cause pin output value to flip flop if previous pin value was set to 1. Fix that behavior by making sure chip is fully reset before all outputs are enabled. Flip-flop can be noticed when module is removed and inserted again and one of the pins was changed to 1 before removal. 100 microsecond flipping is noticeable on oscilloscope (100khz SPI bus). For a properly reset chip - output is enabled around 100 microseconds (on 100khz SPI bus) later during probing process hence should be irrelevant behavioral change. Fixes: 7ebc194d (gpio: 74x164: Introduce 'enable-gpios' property) Link: https://elixir.bootlin.com/linux/v6.7.4/source/drivers/gpio/gpio-74x164.c#L130 [1] Link: https://elixir.bootlin.com/linux/v6.7.4/source/drivers/gpio/gpio-74x164.c#L150 [2] Signed-off-by:
Arturas Moskvinas <arturas.moskvinas@gmail.com> Signed-off-by:
Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-