- May 18, 2022
-
-
Vladimir Oltean authored
[ Upstream commit 93a84170 ] Given the following order of operations: (1) we add filter A using tc-flower (2) we send a packet that matches it (3) we read the filter's statistics to find a hit count of 1 (4) we add a second filter B with a higher preference than A, and A moves one position to the right to make room in the TCAM for it (5) we send another packet, and this matches the second filter B (6) we read the filter statistics again. When this happens, the hit count of filter A is 2 and of filter B is 1, despite a single packet having matched each filter. Furthermore, in an alternate history, reading the filter stats a second time between steps (3) and (4) makes the hit count of filter A remain at 1 after step (6), as expected. The reason why this happens has to do with the filter->stats.pkts field, which is written to hardware through the call path below: vcap_entry_set / | \ / | \ / | \ / | \ es0_entry_set is1_entry_set is2_entry_set \ | / \ | / \ | / vcap_data_set(data.counter, ...) The primary role of filter->stats.pkts is to transport the filter hit counters from the last readout all the way from vcap_entry_get() -> ocelot_vcap_filter_stats_update() -> ocelot_cls_flower_stats(). The reason why vcap_entry_set() writes it to hardware is so that the counters (saturating and having a limited bit width) are cleared after each user space readout. The writing of filter->stats.pkts to hardware during the TCAM entry movement procedure is an unintentional consequence of the code design, because the hit count isn't up to date at this point. So at step (4), when filter A is moved by ocelot_vcap_filter_add() to make room for filter B, the hardware hit count is 0 (no packet matched on it in the meantime), but filter->stats.pkts is 1, because the last readout saw the earlier packet. The movement procedure programs the old hit count back to hardware, so this creates the impression to user space that more packets have been matched than they really were. The bug can be seen when running the gact_drop_and_ok_test() from the tc_actions.sh selftest. Fix the issue by reading back the hit count to tmp->stats.pkts before migrating the VCAP filter. Sure, this is a best-effort technique, since the packets that hit the rule between vcap_entry_get() and vcap_entry_set() won't be counted, but at least it allows the counters to be reliably used for selftests where the traffic is under control. The vcap_entry_get() name is a bit unintuitive, but it only reads back the counter portion of the TCAM entry, not the entire entry. The index from which we retrieve the counter is also a bit unintuitive (i - 1 during add, i + 1 during del), but this is the way in which TCAM entry movement works. The "entry index" isn't a stored integer for a TCAM filter, instead it is dynamically computed by ocelot_vcap_block_get_filter_index() based on the entry's position in the &block->rules list. That position (as well as block->count) is automatically updated by ocelot_vcap_filter_add_to_block() on add, and by ocelot_vcap_block_remove_filter() on del. So "i" is the new filter index, and "i - 1" or "i + 1" respectively are the old addresses of that TCAM entry (we only support installing/deleting one filter at a time). Fixes: b5962294 ("net: mscc: ocelot: Add support for tcam") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Vladimir Oltean authored
[ Upstream commit 477d2b91 ] Once the CPU port was added to the destination port mask of a packet, it can never be cleared, so even packets marked as dropped by the MASK_MODE of a VCAP IS2 filter will still reach it. This is why we need the OCELOT_POLICER_DISCARD to "kill dropped packets dead" and make software stop seeing them. We disallow policer rules from being put on any other chain than the one for the first lookup, but we don't do this for "drop" rules, although we should. This change is merely ascertaining that the rules dont't (completely) work and letting the user know. The blamed commit is the one that introduced the multi-chain architecture in ocelot. Prior to that, we should have always offloaded the filters to VCAP IS2 lookup 0, where they did work. Fixes: 1397a2eb ("net: mscc: ocelot: create TCAM skeleton from tc filter chains") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Vladimir Oltean authored
[ Upstream commit 6741e118 ] The VCAP IS2 TCAM is looked up twice per packet, and each filter can be configured to only match during the first, second lookup, or both, or none. The blamed commit wrote the code for making VCAP IS2 filters match only on the given lookup. But right below that code, there was another line that explicitly made the lookup a "don't care", and this is overwriting the lookup we've selected. So the code had no effect. Some of the more noticeable effects of having filters match on both lookups: - in "tc -s filter show dev swp0 ingress", we see each packet matching a VCAP IS2 filter counted twice. This throws off scripts such as tools/testing/selftests/net/forwarding/tc_actions.sh and makes them fail. - a "tc-drop" action offloaded to VCAP IS2 needs a policer as well, because once the CPU port becomes a member of the destination port mask of a packet, nothing removes it, not even a PERMIT/DENY mask mode with a port mask of 0. But VCAP IS2 rules with the POLICE_ENA bit in the action vector can only appear in the first lookup. What happens when a filter matches both lookups is that the action vector is combined, and this makes the POLICE_ENA bit ineffective, since the last lookup in which it has appeared is the second one. In other words, "tc-drop" actions do not drop packets for the CPU port, dropped packets are still seen by software unless there was an FDB entry that directed those packets to some other place different from the CPU. The last bit used to work, because in the initial commit b5962294 ("net: mscc: ocelot: Add support for tcam"), we were writing the FIRST field of the VCAP IS2 half key with a 1, not with a "don't care". The change to "don't care" was made inadvertently by me in commit c1c3993e ("net: mscc: ocelot: generalize existing code for VCAP"), which I just realized, and which needs a separate fix from this one, for "stable" kernels that lack the commit blamed below. Fixes: 226e9cd8 ("net: mscc: ocelot: only install TCAM entries into a specific lookup and PAG") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Vladimir Oltean authored
[ Upstream commit 16bbebd3 ] ocelot_vcap_filter_del() works by moving the next filters over the current one, and then deleting the last filter by calling vcap_entry_set() with a del_filter which was specially created by memsetting its memory to zeroes. vcap_entry_set() then programs this to the TCAM and action RAM via the cache registers. The problem is that vcap_entry_set() is a dispatch function which looks at del_filter->block_id. But since del_filter is zeroized memory, the block_id is 0, or otherwise said, VCAP_ES0. So practically, what we do is delete the entry at the same TCAM index from VCAP ES0 instead of IS1 or IS2. The code was not always like this. vcap_entry_set() used to simply be is2_entry_set(), and then, the logic used to work. Restore the functionality by populating the block_id of the del_filter based on the VCAP block of the filter that we're deleting. This makes vcap_entry_set() know what to do. Fixes: 1397a2eb ("net: mscc: ocelot: create TCAM skeleton from tc filter chains") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Tariq Toukan authored
[ Upstream commit 85db6352 ] The find_next_netdev_feature() macro gets the "remaining length", not bit index. Passing "bit - 1" for the following iteration is wrong as it skips the adjacent bit. Pass "bit" instead. Fixes: 3b89ea9c ("net: Fix for_each_netdev_feature on Big endian") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Link: https://lore.kernel.org/r/20220504080914.1918-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Manikanta Pubbisetty authored
[ Upstream commit 86af062f ] Currently MBSSID parameters in struct ieee80211_bss_conf are not reset upon connection. This could be problematic with some drivers in a scenario where the device first connects to a non-transmit BSS and then connects to a transmit BSS of a Multi BSS AP. The MBSSID parameters which are set after connecting to a non-transmit BSS will not be reset and the same parameters will be passed on to the driver during the subsequent connection to a transmit BSS of a Multi BSS AP. For example, firmware running on the ath11k device uses the Multi BSS data for tracking the beacon of a non-transmit BSS and reports the driver when there is a beacon miss. If we do not reset the MBSSID parameters during the subsequent connection to a transmit BSS, then the driver would have wrong MBSSID data and FW would be looking for an incorrect BSSID in the MBSSID beacon of a Multi BSS AP and reports beacon loss leading to an unstable connection. Reset the MBSSID parameters upon every connection to solve this problem. Fixes: 78ac51f8 ("mac80211: support multi-bssid") Signed-off-by: Manikanta Pubbisetty <quic_mpubbise@quicinc.com> Link: https://lore.kernel.org/r/20220428052744.27040-1-quic_mpubbise@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Camel Guo authored
[ Upstream commit 3481551f ] This driver doesn't have of_match_table. This makes the kernel module tmp401.ko lack alias patterns (e.g: of:N*T*Cti,tmp411) to match DT node of the supported devices hence this kernel module will not be automatically loaded. After adding of_match_table to this driver, the folllowing alias will be added into tmp401.ko. $ modinfo drivers/hwmon/tmp401.ko filename: drivers/hwmon/tmp401.ko ...... author: Hans de Goede <hdegoede@redhat.com> alias: of:N*T*Cti,tmp435C* alias: of:N*T*Cti,tmp435 alias: of:N*T*Cti,tmp432C* alias: of:N*T*Cti,tmp432 alias: of:N*T*Cti,tmp431C* alias: of:N*T*Cti,tmp431 alias: of:N*T*Cti,tmp411C* alias: of:N*T*Cti,tmp411 alias: of:N*T*Cti,tmp401C* alias: of:N*T*Cti,tmp401 ...... Fixes: af503716 ("i2c: core: report OF style module alias for devices registered via OF") Signed-off-by: Camel Guo <camel.guo@axis.com> Link: https://lore.kernel.org/r/20220503114333.456476-1-camel.guo@axis.com Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Guenter Roeck authored
[ Upstream commit 7635a1ad ] In Chrome OS, a large number of crashes is observed due to corrupted timer lists. Steven Rostedt pointed out that this usually happens when a timer is freed while still active, and that the problem is often triggered by code calling del_timer() instead of del_timer_sync() just before freeing. Steven also identified the iwlwifi driver as one of the possible culprits since it does exactly that. Reported-by: Steven Rostedt <rostedt@goodmis.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Johannes Berg <johannes.berg@intel.com> Cc: Gregory Greenman <gregory.greenman@intel.com> Fixes: 60e8abd9 ("iwlwifi: dbg_ini: add periodic trigger new API support") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Gregory Greenman <gregory.greenman@intel.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # Linux v5.17.3-rc1 and Debian LLVM-14 Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20220411154210.1870008-1-linux@roeck-us.net Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Sven Eckelmann authored
[ Upstream commit a063f2fb ] The receiving interface might have used GRO to receive more fragments than MAX_SKB_FRAGS fragments. In this case, these will not be stored in skb_shinfo(skb)->frags but merged into the frag list. batman-adv relies on the function skb_split to split packets up into multiple smaller packets which are not larger than the MTU on the outgoing interface. But this function cannot handle frag_list entries and is only operating on skb_shinfo(skb)->frags. If it is still trying to split such an skb and xmit'ing it on an interface without support for NETIF_F_FRAGLIST, then validate_xmit_skb() will try to linearize it. But this fails due to inconsistent information. And __pskb_pull_tail will trigger a BUG_ON after skb_copy_bits() returns an error. In case of entries in frag_list, just linearize the skb before operating on it with skb_split(). Reported-by: Felix Kaechele <felix@kaechele.ca> Fixes: c6c8fea2 ("net: Add batman-adv meshing protocol") Signed-off-by: Sven Eckelmann <sven@narfation.org> Tested-by: Felix Kaechele <felix@kaechele.ca> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
- May 16, 2022
-
-
Greg Kroah-Hartman authored
Link: https://lore.kernel.org/r/20220513142229.874949670@linuxfoundation.org Tested-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Shuah Khan <skhan@linuxfoundation.org> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Fox Chen <foxhlchen@gmail.com> Tested-by: Ron Economos <re@w6rz.net> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Peter Xu authored
commit 7196040e upstream. Patch series "mm/gup: some cleanups", v5. This patch (of 5): Alex reported invalid page pointer returned with pin_user_pages_remote() from vfio after upstream commit 4b6c33b3 ("vfio/type1: Prepare for batched pinning with struct vfio_batch"). It turns out that it's not the fault of the vfio commit; however after vfio switches to a full page buffer to store the page pointers it starts to expose the problem easier. The problem is for VM_PFNMAP vmas we should normally fail with an -EFAULT then vfio will carry on to handle the MMIO regions. However when the bug triggered, follow_page_mask() returned -EEXIST for such a page, which will jump over the current page, leaving that entry in **pages untouched. However the caller is not aware of it, hence the caller will reference the page as usual even if the pointer data can be anything. We had that -EEXIST logic since commit 1027e443 ("mm: make GUP handle pfn mapping unless FOLL_GET is requested") which seems very reasonable. It could be that when we reworked GUP with FOLL_PIN we could have overlooked that special path in commit 3faa52c0 ("mm/gup: track FOLL_PIN pages"), even if that commit rightfully touched up follow_devmap_pud() on checking FOLL_PIN when it needs to return an -EEXIST. Attaching the Fixes to the FOLL_PIN rework commit, as it happened later than 1027e443. [jhubbard@nvidia.com: added some tags, removed a reference to an out of tree module.] Link: https://lkml.kernel.org/r/20220207062213.235127-1-jhubbard@nvidia.com Link: https://lkml.kernel.org/r/20220204020010.68930-1-jhubbard@nvidia.com Link: https://lkml.kernel.org/r/20220204020010.68930-2-jhubbard@nvidia.com Fixes: 3faa52c0 ("mm/gup: track FOLL_PIN pages") Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reported-by: Alex Williamson <alex.williamson@redhat.com> Debugged-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: David Hildenbrand <david@redhat.com> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Miaohe Lin authored
commit 5c2a956c upstream. user_shm_lock forgets to set allowed to 0 when get_ucounts fails. So the later user_shm_unlock might do the extra dec_rlimit_ucounts. Fix this by resetting allowed to 0. Link: https://lkml.kernel.org/r/20220310132417.41189-1-linmiaohe@huawei.com Fixes: d7c9e99a ("Reimplement RLIMIT_MEMLOCK on top of ucounts") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Hugh Dickins <hughd@google.com> Cc: Herbert van den Bergh <herbert.van.den.bergh@oracle.com> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Naoya Horiguchi authored
commit 046545a6 upstream. When an uncorrected memory error is consumed there is a race between the CMCI from the memory controller reporting an uncorrected error with a UCNA signature, and the core reporting and SRAR signature machine check when the data is about to be consumed. If the CMCI wins that race, the page is marked poisoned when uc_decode_notifier() calls memory_failure() and the machine check processing code finds the page already poisoned. It calls kill_accessing_process() to make sure a SIGBUS is sent. But returns the wrong error code. Console log looks like this: mce: Uncorrected hardware memory error in user-access at 3710b3400 Memory failure: 0x3710b3: recovery action for dirty LRU page: Recovered Memory failure: 0x3710b3: already hardware poisoned Memory failure: 0x3710b3: Sending SIGBUS to einj_mem_uc:361438 due to hardware memory corruption mce: Memory error not recovered kill_accessing_process() is supposed to return -EHWPOISON to notify that SIGBUS is already set to the process and kill_me_maybe() doesn't have to send it again. But current code simply fails to do this, so fix it to make sure to work as intended. This change avoids the noise message "Memory error not recovered" and skips duplicate SIGBUSs. [tony.luck@intel.com: reword some parts of commit message] Link: https://lkml.kernel.org/r/20220113231117.1021405-1-naoya.horiguchi@linux.dev Fixes: a3f5d80e ("mm,hwpoison: send SIGBUS with error virutal address") Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reported-by: Youquan Song <youquan.song@intel.com> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Muchun Song authored
commit 7c25a0b8 upstream. userfaultfd calls mcopy_atomic_pte() and __mcopy_atomic() which do not do any cache flushing for the target page. Then the target page will be mapped to the user space with a different address (user address), which might have an alias issue with the kernel address used to copy the data from the user to. Fix this by insert flush_dcache_page() after copy_from_user() succeeds. Link: https://lkml.kernel.org/r/20220210123058.79206-7-songmuchun@bytedance.com Fixes: b6ebaedb ("userfaultfd: avoid mmap_sem read recursion in mcopy_atomic") Fixes: c1a4de99 ("userfaultfd: mcopy_atomic|mfill_zeropage: UFFDIO_COPY|UFFDIO_ZEROPAGE preparation") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Fam Zheng <fam.zheng@bytedance.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Lars Persson <lars.persson@axis.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Peter Xu <peterx@redhat.com> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Muchun Song authored
commit 19b482c2 upstream. userfaultfd calls shmem_mfill_atomic_pte() which does not do any cache flushing for the target page. Then the target page will be mapped to the user space with a different address (user address), which might have an alias issue with the kernel address used to copy the data from the user to. Insert flush_dcache_page() in non-zero-page case. And replace clear_highpage() with clear_user_highpage() which already considers the cache maintenance. Link: https://lkml.kernel.org/r/20220210123058.79206-6-songmuchun@bytedance.com Fixes: 8d103963 ("userfaultfd: shmem: add shmem_mfill_zeropage_pte for userfaultfd support") Fixes: 4c27fe4c ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Fam Zheng <fam.zheng@bytedance.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Lars Persson <lars.persson@axis.com> Cc: Peter Xu <peterx@redhat.com> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Muchun Song authored
commit e763243c upstream. userfaultfd calls copy_huge_page_from_user() which does not do any cache flushing for the target page. Then the target page will be mapped to the user space with a different address (user address), which might have an alias issue with the kernel address used to copy the data from the user to. Fix this issue by flushing dcache in copy_huge_page_from_user(). Link: https://lkml.kernel.org/r/20220210123058.79206-4-songmuchun@bytedance.com Fixes: fa4d75c1 ("userfaultfd: hugetlbfs: add copy_huge_page_from_user for hugetlb userfaultfd support") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Fam Zheng <fam.zheng@bytedance.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Lars Persson <lars.persson@axis.com> Cc: Peter Xu <peterx@redhat.com> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Muchun Song authored
commit 2771739a upstream. The D-cache maintenance inside move_to_new_page() only consider one page, there is still D-cache maintenance issue for tail pages of compound page (e.g. THP or HugeTLB). THP migration is only enabled on x86_64, ARM64 and powerpc, while powerpc and arm64 need to maintain the consistency between I-Cache and D-Cache, which depends on flush_dcache_page() to maintain the consistency between I-Cache and D-Cache. But there is no issues on arm64 and powerpc since they already considers the compound page cache flushing in their icache flush function. HugeTLB migration is enabled on arm, arm64, mips, parisc, powerpc, riscv, s390 and sh, while arm has handled the compound page cache flush in flush_dcache_page(), but most others do not. In theory, the issue exists on many architectures. Fix this by not using flush_dcache_folio() since it is not backportable. Link: https://lkml.kernel.org/r/20220210123058.79206-3-songmuchun@bytedance.com Fixes: 290408d4 ("hugetlb: hugepage migration core") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Fam Zheng <fam.zheng@bytedance.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Lars Persson <lars.persson@axis.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Peter Xu <peterx@redhat.com> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jan Kara authored
commit c1ad35dd upstream. udf_write_fi() uses lengthOfImpUse of the entry it is writing to. However this field has not yet been initialized so it either contains completely bogus value or value from last directory entry at that place. In either case this is wrong and can lead to filesystem corruption or kernel crashes. Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com> CC: stable@vger.kernel.org Fixes: 979a6e28 ("udf: Get rid of 0-length arrays in struct fileIdentDesc") Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Gleb Fotengauer-Malinovskiy authored
commit a36e07df upstream. The definition of RFKILL_IOCTL_MAX_SIZE introduced by commit 54f586a9 ("rfkill: make new event layout opt-in") is unusable since it is based on RFKILL_IOC_EXT_SIZE which has not been defined. Fix that by replacing the undefined constant with the constant which is intended to be used in this definition. Fixes: 54f586a9 ("rfkill: make new event layout opt-in") Cc: stable@vger.kernel.org # 5.11+ Signed-off-by: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org> Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Link: https://lore.kernel.org/r/20220506172454.120319-1-glebfm@altlinux.org [add commit message provided later by Dmitry] Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Itay Iellin authored
commit 103a2f32 upstream. Set a size limit of 8 bytes of the written buffer to "hdev->name" including the terminating null byte, as the size of "hdev->name" is 8 bytes. If an id value which is greater than 9999 is allocated, then the "snprintf(hdev->name, sizeof(hdev->name), "hci%d", id)" function call would lead to a truncation of the id value in decimal notation. Set an explicit maximum id parameter in the id allocation function call. The id allocation function defines the maximum allocated id value as the maximum id parameter value minus one. Therefore, HCI_MAX_ID is defined as 10000. Signed-off-by: Itay Iellin <ieitayie@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Peter Zijlstra authored
[ Upstream commit 7a53f408 ] Since not all compilers have a function attribute to disable KCOV instrumentation, objtool can rewrite KCOV instrumentation in noinstr functions as per commit: f56dae88 ("objtool: Handle __sanitize_cov*() tail calls") However, this has subtle interaction with the SLS validation from commit: 1cc1e4c8 ("objtool: Add straight-line-speculation validation") In that when a tail-call instrucion is replaced with a RET an additional INT3 instruction is also written, but is not represented in the decoded instruction stream. This then leads to false positive missing INT3 objtool warnings in noinstr code. Instead of adding additional struct instruction objects, mark the RET instruction with retpoline_safe to suppress the warning (since we know there really is an INT3). Fixes: 1cc1e4c8 ("objtool: Add straight-line-speculation validation") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220323230712.GA8939@worktop.programming.kicks-ass.net Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Peter Zijlstra authored
[ Upstream commit 7ed7aa4d ] Due to being a perl generated asm file, it got missed by the mass convertion script. arch/x86/crypto/poly1305-x86_64-cryptogams.o: warning: objtool: poly1305_init_x86_64()+0x3a: missing int3 after ret arch/x86/crypto/poly1305-x86_64-cryptogams.o: warning: objtool: poly1305_blocks_x86_64()+0xf2: missing int3 after ret arch/x86/crypto/poly1305-x86_64-cryptogams.o: warning: objtool: poly1305_emit_x86_64()+0x37: missing int3 after ret arch/x86/crypto/poly1305-x86_64-cryptogams.o: warning: objtool: __poly1305_block()+0x6d: missing int3 after ret arch/x86/crypto/poly1305-x86_64-cryptogams.o: warning: objtool: __poly1305_init_avx()+0x1e8: missing int3 after ret arch/x86/crypto/poly1305-x86_64-cryptogams.o: warning: objtool: poly1305_blocks_avx()+0x18a: missing int3 after ret arch/x86/crypto/poly1305-x86_64-cryptogams.o: warning: objtool: poly1305_blocks_avx()+0xaf8: missing int3 after ret arch/x86/crypto/poly1305-x86_64-cryptogams.o: warning: objtool: poly1305_emit_avx()+0x99: missing int3 after ret arch/x86/crypto/poly1305-x86_64-cryptogams.o: warning: objtool: poly1305_blocks_avx2()+0x18a: missing int3 after ret arch/x86/crypto/poly1305-x86_64-cryptogams.o: warning: objtool: poly1305_blocks_avx2()+0x776: missing int3 after ret arch/x86/crypto/poly1305-x86_64-cryptogams.o: warning: objtool: poly1305_blocks_avx512()+0x18a: missing int3 after ret arch/x86/crypto/poly1305-x86_64-cryptogams.o: warning: objtool: poly1305_blocks_avx512()+0x796: missing int3 after ret arch/x86/crypto/poly1305-x86_64-cryptogams.o: warning: objtool: poly1305_blocks_avx512()+0x10bd: missing int3 after ret Fixes: f94909ce ("x86: Prepare asm files for straight-line-speculation") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Borislav Petkov authored
[ Upstream commit fe83f5ea ] The commit in Fixes started adding INT3 after RETs as a mitigation against straight-line speculation. The fastop SETcc implementation in kvm's insn emulator uses macro magic to generate all possible SETcc functions and to jump to them when emulating the respective instruction. However, it hardcodes the size and alignment of those functions to 4: a three-byte SETcc insn and a single-byte RET. BUT, with SLS, there's an INT3 that gets slapped after the RET, which brings the whole scheme out of alignment: 15: 0f 90 c0 seto %al 18: c3 ret 19: cc int3 1a: 0f 1f 00 nopl (%rax) 1d: 0f 91 c0 setno %al 20: c3 ret 21: cc int3 22: 0f 1f 00 nopl (%rax) 25: 0f 92 c0 setb %al 28: c3 ret 29: cc int3 and this explodes like this: int3: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 2435 Comm: qemu-system-x86 Not tainted 5.17.0-rc8-sls #1 Hardware name: Dell Inc. Precision WorkStation T3400 /0TP412, BIOS A14 04/30/2012 RIP: 0010:setc+0x5/0x8 [kvm] Code: 00 00 0f 1f 00 0f b6 05 43 24 06 00 c3 cc 0f 1f 80 00 00 00 00 0f 90 c0 c3 cc 0f \ 1f 00 0f 91 c0 c3 cc 0f 1f 00 0f 92 c0 c3 cc <0f> 1f 00 0f 93 c0 c3 cc 0f 1f 00 \ 0f 94 c0 c3 cc 0f 1f 00 0f 95 c0 Call Trace: <TASK> ? x86_emulate_insn [kvm] ? x86_emulate_instruction [kvm] ? vmx_handle_exit [kvm_intel] ? kvm_arch_vcpu_ioctl_run [kvm] ? kvm_vcpu_ioctl [kvm] ? __x64_sys_ioctl ? do_syscall_64 ? entry_SYSCALL_64_after_hwframe </TASK> Raise the alignment value when SLS is enabled and use a macro for that instead of hard-coding naked numbers. Fixes: e463a09a ("x86: Add straight-line-speculation mitigation") Reported-by: Jamie Heilman <jamie@audible.transient.net> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Jamie Heilman <jamie@audible.transient.net> Link: https://lore.kernel.org/r/YjGzJwjrvxg5YZ0Z@audible.transient.net [Add a comment and a bit of safety checking, since this is going to be changed again for IBT support. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Arnaldo Carvalho de Melo authored
[ Upstream commit 35cb8c71 ] To bring in the change made in this cset: f94909ce ("x86: Prepare asm files for straight-line-speculation") It silences these perf tools build warnings, no change in the tools: Warning: Kernel ABI header at 'tools/arch/x86/lib/memcpy_64.S' differs from latest version at 'arch/x86/lib/memcpy_64.S' diff -u tools/arch/x86/lib/memcpy_64.S arch/x86/lib/memcpy_64.S Warning: Kernel ABI header at 'tools/arch/x86/lib/memset_64.S' differs from latest version at 'arch/x86/lib/memset_64.S' diff -u tools/arch/x86/lib/memset_64.S arch/x86/lib/memset_64.S The code generated was checked before and after using 'objdump -d /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o', no changes. Cc: Borislav Petkov <bp@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Peter Zijlstra authored
[ Upstream commit e463a09a ] Make use of an upcoming GCC feature to mitigate straight-line-speculation for x86: https://gcc.gnu.org/g:53a643f8568067d7700a9f2facc8ba39974973d3 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952 https://bugs.llvm.org/show_bug.cgi?id=52323 It's built tested on x86_64-allyesconfig using GCC-12 and GCC-11. Maintenance overhead of this should be fairly low due to objtool validation. Size overhead of all these additional int3 instructions comes to: text data bss dec hex filename 22267751 6933356 2011368 31212475 1dc43bb defconfig-build/vmlinux 22804126 6933356 1470696 31208178 1dc32f2 defconfig-build/vmlinux.sls Or roughly 2.4% additional text. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211204134908.140103474@infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Masahiro Yamada authored
[ Upstream commit 8f0c32c7 ] Commit b1a1a1a0 ("kbuild: lto: postpone objtool") moved objtool_args to Makefile.lib, so the arguments can be used in Makefile.modfinal as well as Makefile.build. With commit 850ded46 ("kbuild: Fix TRIM_UNUSED_KSYMS with LTO_CLANG"), module LTO linking came back to scripts/Makefile.build again. So, there is no more reason to keep objtool_args in a separate file. Get it back to the original place, close to the objtool command. Remove the stale comment too. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Peter Zijlstra authored
[ Upstream commit 26c44b77 ] Currently, text_poke_bp() is very strict to only allow patching a single instruction; however with straight-line-speculation it will be required to patch: ret; int3, which is two instructions. As such, relax the constraints a little to allow int3 padding for all instructions that do not imply the execution of the next instruction, ie: RET, JMP.d8 and JMP.d32. While there, rename the text_poke_loc::rel32 field to ::disp. Note: this fills up the text_poke_loc structure which is now a round 16 bytes big. [ bp: Put comments ontop instead of on the side. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211204134908.082342723@infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Peter Zijlstra authored
[ Upstream commit 1cc1e4c8 ] Teach objtool to validate the straight-line-speculation constraints: - speculation trap after indirect calls - speculation trap after RET Notable: when an instruction is annotated RETPOLINE_SAFE, indicating speculation isn't a problem, also don't care about sls for that instruction. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211204134908.023037659@infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Peter Zijlstra authored
[ Upstream commit b17c2baa ] Replace all ret/retq instructions with ASM_RET in preparation of making it more than a single instruction. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211204134907.964635458@infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Peter Zijlstra authored
[ Upstream commit f94909ce ] Replace all ret/retq instructions with RET in preparation of making RET a macro. Since AS is case insensitive it's a big no-op without RET defined. find arch/x86/ -name \*.S | while read file do sed -i 's/\<ret[q]*\>/RET/' $file done Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211204134907.905503893@infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Peter Zijlstra authored
[ Upstream commit 22da5a07 ] Principally, in order to get rid of #define RET in this code to make place for a new RET, but also to clarify the code, rename a bunch of things: s/UNLOCK/IRQ_RESTORE/ s/LOCK/IRQ_SAVE/ s/BEGIN/BEGIN_IRQ_SAVE/ s/\<RET\>/RET_IRQ_RESTORE/ s/RET_ENDP/\tRET_IRQ_RESTORE\rENDP/ which then leaves RET unused so it can be removed. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211204134907.841623970@infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- May 12, 2022
-
-
Greg Kroah-Hartman authored
Link: https://lore.kernel.org/r/20220510130740.392653815@linuxfoundation.org Tested-by: Fox Chen <foxhlchen@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Slade Watkins <slade@sladewatkins.com> Tested-by: Shuah Khan <skhan@linuxfoundation.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Ron Economos <re@w6rz.net> Tested-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Marek Behún authored
commit 92f4ffec upstream. Update the comment about what happens when link goes down after we have checked for link-up. If a PIO request is done while link-down, we have a serious problem. Link: https://lore.kernel.org/r/20220110015018.26359-23-kabel@kernel.org Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Marek Behún authored
commit 0c36ab43 upstream. This function is now always used in driver remove method, drop the __maybe_unused attribute. Link: https://lore.kernel.org/r/20220110015018.26359-22-kabel@kernel.org Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pali Rohár authored
commit befa7100 upstream. By default, all Legacy INTx interrupts are masked, so there is no need to mask this interrupt during irq_map() callback. Link: https://lore.kernel.org/r/20220110015018.26359-21-kabel@kernel.org Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pali Rohár authored
commit b08e5b53 upstream. Callback for irq_mask_ack() is the same as for irq_mask(). As there is no special handling for irq_ack(), there is no need to define irq_mask_ack() too. Link: https://lore.kernel.org/r/20220110015018.26359-20-kabel@kernel.org Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pali Rohár authored
commit 815bc313 upstream. Emulated root bridge currently provides only one Legacy INTA interrupt which is used for reporting PCIe PME and ERR events and handled by kernel PCIe PME and AER drivers. Aardvark HW reports these PME and ERR events separately, so there is no need to mix real INTA interrupt and emulated INTA interrupt for PCIe PME and AER drivers. Register a new advk-RP (as in Root Port) irq chip and a new irq domain for emulated root bridge and use this new separate irq domain for providing INTA interrupt from emulated root bridge for PME and ERR events. The real INTA interrupt from real devices is now separate. A custom map_irq callback function on PCI host bridge structure is used to allocate IRQ mapping for emulated root bridge from new irq domain. Original callback of_irq_parse_and_map_pci() is used for all other devices as before. Link: https://lore.kernel.org/r/20220110015018.26359-19-kabel@kernel.org Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pali Rohár authored
commit 273ddd86 upstream. Enable aardvark PME interrupt unconditionally by unmasking it and read PME requester ID to emulated bridge config space immediately after receiving interrupt. PME requester ID is stored in the PCIE_MSG_LOG_REG register, which contains the last inbound message. So when new inbound message is received by HW (including non-PM), the content in PCIE_MSG_LOG_REG register is replaced by a new value. PCIe specification mandates that subsequent PMEs are kept pending until the PME Status Register bit is cleared by software by writing a 1b. Support for masking/unmasking PME interrupt on emulated bridge via PCI_EXP_RTCTL_PMEIE bit is now implemented only in emulated bridge config space, to ensure that we do not miss any aardvark PME interrupt. Reading of PCI_EXP_RTCAP and PCI_EXP_RTSTA registers is simplified as final value is now always stored into emulated bridge config space by the interrupt handler, so there is no need to implement support for these registers in read_pcie callback. Clearing of W1C bit PCI_EXP_RTSTA_PME is now also simplified as it is done by pci-bridge-emul.c code for emulated bridge config space. So there is no need to implement support for clearing this bit in write_pcie callback. Link: https://lore.kernel.org/r/20220110015018.26359-18-kabel@kernel.org Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pali Rohár authored
commit 0fc75d87 upstream. Currently enabling PCI_EXP_RTSTA_PME bit in PCI_EXP_RTCTL register does nothing. This is because PCIe PME driver expects to receive PCIe interrupt defined in PCI_EXP_FLAGS_IRQ register, but aardvark hardware does not trigger PCIe INTx/MSI interrupt for PME event, rather it triggers custom aardvark interrupt which this driver is not processing yet. Fix this issue by handling PME interrupt in advk_pcie_handle_int() and chaining it to PCIe interrupt 0 with generic_handle_domain_irq() (since aardvark sets PCI_EXP_FLAGS_IRQ to zero). With this change PCIe PME driver finally starts receiving PME interrupt. Link: https://lore.kernel.org/r/20220110015018.26359-17-kabel@kernel.org Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pali Rohár authored
commit 7122bcb3 upstream. To optimize advk_pci_bridge_emul_pcie_conf_write() code, touch PCIE_ISR0_REG and PCIE_ISR0_MASK_REG registers only when it is really needed, when processing PCI_EXP_RTCTL_PMEIE and PCI_EXP_RTSTA_PME bits. Link: https://lore.kernel.org/r/20220110015018.26359-16-kabel@kernel.org Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-