Skip to content
  1. Aug 30, 2023
    • Sven Eckelmann's avatar
      batman-adv: Hold rtnl lock during MTU update via netlink · 82bb5f8a
      Sven Eckelmann authored
      commit 987aae75
      
       upstream.
      
      The automatic recalculation of the maximum allowed MTU is usually triggered
      by code sections which are already rtnl lock protected by callers outside
      of batman-adv. But when the fragmentation setting is changed via
      batman-adv's own batadv genl family, then the rtnl lock is not yet taken.
      
      But dev_set_mtu requires that the caller holds the rtnl lock because it
      uses netdevice notifiers. And this code will then fail the check for this
      lock:
      
        RTNL: assertion failed at net/core/dev.c (1953)
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatar <syzbot+f8812454d9b3ac00d282@syzkaller.appspotmail.com>
      Fixes: c6a953cc
      
       ("batman-adv: Trigger events for auto adjusted MTU")
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20230821-batadv-missing-mtu-rtnl-lock-v1-1-1c5a7bfe861e@narfation.org
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      82bb5f8a
    • Remi Pommarel's avatar
      batman-adv: Fix batadv_v_ogm_aggr_send memory leak · cb1f73e6
      Remi Pommarel authored
      commit 421d467d upstream.
      
      When batadv_v_ogm_aggr_send is called for an inactive interface, the skb
      is silently dropped by batadv_v_ogm_send_to_if() but never freed causing
      the following memory leak:
      
        unreferenced object 0xffff00000c164800 (size 512):
          comm "kworker/u8:1", pid 2648, jiffies 4295122303 (age 97.656s)
          hex dump (first 32 bytes):
            00 80 af 09 00 00 ff ff e1 09 00 00 75 01 60 83  ............u.`.
            1f 00 00 00 b8 00 00 00 15 00 05 00 da e3 d3 64  ...............d
          backtrace:
            [<0000000007ad20f6>] __kmalloc_track_caller+0x1a8/0x310
            [<00000000d1029e55>] kmalloc_reserve.constprop.0+0x70/0x13c
            [<000000008b9d4183>] __alloc_skb+0xec/0x1fc
            [<00000000c7af5051>] __netdev_alloc_skb+0x48/0x23c
            [<00000000642ee5f5>] batadv_v_ogm_aggr_send+0x50/0x36c
            [<0000000088660bd7>] batadv_v_ogm_aggr_work+0x24/0x40
            [<0000000042fc2606>] process_one_work+0x3b0/0x610
            [<000000002f2a0b1c>] worker_thread+0xa0/0x690
            [<0000000059fae5d4>] kthread+0x1fc/0x210
            [<000000000c587d3a>] ret_from_fork+0x10/0x20
      
      Free the skb in that case to fix this leak.
      
      Cc: stable@vger.kernel.org
      Fixes: 0da00359
      
       ("batman-adv: OGMv2 - add basic infrastructure")
      Signed-off-by: default avatarRemi Pommarel <repk@triplefau.lt>
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarSimon Wunderlich <sw@simonwunderlich.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cb1f73e6
    • Remi Pommarel's avatar
      batman-adv: Fix TT global entry leak when client roamed back · f1bead97
      Remi Pommarel authored
      commit d25ddb7e upstream.
      
      When a client roamed back to a node before it got time to destroy the
      pending local entry (i.e. within the same originator interval) the old
      global one is directly removed from hash table and left as such.
      
      But because this entry had an extra reference taken at lookup (i.e using
      batadv_tt_global_hash_find) there is no way its memory will be reclaimed
      at any time causing the following memory leak:
      
        unreferenced object 0xffff0000073c8000 (size 18560):
          comm "softirq", pid 0, jiffies 4294907738 (age 228.644s)
          hex dump (first 32 bytes):
            06 31 ac 12 c7 7a 05 00 01 00 00 00 00 00 00 00  .1...z..........
            2c ad be 08 00 80 ff ff 6c b6 be 08 00 80 ff ff  ,.......l.......
          backtrace:
            [<00000000ee6e0ffa>] kmem_cache_alloc+0x1b4/0x300
            [<000000000ff2fdbc>] batadv_tt_global_add+0x700/0xe20
            [<00000000443897c7>] _batadv_tt_update_changes+0x21c/0x790
            [<000000005dd90463>] batadv_tt_update_changes+0x3c/0x110
            [<00000000a2d7fc57>] batadv_tt_tvlv_unicast_handler_v1+0xafc/0xe10
            [<0000000011793f2a>] batadv_tvlv_containers_process+0x168/0x2b0
            [<00000000b7cbe2ef>] batadv_recv_unicast_tvlv+0xec/0x1f4
            [<0000000042aef1d8>] batadv_batman_skb_recv+0x25c/0x3a0
            [<00000000bbd8b0a2>] __netif_receive_skb_core.isra.0+0x7a8/0xe90
            [<000000004033d428>] __netif_receive_skb_one_core+0x64/0x74
            [<000000000f39a009>] __netif_receive_skb+0x48/0xe0
            [<00000000f2cd8888>] process_backlog+0x174/0x344
            [<00000000507d6564>] __napi_poll+0x58/0x1f4
            [<00000000b64ef9eb>] net_rx_action+0x504/0x590
            [<00000000056fa5e4>] _stext+0x1b8/0x418
            [<00000000878879d6>] run_ksoftirqd+0x74/0xa4
        unreferenced object 0xffff00000bae1a80 (size 56):
          comm "softirq", pid 0, jiffies 4294910888 (age 216.092s)
          hex dump (first 32 bytes):
            00 78 b1 0b 00 00 ff ff 0d 50 00 00 00 00 00 00  .x.......P......
            00 00 00 00 00 00 00 00 50 c8 3c 07 00 00 ff ff  ........P.<.....
          backtrace:
            [<00000000ee6e0ffa>] kmem_cache_alloc+0x1b4/0x300
            [<00000000d9aaa49e>] batadv_tt_global_add+0x53c/0xe20
            [<00000000443897c7>] _batadv_tt_update_changes+0x21c/0x790
            [<000000005dd90463>] batadv_tt_update_changes+0x3c/0x110
            [<00000000a2d7fc57>] batadv_tt_tvlv_unicast_handler_v1+0xafc/0xe10
            [<0000000011793f2a>] batadv_tvlv_containers_process+0x168/0x2b0
            [<00000000b7cbe2ef>] batadv_recv_unicast_tvlv+0xec/0x1f4
            [<0000000042aef1d8>] batadv_batman_skb_recv+0x25c/0x3a0
            [<00000000bbd8b0a2>] __netif_receive_skb_core.isra.0+0x7a8/0xe90
            [<000000004033d428>] __netif_receive_skb_one_core+0x64/0x74
            [<000000000f39a009>] __netif_receive_skb+0x48/0xe0
            [<00000000f2cd8888>] process_backlog+0x174/0x344
            [<00000000507d6564>] __napi_poll+0x58/0x1f4
            [<00000000b64ef9eb>] net_rx_action+0x504/0x590
            [<00000000056fa5e4>] _stext+0x1b8/0x418
            [<00000000878879d6>] run_ksoftirqd+0x74/0xa4
      
      Releasing the extra reference from batadv_tt_global_hash_find even at
      roam back when batadv_tt_global_free is called fixes this memory leak.
      
      Cc: stable@vger.kernel.org
      Fixes: 068ee6e2
      
       ("batman-adv: roaming handling mechanism redesign")
      Signed-off-by: default avatarRemi Pommarel <repk@triplefau.lt>
      Signed-off-by; Sven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarSimon Wunderlich <sw@simonwunderlich.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f1bead97
    • Remi Pommarel's avatar
      batman-adv: Do not get eth header before batadv_check_management_packet · fc9b87d8
      Remi Pommarel authored
      commit eac27a41 upstream.
      
      If received skb in batadv_v_elp_packet_recv or batadv_v_ogm_packet_recv
      is either cloned or non linearized then its data buffer will be
      reallocated by batadv_check_management_packet when skb_cow or
      skb_linearize get called. Thus geting ethernet header address inside
      skb data buffer before batadv_check_management_packet had any chance to
      reallocate it could lead to the following kernel panic:
      
        Unable to handle kernel paging request at virtual address ffffff8020ab069a
        Mem abort info:
          ESR = 0x96000007
          EC = 0x25: DABT (current EL), IL = 32 bits
          SET = 0, FnV = 0
          EA = 0, S1PTW = 0
          FSC = 0x07: level 3 translation fault
        Data abort info:
          ISV = 0, ISS = 0x00000007
          CM = 0, WnR = 0
        swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000040f45000
        [ffffff8020ab069a] pgd=180000007fffa003, p4d=180000007fffa003, pud=180000007fffa003, pmd=180000007fefe003, pte=0068000020ab0706
        Internal error: Oops: 96000007 [#1] SMP
        Modules linked in: ahci_mvebu libahci_platform libahci dvb_usb_af9035 dvb_usb_dib0700 dib0070 dib7000m dibx000_common ath11k_pci ath10k_pci ath10k_core mwl8k_new nf_nat_sip nf_conntrack_sip xhci_plat_hcd xhci_hcd nf_nat_pptp nf_conntrack_pptp at24 sbsa_gwdt
        CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.15.42-00066-g3242268d425c-dirty #550
        Hardware name: A8k (DT)
        pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
        pc : batadv_is_my_mac+0x60/0xc0
        lr : batadv_v_ogm_packet_recv+0x98/0x5d0
        sp : ffffff8000183820
        x29: ffffff8000183820 x28: 0000000000000001 x27: ffffff8014f9af00
        x26: 0000000000000000 x25: 0000000000000543 x24: 0000000000000003
        x23: ffffff8020ab0580 x22: 0000000000000110 x21: ffffff80168ae880
        x20: 0000000000000000 x19: ffffff800b561000 x18: 0000000000000000
        x17: 0000000000000000 x16: 0000000000000000 x15: 00dc098924ae0032
        x14: 0f0405433e0054b0 x13: ffffffff00000080 x12: 0000004000000001
        x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
        x8 : 0000000000000000 x7 : ffffffc076dae000 x6 : ffffff8000183700
        x5 : ffffffc00955e698 x4 : ffffff80168ae000 x3 : ffffff80059cf000
        x2 : ffffff800b561000 x1 : ffffff8020ab0696 x0 : ffffff80168ae880
        Call trace:
         batadv_is_my_mac+0x60/0xc0
         batadv_v_ogm_packet_recv+0x98/0x5d0
         batadv_batman_skb_recv+0x1b8/0x244
         __netif_receive_skb_core.isra.0+0x440/0xc74
         __netif_receive_skb_one_core+0x14/0x20
         netif_receive_skb+0x68/0x140
         br_pass_frame_up+0x70/0x80
         br_handle_frame_finish+0x108/0x284
         br_handle_frame+0x190/0x250
         __netif_receive_skb_core.isra.0+0x240/0xc74
         __netif_receive_skb_list_core+0x6c/0x90
         netif_receive_skb_list_internal+0x1f4/0x310
         napi_complete_done+0x64/0x1d0
         gro_cell_poll+0x7c/0xa0
         __napi_poll+0x34/0x174
         net_rx_action+0xf8/0x2a0
         _stext+0x12c/0x2ac
         run_ksoftirqd+0x4c/0x7c
         smpboot_thread_fn+0x120/0x210
         kthread+0x140/0x150
         ret_from_fork+0x10/0x20
        Code: f9403844 eb03009f 54fffee1 f94
      
      Thus ethernet header address should only be fetched after
      batadv_check_management_packet has been called.
      
      Fixes: 0da00359
      
       ("batman-adv: OGMv2 - add basic infrastructure")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarRemi Pommarel <repk@triplefau.lt>
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarSimon Wunderlich <sw@simonwunderlich.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc9b87d8
    • Sven Eckelmann's avatar
      batman-adv: Don't increase MTU when set by user · ed1eb198
      Sven Eckelmann authored
      commit d8e42a2b upstream.
      
      If the user set an MTU value, it usually means that there are special
      requirements for the MTU. But if an interface gots activated, the MTU was
      always recalculated and then the user set value was overwritten.
      
      The only reason why this user set value has to be overwritten, is when the
      MTU has to be decreased because batman-adv is not able to transfer packets
      with the user specified size.
      
      Fixes: c6c8fea2
      
       ("net: Add batman-adv meshing protocol")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarSimon Wunderlich <sw@simonwunderlich.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed1eb198
    • Sven Eckelmann's avatar
      batman-adv: Trigger events for auto adjusted MTU · efef746c
      Sven Eckelmann authored
      commit c6a953cc upstream.
      
      If an interface changes the MTU, it is expected that an NETDEV_PRECHANGEMTU
      and NETDEV_CHANGEMTU notification events is triggered. This worked fine for
      .ndo_change_mtu based changes because core networking code took care of it.
      But for auto-adjustments after hard-interfaces changes, these events were
      simply missing.
      
      Due to this problem, non-batman-adv components weren't aware of MTU changes
      and thus couldn't perform their own tasks correctly.
      
      Fixes: c6c8fea2
      
       ("net: Add batman-adv meshing protocol")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarSimon Wunderlich <sw@simonwunderlich.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      efef746c
    • Christian Göttsche's avatar
      selinux: set next pointer before attaching to list · d6b64d71
      Christian Göttsche authored
      commit 70d91dc9 upstream.
      
      Set the next pointer in filename_trans_read_helper() before attaching
      the new node under construction to the list, otherwise garbage would be
      dereferenced on subsequent failure during cleanup in the out goto label.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 43005902
      
       ("selinux: implement new format of filename transitions")
      Signed-off-by: default avatarChristian Göttsche <cgzones@googlemail.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d6b64d71
    • Benjamin Coddington's avatar
      nfsd: Fix race to FREE_STATEID and cl_revoked · 36c5aecc
      Benjamin Coddington authored
      commit 3b816601 upstream.
      
      We have some reports of linux NFS clients that cannot satisfy a linux knfsd
      server that always sets SEQ4_STATUS_RECALLABLE_STATE_REVOKED even though
      those clients repeatedly walk all their known state using TEST_STATEID and
      receive NFS4_OK for all.
      
      Its possible for revoke_delegation() to set NFS4_REVOKED_DELEG_STID, then
      nfsd4_free_stateid() finds the delegation and returns NFS4_OK to
      FREE_STATEID.  Afterward, revoke_delegation() moves the same delegation to
      cl_revoked.  This would produce the observed client/server effect.
      
      Fix this by ensuring that the setting of sc_type to NFS4_REVOKED_DELEG_STID
      and move to cl_revoked happens within the same cl_lock.  This will allow
      nfsd4_free_stateid() to properly remove the delegation from cl_revoked.
      
      Link: https://bugzilla.redhat.com/show_bug.cgi?id=2217103
      Link: https://bugzilla.redhat.com/show_bug.cgi?id=2176575
      
      
      Signed-off-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Cc: stable@vger.kernel.org # v4.17+
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      36c5aecc
    • Trond Myklebust's avatar
      NFS: Fix a use after free in nfs_direct_join_group() · 96fb46ef
      Trond Myklebust authored
      commit be2fd156
      
       upstream.
      
      Be more careful when tearing down the subrequests of an O_DIRECT write
      as part of a retransmission.
      
      Reported-by: default avatarChris Mason <clm@fb.com>
      Fixes: ed5d588f
      
       ("NFS: Try to join page groups before an O_DIRECT retransmission")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      96fb46ef
    • Miaohe Lin's avatar
      mm: memory-failure: fix unexpected return value in soft_offline_page() · bdc544a8
      Miaohe Lin authored
      commit e2c1ab07 upstream.
      
      When page_handle_poison() fails to handle the hugepage or free page in
      retry path, soft_offline_page() will return 0 while -EBUSY is expected in
      this case.
      
      Consequently the user will think soft_offline_page succeeds while it in
      fact failed.  So the user will not try again later in this case.
      
      Link: https://lkml.kernel.org/r/20230627112808.1275241-1-linmiaohe@huawei.com
      Fixes: b94e0282
      
       ("mm,hwpoison: try to narrow window race for free pages")
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Acked-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bdc544a8
    • Alexandre Ghiti's avatar
      mm: add a call to flush_cache_vmap() in vmap_pfn() · 07fad410
      Alexandre Ghiti authored
      commit a50420c7 upstream.
      
      flush_cache_vmap() must be called after new vmalloc mappings are installed
      in the page table in order to allow architectures to make sure the new
      mapping is visible.
      
      It could lead to a panic since on some architectures (like powerpc),
      the page table walker could see the wrong pte value and trigger a
      spurious page fault that can not be resolved (see commit f1cb8f9b
      ("powerpc/64s/radix: avoid ptesync after set_pte and
      ptep_set_access_flags")).
      
      But actually the patch is aiming at riscv: the riscv specification
      allows the caching of invalid entries in the TLB, and since we recently
      removed the vmalloc page fault handling, we now need to emit a tlb
      shootdown whenever a new vmalloc mapping is emitted
      (https://lore.kernel.org/linux-riscv/20230725132246.817726-1-alexghiti@rivosinc.com/).
      That's a temporary solution, there are ways to avoid that :)
      
      Link: https://lkml.kernel.org/r/20230809164633.1556126-1-alexghiti@rivosinc.com
      Fixes: 3e9a9e25
      
       ("mm: add a vmap_pfn function")
      Reported-by: default avatarDylan Jhong <dylan@andestech.com>
      Closes: https://lore.kernel.org/linux-riscv/ZMytNY2J8iyjbPPy@atctrx.andestech.com/
      
      
      Signed-off-by: default avatarAlexandre Ghiti <alexghiti@rivosinc.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      Acked-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      Reviewed-by: default avatarDylan Jhong <dylan@andestech.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      07fad410
    • David Hildenbrand's avatar
      mm/gup: handle cont-PTE hugetlb pages correctly in gup_must_unshare() via GUP-fast · a8a60bc8
      David Hildenbrand authored
      commit 5805192c upstream.
      
      In contrast to most other GUP code, GUP-fast common page table walking
      code like gup_pte_range() also handles hugetlb pages.  But in contrast to
      other hugetlb page table walking code, it does not look at the hugetlb PTE
      abstraction whereby we have only a single logical hugetlb PTE per hugetlb
      page, even when using multiple cont-PTEs underneath -- which is for
      example what huge_ptep_get() abstracts.
      
      So when we have a hugetlb page that is mapped via cont-PTEs, GUP-fast
      might stumble over a PTE that does not map the head page of a hugetlb page
      -- not the first "head" PTE of such a cont mapping.
      
      Logically, the whole hugetlb page is mapped (entire_mapcount == 1), but we
      might end up calling gup_must_unshare() with a tail page of a hugetlb
      page.
      
      We only maintain a single PageAnonExclusive flag per hugetlb page (as
      hugetlb pages cannot get partially COW-shared), stored for the head page.
      That flag is clear for all tail pages.
      
      So when gup_must_unshare() ends up calling PageAnonExclusive() with a tail
      page of a hugetlb page:
      
      1) With CONFIG_DEBUG_VM_PGFLAGS
      
      Stumbles over the:
      
      	VM_BUG_ON_PGFLAGS(PageHuge(page) && !PageHead(page), page);
      
      For example, when executing the COW selftests with 64k hugetlb pages on
      arm64:
      
        [   61.082187] page:00000000829819ff refcount:3 mapcount:1 mapping:0000000000000000 index:0x1 pfn:0x11ee11
        [   61.082842] head:0000000080f79bf7 order:4 entire_mapcount:1 nr_pages_mapped:0 pincount:2
        [   61.083384] anon flags: 0x17ffff80003000e(referenced|uptodate|dirty|head|mappedtodisk|node=0|zone=2|lastcpupid=0xfffff)
        [   61.084101] page_type: 0xffffffff()
        [   61.084332] raw: 017ffff800000000 fffffc00037b8401 0000000000000402 0000000200000000
        [   61.084840] raw: 0000000000000010 0000000000000000 00000000ffffffff 0000000000000000
        [   61.085359] head: 017ffff80003000e ffffd9e95b09b788 ffffd9e95b09b788 ffff0007ff63cf71
        [   61.085885] head: 0000000000000000 0000000000000002 00000003ffffffff 0000000000000000
        [   61.086415] page dumped because: VM_BUG_ON_PAGE(PageHuge(page) && !PageHead(page))
        [   61.086914] ------------[ cut here ]------------
        [   61.087220] kernel BUG at include/linux/page-flags.h:990!
        [   61.087591] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
        [   61.087999] Modules linked in: ...
        [   61.089404] CPU: 0 PID: 4612 Comm: cow Kdump: loaded Not tainted 6.5.0-rc4+ #3
        [   61.089917] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
        [   61.090409] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
        [   61.090897] pc : gup_must_unshare.part.0+0x64/0x98
        [   61.091242] lr : gup_must_unshare.part.0+0x64/0x98
        [   61.091592] sp : ffff8000825eb940
        [   61.091826] x29: ffff8000825eb940 x28: 0000000000000000 x27: fffffc00037b8440
        [   61.092329] x26: 0400000000000001 x25: 0000000000080101 x24: 0000000000080000
        [   61.092835] x23: 0000000000080100 x22: ffff0000cffb9588 x21: ffff0000c8ec6b58
        [   61.093341] x20: 0000ffffad6b1000 x19: fffffc00037b8440 x18: ffffffffffffffff
        [   61.093850] x17: 2864616548656761 x16: 5021202626202965 x15: 6761702865677548
        [   61.094358] x14: 6567615028454741 x13: 2929656761702864 x12: 6165486567615021
        [   61.094858] x11: 00000000ffff7fff x10: 00000000ffff7fff x9 : ffffd9e958b7a1c0
        [   61.095359] x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 : 00000000002bffa8
        [   61.095873] x5 : ffff0008bb19e708 x4 : 0000000000000000 x3 : 0000000000000000
        [   61.096380] x2 : 0000000000000000 x1 : ffff0000cf6636c0 x0 : 0000000000000046
        [   61.096894] Call trace:
        [   61.097080]  gup_must_unshare.part.0+0x64/0x98
        [   61.097392]  gup_pte_range+0x3a8/0x3f0
        [   61.097662]  gup_pgd_range+0x1ec/0x280
        [   61.097942]  lockless_pages_from_mm+0x64/0x1a0
        [   61.098258]  internal_get_user_pages_fast+0xe4/0x1d0
        [   61.098612]  pin_user_pages_fast+0x58/0x78
        [   61.098917]  pin_longterm_test_start+0xf4/0x2b8
        [   61.099243]  gup_test_ioctl+0x170/0x3b0
        [   61.099528]  __arm64_sys_ioctl+0xa8/0xf0
        [   61.099822]  invoke_syscall.constprop.0+0x7c/0xd0
        [   61.100160]  el0_svc_common.constprop.0+0xe8/0x100
        [   61.100500]  do_el0_svc+0x38/0xa0
        [   61.100736]  el0_svc+0x3c/0x198
        [   61.100971]  el0t_64_sync_handler+0x134/0x150
        [   61.101280]  el0t_64_sync+0x17c/0x180
        [   61.101543] Code: aa1303e0 f00074c1 912b0021 97fffeb2 (d4210000)
      
      2) Without CONFIG_DEBUG_VM_PGFLAGS
      
      Always detects "not exclusive" for passed tail pages and refuses to PIN
      the tail pages R/O, as gup_must_unshare() == true.  GUP-fast will fallback
      to ordinary GUP.  As ordinary GUP properly considers the logical hugetlb
      PTE abstraction in hugetlb_follow_page_mask(), pinning the page will
      succeed when looking at the PageAnonExclusive on the head page only.
      
      So the only real effect of this is that with cont-PTE hugetlb pages, we'll
      always fallback from GUP-fast to ordinary GUP when not working on the head
      page, which ends up checking the head page and do the right thing.
      
      Consequently, the cow selftests pass with cont-PTE hugetlb pages as well
      without CONFIG_DEBUG_VM_PGFLAGS.
      
      Note that this only applies to anon hugetlb pages that are mapped using
      cont-PTEs: for example 64k hugetlb pages on a 4k arm64 kernel.
      
      ... and only when R/O-pinning (FOLL_PIN) such pages that are mapped into
      the page table R/O using GUP-fast.
      
      On production kernels (and even most debug kernels, that don't set
      CONFIG_DEBUG_VM_PGFLAGS) this patch should theoretically not be required
      to be backported.  But of course, it does not hurt.
      
      Link: https://lkml.kernel.org/r/20230805101256.87306-1-david@redhat.com
      Fixes: a7f22660
      
       ("mm/gup: trigger FAULT_FLAG_UNSHARE when R/O-pinning a possibly shared anonymous page")
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reported-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Reviewed-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Tested-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a8a60bc8
    • Takashi Iwai's avatar
      ALSA: ymfpci: Fix the missing snd_card_free() call at probe error · d4e11b85
      Takashi Iwai authored
      commit 1d0eb614 upstream.
      
      Like a few other drivers, YMFPCI driver needs to clean up with
      snd_card_free() call at an error path of the probe; otherwise the
      other devres resources are released before the card and it results in
      the UAF.
      
      This patch uses the helper for handling the probe error gracefully.
      
      Fixes: f33fc157
      
       ("ALSA: ymfpci: Create card with device-managed snd_devm_card_new()")
      Cc: <stable@vger.kernel.org>
      Reported-and-tested-by: default avatarTakashi Yano <takashi.yano@nifty.ne.jp>
      Closes: https://lore.kernel.org/r/20230823135846.1812-1-takashi.yano@nifty.ne.jp
      Link: https://lore.kernel.org/r/20230823161625.5807-1-tiwai@suse.de
      
      
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d4e11b85
    • Hugh Dickins's avatar
      shmem: fix smaps BUG sleeping while atomic · d13f3a63
      Hugh Dickins authored
      commit e5548f85 upstream.
      
      smaps_pte_hole_lookup() is calling shmem_partial_swap_usage() with page
      table lock held: but shmem_partial_swap_usage() does cond_resched_rcu() if
      need_resched(): "BUG: sleeping function called from invalid context".
      
      Since shmem_partial_swap_usage() is designed to count across a range, but
      smaps_pte_hole_lookup() only calls it for a single page slot, just break
      out of the loop on the last or only page, before checking need_resched().
      
      Link: https://lkml.kernel.org/r/6fe3b3ec-abdf-332f-5c23-6a3b3a3b11a9@google.com
      Fixes: 23010032
      
       ("mm/smaps: simplify shmem handling of pte holes")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: <stable@vger.kernel.org>	[5.16+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d13f3a63
    • Rik van Riel's avatar
      mm,ima,kexec,of: use memblock_free_late from ima_free_kexec_buffer · 091591f6
      Rik van Riel authored
      commit f0362a25 upstream.
      
      The code calling ima_free_kexec_buffer runs long after the memblock
      allocator has already been torn down, potentially resulting in a use
      after free in memblock_isolate_range.
      
      With KASAN or KFENCE, this use after free will result in a BUG
      from the idle task, and a subsequent kernel panic.
      
      Switch ima_free_kexec_buffer over to memblock_free_late to avoid
      that issue.
      
      Fixes: fee3ff99
      
       ("powerpc: Move arch independent ima kexec functions to drivers/of/kexec.c")
      Cc: stable@kernel.org
      Signed-off-by: default avatarRik van Riel <riel@surriel.com>
      Suggested-by: default avatarMike Rappoport <rppt@kernel.org>
      Link: https://lore.kernel.org/r/20230817135759.0888e5ef@imladris.surriel.com
      
      
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      091591f6
    • Andrey Skvortsov's avatar
      clk: Fix slab-out-of-bounds error in devm_clk_release() · a7d17225
      Andrey Skvortsov authored
      commit 66fbfb35 upstream.
      
      Problem can be reproduced by unloading snd_soc_simple_card, because in
      devm_get_clk_from_child() devres data is allocated as `struct clk`, but
      devm_clk_release() expects devres data to be `struct devm_clk_state`.
      
      KASAN report:
       ==================================================================
       BUG: KASAN: slab-out-of-bounds in devm_clk_release+0x20/0x54
       Read of size 8 at addr ffffff800ee09688 by task (udev-worker)/287
      
       Call trace:
        dump_backtrace+0xe8/0x11c
        show_stack+0x1c/0x30
        dump_stack_lvl+0x60/0x78
        print_report+0x150/0x450
        kasan_report+0xa8/0xf0
        __asan_load8+0x78/0xa0
        devm_clk_release+0x20/0x54
        release_nodes+0x84/0x120
        devres_release_all+0x144/0x210
        device_unbind_cleanup+0x1c/0xac
        really_probe+0x2f0/0x5b0
        __driver_probe_device+0xc0/0x1f0
        driver_probe_device+0x68/0x120
        __driver_attach+0x140/0x294
        bus_for_each_dev+0xec/0x160
        driver_attach+0x38/0x44
        bus_add_driver+0x24c/0x300
        driver_register+0xf0/0x210
        __platform_driver_register+0x48/0x54
        asoc_simple_card_init+0x24/0x1000 [snd_soc_simple_card]
        do_one_initcall+0xac/0x340
        do_init_module+0xd0/0x300
        load_module+0x2ba4/0x3100
        __do_sys_init_module+0x2c8/0x300
        __arm64_sys_init_module+0x48/0x5c
        invoke_syscall+0x64/0x190
        el0_svc_common.constprop.0+0x124/0x154
        do_el0_svc+0x44/0xdc
        el0_svc+0x14/0x50
        el0t_64_sync_handler+0xec/0x11c
        el0t_64_sync+0x14c/0x150
      
       Allocated by task 287:
        kasan_save_stack+0x38/0x60
        kasan_set_track+0x28/0x40
        kasan_save_alloc_info+0x20/0x30
        __kasan_kmalloc+0xac/0xb0
        __kmalloc_node_track_caller+0x6c/0x1c4
        __devres_alloc_node+0x44/0xb4
        devm_get_clk_from_child+0x44/0xa0
        asoc_simple_parse_clk+0x1b8/0x1dc [snd_soc_simple_card_utils]
        simple_parse_node.isra.0+0x1ec/0x230 [snd_soc_simple_card]
        simple_dai_link_of+0x1bc/0x334 [snd_soc_simple_card]
        __simple_for_each_link+0x2ec/0x320 [snd_soc_simple_card]
        asoc_simple_probe+0x468/0x4dc [snd_soc_simple_card]
        platform_probe+0x90/0xf0
        really_probe+0x118/0x5b0
        __driver_probe_device+0xc0/0x1f0
        driver_probe_device+0x68/0x120
        __driver_attach+0x140/0x294
        bus_for_each_dev+0xec/0x160
        driver_attach+0x38/0x44
        bus_add_driver+0x24c/0x300
        driver_register+0xf0/0x210
        __platform_driver_register+0x48/0x54
        asoc_simple_card_init+0x24/0x1000 [snd_soc_simple_card]
        do_one_initcall+0xac/0x340
        do_init_module+0xd0/0x300
        load_module+0x2ba4/0x3100
        __do_sys_init_module+0x2c8/0x300
        __arm64_sys_init_module+0x48/0x5c
        invoke_syscall+0x64/0x190
        el0_svc_common.constprop.0+0x124/0x154
        do_el0_svc+0x44/0xdc
        el0_svc+0x14/0x50
        el0t_64_sync_handler+0xec/0x11c
        el0t_64_sync+0x14c/0x150
      
       The buggy address belongs to the object at ffffff800ee09600
        which belongs to the cache kmalloc-256 of size 256
       The buggy address is located 136 bytes inside of
        256-byte region [ffffff800ee09600, ffffff800ee09700)
      
       The buggy address belongs to the physical page:
       page:000000002d97303b refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4ee08
       head:000000002d97303b order:1 compound_mapcount:0 compound_pincount:0
       flags: 0x10200(slab|head|zone=0)
       raw: 0000000000010200 0000000000000000 dead000000000122 ffffff8002c02480
       raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
       page dumped because: kasan: bad access detected
      
       Memory state around the buggy address:
        ffffff800ee09580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
        ffffff800ee09600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       >ffffff800ee09680: 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                             ^
        ffffff800ee09700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
        ffffff800ee09780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ==================================================================
      
      Fixes: abae8e57
      
       ("clk: generalize devm_clk_get() a bit")
      Signed-off-by: default avatarAndrey Skvortsov <andrej.skvortzov@gmail.com>
      Link: https://lore.kernel.org/r/20230805084847.3110586-1-andrej.skvortzov@gmail.com
      
      
      Signed-off-by: default avatarStephen Boyd <sboyd@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a7d17225
    • Benjamin Coddington's avatar
      NFSv4: Fix dropped lock for racing OPEN and delegation return · 14904f4d
      Benjamin Coddington authored
      commit 1cbc11aa upstream.
      
      Commmit f5ea1613
      
       ("NFSv4: Retry LOCK on OLD_STATEID during delegation
      return") attempted to solve this problem by using nfs4's generic async error
      handling, but introduced a regression where v4.0 lock recovery would hang.
      The additional complexity introduced by overloading that error handling is
      not necessary for this case.  This patch expects that commit to be
      reverted.
      
      The problem as originally explained in the above commit is:
      
          There's a small window where a LOCK sent during a delegation return can
          race with another OPEN on client, but the open stateid has not yet been
          updated.  In this case, the client doesn't handle the OLD_STATEID error
          from the server and will lose this lock, emitting:
          "NFS: nfs4_handle_delegation_recall_error: unhandled error -10024".
      
      Fix this by using the old_stateid refresh helpers if the server replies
      with OLD_STATEID.
      
      Suggested-by: default avatarTrond Myklebust <trondmy@hammerspace.com>
      Signed-off-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      14904f4d
    • André Apitzsch's avatar
      platform/x86: ideapad-laptop: Add support for new hotkeys found on ThinkBook 14s Yoga ITL · ac467d74
      André Apitzsch authored
      commit a260f7d7
      
       upstream.
      
      The Lenovo Thinkbook 14s Yoga ITL has 4 new symbols/shortcuts on their
      F9-F11 and PrtSc keys:
      
      F9:    Has a symbol of a head with a headset, the manual says "Service key"
      F10:   Has a symbol of a telephone horn which has been picked up from the
             receiver, the manual says: "Answer incoming calls"
      F11:   Has a symbol of a telephone horn which is resting on the receiver,
             the manual says: "Reject incoming calls"
      PrtSc: Has a symbol of a siccor and a dashed ellipse, the manual says:
             "Open the Windows 'Snipping' Tool app"
      
      This commit adds support for these 4 new hkey events.
      
      Signed-off-by: default avatarAndré Apitzsch <git@apitzsch.eu>
      Link: https://lore.kernel.org/r/20230819-lenovo_keys-v1-1-9d34eac88e0a@apitzsch.eu
      
      
      Reviewed-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ac467d74
    • Ping-Ke Shih's avatar
      wifi: mac80211: limit reorder_buf_filtered to avoid UBSAN warning · e6a60ecc
      Ping-Ke Shih authored
      commit b98c1610 upstream.
      
      The commit 06470f74
      
       ("mac80211: add API to allow filtering frames in BA sessions")
      added reorder_buf_filtered to mark frames filtered by firmware, and it
      can only work correctly if hw.max_rx_aggregation_subframes <= 64 since
      it stores the bitmap in a u64 variable.
      
      However, new HE or EHT devices can support BlockAck number up to 256 or
      1024, and then using a higher subframe index leads UBSAN warning:
      
       UBSAN: shift-out-of-bounds in net/mac80211/rx.c:1129:39
       shift exponent 215 is too large for 64-bit type 'long long unsigned int'
       Call Trace:
        <IRQ>
        dump_stack_lvl+0x48/0x70
        dump_stack+0x10/0x20
        __ubsan_handle_shift_out_of_bounds+0x1ac/0x360
        ieee80211_release_reorder_frame.constprop.0.cold+0x64/0x69 [mac80211]
        ieee80211_sta_reorder_release+0x9c/0x400 [mac80211]
        ieee80211_prepare_and_rx_handle+0x1234/0x1420 [mac80211]
        ieee80211_rx_list+0xaef/0xf60 [mac80211]
        ieee80211_rx_napi+0x53/0xd0 [mac80211]
      
      Since only old hardware that supports <=64 BlockAck uses
      ieee80211_mark_rx_ba_filtered_frames(), limit the use as it is, so add a
      WARN_ONCE() and comment to note to avoid using this function if hardware
      capability is not suitable.
      
      Signed-off-by: default avatarPing-Ke Shih <pkshih@realtek.com>
      Link: https://lore.kernel.org/r/20230818014004.16177-1-pkshih@realtek.com
      
      
      [edit commit message]
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e6a60ecc
    • Michael Ellerman's avatar
      ibmveth: Use dcbf rather than dcbfl · b8b7243a
      Michael Ellerman authored
      commit bfedba3b
      
       upstream.
      
      When building for power4, newer binutils don't recognise the "dcbfl"
      extended mnemonic.
      
      dcbfl RA, RB is equivalent to dcbf RA, RB, 1.
      
      Switch to "dcbf" to avoid the build error.
      
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b8b7243a
    • Charles Keepax's avatar
      ASoC: cs35l41: Correct amp_gain_tlv values · 85607ef3
      Charles Keepax authored
      commit 1613781d
      
       upstream.
      
      The current analog gain TLV seems to have completely incorrect values in
      it. The gain starts at 0.5dB, proceeds in 1dB steps, and has no mute
      value, correct the control to match.
      
      Signed-off-by: default avatarCharles Keepax <ckeepax@opensource.cirrus.com>
      Link: https://lore.kernel.org/r/20230823085308.753572-1-ckeepax@opensource.cirrus.com
      
      
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      85607ef3
    • BrenoRCBrito's avatar
      ASoC: amd: yc: Add VivoBook Pro 15 to quirks list for acp6x · 014fec55
      BrenoRCBrito authored
      commit 3b1f0883
      
       upstream.
      
      VivoBook Pro 15 Ryzen Edition uses Ryzen 6800H processor, and adding to
       quirks list for acp6x will enable internal mic.
      
      Signed-off-by: default avatarBrenoRCBrito <brenorcbrito@gmail.com>
      Link: https://lore.kernel.org/r/20230818211417.32167-1-brenorcbrito@gmail.com
      
      
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      014fec55
    • Jens Axboe's avatar
      io_uring/msg_ring: fix missing lock on overflow for IOPOLL · 22a406b3
      Jens Axboe authored
      Commit e12d7a46 upstream.
      
      If the target ring is configured with IOPOLL, then we always need to hold
      the target ring uring_lock before posting CQEs. We could just grab it
      unconditionally, but since we don't expect many target rings to be of this
      type, make grabbing the uring_lock conditional on the ring type.
      
      Link: https://lore.kernel.org/io-uring/Y8krlYa52%2F0YGqkg@ip-172-31-85-199.ec2.internal/
      
      
      Reported-by: default avatarXingyuan Mo <hdthky0@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      22a406b3
    • Jens Axboe's avatar
      io_uring/msg_ring: move double lock/unlock helpers higher up · 816c7cec
      Jens Axboe authored
      Commit 423d5081
      
       upstream.
      
      In preparation for needing them somewhere else, move them and get rid of
      the unused 'issue_flags' for the unlock side.
      
      No functional changes in this patch.
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      816c7cec
    • Pavel Begunkov's avatar
      io_uring: extract a io_msg_install_complete helper · 4f593752
      Pavel Begunkov authored
      Commit 17211310
      
       upstream.
      
      Extract a helper called io_msg_install_complete() from io_msg_send_fd(),
      will be used later.
      
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/1500ca1054cc4286a3ee1c60aacead57fcdfa02a.1670384893.git.asml.silence@gmail.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4f593752
    • Pavel Begunkov's avatar
      io_uring: get rid of double locking · 0d617fb6
      Pavel Begunkov authored
      Commit 11373026
      
       upstream.
      
      We don't need to take both uring_locks at once, msg_ring can be split in
      two parts, first getting a file from the filetable of the first ring and
      then installing it into the second one.
      
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/a80ecc2bc99c3b3f2cf20015d618b7c51419a797.1670384893.git.asml.silence@gmail.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0d617fb6
    • Sean Christopherson's avatar
      KVM: x86/mmu: Fix an sign-extension bug with mmu_seq that hangs vCPUs · 82d811ff
      Sean Christopherson authored
      Upstream commit ba6e3fe2
      
       ("KVM: x86/mmu: Grab mmu_invalidate_seq in
      kvm_faultin_pfn()") unknowingly fixed the bug in v6.3 when refactoring
      how KVM tracks the sequence counter snapshot.
      
      Take the vCPU's mmu_seq snapshot as an "unsigned long" instead of an "int"
      when checking to see if a page fault is stale, as the sequence count is
      stored as an "unsigned long" everywhere else in KVM.  This fixes a bug
      where KVM will effectively hang vCPUs due to always thinking page faults
      are stale, which results in KVM refusing to "fix" faults.
      
      mmu_invalidate_seq (née mmu_notifier_seq) is a sequence counter used when
      KVM is handling page faults to detect if userspace mappings relevant to
      the guest were invalidated between snapshotting the counter and acquiring
      mmu_lock, i.e. to ensure that the userspace mapping KVM is using to
      resolve the page fault is fresh.  If KVM sees that the counter has
      changed, KVM simply resumes the guest without fixing the fault.
      
      What _should_ happen is that the source of the mmu_notifier invalidations
      eventually goes away, mmu_invalidate_seq becomes stable, and KVM can once
      again fix guest page fault(s).
      
      But for a long-lived VM and/or a VM that the host just doesn't particularly
      like, it's possible for a VM to be on the receiving end of 2 billion (with
      a B) mmu_notifier invalidations.  When that happens, bit 31 will be set in
      mmu_invalidate_seq.  This causes the value to be turned into a 32-bit
      negative value when implicitly cast to an "int" by is_page_fault_stale(),
      and then sign-extended into a 64-bit unsigned when the signed "int" is
      implicitly cast back to an "unsigned long" on the call to
      mmu_invalidate_retry_hva().
      
      As a result of the casting and sign-extension, given a sequence counter of
      e.g. 0x8002dc25, mmu_invalidate_retry_hva() ends up doing
      
      	if (0x8002dc25 != 0xffffffff8002dc25)
      
      and signals that the page fault is stale and needs to be retried even
      though the sequence counter is stable, and KVM effectively hangs any vCPU
      that takes a page fault (EPT violation or #NPF when TDP is enabled).
      
      Reported-by: default avatarBrian Rak <brak@vultr.com>
      Reported-by: default avatarAmaan Cheval <amaan.cheval@gmail.com>
      Reported-by: default avatarEric Wheeler <kvm@lists.ewheeler.net>
      Closes: https://lore.kernel.org/all/f023d927-52aa-7e08-2ee5-59a2fbc65953@gameservers.com
      Fixes: a955cad8
      
       ("KVM: x86/mmu: Retry page fault if root is invalidated by memslot update")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      82d811ff
    • Sean Christopherson's avatar
      KVM: x86: Preserve TDP MMU roots until they are explicitly invalidated · 2800385f
      Sean Christopherson authored
      commit edbdb43f
      
       upstream.
      
      Preserve TDP MMU roots until they are explicitly invalidated by gifting
      the TDP MMU itself a reference to a root when it is allocated.  Keeping a
      reference in the TDP MMU fixes a flaw where the TDP MMU exhibits terrible
      performance, and can potentially even soft-hang a vCPU, if a vCPU
      frequently unloads its roots, e.g. when KVM is emulating SMI+RSM.
      
      When KVM emulates something that invalidates _all_ TLB entries, e.g. SMI
      and RSM, KVM unloads all of the vCPUs roots (KVM keeps a small per-vCPU
      cache of previous roots).  Unloading roots is a simple way to ensure KVM
      flushes and synchronizes all roots for the vCPU, as KVM flushes and syncs
      when allocating a "new" root (from the vCPU's perspective).
      
      In the shadow MMU, KVM keeps track of all shadow pages, roots included, in
      a per-VM hash table.  Unloading a shadow MMU root just wipes it from the
      per-vCPU cache; the root is still tracked in the per-VM hash table.  When
      KVM loads a "new" root for the vCPU, KVM will find the old, unloaded root
      in the per-VM hash table.
      
      Unlike the shadow MMU, the TDP MMU doesn't track "inactive" roots in a
      per-VM structure, where "active" in this case means a root is either
      in-use or cached as a previous root by at least one vCPU.  When a TDP MMU
      root becomes inactive, i.e. the last vCPU reference to the root is put,
      KVM immediately frees the root (asterisk on "immediately" as the actual
      freeing may be done by a worker, but for all intents and purposes the root
      is gone).
      
      The TDP MMU behavior is especially problematic for 1-vCPU setups, as
      unloading all roots effectively frees all roots.  The issue is mitigated
      to some degree in multi-vCPU setups as a different vCPU usually holds a
      reference to an unloaded root and thus keeps the root alive, allowing the
      vCPU to reuse its old root after unloading (with a flush+sync).
      
      The TDP MMU flaw has been known for some time, as until very recently,
      KVM's handling of CR0.WP also triggered unloading of all roots.  The
      CR0.WP toggling scenario was eventually addressed by not unloading roots
      when _only_ CR0.WP is toggled, but such an approach doesn't Just Work
      for emulating SMM as KVM must emulate a full TLB flush on entry and exit
      to/from SMM.  Given that the shadow MMU plays nice with unloading roots
      at will, teaching the TDP MMU to do the same is far less complex than
      modifying KVM to track which roots need to be flushed before reuse.
      
      Note, preserving all possible TDP MMU roots is not a concern with respect
      to memory consumption.  Now that the role for direct MMUs doesn't include
      information about the guest, e.g. CR0.PG, CR0.WP, CR4.SMEP, etc., there
      are _at most_ six possible roots (where "guest_mode" here means L2):
      
        1. 4-level !SMM !guest_mode
        2. 4-level  SMM !guest_mode
        3. 5-level !SMM !guest_mode
        4. 5-level  SMM !guest_mode
        5. 4-level !SMM guest_mode
        6. 5-level !SMM guest_mode
      
      And because each vCPU can track 4 valid roots, a VM can already have all
      6 root combinations live at any given time.  Not to mention that, in
      practice, no sane VMM will advertise different guest.MAXPHYADDR values
      across vCPUs, i.e. KVM won't ever use both 4-level and 5-level roots for
      a single VM.  Furthermore, the vast majority of modern hypervisors will
      utilize EPT/NPT when available, thus the guest_mode=%true cases are also
      unlikely to be utilized.
      
      Reported-by: default avatarJeremi Piotrowski <jpiotrowski@linux.microsoft.com>
      Link: https://lore.kernel.org/all/959c5bce-beb5-b463-7158-33fc4a4f910c@linux.microsoft.com
      Link: https://lkml.kernel.org/r/20220209170020.1775368-1-pbonzini%40redhat.com
      Link: https://lore.kernel.org/all/20230322013731.102955-1-minipli@grsecurity.net
      Link: https://lore.kernel.org/all/000000000000a0bc2b05f9dd7fab@google.com
      Link: https://lore.kernel.org/all/000000000000eca0b905fa0f7756@google.com
      
      
      Cc: Ben Gardon <bgardon@google.com>
      Cc: David Matlack <dmatlack@google.com>
      Cc: stable@vger.kernel.org
      Tested-by: default avatarJeremi Piotrowski <jpiotrowski@linux.microsoft.com>
      Link: https://lore.kernel.org/r/20230426220323.3079789-1-seanjc@google.com
      
      
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2800385f
    • Hangbin Liu's avatar
      bonding: fix macvlan over alb bond support · a0559fd0
      Hangbin Liu authored
      [ Upstream commit e74216b8 ]
      
      The commit 14af9963 ("bonding: Support macvlans on top of tlb/rlb mode
      bonds") aims to enable the use of macvlans on top of rlb bond mode. However,
      the current rlb bond mode only handles ARP packets to update remote neighbor
      entries. This causes an issue when a macvlan is on top of the bond, and
      remote devices send packets to the macvlan using the bond's MAC address
      as the destination. After delivering the packets to the macvlan, the macvlan
      will rejects them as the MAC address is incorrect. Consequently, this commit
      makes macvlan over bond non-functional.
      
      To address this problem, one potential solution is to check for the presence
      of a macvlan port on the bond device using netif_is_macvlan_port(bond->dev)
      and return NULL in the rlb_arp_xmit() function. However, this approach
      doesn't fully resolve the situation when a VLAN exists between the bond and
      macvlan.
      
      So let's just do a partial revert for commit 14af9963 in rlb_arp_xmit().
      As the comment said, Don't modify or load balance ARPs that do not originate
      locally.
      
      Fixes: 14af9963
      
       ("bonding: Support macvlans on top of tlb/rlb mode bonds")
      Reported-by: default avatar <susan.zheng@veritas.com>
      Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2117816
      
      
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Acked-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a0559fd0
    • Ido Schimmel's avatar
      rtnetlink: Reject negative ifindexes in RTM_NEWLINK · b15dea3d
      Ido Schimmel authored
      [ Upstream commit 30188bd7 ]
      
      Negative ifindexes are illegal, but the kernel does not validate the
      ifindex in the ancillary header of RTM_NEWLINK messages, resulting in
      the kernel generating a warning [1] when such an ifindex is specified.
      
      Fix by rejecting negative ifindexes.
      
      [1]
      WARNING: CPU: 0 PID: 5031 at net/core/dev.c:9593 dev_index_reserve+0x1a2/0x1c0 net/core/dev.c:9593
      [...]
      Call Trace:
       <TASK>
       register_netdevice+0x69a/0x1490 net/core/dev.c:10081
       br_dev_newlink+0x27/0x110 net/bridge/br_netlink.c:1552
       rtnl_newlink_create net/core/rtnetlink.c:3471 [inline]
       __rtnl_newlink+0x115e/0x18c0 net/core/rtnetlink.c:3688
       rtnl_newlink+0x67/0xa0 net/core/rtnetlink.c:3701
       rtnetlink_rcv_msg+0x439/0xd30 net/core/rtnetlink.c:6427
       netlink_rcv_skb+0x16b/0x440 net/netlink/af_netlink.c:2545
       netlink_unicast_kernel net/netlink/af_netlink.c:1342 [inline]
       netlink_unicast+0x536/0x810 net/netlink/af_netlink.c:1368
       netlink_sendmsg+0x93c/0xe40 net/netlink/af_netlink.c:1910
       sock_sendmsg_nosec net/socket.c:728 [inline]
       sock_sendmsg+0xd9/0x180 net/socket.c:751
       ____sys_sendmsg+0x6ac/0x940 net/socket.c:2538
       ___sys_sendmsg+0x135/0x1d0 net/socket.c:2592
       __sys_sendmsg+0x117/0x1e0 net/socket.c:2621
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Fixes: 38f7b870
      
       ("[RTNETLINK]: Link creation API")
      Reported-by: default avatar <syzbot+5ba06978f34abb058571@syzkaller.appspotmail.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: default avatarJakub Kicinski <kuba@kernel.org>
      Link: https://lore.kernel.org/r/20230823064348.2252280-1-idosch@nvidia.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b15dea3d
    • Florian Westphal's avatar
      netfilter: nf_tables: fix out of memory error handling · ed3fe5f9
      Florian Westphal authored
      [ Upstream commit 5e1be4cd ]
      
      Several instances of pipapo_resize() don't propagate allocation failures,
      this causes a crash when fault injection is enabled for gfp_kernel slabs.
      
      Fixes: 3c4287f6
      
       ("nf_tables: Add set type for arbitrary concatenation of ranges")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ed3fe5f9
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: flush pending destroy work before netlink notifier · 41841b58
      Pablo Neira Ayuso authored
      [ Upstream commit 2c9f0293 ]
      
      Destroy work waits for the RCU grace period then it releases the objects
      with no mutex held. All releases objects follow this path for
      transactions, therefore, order is guaranteed and references to top-level
      objects in the hierarchy remain valid.
      
      However, netlink notifier might interfer with pending destroy work.
      rcu_barrier() is not correct because objects are not release via RCU
      callback. Flush destroy work before releasing objects from netlink
      notifier path.
      
      Fixes: d4bc8271
      
       ("netfilter: nf_tables: netlink notifier might race to release objects")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      41841b58
    • Andrii Staikov's avatar
      i40e: fix potential NULL pointer dereferencing of pf->vf i40e_sync_vsi_filters() · 13686195
      Andrii Staikov authored
      [ Upstream commit 9525a3c3 ]
      
      Add check for pf->vf not being NULL before dereferencing
      pf->vf[vsi->vf_id] in updating VSI filter sync.
      Add a similar check before dereferencing !pf->vf[vsi->vf_id].trusted
      in the condition for clearing promisc mode bit.
      
      Fixes: c87c938f
      
       ("i40e: Add VF VLAN pruning")
      Signed-off-by: default avatarAndrii Staikov <andrii.staikov@intel.com>
      Signed-off-by: default avatarAleksandr Loktionov <aleksandr.loktionov@intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      13686195
    • Jamal Hadi Salim's avatar
      net/sched: fix a qdisc modification with ambiguous command request · 58166889
      Jamal Hadi Salim authored
      [ Upstream commit da71714e ]
      
      When replacing an existing root qdisc, with one that is of the same kind, the
      request boils down to essentially a parameterization change  i.e not one that
      requires allocation and grafting of a new qdisc. syzbot was able to create a
      scenario which resulted in a taprio qdisc replacing an existing taprio qdisc
      with a combination of NLM_F_CREATE, NLM_F_REPLACE and NLM_F_EXCL leading to
      create and graft scenario.
      The fix ensures that only when the qdisc kinds are different that we should
      allow a create and graft, otherwise it goes into the "change" codepath.
      
      While at it, fix the code and comments to improve readability.
      
      While syzbot was able to create the issue, it did not zone on the root cause.
      Analysis from Vladimir Oltean <vladimir.oltean@nxp.com> helped narrow it down.
      
      v1->V2 changes:
      - remove "inline" function definition (Vladmir)
      - remove extrenous braces in branches (Vladmir)
      - change inline function names (Pedro)
      - Run tdc tests (Victor)
      v2->v3 changes:
      - dont break else/if (Simon)
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Reported-by: default avatar <syzbot+a3618a167af2021433cd@syzkaller.appspotmail.com>
      Closes: https://lore.kernel.org/netdev/20230816225759.g25x76kmgzya2gei@skbuf/T/
      
      
      Tested-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarVictor Nogueira <victor@mojatatu.com>
      Reviewed-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Reviewed-by: default avatarVictor Nogueira <victor@mojatatu.com>
      Signed-off-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      58166889
    • Sasha Neftin's avatar
      igc: Fix the typo in the PTM Control macro · f94f30e2
      Sasha Neftin authored
      [ Upstream commit de439757 ]
      
      The IGC_PTM_CTRL_SHRT_CYC defines the time between two consecutive PTM
      requests. The bit resolution of this field is six bits. That bit five was
      missing in the mask. This patch comes to correct the typo in the
      IGC_PTM_CTRL_SHRT_CYC macro.
      
      Fixes: a90ec848
      
       ("igc: Add support for PTP getcrosststamp()")
      Signed-off-by: default avatarSasha Neftin <sasha.neftin@intel.com>
      Tested-by: default avatarNaama Meir <naamax.meir@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarKalesh AP <kalesh-anakkur.purayil@broadcom.com>
      Link: https://lore.kernel.org/r/20230821171721.2203572-1-anthony.l.nguyen@intel.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f94f30e2
    • Alessio Igor Bogani's avatar
      igb: Avoid starting unnecessary workqueues · 9b7fd6be
      Alessio Igor Bogani authored
      [ Upstream commit b888c510 ]
      
      If ptp_clock_register() fails or CONFIG_PTP isn't enabled, avoid starting
      PTP related workqueues.
      
      In this way we can fix this:
       BUG: unable to handle page fault for address: ffffc9000440b6f8
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 100000067 P4D 100000067 PUD 1001e0067 PMD 107dc5067 PTE 0
       Oops: 0000 [#1] PREEMPT SMP
       [...]
       Workqueue: events igb_ptp_overflow_check
       RIP: 0010:igb_rd32+0x1f/0x60
       [...]
       Call Trace:
        igb_ptp_read_82580+0x20/0x50
        timecounter_read+0x15/0x60
        igb_ptp_overflow_check+0x1a/0x50
        process_one_work+0x1cb/0x3c0
        worker_thread+0x53/0x3f0
        ? rescuer_thread+0x370/0x370
        kthread+0x142/0x160
        ? kthread_associate_blkcg+0xc0/0xc0
        ret_from_fork+0x1f/0x30
      
      Fixes: 1f6e8178 ("igb: Prevent dropped Tx timestamps via work items and interrupts.")
      Fixes: d339b133
      
       ("igb: add PTP Hardware Clock code")
      Signed-off-by: default avatarAlessio Igor Bogani <alessio.bogani@elettra.eu>
      Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20230821171927.2203644-1-anthony.l.nguyen@intel.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9b7fd6be
    • Oliver Hartkopp's avatar
      can: isotp: fix support for transmission of SF without flow control · 39d43b9c
      Oliver Hartkopp authored
      [ Upstream commit 0bfe7115 ]
      
      The original implementation had a very simple handling for single frame
      transmissions as it just sent the single frame without a timeout handling.
      
      With the new echo frame handling the echo frame was also introduced for
      single frames but the former exception ('simple without timers') has been
      maintained by accident. This leads to a 1 second timeout when closing the
      socket and to an -ECOMM error when CAN_ISOTP_WAIT_TX_DONE is selected.
      
      As the echo handling is always active (also for single frames) remove the
      wrong extra condition for single frames.
      
      Fixes: 9f39d365
      
       ("can: isotp: add support for transmission without flow control")
      Signed-off-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Link: https://lore.kernel.org/r/20230821144547.6658-2-socketcan@hartkopp.net
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      39d43b9c
    • Hangbin Liu's avatar
      selftests: bonding: do not set port down before adding to bond · f41781b9
      Hangbin Liu authored
      [ Upstream commit be809424 ]
      
      Before adding a port to bond, it need to be set down first. In the
      lacpdu test the author set the port down specifically. But commit
      a4abfa62 ("net: rtnetlink: Enslave device before bringing it up")
      changed the operation order, the kernel will set the port down _after_
      adding to bond. So all the ports will be down at last and the test failed.
      
      In fact, the veth interfaces are already inactive when added. This
      means there's no need to set them down again before adding to the bond.
      Let's just remove the link down operation.
      
      Fixes: a4abfa62
      
       ("net: rtnetlink: Enslave device before bringing it up")
      Reported-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Closes: https://lore.kernel.org/netdev/a0ef07c7-91b0-94bd-240d-944a330fcabd@huawei.com/
      
      
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Link: https://lore.kernel.org/r/20230817082459.1685972-1-liuhangbin@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f41781b9
    • Petr Oros's avatar
      ice: Fix NULL pointer deref during VF reset · 850e2322
      Petr Oros authored
      [ Upstream commit 67f6317d ]
      
      During stress test with attaching and detaching VF from KVM and
      simultaneously changing VFs spoofcheck and trust there was a
      NULL pointer dereference in ice_reset_vf that VF's VSI is null.
      
      More than one instance of ice_reset_vf() can be running at a given
      time. When we rebuild the VSI in ice_reset_vf, another reset can be
      triaged from ice_service_task. In this case we can access the currently
      uninitialized VSI and cause panic. The window for this racing condition
      has been around for a long time but it's much worse after commit
      227bf450 ("ice: move VSI delete outside deconfig") because
      the reset runs faster. ice_reset_vf() using vf->cfg_lock and when
      we move this lock before accessing to the VF VSI, we can fix
      BUG for all cases.
      
      Panic occurs sometimes in ice_vsi_is_rx_queue_active() and sometimes
      in ice_vsi_stop_all_rx_rings()
      
      With our reproducer, we can hit BUG:
      ~8h before commit 227bf450 ("ice: move VSI delete outside deconfig").
      ~20m after commit 227bf450 ("ice: move VSI delete outside deconfig").
      After this fix we are not able to reproduce it after ~48h
      
      There was commit cf90b743 ("ice: Fix call trace with null VSI during
      VF reset") which also tried to fix this issue, but it was only
      partially resolved and the bug still exists.
      
      [ 6420.658415] BUG: kernel NULL pointer dereference, address: 0000000000000000
      [ 6420.665382] #PF: supervisor read access in kernel mode
      [ 6420.670521] #PF: error_code(0x0000) - not-present page
      [ 6420.675659] PGD 0
      [ 6420.677679] Oops: 0000 [#1] PREEMPT SMP NOPTI
      [ 6420.682038] CPU: 53 PID: 326472 Comm: kworker/53:0 Kdump: loaded Not tainted 5.14.0-317.el9.x86_64 #1
      [ 6420.691250] Hardware name: Dell Inc. PowerEdge R750/04V528, BIOS 1.6.5 04/15/2022
      [ 6420.698729] Workqueue: ice ice_service_task [ice]
      [ 6420.703462] RIP: 0010:ice_vsi_is_rx_queue_active+0x2d/0x60 [ice]
      [ 6420.705860] ice 0000:ca:00.0: VF 0 is now untrusted
      [ 6420.709494] Code: 00 00 66 83 bf 76 04 00 00 00 48 8b 77 10 74 3e 31 c0 eb 0f 0f b7 97 76 04 00 00 48 83 c0 01 39 c2 7e 2b 48 8b 97 68 04 00 00 <0f> b7 0c 42 48 8b 96 20 13 00 00 48 8d 94 8a 00 00 12 00 8b 12 83
      [ 6420.714426] ice 0000:ca:00.0 ens7f0: Setting MAC 22:22:22:22:22:00 on VF 0. VF driver will be reinitialized
      [ 6420.733120] RSP: 0018:ff778d2ff383fdd8 EFLAGS: 00010246
      [ 6420.733123] RAX: 0000000000000000 RBX: ff2acf1916294000 RCX: 0000000000000000
      [ 6420.733125] RDX: 0000000000000000 RSI: ff2acf1f2c6401a0 RDI: ff2acf1a27301828
      [ 6420.762346] RBP: ff2acf1a27301828 R08: 0000000000000010 R09: 0000000000001000
      [ 6420.769476] R10: ff2acf1916286000 R11: 00000000019eba3f R12: ff2acf19066460d0
      [ 6420.776611] R13: ff2acf1f2c6401a0 R14: ff2acf1f2c6401a0 R15: 00000000ffffffff
      [ 6420.783742] FS:  0000000000000000(0000) GS:ff2acf28ffa80000(0000) knlGS:0000000000000000
      [ 6420.791829] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 6420.797575] CR2: 0000000000000000 CR3: 00000016ad410003 CR4: 0000000000773ee0
      [ 6420.804708] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 6420.811034] vfio-pci 0000:ca:01.0: enabling device (0000 -> 0002)
      [ 6420.811840] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 6420.811841] PKRU: 55555554
      [ 6420.811842] Call Trace:
      [ 6420.811843]  <TASK>
      [ 6420.811844]  ice_reset_vf+0x9a/0x450 [ice]
      [ 6420.811876]  ice_process_vflr_event+0x8f/0xc0 [ice]
      [ 6420.841343]  ice_service_task+0x23b/0x600 [ice]
      [ 6420.845884]  ? __schedule+0x212/0x550
      [ 6420.849550]  process_one_work+0x1e2/0x3b0
      [ 6420.853563]  ? rescuer_thread+0x390/0x390
      [ 6420.857577]  worker_thread+0x50/0x3a0
      [ 6420.861242]  ? rescuer_thread+0x390/0x390
      [ 6420.865253]  kthread+0xdd/0x100
      [ 6420.868400]  ? kthread_complete_and_exit+0x20/0x20
      [ 6420.873194]  ret_from_fork+0x1f/0x30
      [ 6420.876774]  </TASK>
      [ 6420.878967] Modules linked in: vfio_pci vfio_pci_core vfio_iommu_type1 vfio iavf vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter nf_tables bridge stp llc sctp ip6_udp_tunnel udp_tunnel nfp tls nfnetlink bluetooth mlx4_en mlx4_core rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill sunrpc intel_rapl_msr intel_rapl_common i10nm_edac nfit libnvdimm ipmi_ssif x86_pkg_temp_thermal intel_powerclamp coretemp irdma kvm_intel i40e kvm iTCO_wdt dcdbas ib_uverbs irqbypass iTCO_vendor_support mgag200 mei_me ib_core dell_smbios isst_if_mmio isst_if_mbox_pci rapl i2c_algo_bit drm_shmem_helper intel_cstate drm_kms_helper syscopyarea sysfillrect isst_if_common sysimgblt intel_uncore fb_sys_fops dell_wmi_descriptor wmi_bmof intel_vsec mei i2c_i801 acpi_ipmi ipmi_si i2c_smbus ipmi_devintf intel_pch_thermal acpi_power_meter pcspk
       r
      
      Fixes: efe41860 ("ice: Fix memory corruption in VF driver")
      Fixes: f23df522
      
       ("ice: Fix spurious interrupt during removal of trusted VF")
      Signed-off-by: default avatarPetr Oros <poros@redhat.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      850e2322
    • Petr Oros's avatar
      Revert "ice: Fix ice VF reset during iavf initialization" · 7cddaed2
      Petr Oros authored
      [ Upstream commit 0ecff05e ]
      
      This reverts commit 7255355a.
      
      After this commit we are not able to attach VF to VM:
      virsh attach-interface v0 hostdev --managed 0000:41:01.0 --mac 52:52:52:52:52:52
      error: Failed to attach interface
      error: Cannot set interface MAC to 52:52:52:52:52:52 for ifname enp65s0f0np0 vf 0: Resource temporarily unavailable
      
      ice_check_vf_ready_for_cfg() already contain waiting for reset.
      New condition in ice_check_vf_ready_for_reset() causing only problems.
      
      Fixes: 7255355a
      
       ("ice: Fix ice VF reset during iavf initialization")
      Signed-off-by: default avatarPetr Oros <poros@redhat.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7cddaed2