Skip to content
  1. Jun 23, 2021
    • Nikolay Aleksandrov's avatar
      net: bridge: fix vlan tunnel dst refcnt when egressing · fc7fdd8c
      Nikolay Aleksandrov authored
      commit cfc579f9 upstream.
      
      The egress tunnel code uses dst_clone() and directly sets the result
      which is wrong because the entry might have 0 refcnt or be already deleted,
      causing number of problems. It also triggers the WARN_ON() in dst_hold()[1]
      when a refcnt couldn't be taken. Fix it by using dst_hold_safe() and
      checking if a reference was actually taken before setting the dst.
      
      [1] dmesg WARN_ON log and following refcnt errors
       WARNING: CPU: 5 PID: 38 at include/net/dst.h:230 br_handle_egress_vlan_tunnel+0x10b/0x134 [bridge]
       Modules linked in: 8021q garp mrp bridge stp llc bonding ipv6 virtio_net
       CPU: 5 PID: 38 Comm: ksoftirqd/5 Kdump: loaded Tainted: G        W         5.13.0-rc3+ #360
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
       RIP: 0010:br_handle_egress_vlan_tunnel+0x10b/0x134 [bridge]
       Code: e8 85 bc 01 e1 45 84 f6 74 90 45 31 f6 85 db 48 c7 c7 a0 02 19 a0 41 0f 94 c6 31 c9 31 d2 44 89 f6 e8 64 bc 01 e1 85 db 75 02 <0f> 0b 31 c9 31 d2 44 89 f6 48 c7 c7 70 02 19 a0 e8 4b bc 01 e1 49
       RSP: 0018:ffff8881003d39e8 EFLAGS: 00010246
       RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
       RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffffa01902a0
       RBP: ffff8881040c6700 R08: 0000000000000000 R09: 0000000000000001
       R10: 2ce93d0054fe0d00 R11: 54fe0d00000e0000 R12: ffff888109515000
       R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000401
       FS:  0000000000000000(0000) GS:ffff88822bf40000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007f42ba70f030 CR3: 0000000109926000 CR4: 00000000000006e0
       Call Trace:
        br_handle_vlan+0xbc/0xca [bridge]
        __br_forward+0x23/0x164 [bridge]
        deliver_clone+0x41/0x48 [bridge]
        br_handle_frame_finish+0x36f/0x3aa [bridge]
        ? skb_dst+0x2e/0x38 [bridge]
        ? br_handle_ingress_vlan_tunnel+0x3e/0x1c8 [bridge]
        ? br_handle_frame_finish+0x3aa/0x3aa [bridge]
        br_handle_frame+0x2c3/0x377 [bridge]
        ? __skb_pull+0x33/0x51
        ? vlan_do_receive+0x4f/0x36a
        ? br_handle_frame_finish+0x3aa/0x3aa [bridge]
        __netif_receive_skb_core+0x539/0x7c6
        ? __list_del_entry_valid+0x16e/0x1c2
        __netif_receive_skb_list_core+0x6d/0xd6
        netif_receive_skb_list_internal+0x1d9/0x1fa
        gro_normal_list+0x22/0x3e
        dev_gro_receive+0x55b/0x600
        ? detach_buf_split+0x58/0x140
        napi_gro_receive+0x94/0x12e
        virtnet_poll+0x15d/0x315 [virtio_net]
        __napi_poll+0x2c/0x1c9
        net_rx_action+0xe6/0x1fb
        __do_softirq+0x115/0x2d8
        run_ksoftirqd+0x18/0x20
        smpboot_thread_fn+0x183/0x19c
        ? smpboot_unregister_percpu_thread+0x66/0x66
        kthread+0x10a/0x10f
        ? kthread_mod_delayed_work+0xb6/0xb6
        ret_from_fork+0x22/0x30
       ---[ end trace 49f61b07f775fd2b ]---
       dst_release: dst:00000000c02d677a refcnt:-1
       dst_release underflow
      
      Cc: stable@vger.kernel.org
      Fixes: 11538d03
      
       ("bridge: vlan dst_metadata hooks in ingress and egress paths")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc7fdd8c
    • Nikolay Aleksandrov's avatar
      net: bridge: fix vlan tunnel dst null pointer dereference · fe0448a3
      Nikolay Aleksandrov authored
      commit 58e20717 upstream.
      
      This patch fixes a tunnel_dst null pointer dereference due to lockless
      access in the tunnel egress path. When deleting a vlan tunnel the
      tunnel_dst pointer is set to NULL without waiting a grace period (i.e.
      while it's still usable) and packets egressing are dereferencing it
      without checking. Use READ/WRITE_ONCE to annotate the lockless use of
      tunnel_id, use RCU for accessing tunnel_dst and make sure it is read
      only once and checked in the egress path. The dst is already properly RCU
      protected so we don't need to do anything fancy than to make sure
      tunnel_id and tunnel_dst are read only once and checked in the egress path.
      
      Cc: stable@vger.kernel.org
      Fixes: 11538d03
      
       ("bridge: vlan dst_metadata hooks in ingress and egress paths")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fe0448a3
    • Esben Haabendal's avatar
      net: ll_temac: Fix TX BD buffer overwrite · cfe403f2
      Esben Haabendal authored
      commit c364df24 upstream.
      
      Just as the initial check, we need to ensure num_frag+1 buffers available,
      as that is the number of buffers we are going to use.
      
      This fixes a buffer overflow, which might be seen during heavy network
      load. Complete lockup of TEMAC was reproducible within about 10 minutes of
      a particular load.
      
      Fixes: 84823ff8
      
       ("net: ll_temac: Fix race condition causing TX hang")
      Cc: stable@vger.kernel.org # v5.4+
      Signed-off-by: default avatarEsben Haabendal <esben@geanix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cfe403f2
    • Esben Haabendal's avatar
      net: ll_temac: Make sure to free skb when it is completely used · 019ab7d0
      Esben Haabendal authored
      commit 6aa32217
      
       upstream.
      
      With the skb pointer piggy-backed on the TX BD, we have a simple and
      efficient way to free the skb buffer when the frame has been transmitted.
      But in order to avoid freeing the skb while there are still fragments from
      the skb in use, we need to piggy-back on the TX BD of the skb, not the
      first.
      
      Without this, we are doing use-after-free on the DMA side, when the first
      BD of a multi TX BD packet is seen as completed in xmit_done, and the
      remaining BDs are still being processed.
      
      Cc: stable@vger.kernel.org # v5.4+
      Signed-off-by: default avatarEsben Haabendal <esben@geanix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      019ab7d0
    • Yifan Zhang's avatar
      drm/amdgpu/gfx9: fix the doorbell missing when in CGPG issue. · 41984d4f
      Yifan Zhang authored
      commit 4cbbe348
      
       upstream.
      
      If GC has entered CGPG, ringing doorbell > first page doesn't wakeup GC.
      Enlarge CP_MEC_DOORBELL_RANGE_UPPER to workaround this issue.
      
      Signed-off-by: default avatarYifan Zhang <yifan1.zhang@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Reviewed-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      41984d4f
    • Yifan Zhang's avatar
      drm/amdgpu/gfx10: enlarge CP_MEC_DOORBELL_RANGE_UPPER to cover full doorbell. · bc58ec30
      Yifan Zhang authored
      commit 1c0b0efd
      
       upstream.
      
      If GC has entered CGPG, ringing doorbell > first page doesn't wakeup GC.
      Enlarge CP_MEC_DOORBELL_RANGE_UPPER to workaround this issue.
      
      Signed-off-by: default avatarYifan Zhang <yifan1.zhang@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Reviewed-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bc58ec30
    • Avraham Stern's avatar
      cfg80211: avoid double free of PMSR request · 96b4126f
      Avraham Stern authored
      commit 0288e5e1 upstream.
      
      If cfg80211_pmsr_process_abort() moves all the PMSR requests that
      need to be freed into a local list before aborting and freeing them.
      As a result, it is possible that cfg80211_pmsr_complete() will run in
      parallel and free the same PMSR request.
      
      Fix it by freeing the request in cfg80211_pmsr_complete() only if it
      is still in the original pmsr list.
      
      Cc: stable@vger.kernel.org
      Fixes: 9bb7e0f2
      
       ("cfg80211: add peer measurement with FTM initiator API")
      Signed-off-by: default avatarAvraham Stern <avraham.stern@intel.com>
      Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      Link: https://lore.kernel.org/r/iwlwifi.20210618133832.1fbef57e269a.I00294bebdb0680b892f8d1d5c871fd9dbe785a5e@changeid
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      96b4126f
    • Johannes Berg's avatar
      cfg80211: make certificate generation more robust · 5493b0c2
      Johannes Berg authored
      commit b5642479
      
       upstream.
      
      If all net/wireless/certs/*.hex files are deleted, the build
      will hang at this point since the 'cat' command will have no
      arguments. Do "echo | cat - ..." so that even if the "..."
      part is empty, the whole thing won't hang.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      Link: https://lore.kernel.org/r/iwlwifi.20210618133832.c989056c3664.Ic3b77531d00b30b26dcd69c64e55ae2f60c3f31e@changeid
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5493b0c2
    • Mathy Vanhoef's avatar
      mac80211: Fix NULL ptr deref for injected rate info · f74df6e0
      Mathy Vanhoef authored
      commit bddc0c41 upstream.
      
      The commit cb17ed29
      
       ("mac80211: parse radiotap header when selecting Tx
      queue") moved the code to validate the radiotap header from
      ieee80211_monitor_start_xmit to ieee80211_parse_tx_radiotap. This made is
      possible to share more code with the new Tx queue selection code for
      injected frames. But at the same time, it now required the call of
      ieee80211_parse_tx_radiotap at the beginning of functions which wanted to
      handle the radiotap header. And this broke the rate parser for radiotap
      header parser.
      
      The radiotap parser for rates is operating most of the time only on the
      data in the actual radiotap header. But for the 802.11a/b/g rates, it must
      also know the selected band from the chandef information. But this
      information is only written to the ieee80211_tx_info at the end of the
      ieee80211_monitor_start_xmit - long after ieee80211_parse_tx_radiotap was
      already called. The info->band information was therefore always 0
      (NL80211_BAND_2GHZ) when the parser code tried to access it.
      
      For a 5GHz only device, injecting a frame with 802.11a rates would cause a
      NULL pointer dereference because local->hw.wiphy->bands[NL80211_BAND_2GHZ]
      would most likely have been NULL when the radiotap parser searched for the
      correct rate index of the driver.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarBen Greear <greearb@candelatech.com>
      Fixes: cb17ed29
      
       ("mac80211: parse radiotap header when selecting Tx queue")
      Signed-off-by: default avatarMathy Vanhoef <Mathy.Vanhoef@kuleuven.be>
      [sven@narfation.org: added commit message]
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Link: https://lore.kernel.org/r/20210530133226.40587-1-sven@narfation.org
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f74df6e0
    • Bumyong Lee's avatar
      dmaengine: pl330: fix wrong usage of spinlock flags in dma_cyclc · df203c1f
      Bumyong Lee authored
      commit 4ad5dd2d
      
       upstream.
      
      flags varible which is the input parameter of pl330_prep_dma_cyclic()
      should not be used by spinlock_irq[save/restore] function.
      
      Signed-off-by: default avatarJongho Park <jongho7.park@samsung.com>
      Signed-off-by: default avatarBumyong Lee <bumyong.lee@samsung.com>
      Signed-off-by: default avatarChanho Park <chanho61.park@samsung.com>
      Link: https://lore.kernel.org/r/20210507063647.111209-1-chanho61.park@samsung.com
      Fixes: f6f2421c
      
       ("dmaengine: pl330: Merge dma_pl330_dmac and pl330_dmac structs")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarVinod Koul <vkoul@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      df203c1f
    • Pingfan Liu's avatar
      crash_core, vmcoreinfo: append 'SECTION_SIZE_BITS' to vmcoreinfo · b842b568
      Pingfan Liu authored
      commit 4f5aecdf upstream.
      
      As mentioned in kernel commit 1d50e5d0 ("crash_core, vmcoreinfo:
      Append 'MAX_PHYSMEM_BITS' to vmcoreinfo"), SECTION_SIZE_BITS in the
      formula:
      
          #define SECTIONS_SHIFT    (MAX_PHYSMEM_BITS - SECTION_SIZE_BITS)
      
      Besides SECTIONS_SHIFT, SECTION_SIZE_BITS is also used to calculate
      PAGES_PER_SECTION in makedumpfile just like kernel.
      
      Unfortunately, this arch-dependent macro SECTION_SIZE_BITS changes, e.g.
      recently in kernel commit f0b13ee2
      
       ("arm64/sparsemem: reduce
      SECTION_SIZE_BITS").  But user space wants a stable interface to get
      this info.  Such info is impossible to be deduced from a crashdump
      vmcore.  Hence append SECTION_SIZE_BITS to vmcoreinfo.
      
      Link: https://lkml.kernel.org/r/20210608103359.84907-1-kernelfans@gmail.com
      Link: http://lists.infradead.org/pipermail/kexec/2021-June/022676.html
      Signed-off-by: default avatarPingfan Liu <kernelfans@gmail.com>
      Acked-by: default avatarBaoquan He <bhe@redha...>
      b842b568
    • Thomas Gleixner's avatar
      x86/fpu: Reset state for all signal restore failures · 63ba8356
      Thomas Gleixner authored
      commit efa16550 upstream.
      
      If access_ok() or fpregs_soft_set() fails in __fpu__restore_sig() then the
      function just returns but does not clear the FPU state as it does for all
      other fatal failures.
      
      Clear the FPU state for these failures as well.
      
      Fixes: 72a671ce
      
       ("x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/87mtryyhhz.ffs@nanos.tec.linutronix.de
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      63ba8356
    • Andy Lutomirski's avatar
      x86/fpu: Invalidate FPU state after a failed XRSTOR from a user buffer · a7748e02
      Andy Lutomirski authored
      commit d8778e39 upstream.
      
      Both Intel and AMD consider it to be architecturally valid for XRSTOR to
      fail with #PF but nonetheless change the register state.  The actual
      conditions under which this might occur are unclear [1], but it seems
      plausible that this might be triggered if one sibling thread unmaps a page
      and invalidates the shared TLB while another sibling thread is executing
      XRSTOR on the page in question.
      
      __fpu__restore_sig() can execute XRSTOR while the hardware registers
      are preserved on behalf of a different victim task (using the
      fpu_fpregs_owner_ctx mechanism), and, in theory, XRSTOR could fail but
      modify the registers.
      
      If this happens, then there is a window in which __fpu__restore_sig()
      could schedule out and the victim task could schedule back in without
      reloading its own FPU registers. This would result in part of the FPU
      state that __fpu__restore_sig() was attempting to load leaking into the
      victim task's user-visible state.
      
      Invalidate preserved FPU registers on XRSTOR failure to prevent this
      situation from corrupting any state.
      
      [1] Frequent readers of the errata lists might imagine "complex
          microarchitectural conditions".
      
      Fixes: 1d731e73
      
       ("x86/fpu: Add a fastpath to __fpu__restore_sig()")
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarRik van Riel <riel@surriel.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20210608144345.758116583@linutronix.de
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a7748e02
    • Thomas Gleixner's avatar
      x86/fpu: Prevent state corruption in __fpu__restore_sig() · 076f732b
      Thomas Gleixner authored
      commit 484cea4f upstream.
      
      The non-compacted slowpath uses __copy_from_user() and copies the entire
      user buffer into the kernel buffer, verbatim.  This means that the kernel
      buffer may now contain entirely invalid state on which XRSTOR will #GP.
      validate_user_xstate_header() can detect some of that corruption, but that
      leaves the onus on callers to clear the buffer.
      
      Prior to XSAVES support, it was possible just to reinitialize the buffer,
      completely, but with supervisor states that is not longer possible as the
      buffer clearing code split got it backwards. Fixing that is possible but
      not corrupting the state in the first place is more robust.
      
      Avoid corruption of the kernel XSAVE buffer by using copy_user_to_xstate()
      which validates the XSAVE header contents before copying the actual states
      to the kernel. copy_user_to_xstate() was previously only called for
      compacted-format kernel buffers, but it works for both compacted and
      non-compacted forms.
      
      Using it for the non-compacted form is slower because of multiple
      __copy_from_user() operations, but that cost is less important than robust
      code in an already slow path.
      
      [ Changelog polished by Dave Hansen ]
      
      Fixes: b860eb8d
      
       ("x86/fpu/xstate: Define new functions for clearing fpregs and xstates")
      Reported-by: default avatar <syzbot+2067e764dbcd10721e2e@syzkaller.appspotmail.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarRik van Riel <riel@surriel.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20210608144345.611833074@linutronix.de
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      076f732b
    • Thomas Gleixner's avatar
      x86/pkru: Write hardware init value to PKRU when xstate is init · abc790bd
      Thomas Gleixner authored
      commit 510b80a6 upstream.
      
      When user space brings PKRU into init state, then the kernel handling is
      broken:
      
        T1 user space
           xsave(state)
           state.header.xfeatures &= ~XFEATURE_MASK_PKRU;
           xrstor(state)
      
        T1 -> kernel
           schedule()
             XSAVE(S) -> T1->xsave.header.xfeatures[PKRU] == 0
             T1->flags |= TIF_NEED_FPU_LOAD;
      
             wrpkru();
      
           schedule()
             ...
             pk = get_xsave_addr(&T1->fpu->state.xsave, XFEATURE_PKRU);
             if (pk)
      	 wrpkru(pk->pkru);
             else
      	 wrpkru(DEFAULT_PKRU);
      
      Because the xfeatures bit is 0 and therefore the value in the xsave
      storage is not valid, get_xsave_addr() returns NULL and switch_to()
      writes the default PKRU. -> FAIL #1!
      
      So that wrecks any copy_to/from_user() on the way back to user space
      which hits memory which is protected by the default PKRU value.
      
      Assumed that this does not fail (pure luck) then T1 goes back to user
      space and because TIF_NEED_FPU_LOAD is set it ends up in
      
        switch_fpu_return()
            __fpregs_load_activate()
              if (!fpregs_state_valid()) {
        	 load_XSTATE_from_task();
              }
      
      But if nothing touched the FPU between T1 scheduling out and back in,
      then the fpregs_state is still valid which means switch_fpu_return()
      does nothing and just clears TIF_NEED_FPU_LOAD. Back to user space with
      DEFAULT_PKRU loaded. -> FAIL #2!
      
      The fix is simple: if get_xsave_addr() returns NULL then set the
      PKRU value to 0 instead of the restrictive default PKRU value in
      init_pkru_value.
      
       [ bp: Massage in minor nitpicks from folks. ]
      
      Fixes: 0cecca9d
      
       ("x86/fpu: Eager switch PKRU state")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarRik van Riel <riel@surriel.com>
      Tested-by: default avatarBabu Moger <babu.moger@amd.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20210608144346.045616965@linutronix.de
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      abc790bd
    • Tom Lendacky's avatar
      x86/ioremap: Map EFI-reserved memory as encrypted for SEV · 208bb686
      Tom Lendacky authored
      commit 8d651ee9 upstream.
      
      Some drivers require memory that is marked as EFI boot services
      data. In order for this memory to not be re-used by the kernel
      after ExitBootServices(), efi_mem_reserve() is used to preserve it
      by inserting a new EFI memory descriptor and marking it with the
      EFI_MEMORY_RUNTIME attribute.
      
      Under SEV, memory marked with the EFI_MEMORY_RUNTIME attribute needs to
      be mapped encrypted by Linux, otherwise the kernel might crash at boot
      like below:
      
        EFI Variables Facility v0.08 2004-May-17
        general protection fault, probably for non-canonical address 0x3597688770a868b2: 0000 [#1] SMP NOPTI
        CPU: 13 PID: 1 Comm: swapper/0 Not tainted 5.12.4-2-default #1 openSUSE Tumbleweed
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
        RIP: 0010:efi_mokvar_entry_next
        [...]
        Call Trace:
         efi_mokvar_sysfs_init
         ? efi_mokvar_table_init
         do_one_initcall
         ? __kmalloc
         kernel_init_freeable
         ? rest_init
         kernel_init
         ret_from_fork
      
      Expand the __ioremap_check_other() function to additionally check for
      this other type of boot data reserved at runtime and indicate that it
      should be mapped encrypted for an SEV guest.
      
       [ bp: Massage commit message. ]
      
      Fixes: 58c90902
      
       ("efi: Support for MOK variable config table")
      Reported-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Tested-by: default avatarJoerg Roedel <jroedel@suse.de>
      Cc: <stable@vger.kernel.org> # 5.10+
      Link: https://lkml.kernel.org/r/20210608095439.12668-2-joro@8bytes.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      208bb686
    • Thomas Gleixner's avatar
      x86/process: Check PF_KTHREAD and not current->mm for kernel threads · 75a55bc2
      Thomas Gleixner authored
      commit 12f7764a upstream.
      
      switch_fpu_finish() checks current->mm as indicator for kernel threads.
      That's wrong because kernel threads can temporarily use a mm of a user
      process via kthread_use_mm().
      
      Check the task flags for PF_KTHREAD instead.
      
      Fixes: 0cecca9d
      
       ("x86/fpu: Eager switch PKRU state")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarRik van Riel <riel@surriel.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20210608144345.912645927@linutronix.de
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      75a55bc2
    • Fan Du's avatar
      x86/mm: Avoid truncating memblocks for SGX memory · ddaaf38e
      Fan Du authored
      commit 28e5e44a upstream.
      
      tl;dr:
      
      Several SGX users reported seeing the following message on NUMA systems:
      
        sgx: [Firmware Bug]: Unable to map EPC section to online node. Fallback to the NUMA node 0.
      
      This turned out to be the memblock code mistakenly throwing away SGX
      memory.
      
      === Full Changelog ===
      
      The 'max_pfn' variable represents the highest known RAM address.  It can
      be used, for instance, to quickly determine for which physical addresses
      there is mem_map[] space allocated.  The numa_meminfo code makes an
      effort to throw out ("trim") all memory blocks which are above 'max_pfn'.
      
      SGX memory is not considered RAM (it is marked as "Reserved" in the
      e820) and is not taken into account by max_pfn. Despite this, SGX memory
      areas have NUMA affinity and are enumerated in the ACPI SRAT table. The
      existing SGX code uses the numa_meminfo mechanism to look up the NUMA
      affinity for its memory areas.
      
      In cases where SGX memory was above max_pfn (usually just the one EPC
      section in the last highest NUMA node), the numa_memblock is truncated
      at 'max_pfn', which is below the SGX memory.  When the SGX code tries to
      look up the affinity of this memory, it fails and produces an error message:
      
        sgx: [Firmware Bug]: Unable to map EPC section to online node. Fallback to the NUMA node 0.
      
      and assigns the memory to NUMA node 0.
      
      Instead of silently truncating the memory block at 'max_pfn' and
      dropping the SGX memory, add the truncated portion to
      'numa_reserved_meminfo'.  This allows the SGX code to later determine
      the NUMA affinity of its 'Reserved' area.
      
      Before, numa_meminfo looked like this (from 'crash'):
      
        blk = { start =          0x0, end = 0x2080000000, nid = 0x0 }
              { start = 0x2080000000, end = 0x4000000000, nid = 0x1 }
      
      numa_reserved_meminfo is empty.
      
      With this, numa_meminfo looks like this:
      
        blk = { start =          0x0, end = 0x2080000000, nid = 0x0 }
              { start = 0x2080000000, end = 0x4000000000, nid = 0x1 }
      
      and numa_reserved_meminfo has an entry for node 1's SGX memory:
      
        blk =  { start = 0x4000000000, end = 0x4080000000, nid = 0x1 }
      
       [ daveh: completely rewrote/reworked changelog ]
      
      Fixes: 5d30f92e
      
       ("x86/NUMA: Provide a range-to-target_node lookup facility")
      Reported-by: default avatarReinette Chatre <reinette.chatre@intel.com>
      Signed-off-by: default avatarFan Du <fan.du@intel.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@intel.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarDave Hansen <dave.hansen@intel.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20210617194657.0A99CB22@viggo.jf.intel.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ddaaf38e
    • Vineet Gupta's avatar
      ARCv2: save ABI registers across signal handling · f6bcb1a6
      Vineet Gupta authored
      commit 96f1b001
      
       upstream.
      
      ARCv2 has some configuration dependent registers (r30, r58, r59) which
      could be targetted by the compiler. To keep the ABI stable, these were
      unconditionally part of the glibc ABI
      (sysdeps/unix/sysv/linux/arc/sys/ucontext.h:mcontext_t) however we
      missed populating them (by saving/restoring them across signal
      handling).
      
      This patch fixes the issue by
       - adding arcv2 ABI regs to kernel struct sigcontext
       - populating them during signal handling
      
      Change to struct sigcontext might seem like a glibc ABI change (although
      it primarily uses ucontext_t:mcontext_t) but the fact is
       - it has only been extended (existing fields are not touched)
       - the old sigcontext was ABI incomplete to begin with anyways
      
      Fixes: https://github.com/foss-for-synopsys-dwc-arc-processors/linux/issues/53
      Cc: <stable@vger.kernel.org>
      Tested-by: default avatarkernel test robot <lkp@intel.com>
      Reported-by: default avatarVladimir Isaev <isaev@synopsys.com>
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f6bcb1a6
    • Harald Freudenberger's avatar
      s390/ap: Fix hanging ioctl caused by wrong msg counter · b516daed
      Harald Freudenberger authored
      commit e73a99f3
      
       upstream.
      
      When a AP queue is switched to soft offline, all pending
      requests are purged out of the pending requests list and
      'received' by the upper layer like zcrypt device drivers.
      This is also done for requests which are already enqueued
      into the firmware queue. A request in a firmware queue
      may eventually produce an response message, but there is
      no waiting process any more. However, the response was
      counted with the queue_counter and as this counter was
      reset to 0 with the offline switch, the pending response
      caused the queue_counter to get negative. The next request
      increased this counter to 0 (instead of 1) which caused
      the ap code to assume there is nothing to receive and so
      the response for this valid request was never tried to
      fetch from the firmware queue.
      
      This all caused a queue to not work properly after a
      switch offline/online and in the end processes to hang
      forever when trying to send a crypto request after an
      queue offline/online switch cicle.
      
      Fixed by a) making sure the counter does not drop below 0
      and b) on a successful enqueue of a message has at least
      a value of 1.
      
      Additionally a warning is emitted, when a reply can't get
      assigned to a waiting process. This may be normal operation
      (process had timeout or has been killed) but may give a
      hint that something unexpected happened (like this odd
      behavior described above).
      
      Signed-off-by: default avatarHarald Freudenberger <freude@linux.ibm.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b516daed
    • Alexander Gordeev's avatar
      s390/mcck: fix calculation of SIE critical section size · 7c003dab
      Alexander Gordeev authored
      commit 5bcbe328 upstream.
      
      The size of SIE critical section is calculated wrongly
      as result of a missed subtraction in commit 0b0ed657
      ("s390: remove critical section cleanup from entry.S")
      
      Fixes: 0b0ed657
      
       ("s390: remove critical section cleanup from entry.S")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      Reviewed-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7c003dab
    • Wanpeng Li's avatar
      KVM: X86: Fix x86_emulator slab cache leak · 3a9934d6
      Wanpeng Li authored
      commit dfdc0a71 upstream.
      
      Commit c9b8b07c (KVM: x86: Dynamically allocate per-vCPU emulation context)
      tries to allocate per-vCPU emulation context dynamically, however, the
      x86_emulator slab cache is still exiting after the kvm module is unload
      as below after destroying the VM and unloading the kvm module.
      
      grep x86_emulator /proc/slabinfo
      x86_emulator          36     36   2672   12    8 : tunables    0    0    0 : slabdata      3      3      0
      
      This patch fixes this slab cache leak by destroying the x86_emulator slab cache
      when the kvm module is unloaded.
      
      Fixes: c9b8b07c
      
       (KVM: x86: Dynamically allocate per-vCPU emulation context)
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Message-Id: <1623387573-5969-1-git-send-email-wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a9934d6
    • Sean Christopherson's avatar
      KVM: x86/mmu: Calculate and check "full" mmu_role for nested MMU · 18eca69f
      Sean Christopherson authored
      commit 654430ef upstream.
      
      Calculate and check the full mmu_role when initializing the MMU context
      for the nested MMU, where "full" means the bits and pieces of the role
      that aren't handled by kvm_calc_mmu_role_common().  While the nested MMU
      isn't used for shadow paging, things like the number of levels in the
      guest's page tables are surprisingly important when walking the guest
      page tables.  Failure to reinitialize the nested MMU context if L2's
      paging mode changes can result in unexpected and/or missed page faults,
      and likely other explosions.
      
      E.g. if an L1 vCPU is running both a 32-bit PAE L2 and a 64-bit L2, the
      "common" role calculation will yield the same role for both L2s.  If the
      64-bit L2 is run after the 32-bit PAE L2, L0 will fail to reinitialize
      the nested MMU context, ultimately resulting in a bad walk of L2's page
      tables as the MMU will still have a guest root_level of PT32E_ROOT_LEVEL.
      
        WARNING: CPU: 4 PID: 167334 at arch/x86/kvm/vmx/vmx.c:3075 ept_save_pdptrs+0x15/0xe0 [kvm_intel]
        Modules linked in: kvm_intel]
        CPU: 4 PID: 167334 Comm: CPU 3/KVM Not tainted 5.13.0-rc1-d849817d5673-reqs #185
        Hardware name: ASUS Q87M-E/Q87M-E, BIOS 1102 03/03/2014
        RIP: 0010:ept_save_pdptrs+0x15/0xe0 [kvm_intel]
        Code: <0f> 0b c3 f6 87 d8 02 00f
        RSP: 0018:ffffbba702dbba00 EFLAGS: 00010202
        RAX: 0000000000000011 RBX: 0000000000000002 RCX: ffffffff810a2c08
        RDX: ffff91d7bc30acc0 RSI: 0000000000000011 RDI: ffff91d7bc30a600
        RBP: ffff91d7bc30a600 R08: 0000000000000010 R09: 0000000000000007
        R10: 0000000000000000 R11: 0000000000000000 R12: ffff91d7bc30a600
        R13: ffff91d7bc30acc0 R14: ffff91d67c123460 R15: 0000000115d7e005
        FS:  00007fe8e9ffb700(0000) GS:ffff91d90fb00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000000 CR3: 000000029f15a001 CR4: 00000000001726e0
        Call Trace:
         kvm_pdptr_read+0x3a/0x40 [kvm]
         paging64_walk_addr_generic+0x327/0x6a0 [kvm]
         paging64_gva_to_gpa_nested+0x3f/0xb0 [kvm]
         kvm_fetch_guest_virt+0x4c/0xb0 [kvm]
         __do_insn_fetch_bytes+0x11a/0x1f0 [kvm]
         x86_decode_insn+0x787/0x1490 [kvm]
         x86_decode_emulated_instruction+0x58/0x1e0 [kvm]
         x86_emulate_instruction+0x122/0x4f0 [kvm]
         vmx_handle_exit+0x120/0x660 [kvm_intel]
         kvm_arch_vcpu_ioctl_run+0xe25/0x1cb0 [kvm]
         kvm_vcpu_ioctl+0x211/0x5a0 [kvm]
         __x64_sys_ioctl+0x83/0xb0
         do_syscall_64+0x40/0xb0
         entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: bf627a92
      
       ("x86/kvm/mmu: check if MMU reconfiguration is needed in init_kvm_nested_mmu()")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210610220026.1364486-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      18eca69f
    • Sean Christopherson's avatar
      KVM: x86: Immediately reset the MMU context when the SMM flag is cleared · 669a8866
      Sean Christopherson authored
      commit 78fcb2c9
      
       upstream.
      
      Immediately reset the MMU context when the vCPU's SMM flag is cleared so
      that the SMM flag in the MMU role is always synchronized with the vCPU's
      flag.  If RSM fails (which isn't correctly emulated), KVM will bail
      without calling post_leave_smm() and leave the MMU in a bad state.
      
      The bad MMU role can lead to a NULL pointer dereference when grabbing a
      shadow page's rmap for a page fault as the initial lookups for the gfn
      will happen with the vCPU's SMM flag (=0), whereas the rmap lookup will
      use the shadow page's SMM flag, which comes from the MMU (=1).  SMM has
      an entirely different set of memslots, and so the initial lookup can find
      a memslot (SMM=0) and then explode on the rmap memslot lookup (SMM=1).
      
        general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
        KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
        CPU: 1 PID: 8410 Comm: syz-executor382 Not tainted 5.13.0-rc5-syzkaller #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        RIP: 0010:__gfn_to_rmap arch/x86/kvm/mmu/mmu.c:935 [inline]
        RIP: 0010:gfn_to_rmap+0x2b0/0x4d0 arch/x86/kvm/mmu/mmu.c:947
        Code: <42> 80 3c 20 00 74 08 4c 89 ff e8 f1 79 a9 00 4c 89 fb 4d 8b 37 44
        RSP: 0018:ffffc90000ffef98 EFLAGS: 00010246
        RAX: 0000000000000000 RBX: ffff888015b9f414 RCX: ffff888019669c40
        RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000001
        RBP: 0000000000000001 R08: ffffffff811d9cdb R09: ffffed10065a6002
        R10: ffffed10065a6002 R11: 0000000000000000 R12: dffffc0000000000
        R13: 0000000000000003 R14: 0000000000000001 R15: 0000000000000000
        FS:  000000000124b300(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000000 CR3: 0000000028e31000 CR4: 00000000001526e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         rmap_add arch/x86/kvm/mmu/mmu.c:965 [inline]
         mmu_set_spte+0x862/0xe60 arch/x86/kvm/mmu/mmu.c:2604
         __direct_map arch/x86/kvm/mmu/mmu.c:2862 [inline]
         direct_page_fault+0x1f74/0x2b70 arch/x86/kvm/mmu/mmu.c:3769
         kvm_mmu_do_page_fault arch/x86/kvm/mmu.h:124 [inline]
         kvm_mmu_page_fault+0x199/0x1440 arch/x86/kvm/mmu/mmu.c:5065
         vmx_handle_exit+0x26/0x160 arch/x86/kvm/vmx/vmx.c:6122
         vcpu_enter_guest+0x3bdd/0x9630 arch/x86/kvm/x86.c:9428
         vcpu_run+0x416/0xc20 arch/x86/kvm/x86.c:9494
         kvm_arch_vcpu_ioctl_run+0x4e8/0xa40 arch/x86/kvm/x86.c:9722
         kvm_vcpu_ioctl+0x70f/0xbb0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3460
         vfs_ioctl fs/ioctl.c:51 [inline]
         __do_sys_ioctl fs/ioctl.c:1069 [inline]
         __se_sys_ioctl+0xfb/0x170 fs/ioctl.c:1055
         do_syscall_64+0x3f/0xb0 arch/x86/entry/common.c:47
         entry_SYSCALL_64_after_hwframe+0x44/0xae
        RIP: 0033:0x440ce9
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatar <syzbot+fb0b6a7e8713aeb0319c@syzkaller.appspotmail.com>
      Fixes: 9ec19493
      
       ("KVM: x86: clear SMM flags before loading state while leaving SMM")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210609185619.992058-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      669a8866
    • Chiqijun's avatar
      PCI: Work around Huawei Intelligent NIC VF FLR erratum · 077cb894
      Chiqijun authored
      commit ce00322c
      
       upstream.
      
      pcie_flr() starts a Function Level Reset (FLR), waits 100ms (the maximum
      time allowed for FLR completion by PCIe r5.0, sec 6.6.2), and waits for the
      FLR to complete.  It assumes the FLR is complete when a config read returns
      valid data.
      
      When we do an FLR on several Huawei Intelligent NIC VFs at the same time,
      firmware on the NIC processes them serially.  The VF may respond to config
      reads before the firmware has completed its reset processing.  If we bind a
      driver to the VF (e.g., by assigning the VF to a virtual machine) in the
      interval between the successful config read and completion of the firmware
      reset processing, the NIC VF driver may fail to load.
      
      Prevent this driver failure by waiting for the NIC firmware to complete its
      reset processing.  Not all NIC firmware supports this feature.
      
      [bhelgaas: commit log]
      Link: https://support.huawei.com/enterprise/en/doc/EDOC1100063073/87950645/vm-oss-occasionally-fail-to-load-the-in200-driver-when-the-vf-performs-flr
      Link: https://lore.kernel.org/r/20210414132301.1793-1-chiqijun@huawei.com
      Signed-off-by: default avatarChiqijun <chiqijun@huawei.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      077cb894
    • Sriharsha Basavapatna's avatar
      PCI: Add ACS quirk for Broadcom BCM57414 NIC · ee1a9cfe
      Sriharsha Basavapatna authored
      commit db2f77e2
      
       upstream.
      
      The Broadcom BCM57414 NIC may be a multi-function device.  While it does
      not advertise an ACS capability, peer-to-peer transactions are not possible
      between the individual functions, so it is safe to treat them as fully
      isolated.
      
      Add an ACS quirk for this device so the functions can be in independent
      IOMMU groups and attached individually to userspace applications using
      VFIO.
      
      [bhelgaas: commit log]
      Link: https://lore.kernel.org/r/1621645997-16251-1-git-send-email-michael.chan@broadcom.com
      Signed-off-by: default avatarSriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ee1a9cfe
    • Pali Rohár's avatar
      PCI: aardvark: Fix kernel panic during PIO transfer · 1a1dbc44
      Pali Rohár authored
      commit f1813996
      
       upstream.
      
      Trying to start a new PIO transfer by writing value 0 in PIO_START register
      when previous transfer has not yet completed (which is indicated by value 1
      in PIO_START) causes an External Abort on CPU, which results in kernel
      panic:
      
          SError Interrupt on CPU0, code 0xbf000002 -- SError
          Kernel panic - not syncing: Asynchronous SError Interrupt
      
      To prevent kernel panic, it is required to reject a new PIO transfer when
      previous one has not finished yet.
      
      If previous PIO transfer is not finished yet, the kernel may issue a new
      PIO request only if the previous PIO transfer timed out.
      
      In the past the root cause of this issue was incorrectly identified (as it
      often happens during link retraining or after link down event) and special
      hack was implemented in Trusted Firmware to catch all SError events in EL3,
      to ignore errors with code 0xbf000002 and not forwarding any other errors
      to kernel and instead throw panic from EL3 Trusted Firmware handler.
      
      Links to discussion and patches about this issue:
      https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/commit/?id=3c7dcdac5c50
      https://lore.kernel.org/linux-pci/20190316161243.29517-1-repk@triplefau.lt/
      https://lore.kernel.org/linux-pci/971be151d24312cc533989a64bd454b4@www.loen.fr/
      https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/1541
      
      But the real cause was the fact that during link retraining or after link
      down event the PIO transfer may take longer time, up to the 1.44s until it
      times out. This increased probability that a new PIO transfer would be
      issued by kernel while previous one has not finished yet.
      
      After applying this change into the kernel, it is possible to revert the
      mentioned TF-A hack and SError events do not have to be caught in TF-A EL3.
      
      Link: https://lore.kernel.org/r/20210608203655.31228-1-pali@kernel.org
      Signed-off-by: default avatarPali Rohár <pali@kernel.org>
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarMarek Behún <kabel@kernel.org>
      Cc: stable@vger.kernel.org # 7fbcb5da
      
       ("PCI: aardvark: Don't rely on jiffies while holding spinlock")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1a1dbc44
    • Shanker Donthineni's avatar
      PCI: Mark some NVIDIA GPUs to avoid bus reset · dac77a14
      Shanker Donthineni authored
      commit 4c207e71
      
       upstream.
      
      Some NVIDIA GPU devices do not work with SBR.  Triggering SBR leaves the
      device inoperable for the current system boot. It requires a system
      hard-reboot to get the GPU device back to normal operating condition
      post-SBR. For the affected devices, enable NO_BUS_RESET quirk to avoid the
      issue.
      
      This issue will be fixed in the next generation of hardware.
      
      Link: https://lore.kernel.org/r/20210608054857.18963-8-ameynarkhede03@gmail.com
      Signed-off-by: default avatarShanker Donthineni <sdonthineni@nvidia.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarSinan Kaya <okaya@kernel.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dac77a14
    • Antti Järvinen's avatar
      PCI: Mark TI C667X to avoid bus reset · 1e460ddf
      Antti Järvinen authored
      commit b5cf198e
      
       upstream.
      
      Some TI KeyStone C667X devices do not support bus/hot reset.  The PCIESS
      automatically disables LTSSM when Secondary Bus Reset is received and
      device stops working.  Prevent bus reset for these devices.  With this
      change, the device can be assigned to VMs with VFIO, but it will leak state
      between VMs.
      
      Reference: https://e2e.ti.com/support/processors/f/791/t/954382
      Link: https://lore.kernel.org/r/20210315102606.17153-1-antti.jarvinen@gmail.com
      Signed-off-by: default avatarAntti Järvinen <antti.jarvinen@gmail.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarKishon Vijay Abraham I <kishon@ti.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1e460ddf
    • Steven Rostedt (VMware)'s avatar
      tracing: Do no increment trace_clock_global() by one · c9fd0ab3
      Steven Rostedt (VMware) authored
      commit 89529d8b upstream.
      
      The trace_clock_global() tries to make sure the events between CPUs is
      somewhat in order. A global value is used and updated by the latest read
      of a clock. If one CPU is ahead by a little, and is read by another CPU, a
      lock is taken, and if the timestamp of the other CPU is behind, it will
      simply use the other CPUs timestamp.
      
      The lock is also only taken with a "trylock" due to tracing, and strange
      recursions can happen. The lock is not taken at all in NMI context.
      
      In the case where the lock is not able to be taken, the non synced
      timestamp is returned. But it will not be less than the saved global
      timestamp.
      
      The problem arises because when the time goes "backwards" the time
      returned is the saved timestamp plus 1. If the lock is not taken, and the
      plus one to the timestamp is returned, there's a small race that can cause
      the time to go backwards!
      
      	CPU0				CPU1
      	----				----
      				trace_clock_global() {
      				    ts = clock() [ 1000 ]
      				    trylock(clock_lock) [ success ]
      				    global_ts = ts; [ 1000 ]
      
      				    <interrupted by NMI>
       trace_clock_global() {
          ts = clock() [ 999 ]
          if (ts < global_ts)
      	ts = global_ts + 1 [ 1001 ]
      
          trylock(clock_lock) [ fail ]
      
          return ts [ 1001]
       }
      				    unlock(clock_lock);
      				    return ts; [ 1000 ]
      				}
      
       trace_clock_global() {
          ts = clock() [ 1000 ]
          if (ts < global_ts) [ false 1000 == 1000 ]
      
          trylock(clock_lock) [ success ]
          global_ts = ts; [ 1000 ]
          unlock(clock_lock)
      
          return ts; [ 1000 ]
       }
      
      The above case shows to reads of trace_clock_global() on the same CPU, but
      the second read returns one less than the first read. That is, time when
      backwards, and this is not what is allowed by trace_clock_global().
      
      This was triggered by heavy tracing and the ring buffer checker that tests
      for the clock going backwards:
      
       Ring buffer clock went backwards: 20613921464 -> 20613921463
       ------------[ cut here ]------------
       WARNING: CPU: 2 PID: 0 at kernel/trace/ring_buffer.c:3412 check_buffer+0x1b9/0x1c0
       Modules linked in:
       [..]
       [CPU: 2]TIME DOES NOT MATCH expected:20620711698 actual:20620711697 delta:6790234 before:20613921463 after:20613921463
         [20613915818] PAGE TIME STAMP
         [20613915818] delta:0
         [20613915819] delta:1
         [20613916035] delta:216
         [20613916465] delta:430
         [20613916575] delta:110
         [20613916749] delta:174
         [20613917248] delta:499
         [20613917333] delta:85
         [20613917775] delta:442
         [20613917921] delta:146
         [20613918321] delta:400
         [20613918568] delta:247
         [20613918768] delta:200
         [20613919306] delta:538
         [20613919353] delta:47
         [20613919980] delta:627
         [20613920296] delta:316
         [20613920571] delta:275
         [20613920862] delta:291
         [20613921152] delta:290
         [20613921464] delta:312
         [20613921464] delta:0 TIME EXTEND
         [20613921464] delta:0
      
      This happened more than once, and always for an off by one result. It also
      started happening after commit aafe104a was added.
      
      Cc: stable@vger.kernel.org
      Fixes: aafe104a
      
       ("tracing: Restructure trace_clock_global() to never block")
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c9fd0ab3
    • Steven Rostedt (VMware)'s avatar
      tracing: Do not stop recording comms if the trace file is being read · b313bd94
      Steven Rostedt (VMware) authored
      commit 4fdd595e upstream.
      
      A while ago, when the "trace" file was opened, tracing was stopped, and
      code was added to stop recording the comms to saved_cmdlines, for mapping
      of the pids to the task name.
      
      Code has been added that only records the comm if a trace event occurred,
      and there's no reason to not trace it if the trace file is opened.
      
      Cc: stable@vger.kernel.org
      Fixes: 7ffbd48d
      
       ("tracing: Cache comms only after an event occurred")
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b313bd94
    • Steven Rostedt (VMware)'s avatar
      tracing: Do not stop recording cmdlines when tracing is off · adb3849e
      Steven Rostedt (VMware) authored
      commit 85550c83 upstream.
      
      The saved_cmdlines is used to map pids to the task name, such that the
      output of the tracing does not just show pids, but also gives a human
      readable name for the task.
      
      If the name is not mapped, the output looks like this:
      
          <...>-1316          [005] ...2   132.044039: ...
      
      Instead of this:
      
          gnome-shell-1316    [005] ...2   132.044039: ...
      
      The names are updated when tracing is running, but are skipped if tracing
      is stopped. Unfortunately, this stops the recording of the names if the
      top level tracer is stopped, and not if there's other tracers active.
      
      The recording of a name only happens when a new event is written into a
      ring buffer, so there is no need to test if tracing is on or not. If
      tracing is off, then no event is written and no need to test if tracing is
      off or not.
      
      Remove the check, as it hides the names of tasks for events in the
      instance buffers.
      
      Cc: stable@vger.kernel.org
      Fixes: 7ffbd48d
      
       ("tracing: Cache comms only after an event occurred")
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      adb3849e
    • Breno Lima's avatar
      usb: chipidea: imx: Fix Battery Charger 1.2 CDP detection · 1a91fafa
      Breno Lima authored
      commit c6d580d9 upstream.
      
      i.MX8MM cannot detect certain CDP USB HUBs. usbmisc_imx.c driver is not
      following CDP timing requirements defined by USB BC 1.2 specification
      and section 3.2.4 Detection Timing CDP.
      
      During Primary Detection the i.MX device should turn on VDP_SRC and
      IDM_SINK for a minimum of 40ms (TVDPSRC_ON). After a time of TVDPSRC_ON,
      the i.MX is allowed to check the status of the D- line. Current
      implementation is waiting between 1ms and 2ms, and certain BC 1.2
      complaint USB HUBs cannot be detected. Increase delay to 40ms allowing
      enough time for primary detection.
      
      During secondary detection the i.MX is required to disable VDP_SRC and
      IDM_SNK, and enable VDM_SRC and IDP_SINK for at least 40ms (TVDMSRC_ON).
      
      Current implementation is not disabling VDP_SRC and IDM_SNK, introduce
      disable sequence in imx7d_charger_secondary_detection() function.
      
      VDM_SRC and IDP_SINK should be enabled for at least 40ms (TVDMSRC_ON).
      Increase delay allowing enough time for detection.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 746f316b
      
       ("usb: chipidea: introduce imx7d USB charger detection")
      Signed-off-by: default avatarBreno Lima <breno.lima@nxp.com>
      Signed-off-by: default avatarJun Li <jun.li@nxp.com>
      Link: https://lore.kernel.org/r/20210614175013.495808-1-breno.lima@nxp.com
      Signed-off-by: default avatarPeter Chen <peter.chen@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1a91fafa
    • Andrew Lunn's avatar
      usb: core: hub: Disable autosuspend for Cypress CY7C65632 · 576996b6
      Andrew Lunn authored
      commit a7d8d1c7 upstream.
      
      The Cypress CY7C65632 appears to have an issue with auto suspend and
      detecting devices, not too dissimilar to the SMSC 5534B hub. It is
      easiest to reproduce by connecting multiple mass storage devices to
      the hub at the same time. On a Lenovo Yoga, around 1 in 3 attempts
      result in the devices not being detected. It is however possible to
      make them appear using lsusb -v.
      
      Disabling autosuspend for this hub resolves the issue.
      
      Fixes: 1208f9e1
      
       ("USB: hub: Fix the broken detection of USB3 device in SMSC hub")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20210614155524.2228800-1-andrew@lunn.ch
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      576996b6
    • Pavel Skripkin's avatar
      can: mcba_usb: fix memory leak in mcba_usb · 6bd3d80d
      Pavel Skripkin authored
      commit 91c02557 upstream.
      
      Syzbot reported memory leak in SocketCAN driver for Microchip CAN BUS
      Analyzer Tool. The problem was in unfreed usb_coherent.
      
      In mcba_usb_start() 20 coherent buffers are allocated and there is
      nothing, that frees them:
      
      1) In callback function the urb is resubmitted and that's all
      2) In disconnect function urbs are simply killed, but URB_FREE_BUFFER
         is not set (see mcba_usb_start) and this flag cannot be used with
         coherent buffers.
      
      Fail log:
      | [ 1354.053291][ T8413] mcba_usb 1-1:0.0 can0: device disconnected
      | [ 1367.059384][ T8420] kmemleak: 20 new suspected memory leaks (see /sys/kernel/debug/kmem)
      
      So, all allocated buffers should be freed with usb_free_coherent()
      explicitly
      
      NOTE:
      The same pattern for allocating and freeing coherent buffers
      is used in drivers/net/can/usb/kvaser_usb/kvaser_usb_core.c
      
      Fixes: 51f3baad
      
       ("can: mcba_usb: Add support for Microchip CAN BUS Analyzer")
      Link: https://lore.kernel.org/r/20210609215833.30393-1-paskripkin@gmail.com
      Cc: linux-stable <stable@vger.kernel.org>
      Reported-and-tested-by: default avatar <syzbot+57281c762a3922e14dfe@syzkaller.appspotmail.com>
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6bd3d80d
    • Oleksij Rempel's avatar
      can: j1939: fix Use-after-Free, hold skb ref while in use · 509ab6bf
      Oleksij Rempel authored
      commit 2030043e upstream.
      
      This patch fixes a Use-after-Free found by the syzbot.
      
      The problem is that a skb is taken from the per-session skb queue,
      without incrementing the ref count. This leads to a Use-after-Free if
      the skb is taken concurrently from the session queue due to a CTS.
      
      Fixes: 9d71dd0c
      
       ("can: add support of SAE J1939 protocol")
      Link: https://lore.kernel.org/r/20210521115720.7533-1-o.rempel@pengutronix.de
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: linux-stable <stable@vger.kernel.org>
      Reported-by: default avatar <syzbot+220c1a29987a9a490903@syzkaller.appspotmail.com>
      Reported-by: default avatar <syzbot+45199c1b73b4013525cf@syzkaller.appspotmail.com>
      Signed-off-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      509ab6bf
    • Tetsuo Handa's avatar
      can: bcm/raw/isotp: use per module netdevice notifier · 0cf4b377
      Tetsuo Handa authored
      commit 8d0caedb
      
       upstream.
      
      syzbot is reporting hung task at register_netdevice_notifier() [1] and
      unregister_netdevice_notifier() [2], for cleanup_net() might perform
      time consuming operations while CAN driver's raw/bcm/isotp modules are
      calling {register,unregister}_netdevice_notifier() on each socket.
      
      Change raw/bcm/isotp modules to call register_netdevice_notifier() from
      module's __init function and call unregister_netdevice_notifier() from
      module's __exit function, as with gw/j1939 modules are doing.
      
      Link: https://syzkaller.appspot.com/bug?id=391b9498827788b3cc6830226d4ff5be87107c30 [1]
      Link: https://syzkaller.appspot.com/bug?id=1724d278c83ca6e6df100a2e320c10d991cf2bce [2]
      Link: https://lore.kernel.org/r/54a5f451-05ed-f977-8534-79e7aa2bcc8f@i-love.sakura.ne.jp
      Cc: linux-stable <stable@vger.kernel.org>
      Reported-by: default avatarsyzbot <syzbot+355f8edb2ff45d5f95fa@syzkaller.appspotmail.com>
      Reported-by: default avatarsyzbot <syzbot+0f1827363a305f74996f@syzkaller.appspotmail.com>
      Reviewed-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Tested-by: default avatarsyzbot <syzbot+355f8edb2ff45d5f95fa@syzkaller.appspotmail.com>
      Tested-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0cf4b377
    • Norbert Slusarek's avatar
      can: bcm: fix infoleak in struct bcm_msg_head · acb755be
      Norbert Slusarek authored
      commit 5e87ddbe upstream.
      
      On 64-bit systems, struct bcm_msg_head has an added padding of 4 bytes between
      struct members count and ival1. Even though all struct members are initialized,
      the 4-byte hole will contain data from the kernel stack. This patch zeroes out
      struct bcm_msg_head before usage, preventing infoleaks to userspace.
      
      Fixes: ffd980f9
      
       ("[CAN]: Add broadcast manager (bcm) protocol")
      Link: https://lore.kernel.org/r/trinity-7c1b2e82-e34f-4885-8060-2cd7a13769ce-1623532166177@3c-app-gmx-bs52
      Cc: linux-stable <stable@vger.kernel.org>
      Signed-off-by: default avatarNorbert Slusarek <nslusarek@gmx.net>
      Acked-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      acb755be
    • Daniel Borkmann's avatar
      bpf: Do not mark insn as seen under speculative path verification · 8c82c52d
      Daniel Borkmann authored
      [ Upstream commit fe9a5ca7
      
       ]
      
      ... in such circumstances, we do not want to mark the instruction as seen given
      the goal is still to jmp-1 rewrite/sanitize dead code, if it is not reachable
      from the non-speculative path verification. We do however want to verify it for
      safety regardless.
      
      With the patch as-is all the insns that have been marked as seen before the
      patch will also be marked as seen after the patch (just with a potentially
      different non-zero count). An upcoming patch will also verify paths that are
      unreachable in the non-speculative domain, hence this extension is needed.
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Reviewed-by: default avatarBenedict Schlueter <benedict.schlueter@rub.de>
      Reviewed-by: default avatarPiotr Krysiuk <piotras@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8c82c52d
    • Daniel Borkmann's avatar
      bpf: Inherit expanded/patched seen count from old aux data · e9d27173
      Daniel Borkmann authored
      [ Upstream commit d203b0fd
      
       ]
      
      Instead of relying on current env->pass_cnt, use the seen count from the
      old aux data in adjust_insn_aux_data(), and expand it to the new range of
      patched instructions. This change is valid given we always expand 1:n
      with n>=1, so what applies to the old/original instruction needs to apply
      for the replacement as well.
      
      Not relying on env->pass_cnt is a prerequisite for a later change where we
      want to avoid marking an instruction seen when verified under speculative
      execution path.
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Reviewed-by: default avatarBenedict Schlueter <benedict.schlueter@rub.de>
      Reviewed-by: default avatarPiotr Krysiuk <piotras@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e9d27173