Skip to content
  1. Jun 23, 2021
    • Mathy Vanhoef's avatar
      mac80211: Fix NULL ptr deref for injected rate info · f74df6e0
      Mathy Vanhoef authored
      commit bddc0c41 upstream.
      
      The commit cb17ed29
      
       ("mac80211: parse radiotap header when selecting Tx
      queue") moved the code to validate the radiotap header from
      ieee80211_monitor_start_xmit to ieee80211_parse_tx_radiotap. This made is
      possible to share more code with the new Tx queue selection code for
      injected frames. But at the same time, it now required the call of
      ieee80211_parse_tx_radiotap at the beginning of functions which wanted to
      handle the radiotap header. And this broke the rate parser for radiotap
      header parser.
      
      The radiotap parser for rates is operating most of the time only on the
      data in the actual radiotap header. But for the 802.11a/b/g rates, it must
      also know the selected band from the chandef information. But this
      information is only written to the ieee80211_tx_info at the end of the
      ieee80211_monitor_start_xmit - long after ieee80211_parse_tx_radiotap was
      already called. The info->band information was therefore always 0
      (NL80211_BAND_2GHZ) when the parser code tried to access it.
      
      For a 5GHz only device, injecting a frame with 802.11a rates would cause a
      NULL pointer dereference because local->hw.wiphy->bands[NL80211_BAND_2GHZ]
      would most likely have been NULL when the radiotap parser searched for the
      correct rate index of the driver.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarBen Greear <greearb@candelatech.com>
      Fixes: cb17ed29
      
       ("mac80211: parse radiotap header when selecting Tx queue")
      Signed-off-by: default avatarMathy Vanhoef <Mathy.Vanhoef@kuleuven.be>
      [sven@narfation.org: added commit message]
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Link: https://lore.kernel.org/r/20210530133226.40587-1-sven@narfation.org
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f74df6e0
    • Bumyong Lee's avatar
      dmaengine: pl330: fix wrong usage of spinlock flags in dma_cyclc · df203c1f
      Bumyong Lee authored
      commit 4ad5dd2d
      
       upstream.
      
      flags varible which is the input parameter of pl330_prep_dma_cyclic()
      should not be used by spinlock_irq[save/restore] function.
      
      Signed-off-by: default avatarJongho Park <jongho7.park@samsung.com>
      Signed-off-by: default avatarBumyong Lee <bumyong.lee@samsung.com>
      Signed-off-by: default avatarChanho Park <chanho61.park@samsung.com>
      Link: https://lore.kernel.org/r/20210507063647.111209-1-chanho61.park@samsung.com
      Fixes: f6f2421c
      
       ("dmaengine: pl330: Merge dma_pl330_dmac and pl330_dmac structs")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarVinod Koul <vkoul@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      df203c1f
    • Pingfan Liu's avatar
      crash_core, vmcoreinfo: append 'SECTION_SIZE_BITS' to vmcoreinfo · b842b568
      Pingfan Liu authored
      commit 4f5aecdf upstream.
      
      As mentioned in kernel commit 1d50e5d0 ("crash_core, vmcoreinfo:
      Append 'MAX_PHYSMEM_BITS' to vmcoreinfo"), SECTION_SIZE_BITS in the
      formula:
      
          #define SECTIONS_SHIFT    (MAX_PHYSMEM_BITS - SECTION_SIZE_BITS)
      
      Besides SECTIONS_SHIFT, SECTION_SIZE_BITS is also used to calculate
      PAGES_PER_SECTION in makedumpfile just like kernel.
      
      Unfortunately, this arch-dependent macro SECTION_SIZE_BITS changes, e.g.
      recently in kernel commit f0b13ee2
      
       ("arm64/sparsemem: reduce
      SECTION_SIZE_BITS").  But user space wants a stable interface to get
      this info.  Such info is impossible to be deduced from a crashdump
      vmcore.  Hence append SECTION_SIZE_BITS to vmcoreinfo.
      
      Link: https://lkml.kernel.org/r/20210608103359.84907-1-kernelfans@gmail.com
      Link: http://lists.infradead.org/pipermail/kexec/2021-June/022676.html
      Signed-off-by: default avatarPingfan Liu <kernelfans@gmail.com>
      Acked-by: default avatarBaoquan He <bhe@redha...>
      b842b568
    • Thomas Gleixner's avatar
      x86/fpu: Reset state for all signal restore failures · 63ba8356
      Thomas Gleixner authored
      commit efa16550 upstream.
      
      If access_ok() or fpregs_soft_set() fails in __fpu__restore_sig() then the
      function just returns but does not clear the FPU state as it does for all
      other fatal failures.
      
      Clear the FPU state for these failures as well.
      
      Fixes: 72a671ce
      
       ("x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/87mtryyhhz.ffs@nanos.tec.linutronix.de
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      63ba8356
    • Andy Lutomirski's avatar
      x86/fpu: Invalidate FPU state after a failed XRSTOR from a user buffer · a7748e02
      Andy Lutomirski authored
      commit d8778e39 upstream.
      
      Both Intel and AMD consider it to be architecturally valid for XRSTOR to
      fail with #PF but nonetheless change the register state.  The actual
      conditions under which this might occur are unclear [1], but it seems
      plausible that this might be triggered if one sibling thread unmaps a page
      and invalidates the shared TLB while another sibling thread is executing
      XRSTOR on the page in question.
      
      __fpu__restore_sig() can execute XRSTOR while the hardware registers
      are preserved on behalf of a different victim task (using the
      fpu_fpregs_owner_ctx mechanism), and, in theory, XRSTOR could fail but
      modify the registers.
      
      If this happens, then there is a window in which __fpu__restore_sig()
      could schedule out and the victim task could schedule back in without
      reloading its own FPU registers. This would result in part of the FPU
      state that __fpu__restore_sig() was attempting to load leaking into the
      victim task's user-visible state.
      
      Invalidate preserved FPU registers on XRSTOR failure to prevent this
      situation from corrupting any state.
      
      [1] Frequent readers of the errata lists might imagine "complex
          microarchitectural conditions".
      
      Fixes: 1d731e73
      
       ("x86/fpu: Add a fastpath to __fpu__restore_sig()")
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarRik van Riel <riel@surriel.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20210608144345.758116583@linutronix.de
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a7748e02
    • Thomas Gleixner's avatar
      x86/fpu: Prevent state corruption in __fpu__restore_sig() · 076f732b
      Thomas Gleixner authored
      commit 484cea4f upstream.
      
      The non-compacted slowpath uses __copy_from_user() and copies the entire
      user buffer into the kernel buffer, verbatim.  This means that the kernel
      buffer may now contain entirely invalid state on which XRSTOR will #GP.
      validate_user_xstate_header() can detect some of that corruption, but that
      leaves the onus on callers to clear the buffer.
      
      Prior to XSAVES support, it was possible just to reinitialize the buffer,
      completely, but with supervisor states that is not longer possible as the
      buffer clearing code split got it backwards. Fixing that is possible but
      not corrupting the state in the first place is more robust.
      
      Avoid corruption of the kernel XSAVE buffer by using copy_user_to_xstate()
      which validates the XSAVE header contents before copying the actual states
      to the kernel. copy_user_to_xstate() was previously only called for
      compacted-format kernel buffers, but it works for both compacted and
      non-compacted forms.
      
      Using it for the non-compacted form is slower because of multiple
      __copy_from_user() operations, but that cost is less important than robust
      code in an already slow path.
      
      [ Changelog polished by Dave Hansen ]
      
      Fixes: b860eb8d
      
       ("x86/fpu/xstate: Define new functions for clearing fpregs and xstates")
      Reported-by: default avatar <syzbot+2067e764dbcd10721e2e@syzkaller.appspotmail.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarRik van Riel <riel@surriel.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20210608144345.611833074@linutronix.de
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      076f732b
    • Thomas Gleixner's avatar
      x86/pkru: Write hardware init value to PKRU when xstate is init · abc790bd
      Thomas Gleixner authored
      commit 510b80a6 upstream.
      
      When user space brings PKRU into init state, then the kernel handling is
      broken:
      
        T1 user space
           xsave(state)
           state.header.xfeatures &= ~XFEATURE_MASK_PKRU;
           xrstor(state)
      
        T1 -> kernel
           schedule()
             XSAVE(S) -> T1->xsave.header.xfeatures[PKRU] == 0
             T1->flags |= TIF_NEED_FPU_LOAD;
      
             wrpkru();
      
           schedule()
             ...
             pk = get_xsave_addr(&T1->fpu->state.xsave, XFEATURE_PKRU);
             if (pk)
      	 wrpkru(pk->pkru);
             else
      	 wrpkru(DEFAULT_PKRU);
      
      Because the xfeatures bit is 0 and therefore the value in the xsave
      storage is not valid, get_xsave_addr() returns NULL and switch_to()
      writes the default PKRU. -> FAIL #1!
      
      So that wrecks any copy_to/from_user() on the way back to user space
      which hits memory which is protected by the default PKRU value.
      
      Assumed that this does not fail (pure luck) then T1 goes back to user
      space and because TIF_NEED_FPU_LOAD is set it ends up in
      
        switch_fpu_return()
            __fpregs_load_activate()
              if (!fpregs_state_valid()) {
        	 load_XSTATE_from_task();
              }
      
      But if nothing touched the FPU between T1 scheduling out and back in,
      then the fpregs_state is still valid which means switch_fpu_return()
      does nothing and just clears TIF_NEED_FPU_LOAD. Back to user space with
      DEFAULT_PKRU loaded. -> FAIL #2!
      
      The fix is simple: if get_xsave_addr() returns NULL then set the
      PKRU value to 0 instead of the restrictive default PKRU value in
      init_pkru_value.
      
       [ bp: Massage in minor nitpicks from folks. ]
      
      Fixes: 0cecca9d
      
       ("x86/fpu: Eager switch PKRU state")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarRik van Riel <riel@surriel.com>
      Tested-by: default avatarBabu Moger <babu.moger@amd.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20210608144346.045616965@linutronix.de
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      abc790bd
    • Tom Lendacky's avatar
      x86/ioremap: Map EFI-reserved memory as encrypted for SEV · 208bb686
      Tom Lendacky authored
      commit 8d651ee9 upstream.
      
      Some drivers require memory that is marked as EFI boot services
      data. In order for this memory to not be re-used by the kernel
      after ExitBootServices(), efi_mem_reserve() is used to preserve it
      by inserting a new EFI memory descriptor and marking it with the
      EFI_MEMORY_RUNTIME attribute.
      
      Under SEV, memory marked with the EFI_MEMORY_RUNTIME attribute needs to
      be mapped encrypted by Linux, otherwise the kernel might crash at boot
      like below:
      
        EFI Variables Facility v0.08 2004-May-17
        general protection fault, probably for non-canonical address 0x3597688770a868b2: 0000 [#1] SMP NOPTI
        CPU: 13 PID: 1 Comm: swapper/0 Not tainted 5.12.4-2-default #1 openSUSE Tumbleweed
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
        RIP: 0010:efi_mokvar_entry_next
        [...]
        Call Trace:
         efi_mokvar_sysfs_init
         ? efi_mokvar_table_init
         do_one_initcall
         ? __kmalloc
         kernel_init_freeable
         ? rest_init
         kernel_init
         ret_from_fork
      
      Expand the __ioremap_check_other() function to additionally check for
      this other type of boot data reserved at runtime and indicate that it
      should be mapped encrypted for an SEV guest.
      
       [ bp: Massage commit message. ]
      
      Fixes: 58c90902
      
       ("efi: Support for MOK variable config table")
      Reported-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Tested-by: default avatarJoerg Roedel <jroedel@suse.de>
      Cc: <stable@vger.kernel.org> # 5.10+
      Link: https://lkml.kernel.org/r/20210608095439.12668-2-joro@8bytes.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      208bb686
    • Thomas Gleixner's avatar
      x86/process: Check PF_KTHREAD and not current->mm for kernel threads · 75a55bc2
      Thomas Gleixner authored
      commit 12f7764a upstream.
      
      switch_fpu_finish() checks current->mm as indicator for kernel threads.
      That's wrong because kernel threads can temporarily use a mm of a user
      process via kthread_use_mm().
      
      Check the task flags for PF_KTHREAD instead.
      
      Fixes: 0cecca9d
      
       ("x86/fpu: Eager switch PKRU state")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarRik van Riel <riel@surriel.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20210608144345.912645927@linutronix.de
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      75a55bc2
    • Fan Du's avatar
      x86/mm: Avoid truncating memblocks for SGX memory · ddaaf38e
      Fan Du authored
      commit 28e5e44a upstream.
      
      tl;dr:
      
      Several SGX users reported seeing the following message on NUMA systems:
      
        sgx: [Firmware Bug]: Unable to map EPC section to online node. Fallback to the NUMA node 0.
      
      This turned out to be the memblock code mistakenly throwing away SGX
      memory.
      
      === Full Changelog ===
      
      The 'max_pfn' variable represents the highest known RAM address.  It can
      be used, for instance, to quickly determine for which physical addresses
      there is mem_map[] space allocated.  The numa_meminfo code makes an
      effort to throw out ("trim") all memory blocks which are above 'max_pfn'.
      
      SGX memory is not considered RAM (it is marked as "Reserved" in the
      e820) and is not taken into account by max_pfn. Despite this, SGX memory
      areas have NUMA affinity and are enumerated in the ACPI SRAT table. The
      existing SGX code uses the numa_meminfo mechanism to look up the NUMA
      affinity for its memory areas.
      
      In cases where SGX memory was above max_pfn (usually just the one EPC
      section in the last highest NUMA node), the numa_memblock is truncated
      at 'max_pfn', which is below the SGX memory.  When the SGX code tries to
      look up the affinity of this memory, it fails and produces an error message:
      
        sgx: [Firmware Bug]: Unable to map EPC section to online node. Fallback to the NUMA node 0.
      
      and assigns the memory to NUMA node 0.
      
      Instead of silently truncating the memory block at 'max_pfn' and
      dropping the SGX memory, add the truncated portion to
      'numa_reserved_meminfo'.  This allows the SGX code to later determine
      the NUMA affinity of its 'Reserved' area.
      
      Before, numa_meminfo looked like this (from 'crash'):
      
        blk = { start =          0x0, end = 0x2080000000, nid = 0x0 }
              { start = 0x2080000000, end = 0x4000000000, nid = 0x1 }
      
      numa_reserved_meminfo is empty.
      
      With this, numa_meminfo looks like this:
      
        blk = { start =          0x0, end = 0x2080000000, nid = 0x0 }
              { start = 0x2080000000, end = 0x4000000000, nid = 0x1 }
      
      and numa_reserved_meminfo has an entry for node 1's SGX memory:
      
        blk =  { start = 0x4000000000, end = 0x4080000000, nid = 0x1 }
      
       [ daveh: completely rewrote/reworked changelog ]
      
      Fixes: 5d30f92e
      
       ("x86/NUMA: Provide a range-to-target_node lookup facility")
      Reported-by: default avatarReinette Chatre <reinette.chatre@intel.com>
      Signed-off-by: default avatarFan Du <fan.du@intel.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@intel.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarDave Hansen <dave.hansen@intel.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20210617194657.0A99CB22@viggo.jf.intel.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ddaaf38e
    • Vineet Gupta's avatar
      ARCv2: save ABI registers across signal handling · f6bcb1a6
      Vineet Gupta authored
      commit 96f1b001
      
       upstream.
      
      ARCv2 has some configuration dependent registers (r30, r58, r59) which
      could be targetted by the compiler. To keep the ABI stable, these were
      unconditionally part of the glibc ABI
      (sysdeps/unix/sysv/linux/arc/sys/ucontext.h:mcontext_t) however we
      missed populating them (by saving/restoring them across signal
      handling).
      
      This patch fixes the issue by
       - adding arcv2 ABI regs to kernel struct sigcontext
       - populating them during signal handling
      
      Change to struct sigcontext might seem like a glibc ABI change (although
      it primarily uses ucontext_t:mcontext_t) but the fact is
       - it has only been extended (existing fields are not touched)
       - the old sigcontext was ABI incomplete to begin with anyways
      
      Fixes: https://github.com/foss-for-synopsys-dwc-arc-processors/linux/issues/53
      Cc: <stable@vger.kernel.org>
      Tested-by: default avatarkernel test robot <lkp@intel.com>
      Reported-by: default avatarVladimir Isaev <isaev@synopsys.com>
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f6bcb1a6
    • Harald Freudenberger's avatar
      s390/ap: Fix hanging ioctl caused by wrong msg counter · b516daed
      Harald Freudenberger authored
      commit e73a99f3
      
       upstream.
      
      When a AP queue is switched to soft offline, all pending
      requests are purged out of the pending requests list and
      'received' by the upper layer like zcrypt device drivers.
      This is also done for requests which are already enqueued
      into the firmware queue. A request in a firmware queue
      may eventually produce an response message, but there is
      no waiting process any more. However, the response was
      counted with the queue_counter and as this counter was
      reset to 0 with the offline switch, the pending response
      caused the queue_counter to get negative. The next request
      increased this counter to 0 (instead of 1) which caused
      the ap code to assume there is nothing to receive and so
      the response for this valid request was never tried to
      fetch from the firmware queue.
      
      This all caused a queue to not work properly after a
      switch offline/online and in the end processes to hang
      forever when trying to send a crypto request after an
      queue offline/online switch cicle.
      
      Fixed by a) making sure the counter does not drop below 0
      and b) on a successful enqueue of a message has at least
      a value of 1.
      
      Additionally a warning is emitted, when a reply can't get
      assigned to a waiting process. This may be normal operation
      (process had timeout or has been killed) but may give a
      hint that something unexpected happened (like this odd
      behavior described above).
      
      Signed-off-by: default avatarHarald Freudenberger <freude@linux.ibm.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b516daed
    • Alexander Gordeev's avatar
      s390/mcck: fix calculation of SIE critical section size · 7c003dab
      Alexander Gordeev authored
      commit 5bcbe328 upstream.
      
      The size of SIE critical section is calculated wrongly
      as result of a missed subtraction in commit 0b0ed657
      ("s390: remove critical section cleanup from entry.S")
      
      Fixes: 0b0ed657
      
       ("s390: remove critical section cleanup from entry.S")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      Reviewed-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7c003dab
    • Wanpeng Li's avatar
      KVM: X86: Fix x86_emulator slab cache leak · 3a9934d6
      Wanpeng Li authored
      commit dfdc0a71 upstream.
      
      Commit c9b8b07c (KVM: x86: Dynamically allocate per-vCPU emulation context)
      tries to allocate per-vCPU emulation context dynamically, however, the
      x86_emulator slab cache is still exiting after the kvm module is unload
      as below after destroying the VM and unloading the kvm module.
      
      grep x86_emulator /proc/slabinfo
      x86_emulator          36     36   2672   12    8 : tunables    0    0    0 : slabdata      3      3      0
      
      This patch fixes this slab cache leak by destroying the x86_emulator slab cache
      when the kvm module is unloaded.
      
      Fixes: c9b8b07c
      
       (KVM: x86: Dynamically allocate per-vCPU emulation context)
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Message-Id: <1623387573-5969-1-git-send-email-wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a9934d6
    • Sean Christopherson's avatar
      KVM: x86/mmu: Calculate and check "full" mmu_role for nested MMU · 18eca69f
      Sean Christopherson authored
      commit 654430ef upstream.
      
      Calculate and check the full mmu_role when initializing the MMU context
      for the nested MMU, where "full" means the bits and pieces of the role
      that aren't handled by kvm_calc_mmu_role_common().  While the nested MMU
      isn't used for shadow paging, things like the number of levels in the
      guest's page tables are surprisingly important when walking the guest
      page tables.  Failure to reinitialize the nested MMU context if L2's
      paging mode changes can result in unexpected and/or missed page faults,
      and likely other explosions.
      
      E.g. if an L1 vCPU is running both a 32-bit PAE L2 and a 64-bit L2, the
      "common" role calculation will yield the same role for both L2s.  If the
      64-bit L2 is run after the 32-bit PAE L2, L0 will fail to reinitialize
      the nested MMU context, ultimately resulting in a bad walk of L2's page
      tables as the MMU will still have a guest root_level of PT32E_ROOT_LEVEL.
      
        WARNING: CPU: 4 PID: 167334 at arch/x86/kvm/vmx/vmx.c:3075 ept_save_pdptrs+0x15/0xe0 [kvm_intel]
        Modules linked in: kvm_intel]
        CPU: 4 PID: 167334 Comm: CPU 3/KVM Not tainted 5.13.0-rc1-d849817d5673-reqs #185
        Hardware name: ASUS Q87M-E/Q87M-E, BIOS 1102 03/03/2014
        RIP: 0010:ept_save_pdptrs+0x15/0xe0 [kvm_intel]
        Code: <0f> 0b c3 f6 87 d8 02 00f
        RSP: 0018:ffffbba702dbba00 EFLAGS: 00010202
        RAX: 0000000000000011 RBX: 0000000000000002 RCX: ffffffff810a2c08
        RDX: ffff91d7bc30acc0 RSI: 0000000000000011 RDI: ffff91d7bc30a600
        RBP: ffff91d7bc30a600 R08: 0000000000000010 R09: 0000000000000007
        R10: 0000000000000000 R11: 0000000000000000 R12: ffff91d7bc30a600
        R13: ffff91d7bc30acc0 R14: ffff91d67c123460 R15: 0000000115d7e005
        FS:  00007fe8e9ffb700(0000) GS:ffff91d90fb00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000000 CR3: 000000029f15a001 CR4: 00000000001726e0
        Call Trace:
         kvm_pdptr_read+0x3a/0x40 [kvm]
         paging64_walk_addr_generic+0x327/0x6a0 [kvm]
         paging64_gva_to_gpa_nested+0x3f/0xb0 [kvm]
         kvm_fetch_guest_virt+0x4c/0xb0 [kvm]
         __do_insn_fetch_bytes+0x11a/0x1f0 [kvm]
         x86_decode_insn+0x787/0x1490 [kvm]
         x86_decode_emulated_instruction+0x58/0x1e0 [kvm]
         x86_emulate_instruction+0x122/0x4f0 [kvm]
         vmx_handle_exit+0x120/0x660 [kvm_intel]
         kvm_arch_vcpu_ioctl_run+0xe25/0x1cb0 [kvm]
         kvm_vcpu_ioctl+0x211/0x5a0 [kvm]
         __x64_sys_ioctl+0x83/0xb0
         do_syscall_64+0x40/0xb0
         entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: bf627a92
      
       ("x86/kvm/mmu: check if MMU reconfiguration is needed in init_kvm_nested_mmu()")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210610220026.1364486-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      18eca69f
    • Sean Christopherson's avatar
      KVM: x86: Immediately reset the MMU context when the SMM flag is cleared · 669a8866
      Sean Christopherson authored
      commit 78fcb2c9
      
       upstream.
      
      Immediately reset the MMU context when the vCPU's SMM flag is cleared so
      that the SMM flag in the MMU role is always synchronized with the vCPU's
      flag.  If RSM fails (which isn't correctly emulated), KVM will bail
      without calling post_leave_smm() and leave the MMU in a bad state.
      
      The bad MMU role can lead to a NULL pointer dereference when grabbing a
      shadow page's rmap for a page fault as the initial lookups for the gfn
      will happen with the vCPU's SMM flag (=0), whereas the rmap lookup will
      use the shadow page's SMM flag, which comes from the MMU (=1).  SMM has
      an entirely different set of memslots, and so the initial lookup can find
      a memslot (SMM=0) and then explode on the rmap memslot lookup (SMM=1).
      
        general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
        KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
        CPU: 1 PID: 8410 Comm: syz-executor382 Not tainted 5.13.0-rc5-syzkaller #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        RIP: 0010:__gfn_to_rmap arch/x86/kvm/mmu/mmu.c:935 [inline]
        RIP: 0010:gfn_to_rmap+0x2b0/0x4d0 arch/x86/kvm/mmu/mmu.c:947
        Code: <42> 80 3c 20 00 74 08 4c 89 ff e8 f1 79 a9 00 4c 89 fb 4d 8b 37 44
        RSP: 0018:ffffc90000ffef98 EFLAGS: 00010246
        RAX: 0000000000000000 RBX: ffff888015b9f414 RCX: ffff888019669c40
        RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000001
        RBP: 0000000000000001 R08: ffffffff811d9cdb R09: ffffed10065a6002
        R10: ffffed10065a6002 R11: 0000000000000000 R12: dffffc0000000000
        R13: 0000000000000003 R14: 0000000000000001 R15: 0000000000000000
        FS:  000000000124b300(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000000 CR3: 0000000028e31000 CR4: 00000000001526e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         rmap_add arch/x86/kvm/mmu/mmu.c:965 [inline]
         mmu_set_spte+0x862/0xe60 arch/x86/kvm/mmu/mmu.c:2604
         __direct_map arch/x86/kvm/mmu/mmu.c:2862 [inline]
         direct_page_fault+0x1f74/0x2b70 arch/x86/kvm/mmu/mmu.c:3769
         kvm_mmu_do_page_fault arch/x86/kvm/mmu.h:124 [inline]
         kvm_mmu_page_fault+0x199/0x1440 arch/x86/kvm/mmu/mmu.c:5065
         vmx_handle_exit+0x26/0x160 arch/x86/kvm/vmx/vmx.c:6122
         vcpu_enter_guest+0x3bdd/0x9630 arch/x86/kvm/x86.c:9428
         vcpu_run+0x416/0xc20 arch/x86/kvm/x86.c:9494
         kvm_arch_vcpu_ioctl_run+0x4e8/0xa40 arch/x86/kvm/x86.c:9722
         kvm_vcpu_ioctl+0x70f/0xbb0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3460
         vfs_ioctl fs/ioctl.c:51 [inline]
         __do_sys_ioctl fs/ioctl.c:1069 [inline]
         __se_sys_ioctl+0xfb/0x170 fs/ioctl.c:1055
         do_syscall_64+0x3f/0xb0 arch/x86/entry/common.c:47
         entry_SYSCALL_64_after_hwframe+0x44/0xae
        RIP: 0033:0x440ce9
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatar <syzbot+fb0b6a7e8713aeb0319c@syzkaller.appspotmail.com>
      Fixes: 9ec19493
      
       ("KVM: x86: clear SMM flags before loading state while leaving SMM")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210609185619.992058-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      669a8866
    • Chiqijun's avatar
      PCI: Work around Huawei Intelligent NIC VF FLR erratum · 077cb894
      Chiqijun authored
      commit ce00322c
      
       upstream.
      
      pcie_flr() starts a Function Level Reset (FLR), waits 100ms (the maximum
      time allowed for FLR completion by PCIe r5.0, sec 6.6.2), and waits for the
      FLR to complete.  It assumes the FLR is complete when a config read returns
      valid data.
      
      When we do an FLR on several Huawei Intelligent NIC VFs at the same time,
      firmware on the NIC processes them serially.  The VF may respond to config
      reads before the firmware has completed its reset processing.  If we bind a
      driver to the VF (e.g., by assigning the VF to a virtual machine) in the
      interval between the successful config read and completion of the firmware
      reset processing, the NIC VF driver may fail to load.
      
      Prevent this driver failure by waiting for the NIC firmware to complete its
      reset processing.  Not all NIC firmware supports this feature.
      
      [bhelgaas: commit log]
      Link: https://support.huawei.com/enterprise/en/doc/EDOC1100063073/87950645/vm-oss-occasionally-fail-to-load-the-in200-driver-when-the-vf-performs-flr
      Link: https://lore.kernel.org/r/20210414132301.1793-1-chiqijun@huawei.com
      Signed-off-by: default avatarChiqijun <chiqijun@huawei.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      077cb894
    • Sriharsha Basavapatna's avatar
      PCI: Add ACS quirk for Broadcom BCM57414 NIC · ee1a9cfe
      Sriharsha Basavapatna authored
      commit db2f77e2
      
       upstream.
      
      The Broadcom BCM57414 NIC may be a multi-function device.  While it does
      not advertise an ACS capability, peer-to-peer transactions are not possible
      between the individual functions, so it is safe to treat them as fully
      isolated.
      
      Add an ACS quirk for this device so the functions can be in independent
      IOMMU groups and attached individually to userspace applications using
      VFIO.
      
      [bhelgaas: commit log]
      Link: https://lore.kernel.org/r/1621645997-16251-1-git-send-email-michael.chan@broadcom.com
      Signed-off-by: default avatarSriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ee1a9cfe
    • Pali Rohár's avatar
      PCI: aardvark: Fix kernel panic during PIO transfer · 1a1dbc44
      Pali Rohár authored
      commit f1813996
      
       upstream.
      
      Trying to start a new PIO transfer by writing value 0 in PIO_START register
      when previous transfer has not yet completed (which is indicated by value 1
      in PIO_START) causes an External Abort on CPU, which results in kernel
      panic:
      
          SError Interrupt on CPU0, code 0xbf000002 -- SError
          Kernel panic - not syncing: Asynchronous SError Interrupt
      
      To prevent kernel panic, it is required to reject a new PIO transfer when
      previous one has not finished yet.
      
      If previous PIO transfer is not finished yet, the kernel may issue a new
      PIO request only if the previous PIO transfer timed out.
      
      In the past the root cause of this issue was incorrectly identified (as it
      often happens during link retraining or after link down event) and special
      hack was implemented in Trusted Firmware to catch all SError events in EL3,
      to ignore errors with code 0xbf000002 and not forwarding any other errors
      to kernel and instead throw panic from EL3 Trusted Firmware handler.
      
      Links to discussion and patches about this issue:
      https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/commit/?id=3c7dcdac5c50
      https://lore.kernel.org/linux-pci/20190316161243.29517-1-repk@triplefau.lt/
      https://lore.kernel.org/linux-pci/971be151d24312cc533989a64bd454b4@www.loen.fr/
      https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/1541
      
      But the real cause was the fact that during link retraining or after link
      down event the PIO transfer may take longer time, up to the 1.44s until it
      times out. This increased probability that a new PIO transfer would be
      issued by kernel while previous one has not finished yet.
      
      After applying this change into the kernel, it is possible to revert the
      mentioned TF-A hack and SError events do not have to be caught in TF-A EL3.
      
      Link: https://lore.kernel.org/r/20210608203655.31228-1-pali@kernel.org
      Signed-off-by: default avatarPali Rohár <pali@kernel.org>
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarMarek Behún <kabel@kernel.org>
      Cc: stable@vger.kernel.org # 7fbcb5da
      
       ("PCI: aardvark: Don't rely on jiffies while holding spinlock")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1a1dbc44
    • Shanker Donthineni's avatar
      PCI: Mark some NVIDIA GPUs to avoid bus reset · dac77a14
      Shanker Donthineni authored
      commit 4c207e71
      
       upstream.
      
      Some NVIDIA GPU devices do not work with SBR.  Triggering SBR leaves the
      device inoperable for the current system boot. It requires a system
      hard-reboot to get the GPU device back to normal operating condition
      post-SBR. For the affected devices, enable NO_BUS_RESET quirk to avoid the
      issue.
      
      This issue will be fixed in the next generation of hardware.
      
      Link: https://lore.kernel.org/r/20210608054857.18963-8-ameynarkhede03@gmail.com
      Signed-off-by: default avatarShanker Donthineni <sdonthineni@nvidia.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarSinan Kaya <okaya@kernel.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dac77a14
    • Antti Järvinen's avatar
      PCI: Mark TI C667X to avoid bus reset · 1e460ddf
      Antti Järvinen authored
      commit b5cf198e
      
       upstream.
      
      Some TI KeyStone C667X devices do not support bus/hot reset.  The PCIESS
      automatically disables LTSSM when Secondary Bus Reset is received and
      device stops working.  Prevent bus reset for these devices.  With this
      change, the device can be assigned to VMs with VFIO, but it will leak state
      between VMs.
      
      Reference: https://e2e.ti.com/support/processors/f/791/t/954382
      Link: https://lore.kernel.org/r/20210315102606.17153-1-antti.jarvinen@gmail.com
      Signed-off-by: default avatarAntti Järvinen <antti.jarvinen@gmail.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarKishon Vijay Abraham I <kishon@ti.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1e460ddf
    • Steven Rostedt (VMware)'s avatar
      tracing: Do no increment trace_clock_global() by one · c9fd0ab3
      Steven Rostedt (VMware) authored
      commit 89529d8b upstream.
      
      The trace_clock_global() tries to make sure the events between CPUs is
      somewhat in order. A global value is used and updated by the latest read
      of a clock. If one CPU is ahead by a little, and is read by another CPU, a
      lock is taken, and if the timestamp of the other CPU is behind, it will
      simply use the other CPUs timestamp.
      
      The lock is also only taken with a "trylock" due to tracing, and strange
      recursions can happen. The lock is not taken at all in NMI context.
      
      In the case where the lock is not able to be taken, the non synced
      timestamp is returned. But it will not be less than the saved global
      timestamp.
      
      The problem arises because when the time goes "backwards" the time
      returned is the saved timestamp plus 1. If the lock is not taken, and the
      plus one to the timestamp is returned, there's a small race that can cause
      the time to go backwards!
      
      	CPU0				CPU1
      	----				----
      				trace_clock_global() {
      				    ts = clock() [ 1000 ]
      				    trylock(clock_lock) [ success ]
      				    global_ts = ts; [ 1000 ]
      
      				    <interrupted by NMI>
       trace_clock_global() {
          ts = clock() [ 999 ]
          if (ts < global_ts)
      	ts = global_ts + 1 [ 1001 ]
      
          trylock(clock_lock) [ fail ]
      
          return ts [ 1001]
       }
      				    unlock(clock_lock);
      				    return ts; [ 1000 ]
      				}
      
       trace_clock_global() {
          ts = clock() [ 1000 ]
          if (ts < global_ts) [ false 1000 == 1000 ]
      
          trylock(clock_lock) [ success ]
          global_ts = ts; [ 1000 ]
          unlock(clock_lock)
      
          return ts; [ 1000 ]
       }
      
      The above case shows to reads of trace_clock_global() on the same CPU, but
      the second read returns one less than the first read. That is, time when
      backwards, and this is not what is allowed by trace_clock_global().
      
      This was triggered by heavy tracing and the ring buffer checker that tests
      for the clock going backwards:
      
       Ring buffer clock went backwards: 20613921464 -> 20613921463
       ------------[ cut here ]------------
       WARNING: CPU: 2 PID: 0 at kernel/trace/ring_buffer.c:3412 check_buffer+0x1b9/0x1c0
       Modules linked in:
       [..]
       [CPU: 2]TIME DOES NOT MATCH expected:20620711698 actual:20620711697 delta:6790234 before:20613921463 after:20613921463
         [20613915818] PAGE TIME STAMP
         [20613915818] delta:0
         [20613915819] delta:1
         [20613916035] delta:216
         [20613916465] delta:430
         [20613916575] delta:110
         [20613916749] delta:174
         [20613917248] delta:499
         [20613917333] delta:85
         [20613917775] delta:442
         [20613917921] delta:146
         [20613918321] delta:400
         [20613918568] delta:247
         [20613918768] delta:200
         [20613919306] delta:538
         [20613919353] delta:47
         [20613919980] delta:627
         [20613920296] delta:316
         [20613920571] delta:275
         [20613920862] delta:291
         [20613921152] delta:290
         [20613921464] delta:312
         [20613921464] delta:0 TIME EXTEND
         [20613921464] delta:0
      
      This happened more than once, and always for an off by one result. It also
      started happening after commit aafe104a was added.
      
      Cc: stable@vger.kernel.org
      Fixes: aafe104a
      
       ("tracing: Restructure trace_clock_global() to never block")
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c9fd0ab3
    • Steven Rostedt (VMware)'s avatar
      tracing: Do not stop recording comms if the trace file is being read · b313bd94
      Steven Rostedt (VMware) authored
      commit 4fdd595e upstream.
      
      A while ago, when the "trace" file was opened, tracing was stopped, and
      code was added to stop recording the comms to saved_cmdlines, for mapping
      of the pids to the task name.
      
      Code has been added that only records the comm if a trace event occurred,
      and there's no reason to not trace it if the trace file is opened.
      
      Cc: stable@vger.kernel.org
      Fixes: 7ffbd48d
      
       ("tracing: Cache comms only after an event occurred")
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b313bd94
    • Steven Rostedt (VMware)'s avatar
      tracing: Do not stop recording cmdlines when tracing is off · adb3849e
      Steven Rostedt (VMware) authored
      commit 85550c83 upstream.
      
      The saved_cmdlines is used to map pids to the task name, such that the
      output of the tracing does not just show pids, but also gives a human
      readable name for the task.
      
      If the name is not mapped, the output looks like this:
      
          <...>-1316          [005] ...2   132.044039: ...
      
      Instead of this:
      
          gnome-shell-1316    [005] ...2   132.044039: ...
      
      The names are updated when tracing is running, but are skipped if tracing
      is stopped. Unfortunately, this stops the recording of the names if the
      top level tracer is stopped, and not if there's other tracers active.
      
      The recording of a name only happens when a new event is written into a
      ring buffer, so there is no need to test if tracing is on or not. If
      tracing is off, then no event is written and no need to test if tracing is
      off or not.
      
      Remove the check, as it hides the names of tasks for events in the
      instance buffers.
      
      Cc: stable@vger.kernel.org
      Fixes: 7ffbd48d
      
       ("tracing: Cache comms only after an event occurred")
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      adb3849e
    • Breno Lima's avatar
      usb: chipidea: imx: Fix Battery Charger 1.2 CDP detection · 1a91fafa
      Breno Lima authored
      commit c6d580d9 upstream.
      
      i.MX8MM cannot detect certain CDP USB HUBs. usbmisc_imx.c driver is not
      following CDP timing requirements defined by USB BC 1.2 specification
      and section 3.2.4 Detection Timing CDP.
      
      During Primary Detection the i.MX device should turn on VDP_SRC and
      IDM_SINK for a minimum of 40ms (TVDPSRC_ON). After a time of TVDPSRC_ON,
      the i.MX is allowed to check the status of the D- line. Current
      implementation is waiting between 1ms and 2ms, and certain BC 1.2
      complaint USB HUBs cannot be detected. Increase delay to 40ms allowing
      enough time for primary detection.
      
      During secondary detection the i.MX is required to disable VDP_SRC and
      IDM_SNK, and enable VDM_SRC and IDP_SINK for at least 40ms (TVDMSRC_ON).
      
      Current implementation is not disabling VDP_SRC and IDM_SNK, introduce
      disable sequence in imx7d_charger_secondary_detection() function.
      
      VDM_SRC and IDP_SINK should be enabled for at least 40ms (TVDMSRC_ON).
      Increase delay allowing enough time for detection.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 746f316b
      
       ("usb: chipidea: introduce imx7d USB charger detection")
      Signed-off-by: default avatarBreno Lima <breno.lima@nxp.com>
      Signed-off-by: default avatarJun Li <jun.li@nxp.com>
      Link: https://lore.kernel.org/r/20210614175013.495808-1-breno.lima@nxp.com
      Signed-off-by: default avatarPeter Chen <peter.chen@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1a91fafa
    • Andrew Lunn's avatar
      usb: core: hub: Disable autosuspend for Cypress CY7C65632 · 576996b6
      Andrew Lunn authored
      commit a7d8d1c7 upstream.
      
      The Cypress CY7C65632 appears to have an issue with auto suspend and
      detecting devices, not too dissimilar to the SMSC 5534B hub. It is
      easiest to reproduce by connecting multiple mass storage devices to
      the hub at the same time. On a Lenovo Yoga, around 1 in 3 attempts
      result in the devices not being detected. It is however possible to
      make them appear using lsusb -v.
      
      Disabling autosuspend for this hub resolves the issue.
      
      Fixes: 1208f9e1
      
       ("USB: hub: Fix the broken detection of USB3 device in SMSC hub")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20210614155524.2228800-1-andrew@lunn.ch
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      576996b6
    • Pavel Skripkin's avatar
      can: mcba_usb: fix memory leak in mcba_usb · 6bd3d80d
      Pavel Skripkin authored
      commit 91c02557 upstream.
      
      Syzbot reported memory leak in SocketCAN driver for Microchip CAN BUS
      Analyzer Tool. The problem was in unfreed usb_coherent.
      
      In mcba_usb_start() 20 coherent buffers are allocated and there is
      nothing, that frees them:
      
      1) In callback function the urb is resubmitted and that's all
      2) In disconnect function urbs are simply killed, but URB_FREE_BUFFER
         is not set (see mcba_usb_start) and this flag cannot be used with
         coherent buffers.
      
      Fail log:
      | [ 1354.053291][ T8413] mcba_usb 1-1:0.0 can0: device disconnected
      | [ 1367.059384][ T8420] kmemleak: 20 new suspected memory leaks (see /sys/kernel/debug/kmem)
      
      So, all allocated buffers should be freed with usb_free_coherent()
      explicitly
      
      NOTE:
      The same pattern for allocating and freeing coherent buffers
      is used in drivers/net/can/usb/kvaser_usb/kvaser_usb_core.c
      
      Fixes: 51f3baad
      
       ("can: mcba_usb: Add support for Microchip CAN BUS Analyzer")
      Link: https://lore.kernel.org/r/20210609215833.30393-1-paskripkin@gmail.com
      Cc: linux-stable <stable@vger.kernel.org>
      Reported-and-tested-by: default avatar <syzbot+57281c762a3922e14dfe@syzkaller.appspotmail.com>
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6bd3d80d
    • Oleksij Rempel's avatar
      can: j1939: fix Use-after-Free, hold skb ref while in use · 509ab6bf
      Oleksij Rempel authored
      commit 2030043e upstream.
      
      This patch fixes a Use-after-Free found by the syzbot.
      
      The problem is that a skb is taken from the per-session skb queue,
      without incrementing the ref count. This leads to a Use-after-Free if
      the skb is taken concurrently from the session queue due to a CTS.
      
      Fixes: 9d71dd0c
      
       ("can: add support of SAE J1939 protocol")
      Link: https://lore.kernel.org/r/20210521115720.7533-1-o.rempel@pengutronix.de
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: linux-stable <stable@vger.kernel.org>
      Reported-by: default avatar <syzbot+220c1a29987a9a490903@syzkaller.appspotmail.com>
      Reported-by: default avatar <syzbot+45199c1b73b4013525cf@syzkaller.appspotmail.com>
      Signed-off-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      509ab6bf
    • Tetsuo Handa's avatar
      can: bcm/raw/isotp: use per module netdevice notifier · 0cf4b377
      Tetsuo Handa authored
      commit 8d0caedb
      
       upstream.
      
      syzbot is reporting hung task at register_netdevice_notifier() [1] and
      unregister_netdevice_notifier() [2], for cleanup_net() might perform
      time consuming operations while CAN driver's raw/bcm/isotp modules are
      calling {register,unregister}_netdevice_notifier() on each socket.
      
      Change raw/bcm/isotp modules to call register_netdevice_notifier() from
      module's __init function and call unregister_netdevice_notifier() from
      module's __exit function, as with gw/j1939 modules are doing.
      
      Link: https://syzkaller.appspot.com/bug?id=391b9498827788b3cc6830226d4ff5be87107c30 [1]
      Link: https://syzkaller.appspot.com/bug?id=1724d278c83ca6e6df100a2e320c10d991cf2bce [2]
      Link: https://lore.kernel.org/r/54a5f451-05ed-f977-8534-79e7aa2bcc8f@i-love.sakura.ne.jp
      Cc: linux-stable <stable@vger.kernel.org>
      Reported-by: default avatarsyzbot <syzbot+355f8edb2ff45d5f95fa@syzkaller.appspotmail.com>
      Reported-by: default avatarsyzbot <syzbot+0f1827363a305f74996f@syzkaller.appspotmail.com>
      Reviewed-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Tested-by: default avatarsyzbot <syzbot+355f8edb2ff45d5f95fa@syzkaller.appspotmail.com>
      Tested-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0cf4b377
    • Norbert Slusarek's avatar
      can: bcm: fix infoleak in struct bcm_msg_head · acb755be
      Norbert Slusarek authored
      commit 5e87ddbe upstream.
      
      On 64-bit systems, struct bcm_msg_head has an added padding of 4 bytes between
      struct members count and ival1. Even though all struct members are initialized,
      the 4-byte hole will contain data from the kernel stack. This patch zeroes out
      struct bcm_msg_head before usage, preventing infoleaks to userspace.
      
      Fixes: ffd980f9
      
       ("[CAN]: Add broadcast manager (bcm) protocol")
      Link: https://lore.kernel.org/r/trinity-7c1b2e82-e34f-4885-8060-2cd7a13769ce-1623532166177@3c-app-gmx-bs52
      Cc: linux-stable <stable@vger.kernel.org>
      Signed-off-by: default avatarNorbert Slusarek <nslusarek@gmx.net>
      Acked-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      acb755be
    • Daniel Borkmann's avatar
      bpf: Do not mark insn as seen under speculative path verification · 8c82c52d
      Daniel Borkmann authored
      [ Upstream commit fe9a5ca7
      
       ]
      
      ... in such circumstances, we do not want to mark the instruction as seen given
      the goal is still to jmp-1 rewrite/sanitize dead code, if it is not reachable
      from the non-speculative path verification. We do however want to verify it for
      safety regardless.
      
      With the patch as-is all the insns that have been marked as seen before the
      patch will also be marked as seen after the patch (just with a potentially
      different non-zero count). An upcoming patch will also verify paths that are
      unreachable in the non-speculative domain, hence this extension is needed.
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Reviewed-by: default avatarBenedict Schlueter <benedict.schlueter@rub.de>
      Reviewed-by: default avatarPiotr Krysiuk <piotras@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8c82c52d
    • Daniel Borkmann's avatar
      bpf: Inherit expanded/patched seen count from old aux data · e9d27173
      Daniel Borkmann authored
      [ Upstream commit d203b0fd
      
       ]
      
      Instead of relying on current env->pass_cnt, use the seen count from the
      old aux data in adjust_insn_aux_data(), and expand it to the new range of
      patched instructions. This change is valid given we always expand 1:n
      with n>=1, so what applies to the old/original instruction needs to apply
      for the replacement as well.
      
      Not relying on env->pass_cnt is a prerequisite for a later change where we
      want to avoid marking an instruction seen when verified under speculative
      execution path.
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Reviewed-by: default avatarBenedict Schlueter <benedict.schlueter@rub.de>
      Reviewed-by: default avatarPiotr Krysiuk <piotras@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e9d27173
    • Marc Zyngier's avatar
      irqchip/gic-v3: Workaround inconsistent PMR setting on NMI entry · ed423d80
      Marc Zyngier authored
      [ Upstream commit 382e6e17 ]
      
      The arm64 entry code suffers from an annoying issue on taking
      a NMI, as it sets PMR to a value that actually allows IRQs
      to be acknowledged. This is done for consistency with other parts
      of the code, and is in the process of being fixed. This shouldn't
      be a problem, as we are not enabling interrupts whilst in NMI
      context.
      
      However, in the infortunate scenario that we took a spurious NMI
      (retired before the read of IAR) *and* that there is an IRQ pending
      at the same time, we'll ack the IRQ in NMI context. Too bad.
      
      In order to avoid deadlocks while running something like perf,
      teach the GICv3 driver about this situation: if we were in
      a context where no interrupt should have fired, transiently
      set PMR to a value that only allows NMIs before acking the pending
      interrupt, and restore the original value after that.
      
      This papers over the core issue for the time being, and makes
      NMIs great again. Sort of.
      
      Fixes: 4d6a38da
      
       ("arm64: entry: always set GIC_PRIO_PSR_I_SET during entry")
      Co-developed-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Reviewed-by: default avatarMark Rutland <mark.rutland@arm.com>
      Link: https://lore.kernel.org/lkml/20210610145731.1350460-1-maz@kernel.org
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ed423d80
    • Feng Tang's avatar
      mm: relocate 'write_protect_seq' in struct mm_struct · 103c4a08
      Feng Tang authored
      [ Upstream commit 2e302543 ]
      
      0day robot reported a 9.2% regression for will-it-scale mmap1 test
      case[1], caused by commit 57efa1fe ("mm/gup: prevent gup_fast from
      racing with COW during fork").
      
      Further debug shows the regression is due to that commit changes the
      offset of hot fields 'mmap_lock' inside structure 'mm_struct', thus some
      cache alignment changes.
      
      From the perf data, the contention for 'mmap_lock' is very severe and
      takes around 95% cpu cycles, and it is a rw_semaphore
      
              struct rw_semaphore {
                      atomic_long_t count;	/* 8 bytes */
                      atomic_long_t owner;	/* 8 bytes */
                      struct optimistic_spin_queue osq; /* spinner MCS lock */
                      ...
      
      Before commit 57efa1fe
      
       adds the 'write_protect_seq', it happens to
      have a very optimal cache alignment layout, as Linus explained:
      
       "and before the addition of the 'write_protect_seq' field, the
        mmap_sem was at offset 120 in 'struct mm_struct'.
      
        Which meant that count and owner were in two different cachelines,
        and then when you have contention and spend time in
        rwsem_down_write_slowpath(), this is probably *exactly* the kind
        of layout you want.
      
        Because first the rwsem_write_trylock() will do a cmpxchg on the
        first cacheline (for the optimistic fast-path), and then in the
        case of contention, rwsem_down_write_slowpath() will just access
        the second cacheline.
      
        Which is probably just optimal for a load that spends a lot of
        time contended - new waiters touch that first cacheline, and then
        they queue themselves up on the second cacheline."
      
      After the commit, the rw_semaphore is at offset 128, which means the
      'count' and 'owner' fields are now in the same cacheline, and causes
      more cache bouncing.
      
      Currently there are 3 "#ifdef CONFIG_XXX" before 'mmap_lock' which will
      affect its offset:
      
        CONFIG_MMU
        CONFIG_MEMBARRIER
        CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES
      
      The layout above is on 64 bits system with 0day's default kernel config
      (similar to RHEL-8.3's config), in which all these 3 options are 'y'.
      And the layout can vary with different kernel configs.
      
      Relayouting a structure is usually a double-edged sword, as sometimes it
      can helps one case, but hurt other cases.  For this case, one solution
      is, as the newly added 'write_protect_seq' is a 4 bytes long seqcount_t
      (when CONFIG_DEBUG_LOCK_ALLOC=n), placing it into an existing 4 bytes
      hole in 'mm_struct' will not change other fields' alignment, while
      restoring the regression.
      
      Link: https://lore.kernel.org/lkml/20210525031636.GB7744@xsang-OptiPlex-9020/ [1]
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Signed-off-by: default avatarFeng Tang <feng.tang@intel.com>
      Reviewed-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
      Reviewed-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      Cc: Peter Xu <peterx@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      103c4a08
    • Riwen Lu's avatar
      hwmon: (scpi-hwmon) shows the negative temperature properly · a87abba0
      Riwen Lu authored
      [ Upstream commit 78d13552
      
       ]
      
      The scpi hwmon shows the sub-zero temperature in an unsigned integer,
      which would confuse the users when the machine works in low temperature
      environment. This shows the sub-zero temperature in an signed value and
      users can get it properly from sensors.
      
      Signed-off-by: default avatarRiwen Lu <luriwen@kylinos.cn>
      Tested-by: default avatarXin Chen <chenxin@kylinos.cn>
      Link: https://lore.kernel.org/r/20210604030959.736379-1-luriwen@kylinos.cn
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a87abba0
    • Chen Li's avatar
      radeon: use memcpy_to/fromio for UVD fw upload · 57b21ef1
      Chen Li authored
      [ Upstream commit ab8363d3 ]
      
      I met a gpu addr bug recently and the kernel log
      tells me the pc is memcpy/memset and link register is
      radeon_uvd_resume.
      
      As we know, in some architectures, optimized memcpy/memset
      may not work well on device memory. Trival memcpy_toio/memset_io
      can fix this problem.
      
      BTW, amdgpu has already done it in:
      commit ba0b2275
      
       ("drm/amdgpu: use memcpy_to/fromio for UVD fw upload"),
      that's why it has no this issue on the same gpu and platform.
      
      Signed-off-by: default avatarChen Li <chenli@uniontech.com>
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      57b21ef1
    • Srinivasa Rao Mandadapu's avatar
      ASoC: qcom: lpass-cpu: Fix pop noise during audio capture begin · 3e4b0fbb
      Srinivasa Rao Mandadapu authored
      [ Upstream commit c8a4556d
      
       ]
      
      This patch fixes PoP noise of around 15ms observed during audio
      capture begin.
      Enables BCLK and LRCLK in snd_soc_dai_ops prepare call for
      introducing some delay before capture start.
      
      (am from https://patchwork.kernel.org/patch/12276369/)
      (also found at https://lore.kernel.org/r/20210524142114.18676-1-srivasam@codeaurora.org)
      
      Co-developed-by: default avatarJudy Hsiao <judyhsiao@chromium.org>
      Signed-off-by: default avatarJudy Hsiao <judyhsiao@chromium.org>
      Signed-off-by: default avatarSrinivasa Rao Mandadapu <srivasam@codeaurora.org>
      Reviewed-by: default avatarSrinivas Kandagatla <srinivas.kandagatla@linaro.org>
      Link: https://lore.kernel.org/r/20210604154545.1198337-1-judyhsiao@chromium.org
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3e4b0fbb
    • Saravana Kannan's avatar
      drm/sun4i: dw-hdmi: Make HDMI PHY into a platform device · 360609fc
      Saravana Kannan authored
      [ Upstream commit 9bf37977
      
       ]
      
      On sunxi boards that use HDMI output, HDMI device probe keeps being
      avoided indefinitely with these repeated messages in dmesg:
      
        platform 1ee0000.hdmi: probe deferral - supplier 1ef0000.hdmi-phy
          not ready
      
      There's a fwnode_link being created with fw_devlink=on between hdmi
      and hdmi-phy nodes, because both nodes have 'compatible' property set.
      
      Fw_devlink code assumes that nodes that have compatible property
      set will also have a device associated with them by some driver
      eventually. This is not the case with the current sun8i-hdmi
      driver.
      
      This commit makes sun8i-hdmi-phy into a proper platform device
      and fixes the display pipeline probe on sunxi boards that use HDMI.
      
      More context: https://lkml.org/lkml/2021/5/16/203
      
      Signed-off-by: default avatarSaravana Kannan <saravanak@google.com>
      Signed-off-by: default avatarOndrej Jirman <megous@megous.com>
      Tested-by: default avatarAndre Przywara <andre.przywara@arm.com>
      Signed-off-by: default avatarMaxime Ripard <maxime@cerno.tech>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210607085836.2827429-1-megous@megous.com
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      360609fc
    • Sergio Paracuellos's avatar
      pinctrl: ralink: rt2880: avoid to error in calls is pin is already enabled · 5bd6bcb3
      Sergio Paracuellos authored
      [ Upstream commit eb367d87
      
       ]
      
      In 'rt2880_pmx_group_enable' driver is printing an error and returning
      -EBUSY if a pin has been already enabled. This begets anoying messages
      in the caller when this happens like the following:
      
      rt2880-pinmux pinctrl: pcie is already enabled
      mt7621-pci 1e140000.pcie: Error applying setting, reverse things back
      
      To avoid this just print the already enabled message in the pinctrl
      driver and return 0 instead to not confuse the user with a real
      bad problem.
      
      Signed-off-by: default avatarSergio Paracuellos <sergio.paracuellos@gmail.com>
      Link: https://lore.kernel.org/r/20210604055337.20407-1-sergio.paracuellos@gmail.com
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5bd6bcb3
    • Oder Chiou's avatar
      ASoC: rt5682: Fix the fast discharge for headset unplugging in soundwire mode · 6d0dc1b3
      Oder Chiou authored
      [ Upstream commit 49783c6f ]
      
      Based on ("5a15cd7f
      
      "), the setting also
      should be set in soundwire mode.
      
      Signed-off-by: default avatarOder Chiou <oder_chiou@realtek.com>
      Link: https://lore.kernel.org/r/20210604063150.29925-1-oder_chiou@realtek.com
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6d0dc1b3