Skip to content
  1. Aug 26, 2021
  2. Aug 15, 2021
    • Greg Kroah-Hartman's avatar
    • YueHaibing's avatar
      net: xilinx_emaclite: Do not print real IOMEM pointer · 93224014
      YueHaibing authored
      commit d0d62baa
      
       upstream.
      
      Printing kernel pointers is discouraged because they might leak kernel
      memory layout.  This fixes smatch warning:
      
      drivers/net/ethernet/xilinx/xilinx_emaclite.c:1191 xemaclite_of_probe() warn:
       argument 4 to %08lX specifier is cast from pointer
      
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarPavel Machek (CIP) <pavel@denx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      93224014
    • Miklos Szeredi's avatar
      ovl: prevent private clone if bind mount is not allowed · 963d85d6
      Miklos Szeredi authored
      commit 427215d8
      
       upstream.
      
      Add the following checks from __do_loopback() to clone_private_mount() as
      well:
      
       - verify that the mount is in the current namespace
      
       - verify that there are no locked children
      
      Reported-by: default avatarAlois Wohlschlager <alois1@gmx-topmail.de>
      Fixes: c771d683
      
       ("vfs: introduce clone_private_mount()")
      Cc: <stable@vger.kernel.org> # v3.18
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      963d85d6
    • Pali Rohár's avatar
      ppp: Fix generating ppp unit id when ifname is not specified · ef8e4a33
      Pali Rohár authored
      commit 3125f26c
      
       upstream.
      
      When registering new ppp interface via PPPIOCNEWUNIT ioctl then kernel has
      to choose interface name as this ioctl API does not support specifying it.
      
      Kernel in this case register new interface with name "ppp<id>" where <id>
      is the ppp unit id, which can be obtained via PPPIOCGUNIT ioctl. This
      applies also in the case when registering new ppp interface via rtnl
      without supplying IFLA_IFNAME.
      
      PPPIOCNEWUNIT ioctl allows to specify own ppp unit id which will kernel
      assign to ppp interface, in case this ppp id is not already used by other
      ppp interface.
      
      In case user does not specify ppp unit id then kernel choose the first free
      ppp unit id. This applies also for case when creating ppp interface via
      rtnl method as it does not provide a way for specifying own ppp unit id.
      
      If some network interface (does not have to be ppp) has name "ppp<id>"
      with this first free ppp id then PPPIOCNEWUNIT ioctl or rtnl call fails.
      
      And registering new ppp interface is not possible anymore, until interface
      which holds conflicting name is renamed. Or when using rtnl method with
      custom interface name in IFLA_IFNAME.
      
      As list of allocated / used ppp unit ids is not possible to retrieve from
      kernel to userspace, userspace has no idea what happens nor which interface
      is doing this conflict.
      
      So change the algorithm how ppp unit id is generated. And choose the first
      number which is not neither used as ppp unit id nor in some network
      interface with pattern "ppp<id>".
      
      This issue can be simply reproduced by following pppd call when there is no
      ppp interface registered and also no interface with name pattern "ppp<id>":
      
          pppd ifname ppp1 +ipv6 noip noauth nolock local nodetach pty "pppd +ipv6 noip noauth nolock local nodetach notty"
      
      Or by creating the one ppp interface (which gets assigned ppp unit id 0),
      renaming it to "ppp1" and then trying to create a new ppp interface (which
      will always fails as next free ppp unit id is 1, but network interface with
      name "ppp1" exists).
      
      This patch fixes above described issue by generating new and new ppp unit
      id until some non-conflicting id with network interfaces is generated.
      
      Signed-off-by: default avatarPali Rohár <pali@kernel.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ef8e4a33
    • Longfang Liu's avatar
      USB:ehci:fix Kunpeng920 ehci hardware problem · 6b862aa3
      Longfang Liu authored
      commit 26b75952
      
       upstream.
      
      Kunpeng920's EHCI controller does not have SBRN register.
      Reading the SBRN register when the controller driver is
      initialized will get 0.
      
      When rebooting the EHCI driver, ehci_shutdown() will be called.
      if the sbrn flag is 0, ehci_shutdown() will return directly.
      The sbrn flag being 0 will cause the EHCI interrupt signal to
      not be turned off after reboot. this interrupt that is not closed
      will cause an exception to the device sharing the interrupt.
      
      Therefore, the EHCI controller of Kunpeng920 needs to skip
      the read operation of the SBRN register.
      
      Acked-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: default avatarLongfang Liu <liulongfang@huawei.com>
      Link: https://lore.kernel.org/r/1617958081-17999-1-git-send-email-liulongfang@huawei.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6b862aa3
    • Lai Jiangshan's avatar
      KVM: X86: MMU: Use the correct inherited permissions to get shadow page · 4c07e701
      Lai Jiangshan authored
      commit b1bd5cba upstream.
      
      When computing the access permissions of a shadow page, use the effective
      permissions of the walk up to that point, i.e. the logic AND of its parents'
      permissions.  Two guest PxE entries that point at the same table gfn need to
      be shadowed with different shadow pages if their parents' permissions are
      different.  KVM currently uses the effective permissions of the last
      non-leaf entry for all non-leaf entries.  Because all non-leaf SPTEs have
      full ("uwx") permissions, and the effective permissions are recorded only
      in role.access and merged into the leaves, this can lead to incorrect
      reuse of a shadow page and eventually to a missing guest protection page
      fault.
      
      For example, here is a shared pagetable:
      
         pgd[]   pud[]        pmd[]            virtual address pointers
                           /->pmd1(u--)->pte1(uw-)->page1 <- ptr1 (u--)
              /->pud1(uw-)--->pmd2(uw-)->pte2(uw-)->page2 <- ptr2 (uw-)
         pgd-|           (shared pmd[] as above)
              \->pud2(u--)--->pmd1(u--)->pte1(uw-)->page1 <- ptr3 (u--)
                           \->pmd2(uw-)->pte2(uw-)->page2 <- ptr4 (u--)
      
        pud1 and pud2 point to the same pmd table, so:
        - ptr1 and ptr3 points to the same page.
        - ptr2 and ptr4 points to the same page.
      
      (pud1 and pud2 here are pud entries, while pmd1 and pmd2 here are pmd entries)
      
      - First, the guest reads from ptr1 first and KVM prepares a shadow
        page table with role.access=u--, from ptr1's pud1 and ptr1's pmd1.
        "u--" comes from the effective permissions of pgd, pud1 and
        pmd1, which are stored in pt->access.  "u--" is used also to get
        the pagetable for pud1, instead of "uw-".
      
      - Then the guest writes to ptr2 and KVM reuses pud1 which is present.
        The hypervisor set up a shadow page for ptr2 with pt->access is "uw-"
        even though the pud1 pmd (because of the incorrect argument to
        kvm_mmu_get_page in the previous step) has role.access="u--".
      
      - Then the guest reads from ptr3.  The hypervisor reuses pud1's
        shadow pmd for pud2, because both use "u--" for their permissions.
        Thus, the shadow pmd already includes entries for both pmd1 and pmd2.
      
      - At last, the guest writes to ptr4.  This causes no vmexit or pagefault,
        because pud1's shadow page structures included an "uw-" page even though
        its role.access was "u--".
      
      Any kind of shared pagetable might have the similar problem when in
      virtual machine without TDP enabled if the permissions are different
      from different ancestors.
      
      In order to fix the problem, we change pt->access to be an array, and
      any access in it will not include permissions ANDed from child ptes.
      
      The test code is: https://lore.kernel.org/kvm/20210603050537.19605-1-jiangshanlai@gmail.com/
      Remember to test it with TDP disabled.
      
      The problem had existed long before the commit 41074d07
      
       ("KVM: MMU:
      Fix inherited permissions for emulated guest pte updates"), and it
      is hard to find which is the culprit.  So there is no fixes tag here.
      
      Signed-off-by: default avatarLai Jiangshan <laijs@linux.alibaba.com>
      Message-Id: <20210603052455.21023-1-jiangshanlai@gmail.com>
      Cc: stable@vger.kernel.org
      Fixes: cea0f0e7
      
       ("[PATCH] KVM: MMU: Shadow page table caching")
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      [OP: - apply arch/x86/kvm/mmu/* changes to arch/x86/kvm
           - apply documentation changes to Documentation/virtual/kvm/mmu.txt
           - adjusted context in arch/x86/kvm/paging_tmpl.h]
      Signed-off-by: default avatarOvidiu Panait <ovidiu.panait@windriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c07e701
    • Daniel Borkmann's avatar
      bpf, selftests: Adjust few selftest outcomes wrt unreachable code · c15b3877
      Daniel Borkmann authored
      commit 973377ff
      
       upstream.
      
      In almost all cases from test_verifier that have been changed in here, we've
      had an unreachable path with a load from a register which has an invalid
      address on purpose. This was basically to make sure that we never walk this
      path and to have the verifier complain if it would otherwise. Change it to
      match on the right error for unprivileged given we now test these paths
      under speculative execution.
      
      There's one case where we match on exact # of insns_processed. Due to the
      extra path, this will of course mismatch on unprivileged. Thus, restrict the
      test->insn_processed check to privileged-only.
      
      In one other case, we result in a 'pointer comparison prohibited' error. This
      is similarly due to verifying an 'invalid' branch where we end up with a value
      pointer on one side of the comparison.
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      [OP: ignore changes to tests that do not exist in 4.19]
      Signed-off-by: default avatarOvidiu Panait <ovidiu.panait@windriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c15b3877
    • Daniel Borkmann's avatar
      bpf: Fix leakage under speculation on mispredicted branches · 9df311b2
      Daniel Borkmann authored
      commit 9183671a upstream.
      
      The verifier only enumerates valid control-flow paths and skips paths that
      are unreachable in the non-speculative domain. And so it can miss issues
      under speculative execution on mispredicted branches.
      
      For example, a type confusion has been demonstrated with the following
      crafted program:
      
        // r0 = pointer to a map array entry
        // r6 = pointer to readable stack slot
        // r9 = scalar controlled by attacker
        1: r0 = *(u64 *)(r0) // cache miss
        2: if r0 != 0x0 goto line 4
        3: r6 = r9
        4: if r0 != 0x1 goto line 6
        5: r9 = *(u8 *)(r6)
        6: // leak r9
      
      Since line 3 runs iff r0 == 0 and line 5 runs iff r0 == 1, the verifier
      concludes that the pointer dereference on line 5 is safe. But: if the
      attacker trains both the branches to fall-through, such that the following
      is speculatively executed ...
      
        r6 = r9
        r9 = *(u8 *)(r6)
        // leak r9
      
      ... then the program will dereference an attacker-controlled value and could
      leak its content under speculative execution via side-channel. This requires
      to mistrain the branch predictor, which can be rather tricky, because the
      branches are mutually exclusive. However such training can be done at
      congruent addresses in user space using different branches that are not
      mutually exclusive. That is, by training branches in user space ...
      
        A:  if r0 != 0x0 goto line C
        B:  ...
        C:  if r0 != 0x0 goto line D
        D:  ...
      
      ... such that addresses A and C collide to the same CPU branch prediction
      entries in the PHT (pattern history table) as those of the BPF program's
      lines 2 and 4, respectively. A non-privileged attacker could simply brute
      force such collisions in the PHT until observing the attack succeeding.
      
      Alternative methods to mistrain the branch predictor are also possible that
      avoid brute forcing the collisions in the PHT. A reliable attack has been
      demonstrated, for example, using the following crafted program:
      
        // r0 = pointer to a [control] map array entry
        // r7 = *(u64 *)(r0 + 0), training/attack phase
        // r8 = *(u64 *)(r0 + 8), oob address
        // [...]
        // r0 = pointer to a [data] map array entry
        1: if r7 == 0x3 goto line 3
        2: r8 = r0
        // crafted sequence of conditional jumps to separate the conditional
        // branch in line 193 from the current execution flow
        3: if r0 != 0x0 goto line 5
        4: if r0 == 0x0 goto exit
        5: if r0 != 0x0 goto line 7
        6: if r0 == 0x0 goto exit
        [...]
        187: if r0 != 0x0 goto line 189
        188: if r0 == 0x0 goto exit
        // load any slowly-loaded value (due to cache miss in phase 3) ...
        189: r3 = *(u64 *)(r0 + 0x1200)
        // ... and turn it into known zero for verifier, while preserving slowly-
        // loaded dependency when executing:
        190: r3 &= 1
        191: r3 &= 2
        // speculatively bypassed phase dependency
        192: r7 += r3
        193: if r7 == 0x3 goto exit
        194: r4 = *(u8 *)(r8 + 0)
        // leak r4
      
      As can be seen, in training phase (phase != 0x3), the condition in line 1
      turns into false and therefore r8 with the oob address is overridden with
      the valid map value address, which in line 194 we can read out without
      issues. However, in attack phase, line 2 is skipped, and due to the cache
      miss in line 189 where the map value is (zeroed and later) added to the
      phase register, the condition in line 193 takes the fall-through path due
      to prior branch predictor training, where under speculation, it'll load the
      byte at oob address r8 (unknown scalar type at that point) which could then
      be leaked via side-channel.
      
      One way to mitigate these is to 'branch off' an unreachable path, meaning,
      the current verification path keeps following the is_branch_taken() path
      and we push the other branch to the verification stack. Given this is
      unreachable from the non-speculative domain, this branch's vstate is
      explicitly marked as speculative. This is needed for two reasons: i) if
      this path is solely seen from speculative execution, then we later on still
      want the dead code elimination to kick in in order to sanitize these
      instructions with jmp-1s, and ii) to ensure that paths walked in the
      non-speculative domain are not pruned from earlier walks of paths walked in
      the speculative domain. Additionally, for robustness, we mark the registers
      which have been part of the conditional as unknown in the speculative path
      given there should be no assumptions made on their content.
      
      The fix in here mitigates type confusion attacks described earlier due to
      i) all code paths in the BPF program being explored and ii) existing
      verifier logic already ensuring that given memory access instruction
      references one specific data structure.
      
      An alternative to this fix that has also been looked at in this scope was to
      mark aux->alu_state at the jump instruction with a BPF_JMP_TAKEN state as
      well as direction encoding (always-goto, always-fallthrough, unknown), such
      that mixing of different always-* directions themselves as well as mixing of
      always-* with unknown directions would cause a program rejection by the
      verifier, e.g. programs with constructs like 'if ([...]) { x = 0; } else
      { x = 1; }' with subsequent 'if (x == 1) { [...] }'. For unprivileged, this
      would result in only single direction always-* taken paths, and unknown taken
      paths being allowed, such that the former could be patched from a conditional
      jump to an unconditional jump (ja). Compared to this approach here, it would
      have two downsides: i) valid programs that otherwise are not performing any
      pointer arithmetic, etc, would potentially be rejected/broken, and ii) we are
      required to turn off path pruning for unprivileged, where both can be avoided
      in this work through pushing the invalid branch to the verification stack.
      
      The issue was originally discovered by Adam and Ofek, and later independently
      discovered and reported as a result of Benedict and Piotr's research work.
      
      Fixes: b2157399
      
       ("bpf: prevent out-of-bounds speculation")
      Reported-by: default avatarAdam Morrison <mad@cs.tau.ac.il>
      Reported-by: default avatarOfek Kirzner <ofekkir@gmail.com>
      Reported-by: default avatarBenedict Schlueter <benedict.schlueter@rub.de>
      Reported-by: default avatarPiotr Krysiuk <piotras@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Reviewed-by: default avatarBenedict Schlueter <benedict.schlueter@rub.de>
      Reviewed-by: default avatarPiotr Krysiuk <piotras@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      [OP: use allow_ptr_leaks instead of bypass_spec_v1]
      Signed-off-by: default avatarOvidiu Panait <ovidiu.panait@windriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9df311b2
    • Daniel Borkmann's avatar
      bpf: Do not mark insn as seen under speculative path verification · c510c184
      Daniel Borkmann authored
      commit fe9a5ca7
      
       upstream.
      
      ... in such circumstances, we do not want to mark the instruction as seen given
      the goal is still to jmp-1 rewrite/sanitize dead code, if it is not reachable
      from the non-speculative path verification. We do however want to verify it for
      safety regardless.
      
      With the patch as-is all the insns that have been marked as seen before the
      patch will also be marked as seen after the patch (just with a potentially
      different non-zero count). An upcoming patch will also verify paths that are
      unreachable in the non-speculative domain, hence this extension is needed.
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Reviewed-by: default avatarBenedict Schlueter <benedict.schlueter@rub.de>
      Reviewed-by: default avatarPiotr Krysiuk <piotras@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      [OP: - env->pass_cnt is not used in 4.19, so adjust sanitize_mark_insn_seen()
             to assign "true" instead
           - drop sanitize_insn_aux_data() comment changes, as the function is not
             present in 4.19]
      Signed-off-by: default avatarOvidiu Panait <ovidiu.panait@windriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c510c184
    • Daniel Borkmann's avatar
      bpf: Inherit expanded/patched seen count from old aux data · 0abc8c97
      Daniel Borkmann authored
      commit d203b0fd
      
       upstream.
      
      Instead of relying on current env->pass_cnt, use the seen count from the
      old aux data in adjust_insn_aux_data(), and expand it to the new range of
      patched instructions. This change is valid given we always expand 1:n
      with n>=1, so what applies to the old/original instruction needs to apply
      for the replacement as well.
      
      Not relying on env->pass_cnt is a prerequisite for a later change where we
      want to avoid marking an instruction seen when verified under speculative
      execution path.
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Reviewed-by: default avatarBenedict Schlueter <benedict.schlueter@rub.de>
      Reviewed-by: default avatarPiotr Krysiuk <piotras@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      [OP: - declare old_data as bool instead of u32 (struct bpf_insn_aux_data.seen
           is bool in 5.4)
           - adjusted context for 4.19]
      Signed-off-by: default avatarOvidiu Panait <ovidiu.panait@windriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0abc8c97
    • Masami Hiramatsu's avatar
      tracing: Reject string operand in the histogram expression · 7c165d58
      Masami Hiramatsu authored
      commit a9d10ca4 upstream.
      
      Since the string type can not be the target of the addition / subtraction
      operation, it must be rejected. Without this fix, the string type silently
      converted to digits.
      
      Link: https://lkml.kernel.org/r/162742654278.290973.1523000673366456634.stgit@devnote2
      
      Cc: stable@vger.kernel.org
      Fixes: 100719dc
      
       ("tracing: Add simple expression support to hist triggers")
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7c165d58
    • Sean Christopherson's avatar
      KVM: SVM: Fix off-by-one indexing when nullifying last used SEV VMCB · 17b9e2da
      Sean Christopherson authored
      [ Upstream commit 179c6c27 ]
      
      Use the raw ASID, not ASID-1, when nullifying the last used VMCB when
      freeing an SEV ASID.  The consumer, pre_sev_run(), indexes the array by
      the raw ASID, thus KVM could get a false negative when checking for a
      different VMCB if KVM manages to reallocate the same ASID+VMCB combo for
      a new VM.
      
      Note, this cannot cause a functional issue _in the current code_, as
      pre_sev_run() also checks which pCPU last did VMRUN for the vCPU, and
      last_vmentry_cpu is initialized to -1 during vCPU creation, i.e. is
      guaranteed to mismatch on the first VMRUN.  However, prior to commit
      8a14fe4f ("kvm: x86: Move last_cpu into kvm_vcpu_arch as
      last_vmentry_cpu"), SVM tracked pCPU on its own and zero-initialized the
      last_cpu variable.  Thus it's theoretically possible that older versions
      of KVM could miss a TLB flush if the first VMRUN is on pCPU0 and the ASID
      and VMCB exactly match those of a prior VM.
      
      Fixes: 70cd94e6
      
       ("KVM: SVM: VMRUN should use associated ASID when SEV is enabled")
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      17b9e2da
  3. Aug 12, 2021