Skip to content
  1. Feb 10, 2021
  2. Feb 07, 2021
  3. Feb 04, 2021
    • Jay Zhou's avatar
      KVM: x86: get smi pending status correctly · e895a39a
      Jay Zhou authored
      
      
      commit 1f7becf1 upstream.
      
      The injection process of smi has two steps:
      
          Qemu                        KVM
      Step1:
          cpu->interrupt_request &= \
              ~CPU_INTERRUPT_SMI;
          kvm_vcpu_ioctl(cpu, KVM_SMI)
      
                                      call kvm_vcpu_ioctl_smi() and
                                      kvm_make_request(KVM_REQ_SMI, vcpu);
      
      Step2:
          kvm_vcpu_ioctl(cpu, KVM_RUN, 0)
      
                                      call process_smi() if
                                      kvm_check_request(KVM_REQ_SMI, vcpu) is
                                      true, mark vcpu->arch.smi_pending = true;
      
      The vcpu->arch.smi_pending will be set true in step2, unfortunately if
      vcpu paused between step1 and step2, the kvm_run->immediate_exit will be
      set and vcpu has to exit to Qemu immediately during step2 before mark
      vcpu->arch.smi_pending true.
      During VM migration, Qemu will get the smi pending status from KVM using
      KVM_GET_VCPU_EVENTS ioctl at the downtime, then the smi pending status
      will be lost.
      
      Signed-off-by: default avatarJay Zhou <jianjay.zhou@huawei.com>
      Signed-off-by: default avatarShengen Zhuang <zhuangshengen@huawei.com>
      Message-Id: <20210118084720.1585-1-jianjay.zhou@huawei.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e895a39a
    • Maxim Levitsky's avatar
      KVM: nVMX: Sync unsync'd vmcs02 state to vmcs12 on migration · 427adbb3
      Maxim Levitsky authored
      
      
      commit d51e1d3f upstream.
      
      Even when we are outside the nested guest, some vmcs02 fields
      may not be in sync vs vmcs12.  This is intentional, even across
      nested VM-exit, because the sync can be delayed until the nested
      hypervisor performs a VMCLEAR or a VMREAD/VMWRITE that affects those
      rarely accessed fields.
      
      However, during KVM_GET_NESTED_STATE, the vmcs12 has to be up to date to
      be able to restore it.  To fix that, call copy_vmcs02_to_vmcs12_rare()
      before the vmcs12 contents are copied to userspace.
      
      Fixes: 7952d769 ("KVM: nVMX: Sync rarely accessed guest fields only when needed")
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210114205449.8715-2-mlevitsk@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      427adbb3
    • Paolo Bonzini's avatar
      KVM: x86: allow KVM_REQ_GET_NESTED_STATE_PAGES outside guest mode for VMX · cffcb5e0
      Paolo Bonzini authored
      
      
      commit 9a78e158 upstream.
      
      VMX also uses KVM_REQ_GET_NESTED_STATE_PAGES for the Hyper-V eVMCS,
      which may need to be loaded outside guest mode.  Therefore we cannot
      WARN in that case.
      
      However, that part of nested_get_vmcs12_pages is _not_ needed at
      vmentry time.  Split it out of KVM_REQ_GET_NESTED_STATE_PAGES handling,
      so that both vmentry and migration (and in the latter case, independent
      of is_guest_mode) do the parts that are needed.
      
      Cc: <stable@vger.kernel.org> # 5.10.x: f2c7ef3b: KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES
      Cc: <stable@vger.kernel.org> # 5.10.x
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cffcb5e0
    • Maxim Levitsky's avatar
      KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit · 0faceb7d
      Maxim Levitsky authored
      
      
      commit f2c7ef3b upstream.
      
      It is possible to exit the nested guest mode, entered by
      svm_set_nested_state prior to first vm entry to it (e.g due to pending event)
      if the nested run was not pending during the migration.
      
      In this case we must not switch to the nested msr permission bitmap.
      Also add a warning to catch similar cases in the future.
      
      Fixes: a7d5c7ce ("KVM: nSVM: delay MSR permission processing to first nested VM run")
      
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210107093854.882483-2-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0faceb7d
    • Like Xu's avatar
      KVM: x86/pmu: Fix UBSAN shift-out-of-bounds warning in intel_pmu_refresh() · a519d980
      Like Xu authored
      
      
      commit e61ab2a3 upstream.
      
      Since we know vPMU will not work properly when (1) the guest bit_width(s)
      of the [gp|fixed] counters are greater than the host ones, or (2) guest
      requested architectural events exceeds the range supported by the host, so
      we can setup a smaller left shift value and refresh the guest cpuid entry,
      thus fixing the following UBSAN shift-out-of-bounds warning:
      
      shift exponent 197 is too large for 64-bit type 'long long unsigned int'
      
      Call Trace:
       __dump_stack lib/dump_stack.c:79 [inline]
       dump_stack+0x107/0x163 lib/dump_stack.c:120
       ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
       __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
       intel_pmu_refresh.cold+0x75/0x99 arch/x86/kvm/vmx/pmu_intel.c:348
       kvm_vcpu_after_set_cpuid+0x65a/0xf80 arch/x86/kvm/cpuid.c:177
       kvm_vcpu_ioctl_set_cpuid2+0x160/0x440 arch/x86/kvm/cpuid.c:308
       kvm_arch_vcpu_ioctl+0x11b6/0x2d70 arch/x86/kvm/x86.c:4709
       kvm_vcpu_ioctl+0x7b9/0xdb0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3386
       vfs_ioctl fs/ioctl.c:48 [inline]
       __do_sys_ioctl fs/ioctl.c:753 [inline]
       __se_sys_ioctl fs/ioctl.c:739 [inline]
       __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported-by: default avatar <syzbot+ae488dc136a4cc6ba32b@syzkaller.appspotmail.com>
      Signed-off-by: default avatarLike Xu <like.xu@linux.intel.com>
      Message-Id: <20210118025800.34620-1-like.xu@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a519d980
    • Like Xu's avatar
      KVM: x86/pmu: Fix HW_REF_CPU_CYCLES event pseudo-encoding in intel_arch_events[] · 0517693d
      Like Xu authored
      
      
      commit 98dd2f10 upstream.
      
      The HW_REF_CPU_CYCLES event on the fixed counter 2 is pseudo-encoded as
      0x0300 in the intel_perfmon_event_map[]. Correct its usage.
      
      Fixes: 62079d8a ("KVM: PMU: add proper support for fixed counter 2")
      Signed-off-by: default avatarLike Xu <like.xu@linux.intel.com>
      Message-Id: <20201230081916.63417-1-like.xu@linux.intel.com>
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0517693d
    • Nick Desaulniers's avatar
      x86/entry: Emit a symbol for register restoring thunk · 4c973f75
      Nick Desaulniers authored
      
      
      commit 5e6dca82 upstream.
      
      Arnd found a randconfig that produces the warning:
      
        arch/x86/entry/thunk_64.o: warning: objtool: missing symbol for insn at
        offset 0x3e
      
      when building with LLVM_IAS=1 (Clang's integrated assembler). Josh
      notes:
      
        With the LLVM assembler not generating section symbols, objtool has no
        way to reference this code when it generates ORC unwinder entries,
        because this code is outside of any ELF function.
      
        The limitation now being imposed by objtool is that all code must be
        contained in an ELF symbol.  And .L symbols don't create such symbols.
      
        So basically, you can use an .L symbol *inside* a function or a code
        segment, you just can't use the .L symbol to contain the code using a
        SYM_*_START/END annotation pair.
      
      Fangrui notes that this optimization is helpful for reducing image size
      when compiling with -ffunction-sections and -fdata-sections. I have
      observed on the order of tens of thousands of symbols for the kernel
      images built with those flags.
      
      A patch has been authored against GNU binutils to match this behavior
      of not generating unused section symbols ([1]), so this will
      also become a problem for users of GNU binutils once they upgrade to 2.36.
      
      Omit the .L prefix on a label so that the assembler will emit an entry
      into the symbol table for the label, with STB_LOCAL binding. This
      enables objtool to generate proper unwind info here with LLVM_IAS=1 or
      GNU binutils 2.36+.
      
       [ bp: Massage commit message. ]
      
      Reported-by: default avatarArnd Bergmann <arnd@arndb.de>
      Suggested-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Suggested-by: default avatarBorislav Petkov <bp@alien8.de>
      Suggested-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Link: https://lkml.kernel.org/r/20210112194625.4181814-1-ndesaulniers@google.com
      Link: https://github.com/ClangBuiltLinux/linux/issues/1209
      Link: https://reviews.llvm.org/D93783
      Link: https://sourceware.org/binutils/docs/as/Symbol-Names.html
      Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1408485ce69f844dcd7ded093a8
      
       [1]
      Cc: Chris Clayton <chris2553@googlemail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c973f75
    • Juergen Gross's avatar
      x86/xen: avoid warning in Xen pv guest with CONFIG_AMD_MEM_ENCRYPT enabled · b444b86e
      Juergen Gross authored
      
      
      commit 2e924936 upstream.
      
      When booting a kernel which has been built with CONFIG_AMD_MEM_ENCRYPT
      enabled as a Xen pv guest a warning is issued for each processor:
      
      [    5.964347] ------------[ cut here ]------------
      [    5.968314] WARNING: CPU: 0 PID: 1 at /home/gross/linux/head/arch/x86/xen/enlighten_pv.c:660 get_trap_addr+0x59/0x90
      [    5.972321] Modules linked in:
      [    5.976313] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.11.0-rc5-default #75
      [    5.980313] Hardware name: Dell Inc. OptiPlex 9020/0PC5F7, BIOS A05 12/05/2013
      [    5.984313] RIP: e030:get_trap_addr+0x59/0x90
      [    5.988313] Code: 42 10 83 f0 01 85 f6 74 04 84 c0 75 1d b8 01 00 00 00 c3 48 3d 00 80 83 82 72 08 48 3d 20 81 83 82 72 0c b8 01 00 00 00 eb db <0f> 0b 31 c0 c3 48 2d 00 80 83 82 48 ba 72 1c c7 71 1c c7 71 1c 48
      [    5.992313] RSP: e02b:ffffc90040033d38 EFLAGS: 00010202
      [    5.996313] RAX: 0000000000000001 RBX: ffffffff82a141d0 RCX: ffffffff8222ec38
      [    6.000312] RDX: ffffffff8222ec38 RSI: 0000000000000005 RDI: ffffc90040033d40
      [    6.004313] RBP: ffff8881003984a0 R08: 0000000000000007 R09: ffff888100398000
      [    6.008312] R10: 0000000000000007 R11: ffffc90040246000 R12: ffff8884082182a8
      [    6.012313] R13: 0000000000000100 R14: 000000000000001d R15: ffff8881003982d0
      [    6.016316] FS:  0000000000000000(0000) GS:ffff888408200000(0000) knlGS:0000000000000000
      [    6.020313] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    6.024313] CR2: ffffc900020ef000 CR3: 000000000220a000 CR4: 0000000000050660
      [    6.028314] Call Trace:
      [    6.032313]  cvt_gate_to_trap.part.7+0x3f/0x90
      [    6.036313]  ? asm_exc_double_fault+0x30/0x30
      [    6.040313]  xen_convert_trap_info+0x87/0xd0
      [    6.044313]  xen_pv_cpu_up+0x17a/0x450
      [    6.048313]  bringup_cpu+0x2b/0xc0
      [    6.052313]  ? cpus_read_trylock+0x50/0x50
      [    6.056313]  cpuhp_invoke_callback+0x80/0x4c0
      [    6.060313]  _cpu_up+0xa7/0x140
      [    6.064313]  cpu_up+0x98/0xd0
      [    6.068313]  bringup_nonboot_cpus+0x4f/0x60
      [    6.072313]  smp_init+0x26/0x79
      [    6.076313]  kernel_init_freeable+0x103/0x258
      [    6.080313]  ? rest_init+0xd0/0xd0
      [    6.084313]  kernel_init+0xa/0x110
      [    6.088313]  ret_from_fork+0x1f/0x30
      [    6.092313] ---[ end trace be9ecf17dceeb4f3 ]---
      
      Reason is that there is no Xen pv trap entry for X86_TRAP_VC.
      
      Fix that by adding a generic trap handler for unknown traps and wire all
      unknown bare metal handlers to this generic handler, which will just
      crash the system in case such a trap will ever happen.
      
      Fixes: 0786138c ("x86/sev-es: Add a Runtime #VC Exception Handler")
      Cc: <stable@vger.kernel.org> # v5.10
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b444b86e
  4. Jan 27, 2021
  5. Jan 23, 2021
    • Dexuan Cui's avatar
      x86/hyperv: Initialize clockevents after LAPIC is initialized · 4473923b
      Dexuan Cui authored
      
      
      [ Upstream commit fff7b5e6 ]
      
      With commit 4df4cb9e, the Hyper-V direct-mode STIMER is actually
      initialized before LAPIC is initialized: see
      
        apic_intr_mode_init()
      
          x86_platform.apic_post_init()
            hyperv_init()
              hv_stimer_alloc()
      
          apic_bsp_setup()
            setup_local_APIC()
      
      setup_local_APIC() temporarily disables LAPIC, initializes it and
      re-eanble it.  The direct-mode STIMER depends on LAPIC, and when it's
      registered, it can be programmed immediately and the timer can fire
      very soon:
      
        hv_stimer_init
          clockevents_config_and_register
            clockevents_register_device
              tick_check_new_device
                tick_setup_device
                  tick_setup_periodic(), tick_setup_oneshot()
                    clockevents_program_event
      
      When the timer fires in the hypervisor, if the LAPIC is in the
      disabled state, new versions of Hyper-V ignore the event and don't inject
      the timer interrupt into the VM, and hence the VM hangs when it boots.
      
      Note: when the VM starts/reboots, the LAPIC is pre-enabled by the
      firmware, so the window of LAPIC being temporarily disabled is pretty
      small, and the issue can only happen once out of 100~200 reboots for
      a 40-vCPU VM on one dev host, and on another host the issue doesn't
      reproduce after 2000 reboots.
      
      The issue is more noticeable for kdump/kexec, because the LAPIC is
      disabled by the first kernel, and stays disabled until the kdump/kexec
      kernel enables it. This is especially an issue to a Generation-2 VM
      (for which Hyper-V doesn't emulate the PIT timer) when CONFIG_HZ=1000
      (rather than CONFIG_HZ=250) is used.
      
      Fix the issue by moving hv_stimer_alloc() to a later place where the
      LAPIC timer is initialized.
      
      Fixes: 4df4cb9e ("x86/hyperv: Initialize clockevents earlier in CPU onlining")
      Signed-off-by: default avatarDexuan Cui <decui@microsoft.com>
      Reviewed-by: default avatarMichael Kelley <mikelley@microsoft.com>
      Link: https://lore.kernel.org/r/20210116223136.13892-1-decui@microsoft.com
      
      
      Signed-off-by: default avatarWei Liu <wei.liu@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4473923b
  6. Jan 20, 2021
  7. Jan 17, 2021
  8. Jan 13, 2021