Skip to content
  1. Nov 18, 2021
    • David Woodhouse's avatar
      KVM: nVMX: Use kvm_read_guest_offset_cached() for nested VMCS check · 7d0172b3
      David Woodhouse authored
      
      
      Kill another mostly gratuitous kvm_vcpu_map() which could just use the
      userspace HVA for it.
      
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <20211115165030.7422-6-dwmw2@infradead.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7d0172b3
    • David Woodhouse's avatar
      KVM: x86/xen: Use sizeof_field() instead of open-coding it · 6a834754
      David Woodhouse authored
      
      
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <20211115165030.7422-4-dwmw2@infradead.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6a834754
    • David Woodhouse's avatar
      KVM: nVMX: Use kvm_{read,write}_guest_cached() for shadow_vmcs12 · 297d597a
      David Woodhouse authored
      
      
      Using kvm_vcpu_map() for reading from the guest is entirely gratuitous,
      when all we do is a single memcpy and unmap it again. Fix it up to use
      kvm_read_guest()... but in fact I couldn't bring myself to do that
      without also making it use a gfn_to_hva_cache for both that *and* the
      copy in the other direction.
      
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <20211115165030.7422-5-dwmw2@infradead.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      297d597a
    • David Woodhouse's avatar
      KVM: x86/xen: Fix get_attr of KVM_XEN_ATTR_TYPE_SHARED_INFO · 4e843647
      David Woodhouse authored
      In commit 319afe68 ("KVM: xen: do not use struct gfn_to_hva_cache") we
      stopped storing this in-kernel as a GPA, and started storing it as a GFN.
      Which means we probably should have stopped calling gpa_to_gfn() on it
      when userspace asks for it back.
      
      Cc: stable@vger.kernel.org
      Fixes: 319afe68
      
       ("KVM: xen: do not use struct gfn_to_hva_cache")
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <20211115165030.7422-2-dwmw2@infradead.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4e843647
    • Maxim Levitsky's avatar
      KVM: x86/mmu: include EFER.LMA in extended mmu role · b8453cdc
      Maxim Levitsky authored
      
      
      Incorporate EFER.LMA into kvm_mmu_extended_role, as it used to compute the
      guest root level and is not reflected in kvm_mmu_page_role.level when TDP
      is in use.  When simply running the guest, it is impossible for EFER.LMA
      and kvm_mmu.root_level to get out of sync, as the guest cannot transition
      from PAE paging to 64-bit paging without toggling CR0.PG, i.e. without
      first bouncing through a different MMU context.  And stuffing guest state
      via KVM_SET_SREGS{,2} also ensures a full MMU context reset.
      
      However, if KVM_SET_SREGS{,2} is followed by KVM_SET_NESTED_STATE, e.g. to
      set guest state when migrating the VM while L2 is active, the vCPU state
      will reflect L2, not L1.  If L1 is using TDP for L2, then root_mmu will
      have been configured using L2's state, despite not being used for L2.  If
      L2.EFER.LMA != L1.EFER.LMA, and L2 is using PAE paging, then root_mmu will
      be configured for guest PAE paging, but will match the mmu_role for 64-bit
      paging and cause KVM to not reconfigure root_mmu on the next nested VM-Exit.
      
      Alternatively, the root_mmu's role could be invalidated after a successful
      KVM_SET_NESTED_STATE that yields vcpu->arch.mmu != vcpu->arch.root_mmu,
      i.e. that switches the active mmu to guest_mmu, but doing so is unnecessarily
      tricky, and not even needed if L1 and L2 do have the same role (e.g., they
      are both 64-bit guests and run with the same CR4).
      
      Suggested-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20211115131837.195527-3-mlevitsk@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b8453cdc
    • Maxim Levitsky's avatar
      KVM: nVMX: don't use vcpu->arch.efer when checking host state on nested state load · af957eeb
      Maxim Levitsky authored
      
      
      When loading nested state, don't use check vcpu->arch.efer to get the
      L1 host's 64-bit vs. 32-bit state and don't check it for consistency
      with respect to VM_EXIT_HOST_ADDR_SPACE_SIZE, as register state in vCPU
      may be stale when KVM_SET_NESTED_STATE is called---and architecturally
      does not exist.  When restoring L2 state in KVM, the CPU is placed in
      non-root where nested VMX code has no snapshot of L1 host state: VMX
      (conditionally) loads host state fields loaded on VM-exit, but they need
      not correspond to the state before entry.  A simple case occurs in KVM
      itself, where the host RIP field points to vmx_vmexit rather than the
      instruction following vmlaunch/vmresume.
      
      However, for the particular case of L1 being in 32- or 64-bit mode
      on entry, the exit controls can be treated instead as the source of
      truth regarding the state of L1 on entry, and can be used to check
      that vmcs12.VM_EXIT_HOST_ADDR_SPACE_SIZE matches vmcs12.HOST_EFER if
      vmcs12.VM_EXIT_LOAD_IA32_EFER is set.  The consistency check on CPU
      EFER vs. vmcs12.VM_EXIT_HOST_ADDR_SPACE_SIZE, instead, happens only
      on VM-Enter.  That's because, again, there's conceptually no "current"
      L1 EFER to check on KVM_SET_NESTED_STATE.
      
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20211115131837.195527-2-mlevitsk@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      af957eeb
    • David Woodhouse's avatar
      KVM: Fix steal time asm constraints · 964b7aa0
      David Woodhouse authored
      In 64-bit mode, x86 instruction encoding allows us to use the low 8 bits
      of any GPR as an 8-bit operand. In 32-bit mode, however, we can only use
      the [abcd] registers. For which, GCC has the "q" constraint instead of
      the less restrictive "r".
      
      Also fix st->preempted, which is an input/output operand rather than an
      input.
      
      Fixes: 7e2175eb
      
       ("KVM: x86: Fix recording of guest steal time / preempted status")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <89bf72db1b859990355f9c40713a34e0d2d86c98.camel@infradead.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      964b7aa0
    • Paul Durrant's avatar
      cpuid: kvm_find_kvm_cpuid_features() should be declared 'static' · dc23a511
      Paul Durrant authored
      
      
      The lack a static declaration currently results in:
      
      arch/x86/kvm/cpuid.c:128:26: warning: no previous prototype for function 'kvm_find_kvm_cpuid_features'
      
      when compiling with "W=1".
      
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Fixes: 760849b1
      
       ("KVM: x86: Make sure KVM_CPUID_FEATURES really are KVM_CPUID_FEATURES")
      Signed-off-by: default avatarPaul Durrant <pdurrant@amazon.com>
      Message-Id: <20211115144131.5943-1-pdurrant@amazon.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      dc23a511
  2. Nov 16, 2021
    • 黄乐's avatar
      KVM: x86: Fix uninitialized eoi_exit_bitmap usage in vcpu_load_eoi_exitmap() · c5adbb3a
      黄乐 authored
      In vcpu_load_eoi_exitmap(), currently the eoi_exit_bitmap[4] array is
      initialized only when Hyper-V context is available, in other path it is
      just passed to kvm_x86_ops.load_eoi_exitmap() directly from on the stack,
      which would cause unexpected interrupt delivery/handling issues, e.g. an
      *old* linux kernel that relies on PIT to do clock calibration on KVM might
      randomly fail to boot.
      
      Fix it by passing ioapic_handled_vectors to load_eoi_exitmap() when Hyper-V
      context is not available.
      
      Fixes: f2bc14b6
      
       ("KVM: x86: hyper-v: Prepare to meet unallocated Hyper-V context")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarHuang Le <huangle1@jd.com>
      Message-Id: <62115b277dab49ea97da5633f8522daf@jd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c5adbb3a
  3. Nov 11, 2021
    • Vitaly Kuznetsov's avatar
      KVM: x86: Drop arbitrary KVM_SOFT_MAX_VCPUS · da1bfd52
      Vitaly Kuznetsov authored
      
      
      KVM_CAP_NR_VCPUS is used to get the "recommended" maximum number of
      VCPUs and arm64/mips/riscv report num_online_cpus(). Powerpc reports
      either num_online_cpus() or num_present_cpus(), s390 has multiple
      constants depending on hardware features. On x86, KVM reports an
      arbitrary value of '710' which is supposed to be the maximum tested
      value but it's possible to test all KVM_MAX_VCPUS even when there are
      less physical CPUs available.
      
      Drop the arbitrary '710' value and return num_online_cpus() on x86 as
      well. The recommendation will match other architectures and will mean
      'no CPU overcommit'.
      
      For reference, QEMU only queries KVM_CAP_NR_VCPUS to print a warning
      when the requested vCPU number exceeds it. The static limit of '710'
      is quite weird as smaller systems with just a few physical CPUs should
      certainly "recommend" less.
      
      Suggested-by: default avatarEduardo Habkost <ehabkost@redhat.com>
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20211111134733.86601-1-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      da1bfd52
    • Vipin Sharma's avatar
      KVM: Move INVPCID type check from vmx and svm to the common kvm_handle_invpcid() · 796c83c5
      Vipin Sharma authored
      
      
      Handle #GP on INVPCID due to an invalid type in the common switch
      statement instead of relying on the callers (VMX and SVM) to manually
      validate the type.
      
      Unlike INVVPID and INVEPT, INVPCID is not explicitly documented to check
      the type before reading the operand from memory, so deferring the
      type validity check until after that point is architecturally allowed.
      
      Signed-off-by: default avatarVipin Sharma <vipinsh@google.com>
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211109174426.2350547-3-vipinsh@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      796c83c5
    • Vipin Sharma's avatar
      KVM: VMX: Add a helper function to retrieve the GPR index for INVPCID, INVVPID, and INVEPT · 329bd56c
      Vipin Sharma authored
      
      
      handle_invept(), handle_invvpid(), handle_invpcid() read the same reg2
      field in vmcs.VMX_INSTRUCTION_INFO to get the index of the GPR that
      holds the invalidation type. Add a helper to retrieve reg2 from VMX
      instruction info to consolidate and document the shift+mask magic.
      
      Signed-off-by: default avatarVipin Sharma <vipinsh@google.com>
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211109174426.2350547-2-vipinsh@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      329bd56c
    • Sean Christopherson's avatar
      KVM: nVMX: Clean up x2APIC MSR handling for L2 · a5e0c252
      Sean Christopherson authored
      
      
      Clean up the x2APIC MSR bitmap intereption code for L2, which is the last
      holdout of open coded bitmap manipulations.  Freshen up the SDM/PRM
      comment, rename the function to make it abundantly clear the funky
      behavior is x2APIC specific, and explain _why_ vmcs01's bitmap is ignored
      (the previous comment was flat out wrong for x2APIC behavior).
      
      No functional change intended.
      
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211109013047.2041518-5-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a5e0c252
    • Sean Christopherson's avatar
      KVM: VMX: Macrofy the MSR bitmap getters and setters · 0cacb80b
      Sean Christopherson authored
      
      
      Add builder macros to generate the MSR bitmap helpers to reduce the
      amount of copy-paste code, especially with respect to all the magic
      numbers needed to calc the correct bit location.
      
      No functional change intended.
      
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211109013047.2041518-4-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0cacb80b
    • Sean Christopherson's avatar
      KVM: nVMX: Handle dynamic MSR intercept toggling · 67f4b996
      Sean Christopherson authored
      Always check vmcs01's MSR bitmap when merging L0 and L1 bitmaps for L2,
      and always update the relevant bits in vmcs02.  This fixes two distinct,
      but intertwined bugs related to dynamic MSR bitmap modifications.
      
      The first issue is that KVM fails to enable MSR interception in vmcs02
      for the FS/GS base MSRs if L1 first runs L2 with interception disabled,
      and later enables interception.
      
      The second issue is that KVM fails to honor userspace MSR filtering when
      preparing vmcs02.
      
      Fix both issues simultaneous as fixing only one of the issues (doesn't
      matter which) would create a mess that no one should have to bisect.
      Fixing only the first bug would exacerbate the MSR filtering issue as
      userspace would see inconsistent behavior depending on the whims of L1.
      Fixing only the second bug (MSR filtering) effectively requires fixing
      the first, as the nVMX code only knows how to transition vmcs02's
      bitmap from 1->0.
      
      Move the various accessor/mutators that are currently buried in vmx.c
      into vmx.h so that they can be shared by the nested code.
      
      Fixes: 1a155254 ("KVM: x86: Introduce MSR filtering")
      Fixes: d69129b4
      
       ("KVM: nVMX: Disable intercept for FS/GS base MSRs in vmcs02 when possible")
      Cc: stable@vger.kernel.org
      Cc: Alexander Graf <graf@amazon.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211109013047.2041518-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      67f4b996
    • Sean Christopherson's avatar
      KVM: nVMX: Query current VMCS when determining if MSR bitmaps are in use · 7dfbc624
      Sean Christopherson authored
      Check the current VMCS controls to determine if an MSR write will be
      intercepted due to MSR bitmaps being disabled.  In the nested VMX case,
      KVM will disable MSR bitmaps in vmcs02 if they're disabled in vmcs12 or
      if KVM can't map L1's bitmaps for whatever reason.
      
      Note, the bad behavior is relatively benign in the current code base as
      KVM sets all bits in vmcs02's MSR bitmap by default, clears bits if and
      only if L0 KVM also disables interception of an MSR, and only uses the
      buggy helper for MSR_IA32_SPEC_CTRL.  Because KVM explicitly tests WRMSR
      before disabling interception of MSR_IA32_SPEC_CTRL, the flawed check
      will only result in KVM reading MSR_IA32_SPEC_CTRL from hardware when it
      isn't strictly necessary.
      
      Tag the fix for stable in case a future fix wants to use
      msr_write_intercepted(), in which case a buggy implementation in older
      kernels could prove subtly problematic.
      
      Fixes: d28b387f
      
       ("KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211109013047.2041518-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7dfbc624
    • Vitaly Kuznetsov's avatar
      KVM: x86: Don't update vcpu->arch.pv_eoi.msr_val when a bogus value was... · afd67ee3
      Vitaly Kuznetsov authored
      
      KVM: x86: Don't update vcpu->arch.pv_eoi.msr_val when a bogus value was written to MSR_KVM_PV_EOI_EN
      
      When kvm_gfn_to_hva_cache_init() call from kvm_lapic_set_pv_eoi() fails,
      MSR write to MSR_KVM_PV_EOI_EN results in #GP so it is reasonable to
      expect that the value we keep internally in KVM wasn't updated.
      
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20211108152819.12485-3-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      afd67ee3
    • Vitaly Kuznetsov's avatar
      KVM: x86: Rename kvm_lapic_enable_pv_eoi() · 77c3323f
      Vitaly Kuznetsov authored
      
      
      kvm_lapic_enable_pv_eoi() is a misnomer as the function is also
      used to disable PV EOI. Rename it to kvm_lapic_set_pv_eoi().
      
      No functional change intended.
      
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20211108152819.12485-2-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      77c3323f
    • Paul Durrant's avatar
      KVM: x86: Make sure KVM_CPUID_FEATURES really are KVM_CPUID_FEATURES · 760849b1
      Paul Durrant authored
      
      
      Currently when kvm_update_cpuid_runtime() runs, it assumes that the
      KVM_CPUID_FEATURES leaf is located at 0x40000001. This is not true,
      however, if Hyper-V support is enabled. In this case the KVM leaves will
      be offset.
      
      This patch introdues as new 'kvm_cpuid_base' field into struct
      kvm_vcpu_arch to track the location of the KVM leaves and function
      kvm_update_kvm_cpuid_base() (called from kvm_set_cpuid()) to locate the
      leaves using the 'KVMKVMKVM\0\0\0' signature (which is now given a
      definition in kvm_para.h). Adjustment of KVM_CPUID_FEATURES will hence now
      target the correct leaf.
      
      NOTE: A new for_each_possible_hypervisor_cpuid_base() macro is intoduced
            into processor.h to avoid having duplicate code for the iteration
            over possible hypervisor base leaves.
      
      Signed-off-by: default avatarPaul Durrant <pdurrant@amazon.com>
      Message-Id: <20211105095101.5384-3-pdurrant@amazon.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      760849b1
    • Sean Christopherson's avatar
      KVM: x86: Add helper to consolidate core logic of SET_CPUID{2} flows · 8b44b174
      Sean Christopherson authored
      
      
      Move the core logic of SET_CPUID and SET_CPUID2 to a common helper, the
      only difference between the two ioctls() is the format of the userspace
      struct.  A future fix will add yet more code to the core logic.
      
      No functional change intended.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211105095101.5384-2-pdurrant@amazon.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      8b44b174
    • Junaid Shahid's avatar
      kvm: mmu: Use fast PF path for access tracking of huge pages when possible · 10c30de0
      Junaid Shahid authored
      
      
      The fast page fault path bails out on write faults to huge pages in
      order to accommodate dirty logging. This change adds a check to do that
      only when dirty logging is actually enabled, so that access tracking for
      huge pages can still use the fast path for write faults in the common
      case.
      
      Signed-off-by: default avatarJunaid Shahid <junaids@google.com>
      Reviewed-by: default avatarBen Gardon <bgardon@google.com>
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211104003359.2201967-1-junaids@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      10c30de0
    • Sean Christopherson's avatar
      KVM: x86/mmu: Properly dereference rcu-protected TDP MMU sptep iterator · c435d4b7
      Sean Christopherson authored
      Wrap the read of iter->sptep in tdp_mmu_map_handle_target_level() with
      rcu_dereference().  Shadow pages in the TDP MMU, and thus their SPTEs,
      are protected by rcu.
      
      This fixes a Sparse warning at tdp_mmu.c:900:51:
        warning: incorrect type in argument 1 (different address spaces)
        expected unsigned long long [usertype] *sptep
        got unsigned long long [noderef] [usertype] __rcu *[usertype] sptep
      
      Fixes: 7158bee4
      
       ("KVM: MMU: pass kvm_mmu_page struct to make_spte")
      Cc: Ben Gardon <bgardon@google.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211103161833.3769487-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c435d4b7
    • Maxim Levitsky's avatar
      KVM: x86: inhibit APICv when KVM_GUESTDBG_BLOCKIRQ active · cae72dcc
      Maxim Levitsky authored
      KVM_GUESTDBG_BLOCKIRQ relies on interrupts being injected using
      standard kvm's inject_pending_event, and not via APICv/AVIC.
      
      Since this is a debug feature, just inhibit APICv/AVIC while
      KVM_GUESTDBG_BLOCKIRQ is in use on at least one vCPU.
      
      Fixes: 61e5f69e
      
       ("KVM: x86: implement KVM_GUESTDBG_BLOCKIRQ")
      
      Reported-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Tested-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211108090245.166408-1-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      cae72dcc
    • Jim Mattson's avatar
      kvm: x86: Convert return type of *is_valid_rdpmc_ecx() to bool · e6cd31f1
      Jim Mattson authored
      
      
      These function names sound like predicates, and they have siblings,
      *is_valid_msr(), which _are_ predicates. Moreover, there are comments
      that essentially warn that these functions behave unexpectedly.
      
      Flip the polarity of the return values, so that they become
      predicates, and convert the boolean result to a success/failure code
      at the outer call site.
      
      Suggested-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarJim Mattson <jmattson@google.com>
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211105202058.1048757-1-jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e6cd31f1
    • David Woodhouse's avatar
      KVM: x86: Fix recording of guest steal time / preempted status · 7e2175eb
      David Woodhouse authored
      In commit b0431382 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is
      not missed") we switched to using a gfn_to_pfn_cache for accessing the
      guest steal time structure in order to allow for an atomic xchg of the
      preempted field. This has a couple of problems.
      
      Firstly, kvm_map_gfn() doesn't work at all for IOMEM pages when the
      atomic flag is set, which it is in kvm_steal_time_set_preempted(). So a
      guest vCPU using an IOMEM page for its steal time would never have its
      preempted field set.
      
      Secondly, the gfn_to_pfn_cache is not invalidated in all cases where it
      should have been. There are two stages to the GFN->PFN conversion;
      first the GFN is converted to a userspace HVA, and then that HVA is
      looked up in the process page tables to find the underlying host PFN.
      Correct invalidation of the latter would require being hooked up to the
      MMU notifiers, but that doesn't happen---so it just keeps mapping and
      unmapping the *wrong* PFN after the userspace page tables change.
      
      In the !IOMEM case at least the stale page *is* pinned all the time it's
      cached, so it won't be freed and reused by anyone else while still
      receiving the steal time updates. The map/unmap dance only takes care
      of the KVM administrivia such as marking the page dirty.
      
      Until the gfn_to_pfn cache handles the remapping automatically by
      integrating with the MMU notifiers, we might as well not get a
      kernel mapping of it, and use the perfectly serviceable userspace HVA
      that we already have.  We just need to implement the atomic xchg on
      the userspace address with appropriate exception handling, which is
      fairly trivial.
      
      Cc: stable@vger.kernel.org
      Fixes: b0431382
      
       ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed")
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <3645b9b889dac6438394194bb5586a46b68d581f.camel@infradead.org>
      [I didn't entirely agree with David's assessment of the
       usefulness of the gfn_to_pfn cache, and integrated the outcome
       of the discussion in the above commit message. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7e2175eb
  4. Nov 02, 2021
  5. Nov 01, 2021
    • Bixuan Cui's avatar
      RISC-V: KVM: fix boolreturn.cocci warnings · bbd5ba8d
      Bixuan Cui authored
      
      
      Fix boolreturn.cocci warnings:
      ./arch/riscv/kvm/mmu.c:603:9-10: WARNING: return of 0/1 in function
      'kvm_age_gfn' with return type bool
      ./arch/riscv/kvm/mmu.c:582:9-10: WARNING: return of 0/1 in function
      'kvm_set_spte_gfn' with return type bool
      ./arch/riscv/kvm/mmu.c:621:9-10: WARNING: return of 0/1 in function
      'kvm_test_age_gfn' with return type bool
      ./arch/riscv/kvm/mmu.c:568:9-10: WARNING: return of 0/1 in function
      'kvm_unmap_gfn_range' with return type bool
      
      Signed-off-by: default avatarBixuan Cui <cuibixuan@linux.alibaba.com>
      Signed-off-by: default avatarAnup Patel <anup.patel@wdc.com>
      bbd5ba8d
    • ran jianping's avatar
      RISC-V: KVM: remove unneeded semicolon · 7b161d9c
      ran jianping authored
      
      
       Elimate the following coccinelle check warning:
       ./arch/riscv/kvm/vcpu_sbi.c:169:2-3: Unneeded semicolon
       ./arch/riscv/kvm/vcpu_exit.c:397:2-3: Unneeded semicolon
       ./arch/riscv/kvm/vcpu_exit.c:687:2-3: Unneeded semicolon
       ./arch/riscv/kvm/vcpu_exit.c:645:2-3: Unneeded semicolon
       ./arch/riscv/kvm/vcpu.c:247:2-3: Unneeded semicolon
       ./arch/riscv/kvm/vcpu.c:284:2-3: Unneeded semicolon
       ./arch/riscv/kvm/vcpu_timer.c:123:2-3: Unneeded semicolon
       ./arch/riscv/kvm/vcpu_timer.c:170:2-3: Unneeded semicolon
      
      Reported-by: default avatarZeal Robot <zealci@zte.com.cn>
      Signed-off-by: default avatarran jianping <ran.jianping@zte.com.cn>
      Signed-off-by: default avatarAnup Patel <anup.patel@wdc.com>
      7b161d9c
  6. Oct 31, 2021
    • Paolo Bonzini's avatar
      Merge tag 'kvm-s390-next-5.16-1' of... · 9c6eb531
      Paolo Bonzini authored
      Merge tag 'kvm-s390-next-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
      
      KVM: s390: Fixes and Features for 5.16
      
      - SIGP Fixes
      - initial preparations for lazy destroy of secure VMs
      - storage key improvements/fixes
      - Log the guest CPNC
      9c6eb531
    • Anup Patel's avatar
      RISC-V: KVM: Fix GPA passed to __kvm_riscv_hfence_gvma_xyz() functions · 7c8de080
      Anup Patel authored
      The parameter passed to HFENCE.GVMA instruction in rs1 register
      is guest physical address right shifted by 2 (i.e. divided by 4).
      
      Unfortunately, we overlooked the semantics of rs1 registers for
      HFENCE.GVMA instruction and never right shifted guest physical
      address by 2. This issue did not manifest for hypervisors till
      now because:
        1) Currently, only __kvm_riscv_hfence_gvma_all() and SBI
           HFENCE calls are used to invalidate TLB.
        2) All H-extension implementations (such as QEMU, Spike,
           Rocket Core FPGA, etc) that we tried till now were
           conservatively flushing everything upon any HFENCE.GVMA
           instruction.
      
      This patch fixes GPA passed to __kvm_riscv_hfence_gvma_vmid_gpa()
      and __kvm_riscv_hfence_gvma_gpa() functions.
      
      Fixes: fd7bb4a2
      
       ("RISC-V: KVM: Implement VMID allocator")
      Reported-by: default avatarIan Huang <ihuang@ventanamicro.com>
      Signed-off-by: default avatarAnup Patel <anup.patel@wdc.com>
      Message-Id: <20211026170136.2147619-4-anup.patel@wdc.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7c8de080
    • Anup Patel's avatar
      RISC-V: KVM: Factor-out FP virtualization into separate sources · 0a86512d
      Anup Patel authored
      
      
      The timer and SBI virtualization is already in separate sources.
      In future, we will have vector and AIA virtualization also added
      as separate sources.
      
      To align with above described modularity, we factor-out FP
      virtualization into separate sources.
      
      Signed-off-by: default avatarAnup Patel <anup.patel@wdc.com>
      Message-Id: <20211026170136.2147619-3-anup.patel@wdc.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0a86512d
    • Paolo Bonzini's avatar
      Merge tag 'kvmarm-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD · 4e338684
      Paolo Bonzini authored
      KVM/arm64 updates for Linux 5.16
      
      - More progress on the protected VM front, now with the full
        fixed feature set as well as the limitation of some hypercalls
        after initialisation.
      
      - Cleanup of the RAZ/WI sysreg handling, which was pointlessly
        complicated
      
      - Fixes for the vgic placement in the IPA space, together with a
        bunch of selftests
      
      - More memcg accounting of the memory allocated on behalf of a guest
      
      - Timer and vgic selftests
      
      - Workarounds for the Apple M1 broken vgic implementation
      
      - KConfig cleanups
      
      - New kvmarm.mode=none option, for those who really dislike us
      4e338684
  7. Oct 27, 2021
  8. Oct 25, 2021