Skip to content
  1. Oct 23, 2020
    • Ben Gardon's avatar
      kvm: x86/mmu: NX largepage recovery for TDP MMU · 29cf0f50
      Ben Gardon authored
      
      
      When KVM maps a largepage backed region at a lower level in order to
      make it executable (i.e. NX large page shattering), it reduces the TLB
      performance of that region. In order to avoid making this degradation
      permanent, KVM must periodically reclaim shattered NX largepages by
      zapping them and allowing them to be rebuilt in the page fault handler.
      
      With this patch, the TDP MMU does not respect KVM's rate limiting on
      reclaim. It traverses the entire TDP structure every time. This will be
      addressed in a future patch.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
      
      Signed-off-by: default avatarBen Gardon <bgardon@google.com>
      Message-Id: <20201014182700.2888246-21-bgardon@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      29cf0f50
    • Ben Gardon's avatar
      kvm: x86/mmu: Don't clear write flooding count for direct roots · daa5b6c1
      Ben Gardon authored
      
      
      Direct roots don't have a write flooding count because the guest can't
      affect that paging structure. Thus there's no need to clear the write
      flooding count on a fast CR3 switch for direct roots.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
      
      Signed-off-by: default avatarBen Gardon <bgardon@google.com>
      Message-Id: <20201014182700.2888246-20-bgardon@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      daa5b6c1
    • Ben Gardon's avatar
      kvm: x86/mmu: Support MMIO in the TDP MMU · 95fb5b02
      Ben Gardon authored
      
      
      In order to support MMIO, KVM must be able to walk the TDP paging
      structures to find mappings for a given GFN. Support this walk for
      the TDP MMU.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
      
      v2: Thanks to Dan Carpenter and kernel test robot for finding that root
      was used uninitialized in get_mmio_spte.
      
      Signed-off-by: default avatarBen Gardon <bgardon@google.com>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Message-Id: <20201014182700.2888246-19-bgardon@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      95fb5b02
    • Ben Gardon's avatar
      kvm: x86/mmu: Support write protection for nesting in tdp MMU · 46044f72
      Ben Gardon authored
      
      
      To support nested virtualization, KVM will sometimes need to write
      protect pages which are part of a shadowed paging structure or are not
      writable in the shadowed paging structure. Add a function to write
      protect GFN mappings for this purpose.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
      
      Signed-off-by: default avatarBen Gardon <bgardon@google.com>
      Message-Id: <20201014182700.2888246-18-bgardon@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      46044f72
    • Ben Gardon's avatar
      kvm: x86/mmu: Support disabling dirty logging for the tdp MMU · 14881998
      Ben Gardon authored
      
      
      Dirty logging ultimately breaks down MMU mappings to 4k granularity.
      When dirty logging is no longer needed, these granaular mappings
      represent a useless performance penalty. When dirty logging is disabled,
      search the paging structure for mappings that could be re-constituted
      into a large page mapping. Zap those mappings so that they can be
      faulted in again at a higher mapping level.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
      
      Signed-off-by: default avatarBen Gardon <bgardon@google.com>
      Message-Id: <20201014182700.2888246-17-bgardon@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      14881998
    • Ben Gardon's avatar
      kvm: x86/mmu: Support dirty logging for the TDP MMU · a6a0b05d
      Ben Gardon authored
      
      
      Dirty logging is a key feature of the KVM MMU and must be supported by
      the TDP MMU. Add support for both the write protection and PML dirty
      logging modes.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
      
      Signed-off-by: default avatarBen Gardon <bgardon@google.com>
      Message-Id: <20201014182700.2888246-16-bgardon@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a6a0b05d
    • Ben Gardon's avatar
      kvm: x86/mmu: Support changed pte notifier in tdp MMU · 1d8dd6b3
      Ben Gardon authored
      
      
      In order to interoperate correctly with the rest of KVM and other Linux
      subsystems, the TDP MMU must correctly handle various MMU notifiers. Add
      a hook and handle the change_pte MMU notifier.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
      
      Signed-off-by: default avatarBen Gardon <bgardon@google.com>
      Message-Id: <20201014182700.2888246-15-bgardon@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1d8dd6b3
    • Ben Gardon's avatar
      kvm: x86/mmu: Add access tracking for tdp_mmu · f8e14497
      Ben Gardon authored
      
      
      In order to interoperate correctly with the rest of KVM and other Linux
      subsystems, the TDP MMU must correctly handle various MMU notifiers. The
      main Linux MM uses the access tracking MMU notifiers for swap and other
      features. Add hooks to handle the test/flush HVA (range) family of
      MMU notifiers.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
      
      Signed-off-by: default avatarBen Gardon <bgardon@google.com>
      Message-Id: <20201014182700.2888246-14-bgardon@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f8e14497
    • Ben Gardon's avatar
      kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU · 063afacd
      Ben Gardon authored
      
      
      In order to interoperate correctly with the rest of KVM and other Linux
      subsystems, the TDP MMU must correctly handle various MMU notifiers. Add
      hooks to handle the invalidate range family of MMU notifiers.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
      
      Signed-off-by: default avatarBen Gardon <bgardon@google.com>
      Message-Id: <20201014182700.2888246-13-bgardon@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      063afacd
    • Ben Gardon's avatar
      kvm: x86/mmu: Allocate struct kvm_mmu_pages for all pages in TDP MMU · 89c0fd49
      Ben Gardon authored
      
      
      Attach struct kvm_mmu_pages to every page in the TDP MMU to track
      metadata, facilitate NX reclaim, and enable inproved parallelism of MMU
      operations in future patches.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
      
      Signed-off-by: default avatarBen Gardon <bgardon@google.com>
      Message-Id: <20201014182700.2888246-12-bgardon@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      89c0fd49
    • Ben Gardon's avatar
      kvm: x86/mmu: Add TDP MMU PF handler · bb18842e
      Ben Gardon authored
      
      
      Add functions to handle page faults in the TDP MMU. These page faults
      are currently handled in much the same way as the x86 shadow paging
      based MMU, however the ordering of some operations is slightly
      different. Future patches will add eager NX splitting, a fast page fault
      handler, and parallel page faults.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
      
      Signed-off-by: default avatarBen Gardon <bgardon@google.com>
      Message-Id: <20201014182700.2888246-11-bgardon@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bb18842e
  2. Oct 22, 2020
    • Ben Gardon's avatar
      kvm: x86/mmu: Remove disallowed_hugepage_adjust shadow_walk_iterator arg · 7d945312
      Ben Gardon authored
      
      
      In order to avoid creating executable hugepages in the TDP MMU PF
      handler, remove the dependency between disallowed_hugepage_adjust and
      the shadow_walk_iterator. This will open the function up to being used
      by the TDP MMU PF handler in a future patch.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
      
      Signed-off-by: default avatarBen Gardon <bgardon@google.com>
      Message-Id: <20201014182700.2888246-10-bgardon@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7d945312
    • Ben Gardon's avatar
      kvm: x86/mmu: Support zapping SPTEs in the TDP MMU · faaf05b0
      Ben Gardon authored
      
      
      Add functions to zap SPTEs to the TDP MMU. These are needed to tear down
      TDP MMU roots properly and implement other MMU functions which require
      tearing down mappings. Future patches will add functions to populate the
      page tables, but as for this patch there will not be any work for these
      functions to do.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
      
      Signed-off-by: default avatarBen Gardon <bgardon@google.com>
      Message-Id: <20201014182700.2888246-8-bgardon@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      faaf05b0
    • Peter Xu's avatar
      KVM: Cache as_id in kvm_memory_slot · 9e9eb226
      Peter Xu authored
      
      
      Cache the address space ID just like the slot ID.  It will be used in
      order to fill in the dirty ring entries.
      
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Suggested-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Reviewed-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20201014182700.2888246-7-bgardon@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      9e9eb226
    • Ben Gardon's avatar
      kvm: x86/mmu: Add functions to handle changed TDP SPTEs · 2f2fad08
      Ben Gardon authored
      
      
      The existing bookkeeping done by KVM when a PTE is changed is spread
      around several functions. This makes it difficult to remember all the
      stats, bitmaps, and other subsystems that need to be updated whenever a
      PTE is modified. When a non-leaf PTE is marked non-present or becomes a
      leaf PTE, page table memory must also be freed. To simplify the MMU and
      facilitate the use of atomic operations on SPTEs in future patches, create
      functions to handle some of the bookkeeping required as a result of
      a change.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
      
      Signed-off-by: default avatarBen Gardon <bgardon@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2f2fad08
    • Ben Gardon's avatar
      kvm: x86/mmu: Allocate and free TDP MMU roots · 02c00b3a
      Ben Gardon authored
      
      
      The TDP MMU must be able to allocate paging structure root pages and track
      the usage of those pages. Implement a similar, but separate system for root
      page allocation to that of the x86 shadow paging implementation. When
      future patches add synchronization model changes to allow for parallel
      page faults, these pages will need to be handled differently from the
      x86 shadow paging based MMU's root pages.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
      
      Signed-off-by: default avatarBen Gardon <bgardon@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      02c00b3a
    • Ben Gardon's avatar
      kvm: x86/mmu: Init / Uninit the TDP MMU · fe5db27d
      Ben Gardon authored
      
      
      The TDP MMU offers an alternative mode of operation to the x86 shadow
      paging based MMU, optimized for running an L1 guest with TDP. The TDP MMU
      will require new fields that need to be initialized and torn down. Add
      hooks into the existing KVM MMU initialization process to do that
      initialization / cleanup. Currently the initialization and cleanup
      fucntions do not do very much, however more operations will be added in
      future patches.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
      
      Signed-off-by: default avatarBen Gardon <bgardon@google.com>
      Message-Id: <20201014182700.2888246-4-bgardon@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fe5db27d
    • Ben Gardon's avatar
      kvm: x86/mmu: Introduce tdp_iter · c9180b72
      Ben Gardon authored
      
      
      The TDP iterator implements a pre-order traversal of a TDP paging
      structure. This iterator will be used in future patches to create
      an efficient implementation of the KVM MMU for the TDP case.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
      
      Signed-off-by: default avatarBen Gardon <bgardon@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c9180b72
    • Paolo Bonzini's avatar
      KVM: mmu: extract spte.h and spte.c · 5a9624af
      Paolo Bonzini authored
      
      
      The SPTE format will be common to both the shadow and the TDP MMU.
      
      Extract code that implements the format to a separate module, as a
      first step towards adding the TDP MMU and putting mmu.c on a diet.
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5a9624af
    • Paolo Bonzini's avatar
      KVM: mmu: Separate updating a PTE from kvm_set_pte_rmapp · cb3eedab
      Paolo Bonzini authored
      
      
      The TDP MMU's own function for the changed-PTE notifier will need to be
      update a PTE in the exact same way as the shadow MMU.  Rather than
      re-implementing this logic, factor the SPTE creation out of kvm_set_pte_rmapp.
      
      Extracted out of a patch by Ben Gardon. <bgardon@google.com>
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      cb3eedab
    • Ben Gardon's avatar
      kvm: x86/mmu: Separate making SPTEs from set_spte · 799a4190
      Ben Gardon authored
      
      
      Separate the functions for generating leaf page table entries from the
      function that inserts them into the paging structure. This refactoring
      will facilitate changes to the MMU sychronization model to use atomic
      compare / exchanges (which are not guaranteed to succeed) instead of a
      monolithic MMU lock.
      
      No functional change expected.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This commit introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
      
      Signed-off-by: default avatarBen Gardon <bgardon@google.com>
      Reviewed-by: default avatarPeter Shier <pshier@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      799a4190
    • Ben Gardon's avatar
      kvm: mmu: Separate making non-leaf sptes from link_shadow_page · cc4674d0
      Ben Gardon authored
      
      
      The TDP MMU page fault handler will need to be able to create non-leaf
      SPTEs to build up the paging structures. Rather than re-implementing the
      function, factor the SPTE creation out of link_shadow_page.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
      
      Signed-off-by: default avatarBen Gardon <bgardon@google.com>
      Message-Id: <20200925212302.3979661-9-bgardon@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      cc4674d0
    • Paolo Bonzini's avatar
      Merge branch 'kvm-fixes' into 'next' · c0623f5e
      Paolo Bonzini authored
      Pick up bugfixes from 5.9, otherwise various tests fail.
      c0623f5e
    • Joe Perches's avatar
      KVM: PPC: Book3S HV: Make struct kernel_param_ops definition const · a4f1d94e
      Joe Perches authored
      
      
      This should be const, so make it so.
      
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Message-Id: <d130e88dd4c82a12d979da747cc0365c72c3ba15.1601770305.git.joe@perches.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a4f1d94e
    • Lai Jiangshan's avatar
      KVM: x86: Let the guest own CR4.FSGSBASE · 30031c2b
      Lai Jiangshan authored
      
      
      Add FSGSBASE to the set of possible guest-owned CR4 bits, i.e. let the
      guest own it on VMX.  KVM never queries the guest's CR4.FSGSBASE value,
      thus there is no reason to force VM-Exit on FSGSBASE being toggled.
      
      Note, because FSGSBASE is conditionally available, this is dependent on
      recent changes to intercept reserved CR4 bits and to update the CR4
      guest/host mask in response to guest CPUID changes.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarLai Jiangshan <laijs@linux.alibaba.com>
      [sean: added justification in changelog]
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200930041659.28181-6-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      30031c2b
    • Sean Christopherson's avatar
      KVM: VMX: Intercept guest reserved CR4 bits to inject #GP fault · 2ed41aa6
      Sean Christopherson authored
      
      
      Intercept CR4 bits that are guest reserved so that KVM correctly injects
      a #GP fault if the guest attempts to set a reserved bit.  If a feature
      is supported by the CPU but is not exposed to the guest, and its
      associated CR4 bit is not intercepted by KVM by default, then KVM will
      fail to inject a #GP if the guest sets the CR4 bit without triggering
      an exit, e.g. by toggling only the bit in question.
      
      Note, KVM doesn't give the guest direct access to any CR4 bits that are
      also dependent on guest CPUID.  Yet.
      
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200930041659.28181-5-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2ed41aa6
    • Sean Christopherson's avatar
      KVM: x86: Move call to update_exception_bitmap() into VMX code · a6337a35
      Sean Christopherson authored
      
      
      Now that vcpu_after_set_cpuid() and update_exception_bitmap() are called
      back-to-back, subsume the exception bitmap update into the common CPUID
      update.  Drop the SVM invocation entirely as SVM's exception bitmap
      doesn't vary with respect to guest CPUID.
      
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200930041659.28181-4-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a6337a35
    • Sean Christopherson's avatar
      KVM: x86: Invoke vendor's vcpu_after_set_cpuid() after all common updates · c44d9b34
      Sean Christopherson authored
      Move the call to kvm_x86_ops.vcpu_after_set_cpuid() to the very end of
      kvm_vcpu_after_set_cpuid() to allow the vendor implementation to react
      to changes made by the common code.  In the near future, this will be
      used by VMX to update its CR4 guest/host masks to account for reserved
      bits.  In the long term, SGX support will update the allowed XCR0 mask
      for enclaves based on the vCPU's allowed XCR0.
      
      vcpu_after_set_cpuid() (nee kvm_update_cpuid()) was originally added by
      commit 2acf923e
      
       ("KVM: VMX: Enable XSAVE/XRSTOR for guest"), and was
      called separately after kvm_x86_ops.vcpu_after_set_cpuid() (nee
      kvm_x86_ops->cpuid_update()).  There is no indication that the placement
      of the common code updates after the vendor updates was anything more
      than a "new function at the end" decision.
      
      Inspection of the current code reveals no dependency on kvm_x86_ops'
      vcpu_after_set_cpuid() in kvm_vcpu_after_set_cpuid() or any of its
      helpers.  The bulk of the common code depends only on the guest's CPUID
      configuration, kvm_mmu_reset_context() does not consume dynamic vendor
      state, and there are no collisions between kvm_pmu_refresh() and VMX's
      update of PT state.
      
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200930041659.28181-3-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c44d9b34
    • Lai Jiangshan's avatar
      KVM: x86: Intercept LA57 to inject #GP fault when it's reserved · 6e1d849f
      Lai Jiangshan authored
      Unconditionally intercept changes to CR4.LA57 so that KVM correctly
      injects a #GP fault if the guest attempts to set CR4.LA57 when it's
      supported in hardware but not exposed to the guest.
      
      Long term, KVM needs to properly handle CR4 bits that can be under guest
      control but also may be reserved from the guest's perspective.  But, KVM
      currently sets the CR4 guest/host mask only during vCPU creation, and
      reworking flows to change that will take a bit of elbow grease.
      
      Even if/when generic support for intercepting reserved bits exists, it's
      probably not worth letting the guest set CR4.LA57 directly.  LA57 can't
      be toggled while long mode is enabled, thus it's all but guaranteed to
      be set once (maybe twice, e.g. by BIOS and kernel) during boot and never
      touched again.  On the flip side, letting the guest own CR4.LA57 may
      incur extra VMREADs.  In other words, this temporary "hack" is probably
      also the right long term fix.
      
      Fixes: fd8cb433
      
       ("KVM: MMU: Expose the LA57 feature to VM.")
      Cc: stable@vger.kernel.org
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Signed-off-by: default avatarLai Jiangshan <laijs@linux.alibaba.com>
      [sean: rewrote changelog]
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200930041659.28181-2-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6e1d849f
    • Suravee Suthikulpanit's avatar
      KVM: SVM: Initialize prev_ga_tag before use · f6426ab9
      Suravee Suthikulpanit authored
      The function amd_ir_set_vcpu_affinity makes use of the parameter struct
      amd_iommu_pi_data.prev_ga_tag to determine if it should delete struct
      amd_iommu_pi_data from a list when not running in AVIC mode.
      
      However, prev_ga_tag is initialized only when AVIC is enabled. The non-zero
      uninitialized value can cause unintended code path, which ends up making
      use of the struct vcpu_svm.ir_list and ir_list_lock without being
      initialized (since they are intended only for the AVIC case).
      
      This triggers NULL pointer dereference bug in the function vm_ir_list_del
      with the following call trace:
      
          svm_update_pi_irte+0x3c2/0x550 [kvm_amd]
          ? proc_create_single_data+0x41/0x50
          kvm_arch_irq_bypass_add_producer+0x40/0x60 [kvm]
          __connect+0x5f/0xb0 [irqbypass]
          irq_bypass_register_producer+0xf8/0x120 [irqbypass]
          vfio_msi_set_vector_signal+0x1de/0x2d0 [vfio_pci]
          vfio_msi_set_block+0x77/0xe0 [vfio_pci]
          vfio_pci_set_msi_trigger+0x25c/0x2f0 [vfio_pci]
          vfio_pci_set_irqs_ioctl+0x88/0xb0 [vfio_pci]
          vfio_pci_ioctl+0x2ea/0xed0 [vfio_pci]
          ? alloc_file_pseudo+0xa5/0x100
          vfio_device_fops_unl_ioctl+0x26/0x30 [vfio]
          ? vfio_device_fops_unl_ioctl+0x26/0x30 [vfio]
          __x64_sys_ioctl+0x96/0xd0
          do_syscall_64+0x37/0x80
          entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Therefore, initialize prev_ga_tag to zero before use. This should be safe
      because ga_tag value 0 is invalid (see function avic_vm_init).
      
      Fixes: dfa20099
      
       ("KVM: SVM: Refactor AVIC vcpu initialization into avic_init_vcpu()")
      Signed-off-by: default avatarSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Message-Id: <20201003232707.4662-1-suravee.suthikulpanit@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f6426ab9
    • Maxim Levitsky's avatar
      KVM: nSVM: implement on demand allocation of the nested state · 2fcf4876
      Maxim Levitsky authored
      
      
      This way we don't waste memory on VMs which don't use nesting
      virtualization even when the host enabled it for them.
      
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20201001112954.6258-5-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2fcf4876
    • Maxim Levitsky's avatar
      KVM: x86: allow kvm_x86_ops.set_efer to return an error value · 72f211ec
      Maxim Levitsky authored
      
      
      This will be used to signal an error to the userspace, in case
      the vendor code failed during handling of this msr. (e.g -ENOMEM)
      
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20201001112954.6258-4-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      72f211ec
    • Maxim Levitsky's avatar
      KVM: x86: report negative values from wrmsr emulation to userspace · 7dffecaf
      Maxim Levitsky authored
      
      
      This will allow the KVM to report such errors (e.g -ENOMEM)
      to the userspace.
      
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20201001112954.6258-3-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7dffecaf
    • Maxim Levitsky's avatar
      KVM: x86: xen_hvm_config: cleanup return values · 36385ccc
      Maxim Levitsky authored
      
      
      Return 1 on errors that are caused by wrong guest behavior
      (which will inject #GP to the guest)
      
      And return a negative error value on issues that are
      the kernel's fault (e.g -ENOMEM)
      
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20201001112954.6258-2-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      36385ccc
    • Joe Perches's avatar
      kvm x86/mmu: Make struct kernel_param_ops definitions const · d5d6c18d
      Joe Perches authored
      
      
      These should be const, so make it so.
      
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Message-Id: <ed95eef4f10fc1317b66936c05bc7dd8f943a6d5.1601770305.git.joe@perches.com>
      Reviewed-by: default avatarBen Gardon <bgardon@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d5d6c18d
    • Vitaly Kuznetsov's avatar
      KVM: x86: bump KVM_MAX_CPUID_ENTRIES · 3f4e3eb4
      Vitaly Kuznetsov authored
      
      
      As vcpu->arch.cpuid_entries is now allocated dynamically, the only
      remaining use for KVM_MAX_CPUID_ENTRIES is to check KVM_SET_CPUID/
      KVM_SET_CPUID2 input for sanity. Since it was reported that the
      current limit (80) is insufficient for some CPUs, bump
      KVM_MAX_CPUID_ENTRIES and use an arbitrary value '256' as the new
      limit.
      
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20201001130541.1398392-4-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3f4e3eb4
    • Vitaly Kuznetsov's avatar
      KVM: x86: allocate vcpu->arch.cpuid_entries dynamically · 255cbecf
      Vitaly Kuznetsov authored
      
      
      The current limit for guest CPUID leaves (KVM_MAX_CPUID_ENTRIES, 80)
      is reported to be insufficient but before we bump it let's switch to
      allocating vcpu->arch.cpuid_entries[] array dynamically. Currently,
      'struct kvm_cpuid_entry2' is 40 bytes so vcpu->arch.cpuid_entries is
      3200 bytes which accounts for 1/4 of the whole 'struct kvm_vcpu_arch'
      but having it pre-allocated (for all vCPUs which we also pre-allocate)
      gives us no real benefits.
      
      Another plus of the dynamic allocation is that we now do kvm_check_cpuid()
      check before we assign anything to vcpu->arch.cpuid_nent/cpuid_entries so
      no changes are made in case the check fails.
      
      Opportunistically remove unneeded 'out' labels from
      kvm_vcpu_ioctl_set_cpuid()/kvm_vcpu_ioctl_set_cpuid2() and return
      directly whenever possible.
      
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20201001130541.1398392-3-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      255cbecf
    • Vitaly Kuznetsov's avatar
      KVM: x86: disconnect kvm_check_cpuid() from vcpu->arch.cpuid_entries · f69858fc
      Vitaly Kuznetsov authored
      
      
      As a preparatory step to allocating vcpu->arch.cpuid_entries dynamically
      make kvm_check_cpuid() check work with an arbitrary 'struct kvm_cpuid_entry2'
      array.
      
      Currently, when kvm_check_cpuid() fails we reset vcpu->arch.cpuid_nent to
      0 and this is kind of weird, i.e. one would expect CPUIDs to remain
      unchanged when KVM_SET_CPUID[2] call fails.
      
      No functional change intended. It would've been possible to move the updated
      kvm_check_cpuid() in kvm_vcpu_ioctl_set_cpuid2() and check the supplied
      input before we start updating vcpu->arch.cpuid_entries/nent but we
      can't do the same in kvm_vcpu_ioctl_set_cpuid() as we'll have to copy
      'struct kvm_cpuid_entry' entries first. The change will be made when
      vcpu->arch.cpuid_entries[] array becomes allocated dynamically.
      
      Suggested-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20201001130541.1398392-2-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f69858fc
    • Oliver Upton's avatar
      Documentation: kvm: fix some typos in cpuid.rst · 3ee6fb49
      Oliver Upton authored
      
      
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Reviewed-by: default avatarPeter Shier <pshier@google.com>
      Signed-off-by: default avatarOliver Upton <oupton@google.com>
      Change-Id: I0c6355b09fedf8f9cc4cc5f51be418e2c1c82b7b
      Message-Id: <20200818152429.1923996-5-oupton@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3ee6fb49
    • Oliver Upton's avatar
      kvm: x86: only provide PV features if enabled in guest's CPUID · 66570e96
      Oliver Upton authored
      
      
      KVM unconditionally provides PV features to the guest, regardless of the
      configured CPUID. An unwitting guest that doesn't check
      KVM_CPUID_FEATURES before use could access paravirt features that
      userspace did not intend to provide. Fix this by checking the guest's
      CPUID before performing any paravirtual operations.
      
      Introduce a capability, KVM_CAP_ENFORCE_PV_FEATURE_CPUID, to gate the
      aforementioned enforcement. Migrating a VM from a host w/o this patch to
      a host with this patch could silently change the ABI exposed to the
      guest, warranting that we default to the old behavior and opt-in for
      the new one.
      
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Reviewed-by: default avatarPeter Shier <pshier@google.com>
      Signed-off-by: default avatarOliver Upton <oupton@google.com>
      Change-Id: I202a0926f65035b872bfe8ad15307c026de59a98
      Message-Id: <20200818152429.1923996-4-oupton@google.com>
      Reviewed-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      66570e96