Skip to content
  1. Oct 19, 2021
    • Oliver Upton's avatar
      selftests: KVM: Add test for KVM_{GET,SET}_CLOCK · 61fb1c54
      Oliver Upton authored
      
      
      Add a selftest for the new KVM clock UAPI that was introduced. Ensure
      that the KVM clock is consistent between userspace and the guest, and
      that the difference in realtime will only ever cause the KVM clock to
      advance forward.
      
      Cc: Andrew Jones <drjones@redhat.com>
      Signed-off-by: default avatarOliver Upton <oupton@google.com>
      Message-Id: <20210916181555.973085-3-oupton@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      61fb1c54
    • Oliver Upton's avatar
      tools: arch: x86: pull in pvclock headers · 50006539
      Oliver Upton authored
      
      
      Copy over approximately clean versions of the pvclock headers into
      tools. Reconcile headers/symbols missing in tools that are unneeded.
      
      Signed-off-by: default avatarOliver Upton <oupton@google.com>
      Message-Id: <20210916181555.973085-2-oupton@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      50006539
    • Oliver Upton's avatar
      KVM: x86: Expose TSC offset controls to userspace · 828ca896
      Oliver Upton authored
      
      
      To date, VMM-directed TSC synchronization and migration has been a bit
      messy. KVM has some baked-in heuristics around TSC writes to infer if
      the VMM is attempting to synchronize. This is problematic, as it depends
      on host userspace writing to the guest's TSC within 1 second of the last
      write.
      
      A much cleaner approach to configuring the guest's views of the TSC is to
      simply migrate the TSC offset for every vCPU. Offsets are idempotent,
      and thus not subject to change depending on when the VMM actually
      reads/writes values from/to KVM. The VMM can then read the TSC once with
      KVM_GET_CLOCK to capture a (realtime, host_tsc) pair at the instant when
      the guest is paused.
      
      Cc: David Matlack <dmatlack@google.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarOliver Upton <oupton@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Message-Id: <20210916181538.968978-8-oupton@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      828ca896
    • Oliver Upton's avatar
      KVM: x86: Refactor tsc synchronization code · 58d4277b
      Oliver Upton authored
      
      
      Refactor kvm_synchronize_tsc to make a new function that allows callers
      to specify TSC parameters (offset, value, nanoseconds, etc.) explicitly
      for the sake of participating in TSC synchronization.
      
      Signed-off-by: default avatarOliver Upton <oupton@google.com>
      Message-Id: <20210916181538.968978-7-oupton@google.com>
      [Make sure kvm->arch.cur_tsc_generation and vcpu->arch.this_tsc_generation are
       equal at the end of __kvm_synchronize_tsc, if matched is false. Reported by
       Maxim Levitsky. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      58d4277b
    • Paolo Bonzini's avatar
      kvm: x86: protect masterclock with a seqcount · 869b4421
      Paolo Bonzini authored
      
      
      Protect the reference point for kvmclock with a seqcount, so that
      kvmclock updates for all vCPUs can proceed in parallel.  Xen runstate
      updates will also run in parallel and not bounce the kvmclock cacheline.
      
      Of the variables that were protected by pvclock_gtod_sync_lock,
      nr_vcpus_matched_tsc is different because it is updated outside
      pvclock_update_vm_gtod_copy and read inside it.  Therefore, we
      need to keep it protected by a spinlock.  In fact it must now
      be a raw spinlock, because pvclock_update_vm_gtod_copy, being the
      write-side of a seqcount, is non-preemptible.  Since we already
      have tsc_write_lock which is a raw spinlock, we can just use
      tsc_write_lock as the lock that protects the write-side of the
      seqcount.
      
      Co-developed-by: default avatarOliver Upton <oupton@google.com>
      Message-Id: <20210916181538.968978-6-oupton@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      869b4421
    • Oliver Upton's avatar
      KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK · c68dc1b5
      Oliver Upton authored
      
      
      Handling the migration of TSCs correctly is difficult, in part because
      Linux does not provide userspace with the ability to retrieve a (TSC,
      realtime) clock pair for a single instant in time. In lieu of a more
      convenient facility, KVM can report similar information in the kvm_clock
      structure.
      
      Provide userspace with a host TSC & realtime pair iff the realtime clock
      is based on the TSC. If userspace provides KVM_SET_CLOCK with a valid
      realtime value, advance the KVM clock by the amount of elapsed time. Do
      not step the KVM clock backwards, though, as it is a monotonic
      oscillator.
      
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarOliver Upton <oupton@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Message-Id: <20210916181538.968978-5-oupton@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c68dc1b5
    • Paolo Bonzini's avatar
      KVM: x86: avoid warning with -Wbitwise-instead-of-logical · 3d5e7a28
      Paolo Bonzini authored
      
      
      This is a new warning in clang top-of-tree (will be clang 14):
      
      In file included from arch/x86/kvm/mmu/mmu.c:27:
      arch/x86/kvm/mmu/spte.h:318:9: error: use of bitwise '|' with boolean operands [-Werror,-Wbitwise-instead-of-logical]
              return __is_bad_mt_xwr(rsvd_check, spte) |
                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                       ||
      arch/x86/kvm/mmu/spte.h:318:9: note: cast one or both operands to int to silence this warning
      
      The code is fine, but change it anyway to shut up this clever clogs
      of a compiler.
      
      Reported-by: default avatar <torvic9@mailbox.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3d5e7a28
    • Paolo Bonzini's avatar
      a25c78d0
    • Paolo Bonzini's avatar
      KVM: X86: fix lazy allocation of rmaps · fa13843d
      Paolo Bonzini authored
      
      
      If allocation of rmaps fails, but some of the pointers have already been written,
      those pointers can be cleaned up when the memslot is freed, or even reused later
      for another attempt at allocating the rmaps.  Therefore there is no need to
      WARN, as done for example in memslot_rmap_alloc, but the allocation *must* be
      skipped lest KVM will overwrite the previous pointer and will indeed leak memory.
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fa13843d
  2. Oct 18, 2021
  3. Oct 15, 2021
  4. Oct 05, 2021
    • Quentin Perret's avatar
      KVM: arm64: Release mmap_lock when using VM_SHARED with MTE · 6e6a8ef0
      Quentin Perret authored
      VM_SHARED mappings are currently forbidden in a memslot with MTE to
      prevent two VMs racing to sanitise the same page. However, this check
      is performed while holding current->mm's mmap_lock, but fails to release
      it. Fix this by releasing the lock when needed.
      
      Fixes: ea7fc1bb
      
       ("KVM: arm64: Introduce MTE VM feature")
      Signed-off-by: default avatarQuentin Perret <qperret@google.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20211005122031.809857-1-qperret@google.com
      6e6a8ef0
    • Quentin Perret's avatar
      KVM: arm64: Report corrupted refcount at EL2 · 7615c2a5
      Quentin Perret authored
      
      
      Some of the refcount manipulation helpers used at EL2 are instrumented
      to catch a corrupted state, but not all of them are treated equally. Let's
      make things more consistent by instrumenting hyp_page_ref_dec_and_test()
      as well.
      
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Suggested-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarQuentin Perret <qperret@google.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20211005090155.734578-6-qperret@google.com
      7615c2a5
    • Quentin Perret's avatar
      KVM: arm64: Fix host stage-2 PGD refcount · 1d58a17e
      Quentin Perret authored
      The KVM page-table library refcounts the pages of concatenated stage-2
      PGDs individually. However, when running KVM in protected mode, the
      host's stage-2 PGD is currently managed by EL2 as a single high-order
      compound page, which can cause the refcount of the tail pages to reach 0
      when they shouldn't, hence corrupting the page-table.
      
      Fix this by introducing a new hyp_split_page() helper in the EL2 page
      allocator (matching the kernel's split_page() function), and make use of
      it from host_s2_zalloc_pages_exact().
      
      Fixes: 1025c8c0
      
       ("KVM: arm64: Wrap the host with a stage 2")
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Suggested-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarQuentin Perret <qperret@google.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20211005090155.734578-5-qperret@google.com
      1d58a17e
    • Paolo Bonzini's avatar
      Merge tag 'kvm-riscv-5.16-1' of git://github.com/kvm-riscv/linux into HEAD · 542a2640
      Paolo Bonzini authored
      Initial KVM RISC-V support
      
      Following features are supported by the initial KVM RISC-V support:
      1. No RISC-V specific KVM IOCTL
      2. Loadable KVM RISC-V module
      3. Minimal possible KVM world-switch which touches only GPRs and few CSRs
      4. Works on both RV64 and RV32 host
      5. Full Guest/VM switch via vcpu_get/vcpu_put infrastructure
      6. KVM ONE_REG interface for VCPU register access from KVM user-space
      7. Interrupt controller emulation in KVM user-space
      8. Timer and IPI emuation in kernel
      9. Both Sv39x4 and Sv48x4 supported for RV64 host
      10. MMU notifiers supported
      11. Generic dirty log supported
      12. FP lazy save/restore supported
      13. SBI v0.1 emulation for Guest/VM
      14. Forward unhandled SBI calls to KVM user-space
      15. Hugepage support for Guest/VM
      16. IOEVENTFD support for Vhost
      542a2640
  5. Oct 04, 2021
  6. Oct 01, 2021
    • David Stevens's avatar
      KVM: x86: only allocate gfn_track when necessary · deae4a10
      David Stevens authored
      
      
      Avoid allocating the gfn_track arrays if nothing needs them. If there
      are no external to KVM users of the API (i.e. no GVT-g), then page
      tracking is only needed for shadow page tables. This means that when tdp
      is enabled and there are no external users, then the gfn_track arrays
      can be lazily allocated when the shadow MMU is actually used. This avoid
      allocations equal to .05% of guest memory when nested virtualization is
      not used, if the kernel is compiled without GVT-g.
      
      Signed-off-by: default avatarDavid Stevens <stevensd@chromium.org>
      Message-Id: <20210922045859.2011227-3-stevensd@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      deae4a10
    • David Stevens's avatar
      KVM: x86: add config for non-kvm users of page tracking · e9d0c0c4
      David Stevens authored
      
      
      Add a config option that allows kvm to determine whether or not there
      are any external users of page tracking.
      
      Signed-off-by: default avatarDavid Stevens <stevensd@chromium.org>
      Message-Id: <20210922045859.2011227-2-stevensd@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e9d0c0c4
    • Krish Sadhukhan's avatar
      nSVM: Check for reserved encodings of TLB_CONTROL in nested VMCB · 174a921b
      Krish Sadhukhan authored
      
      
      According to section "TLB Flush" in APM vol 2,
      
          "Support for TLB_CONTROL commands other than the first two, is
           optional and is indicated by CPUID Fn8000_000A_EDX[FlushByAsid].
      
           All encodings of TLB_CONTROL not defined in the APM are reserved."
      
      Signed-off-by: default avatarKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Message-Id: <20210920235134.101970-3-krish.sadhukhan@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      174a921b
    • Juergen Gross's avatar
      kvm: use kvfree() in kvm_arch_free_vm() · 78b497f2
      Juergen Gross authored
      
      
      By switching from kfree() to kvfree() in kvm_arch_free_vm() Arm64 can
      use the common variant. This can be accomplished by adding another
      macro __KVM_HAVE_ARCH_VM_FREE, which will be used only by x86 for now.
      
      Further simplification can be achieved by adding __kvm_arch_free_vm()
      doing the common part.
      
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Message-Id: <20210903130808.30142-5-jgross@suse.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      78b497f2
    • Babu Moger's avatar
      KVM: x86: Expose Predictive Store Forwarding Disable · b73a5432
      Babu Moger authored
      
      
      Predictive Store Forwarding: AMD Zen3 processors feature a new
      technology called Predictive Store Forwarding (PSF).
      
      PSF is a hardware-based micro-architectural optimization designed
      to improve the performance of code execution by predicting address
      dependencies between loads and stores.
      
      How PSF works:
      
      It is very common for a CPU to execute a load instruction to an address
      that was recently written by a store. Modern CPUs implement a technique
      known as Store-To-Load-Forwarding (STLF) to improve performance in such
      cases. With STLF, data from the store is forwarded directly to the load
      without having to wait for it to be written to memory. In a typical CPU,
      STLF occurs after the address of both the load and store are calculated
      and determined to match.
      
      PSF expands on this by speculating on the relationship between loads and
      stores without waiting for the address calculation to complete. With PSF,
      the CPU learns over time the relationship between loads and stores. If
      STLF typically occurs between a particular store and load, the CPU will
      remember this.
      
      In typical code, PSF provides a performance benefit by speculating on
      the load result and allowing later instructions to begin execution
      sooner than they otherwise would be able to.
      
      The details of security analysis of AMD predictive store forwarding is
      documented here.
      https://www.amd.com/system/files/documents/security-analysis-predictive-store-forwarding.pdf
      
      Predictive Store Forwarding controls:
      There are two hardware control bits which influence the PSF feature:
      - MSR 48h bit 2 – Speculative Store Bypass (SSBD)
      - MSR 48h bit 7 – Predictive Store Forwarding Disable (PSFD)
      
      The PSF feature is disabled if either of these bits are set.  These bits
      are controllable on a per-thread basis in an SMT system. By default, both
      SSBD and PSFD are 0 meaning that the speculation features are enabled.
      
      While the SSBD bit disables PSF and speculative store bypass, PSFD only
      disables PSF.
      
      PSFD may be desirable for software which is concerned with the
      speculative behavior of PSF but desires a smaller performance impact than
      setting SSBD.
      
      Support for PSFD is indicated in CPUID Fn8000_0008 EBX[28].
      All processors that support PSF will also support PSFD.
      
      Linux kernel does not have the interface to enable/disable PSFD yet. Plan
      here is to expose the PSFD technology to KVM so that the guest kernel can
      make use of it if they wish to.
      
      Signed-off-by: default avatarBabu Moger <Babu.Moger@amd.com>
      Message-Id: <163244601049.30292.5855870305350227855.stgit@bmoger-ubuntu>
      [Keep feature private to KVM, as requested by Borislav Petkov. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b73a5432
    • David Matlack's avatar
      KVM: x86/mmu: Avoid memslot lookup in make_spte and mmu_try_to_unsync_pages · 53597858
      David Matlack authored
      
      
      mmu_try_to_unsync_pages checks if page tracking is active for the given
      gfn, which requires knowing the memslot. We can pass down the memslot
      via make_spte to avoid this lookup.
      
      The memslot is also handy for make_spte's marking of the gfn as dirty:
      we can test whether dirty page tracking is enabled, and if so ensure that
      pages are mapped as writable with 4K granularity.  Apart from the warning,
      no functional change is intended.
      
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Message-Id: <20210813203504.2742757-7-dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      53597858